You can check the status of a scraper job and fetch results manually using the job ID. This is useful if you’re not using webhooks or need to monitor job progress in your system.

Check Job Status

To check whether a job is still running or finished:

curl --request GET \
  --url 'https://api.hasdata.com/scrapers/crawler/jobs/<jobId>' \
  --header 'x-api-key: <your-api-key>'

Example Response

{
  "id": "dd1a8c53-2d47-4444-977d-8d653a6a3c82",
  "status": "in_progress",
  "dataRows": 20,
  "creditsSpent": 200
}
{
  "id": "dd1a8c53-2d47-4444-977d-8d653a6a3c82",
  "status": "in_progress",
  "data": {
    "csv": "https://api.hasdata.com/jobs/dd1a8c53-2d47-4444-977d-8d653a6a3c82/results/b6cc6733-6d0e-4e44-9e94-38688aad3884.csv",
    "json": "https://api.hasdata.com/jobs/dd1a8c53-2d47-4444-977d-8d653a6a3c82/results/9cb592e3-6700-42ff-b58c-e7da3f478f28.json",
    "xlsx": "https://api.hasdata.com/jobs/dd1a8c53-2d47-4444-977d-8d653a6a3c82/results/ecea853c-e0ca-4a23-ae74-eea0588e54b6.xlsx"
  },
  "input": {
    "limit": 25,
    "urls": ["https://hasdata.com", "https://example.com"],
    "maxDepth": 5,
    "includeOnlyPaths": "(blog/.+|articles/.+)",
    "webhook": {
      "url": "https://example.com/webhook",
      "events": ["scraper.job.started", "scraper.job.finished", "scraper.data.rows_added"]
    }
  },
  "dataRows": 20,
  "creditsSpent": 200
}

Job Statuses

  • pending — Waiting to be processed
  • in_progress — Currently running
  • finished — Completed

Fetch Results

Once the job status is finished, you can retrieve results:

curl --request GET \
  --url 'https://api.hasdata.com/v1/scrapers/crawler/jobs/<jobId>/results?page=1&limit=100' \
  --header 'x-api-key: <your-api-key>'

Response Example

{
  "page": 1,
  "limit": 100,
  "total": 50,
  "results": [
    {
      "url": "https://example.com/page1",
      "statusCode": 200,
      "text": "Extracted content...",
      "title": "Example Page",
      "depth": 1
    },
    {
      "url": "https://example.com/page2",
      "statusCode": 404,
      "error": "Page not found",
      "depth": 1
    }
  ]
}

Maximum limit is 100 per request.