The Websites Crawler lets you crawl and extract content from multiple pages of a website by following internal links.

You submit one or more starting URLs and define how deep the crawler should go using maxDepth. You can also limit which paths should be followed using regex with includeOnlyPaths.

This scraper job is asynchronous. You’ll receive a jobId, and results can be fetched via polling or delivered to a webhook.

Example Request

curl --request POST \
  --url 'https://api.hasdata.com/scrapers/crawler/jobs' \
  --header 'Content-Type: application/json' \
  --header 'x-api-key: <your-api-key>' \
  --data '{
    "urls": [
      "https://example.com"
    ],
    "maxDepth": 3,
    "includeOnlyPaths": "(blog/.+|articles/.+)",
    "outputFormat": ["text", "json"],
    "webhook": {
      "url": "https://yourdomain.com/webhook",
      "events": ["scraper.job.started", "scraper.job.finished", "scraper.data.rows_added"]
    }
  }'

Use Web Scraping API Params

You can use any parameters from the Web Scraping API inside a Websites Crawler job — including:

  • extractRules
  • aiExtractRules
  • headers
  • proxyType / proxyCountry
  • blockResources, jsScenario, outputFormat, and more

All parameters are applied to each crawled page individually.

Get Scraper Job Status

To get the status of an existing scraper job, make a GET request to the endpoint /v1/scrapers/jobs/:jobId:

curl --location 'https://api.hasdata.com/scrapers/jobs/:jobId' \
  --header 'x-api-key: <your-api-key>'

Webhook

The webhook will notify you of events related to the scraper job. Here is an example webhook payload for the scraper.data.rows_added event:

{
  "event": "scraper.data.rows_added",
  "timestamp": "2025-04-11T14:30:00Z",
  "jobId": "dd1a8c53-2d47-4444-977d-8d653a6a3c82",
  "jobStatus": "in_progress",
  "dataRows": [
    {
      "text": "Extracted text here...",
      "statusCode": 200,
      "statusText": "OK",
      "url": "https://hasdata.com/blog",
      "depth": 1,
      "title": "Blog | HasData"
    }
  ],
  "dataRowsCount": 50,
  "creditsSpent": 50
}