Quickstart - Websites Crawler - HasData Documentation

The Websites Crawler lets you crawl and extract content from multiple pages of a website by following internal links. You submit one or more starting URLs and define how deep the crawler should go using maxDepth. You can also limit which paths should be followed using regex with includePaths. This scraper job is asynchronous. You’ll receive a jobId, and results can be fetched via polling or delivered to a webhook.

Example Request

curl --request POST \
  --url 'https://api.hasdata.com/scrapers/crawler/jobs' \
  --header 'Content-Type: application/json' \
  --header 'x-api-key: <your-api-key>' \
  --data '{
    "urls": [
      "https://example.com"
    ],
    "maxDepth": 3,
    "includePaths": "(blog/.+|articles/.+)",
    "outputFormat": ["text", "json"],
    "webhook": {
      "url": "https://yourdomain.com/webhook",
      "events": ["scraper.job.started", "scraper.job.finished", "scraper.data.scraped"]
    }
  }'

Use Web Scraping API Params

You can use any parameters from the Web Scraping API inside a Websites Crawler job — including:

extractRules
aiExtractRules
headers
proxyType / proxyCountry
blockResources, jsScenario, outputFormat, and more

All parameters are applied to each crawled page individually.

Get Scraper Job Status

To get the status of an existing scraper job, make a GET request to the endpoint /scrapers/jobs/:jobId:

curl --location 'https://api.hasdata.com/scrapers/jobs/:jobId' \
  --header 'x-api-key: <your-api-key>'

Response

{
  "id": "dd1a8c53-2d47-4444-977d-8d653a6a3c82",
  "status": "finished",
  "creditsSpent": 200,
  "dataRowsCount": 20,
  "data": {
    "csv": "https://api.hasdata.com/scrapers/jobs/dd1a8c53-2d47-4444-977d-8d653a6a3c82/results/b6cc6733-6d0e-4e44-9e94-38688aad3884.csv",
    "json": "https://api.hasdata.com/scrapers/jobs/dd1a8c53-2d47-4444-977d-8d653a6a3c82/results/9cb592e3-6700-42ff-b58c-e7da3f478f28.json",
    "xlsx": "https://api.hasdata.com/scrapers/jobs/dd1a8c53-2d47-4444-977d-8d653a6a3c82/results/ecea853c-e0ca-4a23-ae74-eea0588e54b6.xlsx"
  },
  "input": {
    "limit": 25,
    "urls": ["https://hasdata.com", "https://example.com"],
    "maxDepth": 5,
    "includePaths": "(blog/.+|articles/.+)",
    "webhook": {
      "url": "https://example.com/webhook",
      "events": ["scraper.job.started", "scraper.job.finished", "scraper.data.scraped"]
    }
  }
}

Webhook

The webhook will notify you of events related to the scraper job. Here is an example webhook payload for the scraper.data.scraped event:

{
  "event": "scraper.data.scraped",
  "timestamp": "2025-04-11T14:30:00Z",
  "jobId": "dd1a8c53-2d47-4444-977d-8d653a6a3c82",
  "jobStatus": "in_progress",
  "data": [
    {
      "text": "Extracted text here...",
      "statusCode": 200,
      "statusText": "OK",
      "url": "https://hasdata.com/blog",
      "depth": 1,
      "title": "Blog | HasData"
    }
  ]
}

Overview

Getting Results

Job Control

Quickstart - Websites Crawler

Example Request

Use Web Scraping API Params

Get Scraper Job Status

Webhook

Overview

Getting Results

Job Control

​Example Request

​Use Web Scraping API Params

​Get Scraper Job Status

​Webhook

Example Request

Use Web Scraping API Params

Get Scraper Job Status

Webhook