Documentation Index Fetch the complete documentation index at: https://docs.hasdata.com/llms.txt
Use this file to discover all available pages before exploring further.
The Websites Crawler lets you crawl and extract content from multiple pages of a website by following internal links.
You submit one or more starting URLs and define how deep the crawler should go using maxDepth. You can also limit which paths should be followed using regex with includePaths.
This scraper job is asynchronous. You’ll receive a jobId, and results can be fetched via polling or delivered to a webhook.
Example Request
cURL
Node.js
Python
PHP
Java
C#
Ruby
Rust
Go
curl --request POST \
--url 'https://api.hasdata.com/scrapers/crawler/jobs' \
--header 'Content-Type: application/json' \
--header 'x-api-key: <your-api-key>' \
--data '{"urls":["https://example.com"],"maxDepth":3,"includePaths":"(blog/.+|articles/.+)","outputFormat":["text","json"],"webhook":{"url":"https://yourdomain.com/webhook","events":["scraper.job.started","scraper.job.finished","scraper.data.scraped"]}}'
Use Web Scraping API Params
You can use any parameters from the Web Scraping API inside a Websites Crawler job — including:
extractRules
aiExtractRules
headers
proxyType / proxyCountry
blockResources, jsScenario, outputFormat, and more
All parameters are applied to each crawled page individually.
Get Scraper Job Status
To get the status of an existing scraper job, make a GET request to the endpoint /scrapers/jobs/:jobId:
cURL
Node.js
Python
PHP
Java
C#
Ruby
Rust
Go
curl --request GET \
--url 'https://api.hasdata.com/scrapers/jobs/:jobId' \
--header 'Content-Type: application/json' \
--header 'x-api-key: <your-api-key>'
{
"id" : "dd1a8c53-2d47-4444-977d-8d653a6a3c82" ,
"status" : "finished" ,
"creditsSpent" : 200 ,
"dataRowsCount" : 20 ,
"data" : {
"csv" : "https://api.hasdata.com/scrapers/jobs/dd1a8c53-2d47-4444-977d-8d653a6a3c82/results/b6cc6733-6d0e-4e44-9e94-38688aad3884.csv" ,
"json" : "https://api.hasdata.com/scrapers/jobs/dd1a8c53-2d47-4444-977d-8d653a6a3c82/results/9cb592e3-6700-42ff-b58c-e7da3f478f28.json" ,
"xlsx" : "https://api.hasdata.com/scrapers/jobs/dd1a8c53-2d47-4444-977d-8d653a6a3c82/results/ecea853c-e0ca-4a23-ae74-eea0588e54b6.xlsx"
},
"input" : {
"limit" : 25 ,
"urls" : [ "https://hasdata.com" , "https://example.com" ],
"maxDepth" : 5 ,
"includePaths" : "(blog/.+|articles/.+)" ,
"webhook" : {
"url" : "https://example.com/webhook" ,
"events" : [ "scraper.job.started" , "scraper.job.finished" , "scraper.data.scraped" ]
}
}
}
Webhook
The webhook will notify you of events related to the scraper job. Here is an example webhook payload for the scraper.data.scraped event:
{
"event" : "scraper.data.scraped" ,
"timestamp" : "2025-04-11T14:30:00Z" ,
"jobId" : "dd1a8c53-2d47-4444-977d-8d653a6a3c82" ,
"jobStatus" : "in_progress" ,
"data" : [
{
"text" : "Extracted text here..." ,
"statusCode" : 200 ,
"statusText" : "OK" ,
"url" : "https://hasdata.com/blog" ,
"depth" : 1 ,
"title" : "Blog | HasData"
}
]
}