Quickstart
The Websites Crawler lets you crawl and extract content from multiple pages of a website by following internal links.
You submit one or more starting URLs and define how deep the crawler should go using maxDepth
. You can also limit which paths should be followed using regex with includeOnlyPaths
.
This scraper job is asynchronous. You’ll receive a jobId
, and results can be fetched via polling or delivered to a webhook.
Example Request
Use Web Scraping API Params
You can use any parameters from the Web Scraping API inside a Websites Crawler job — including:
extractRules
aiExtractRules
headers
proxyType
/proxyCountry
blockResources
,jsScenario
,outputFormat
, and more
All parameters are applied to each crawled page individually.
Get Scraper Job Status
To get the status of an existing scraper job, make a GET request to the endpoint /v1/scrapers/jobs/:jobId
:
Webhook
The webhook will notify you of events related to the scraper job. Here is an example webhook payload for the scraper.data.rows_added
event: