Overview
Job Parameters
These parameters control how the crawler behaves and which pages it visits.
List of starting URLs. The crawler will begin from these.
Maximum number of pages to scrape. 0
means no limit.
How many link levels to follow from each starting URL. 1
means only direct links. 2
means follow links from those pages, and so on.
Only follow URLs that match this regex.
Example: (blog/.+|about/.+)
matches /blog/post-1
, /about/company
, etc.
Skip URLs that match this regex.
Example: (admin/.+|private/.+)
skips /admin/login
, /private/settings
, etc.
You can also use any Web Scraping API Params.
They apply to every page the crawler visits — including outputFormat
, extractRules
, headers
, proxyType
, and others.