Job Parameters

These parameters control how the crawler behaves and which pages it visits.

urls

array of strings

required

List of starting URLs. The crawler will begin from these.

limit

integer

default:"0"

Maximum number of pages to scrape. 0 means no limit.

maxDepth

integer

How many link levels to follow from each starting URL. 1 means only direct links. 2 means follow links from those pages, and so on.

includePaths

string

Only follow URLs that match this regex.Example: (blog/.+|about/.+) matches /blog/post-1, /about/company, etc.

excludePaths

string

Skip URLs that match this regex.Example: (admin/.+|private/.+) skips /admin/login, /private/settings, etc.

You can also use any Web Scraping API Params. They apply to every page the crawler visits — including outputFormat, extractRules, headers, proxyType, and others.

Overview

Getting Results

Job Control