https://api.hasdata.com/scrape/web
with a JSON body using the fields below.
Basic Configuration
The URL of the page to scrape. Must be a valid absolute URI (e.g.
https://example.com
).Proxy Settings
Type of proxy to use. Options:
datacenter
, residential
. Required if you’re targeting geo-restricted or bot-protected content.ISO 3166-1 alpha-2 country code for proxy location (e.g.
US
, DE
, IN
).Data Extraction
CSS selectors for field-level extraction. Example:
{ "title": "h1", "link": "a @href" }
.Structured AI rules for LLM-based extraction. Supports types: To learn more, see LLM Extraction.
string
, number
, boolean
, list
, item
.Example:Capture a screenshot of the page.
Extract all email addresses found in the page content.
Extract all hyperlinks (
<a href="...">
) from the page.Timing
Delay (in milliseconds) after page load before scraping. Max: 30000.
CSS selector to wait for before scraping begins.Example:
.product-listing
Resource Control
Block loading of images and stylesheets.
Block common ad scripts and tracking pixels.
Block any network requests containing these substrings or domains.Example:
["googleanalytics", "doubleclick"]
JavaScript Options
Enable JavaScript rendering (required for SPAs or dynamic content).
List of JavaScript actions to run on the page (click, scroll, wait, evaluate, etc.).Example:To learn more, see Page Interactions.
Advanced Settings
Custom headers to include in the request. Example:
{ "User-Agent": "custom-agent" }
.To learn more, see Custom Headers and Cookies.Response format(s). Options:
html
, text
, markdown
, json
. Multiple formats allowed.