Use aiExtractRules
to define custom rules for extracting structured data from any web page using large language models (LLMs). This is ideal when you don’t want to write manual CSS selectors and need clean, field-level data in JSON format.
Each key you define represents a field you want to extract. You provide a type
and (optionally) a description
to help the model understand what data to look for.
Supported Types
string
– plain text value
number
– numeric value
boolean
– true or false
list
– an array of values
item
– a nested object (with its own structure under output
)
You can also use enum
to restrict a string to a fixed set of values.
Example Request
curl --request POST \
--url 'https://api.hasdata.com/scrape/web' \
--header 'Content-Type: application/json' \
--header 'x-api-key: <your-api-key>' \
--data '{
"url": "https://hasdata.com",
"aiExtractRules": {
"company": {
"description": "company name",
"type": "string"
},
"reviews": {
"type": "list",
"output": {
"review": {
"description": "review text",
"type": "string"
},
"author": {
"type": "string"
}
}
},
"clients": {
"type": "list",
"output": "string"
},
"trial": {
"type": "item",
"output": {
"available": {
"type": "boolean"
},
"type": {
"type": "string",
"enum": ["paid", "free"]
}
}
},
"yearFounded": {
"type": "number"
}
}
}'
Example Response
{
"requestMetadata": {
"id": "784b9b3a-8426-431c-a516-beec621183a0",
"status": "ok"
},
"content": "<!DOCTYPE html><html lang=\"en\"><head>...</body></html>",
"aiResponse": {
"company": "HasData",
"reviews": [
{
"review": "Roman from HasData went above and beyond to help us with our scraping needs...",
"author": "Michael Bonacina"
},
{
"review": "I found HasData, which is one of the best scraping services I have ever used...",
"author": "Hussein Ali"
}
],
"clients": [
"Stanford",
"Salesforce",
"Samsung",
"Nvidia",
"Mailchimp",
"Harvard",
"Copyleaks",
"LosAngelesTimes",
"SurveySparrow"
],
"trial": {
"available": true,
"type": "free"
},
"yearFounded": null
}
}
Notes
- Descriptions are optional but highly recommended for accuracy.
list
fields can output flat values ("output": "string"
) or objects ("output": { ... }
).
- Fields with no match will return
null
.