LLM Extraction

Use aiExtractRules to define custom rules for extracting structured data from any web page using large language models (LLMs). This is ideal when you don’t want to write manual CSS selectors and need clean, field-level data in JSON format. Each key you define represents a field you want to extract. You provide a type and (optionally) a description to help the model understand what data to look for.

Supported Types

string – plain text value
number – numeric value
boolean – true or false
list – an array of values
item – a nested object (with its own structure under output)

You can also use enum to restrict a string to a fixed set of values.

Example Request

curl --request POST \
  --url 'https://api.hasdata.com/scrape/web' \
  --header 'Content-Type: application/json' \
  --header 'x-api-key: <your-api-key>' \
  --data '{
    "url": "https://hasdata.com",
    "aiExtractRules": {
      "company": {
        "description": "company name",
        "type": "string"
      },
      "reviews": {
        "type": "list",
        "output": {
          "review": {
            "description": "review text",
            "type": "string"
          },
          "author": {
            "type": "string"
          }
        }
      },
      "clients": {
        "type": "list",
        "output": "string"
      },
      "trial": {
        "type": "item",
        "output": {
          "available": {
            "type": "boolean"
          },
          "type": {
            "type": "string",
            "enum": ["paid", "free"]
          }
        }
      },
      "yearFounded": {
        "type": "number"
      }
    }
  }'

Example Response

{
  "requestMetadata": {
    "id": "784b9b3a-8426-431c-a516-beec621183a0",
    "status": "ok"
  },
  "content": "<!DOCTYPE html><html lang=\"en\"><head>...</body></html>",
  "aiResponse": {
    "company": "HasData",
    "reviews": [
      {
        "review": "Roman from HasData went above and beyond to help us with our scraping needs...",
        "author": "Michael Bonacina"
      },
      {
        "review": "I found HasData, which is one of the best scraping services I have ever used...",
        "author": "Hussein Ali"
      }
    ],
    "clients": [
      "Stanford",
      "Salesforce",
      "Samsung",
      "Nvidia",
      "Mailchimp",
      "Harvard",
      "Copyleaks",
      "LosAngelesTimes",
      "SurveySparrow"
    ],
    "trial": {
      "available": true,
      "type": "free"
    },
    "yearFounded": null
  }
}

Notes

Descriptions are optional but highly recommended for accuracy.
list fields can output flat values ("output": "string") or objects ("output": { ... }).
Fields with no match will return null.

Get Started

Features

Supported Types

Example Request

Example Response

Notes

Get Started

Features

​Supported Types

​Example Request

​Example Response

​Notes

Supported Types

Example Request

Example Response

Notes