Use aiExtractRules to define custom rules for extracting structured data from any web page using large language models (LLMs). This is ideal when you don’t want to write manual CSS selectors and need clean, field-level data in JSON format.

Each key you define represents a field you want to extract. You provide a type and (optionally) a description to help the model understand what data to look for.

Supported Types

  • string – plain text value
  • number – numeric value
  • boolean – true or false
  • list – an array of values
  • item – a nested object (with its own structure under output)

You can also use enum to restrict a string to a fixed set of values.

Example Request

curl --request POST \
  --url 'https://api.hasdata.com/scrape/web' \
  --header 'Content-Type: application/json' \
  --header 'x-api-key: <your-api-key>' \
  --data '{
    "url": "https://hasdata.com",
    "aiExtractRules": {
      "company": {
        "description": "company name",
        "type": "string"
      },
      "reviews": {
        "type": "list",
        "output": {
          "review": {
            "description": "review text",
            "type": "string"
          },
          "author": {
            "type": "string"
          }
        }
      },
      "clients": {
        "type": "list",
        "output": "string"
      },
      "trial": {
        "type": "item",
        "output": {
          "available": {
            "type": "boolean"
          },
          "type": {
            "type": "string",
            "enum": ["paid", "free"]
          }
        }
      },
      "yearFounded": {
        "type": "number"
      }
    }
  }'

Example Response

{
  "requestMetadata": {
    "id": "784b9b3a-8426-431c-a516-beec621183a0",
    "status": "ok"
  },
  "content": "<!DOCTYPE html><html lang=\"en\"><head>...</body></html>",
  "aiResponse": {
    "company": "HasData",
    "reviews": [
      {
        "review": "Roman from HasData went above and beyond to help us with our scraping needs...",
        "author": "Michael Bonacina"
      },
      {
        "review": "I found HasData, which is one of the best scraping services I have ever used...",
        "author": "Hussein Ali"
      }
    ],
    "clients": [
      "Stanford",
      "Salesforce",
      "Samsung",
      "Nvidia",
      "Mailchimp",
      "Harvard",
      "Copyleaks",
      "LosAngelesTimes",
      "SurveySparrow"
    ],
    "trial": {
      "available": true,
      "type": "free"
    },
    "yearFounded": null
  }
}

Notes

  • Descriptions are optional but highly recommended for accuracy.
  • list fields can output flat values ("output": "string") or objects ("output": { ... }).
  • Fields with no match will return null.