Use Batch Scraping to submit up to 10,000 URLs in a single API call. This is useful when you need to extract the same type of data from a large number of pages — for example, scraping product pages or company profiles at scale.

Unlike the standard Web Scraping API which accepts a single url, Batch Scrape works by sending an array of URLs under the urls field. All URLs in the batch will be processed using the same parameters.

When to Use Batch Scraping

  • Extracting titles, authors, and publish dates from a list of blog or news article URLs
  • Running aiExtractRules across a set of company websites to collect structured data like founding year, services, and contact info
  • Gathering legal notices or disclaimers from the footer pages of 5,000+ policy URLs

Submit a Batch Scrape Job

curl --request POST \
  --url 'https://api.hasdata.com/scrape/batch/web/' \
  --header 'Content-Type: application/json' \
  --header 'x-api-key: <your-api-key>' \
  --data '{
    "urls": [
      "https://example.com/page1",
      "https://example.com/page2",
      "https://example.com/page3"
    ],
    "outputFormat": ["text", "html"],
    "aiExtractRules": {
      "company": { "type": "string" },
      "email": { "type": "string" },
      "yearFounded": { "type": "number" },
      "isHiring": { "type": "boolean" }
    }
  }'

Response

{
  "jobId": "9a35f32e-4f9c-4d49-9c6e-7c4de4a091e0",
  "status": "ok"
}

This means the batch job was accepted and is being processed asynchronously.

Get Job Status & Results

To check the status of your batch job:

curl --request GET \
  --url 'https://api.hasdata.com/scrape/batch/web/9a35f32e-4f9c-4d49-9c6e-7c4de4a091e0' \
  --header 'x-api-key: <your-api-key>'

To retrieve results once ready (supports pagination):

curl --request GET \
  --url 'https://api.hasdata.com/scrape/batch/web/9a35f32e-4f9c-4d49-9c6e-7c4de4a091e0/results?page=1&limit=100' \
  --header 'x-api-key: <your-api-key>'

Example Result

{
  "page": 1,
  "limit": 100,
  "total": 3,
  "results": [
    {
      "url": "https://example.com/page1",
      "result": {
        "content": "<html>...</html>",
        "text": "Extracted text here...",
        "aiResponse": {
          "company": "HasData",
          "email": "roman@hasdata.com",
          "yearFounded": 2022,
          "isHiring": true
        }
      }
    },
    {
      "url": "https://example.com/page2",
      "error": "Url unreachable"
    }
  ]
}

Each result matches the format of a regular Web Scraping API response, except it’s returned as part of an array.

Notes

  • Maximum batch size: 10,000 URLs
  • All URLs are processed using the same parameters
  • Failed URLs do not consume credits
  • All outputs are returned as an array of Web Scraping API-style results