gottem API

Universal scraper as a service. One API for every major scraping vendor. Tiered ladder, race, hedge, force-provider. Bring-your-own-key supported. Credit-based billing.

Authenticate every request with Authorization: Bearer gtm_<your_key>. Credits decrement per request based on the vendor that handled the fetch (cost-actual × markup). Use force_provider to lock to one route for testing.

Endpoints

POST /scrape

Fetch a URL. Request body:

{
  "url":            "https://example.com",
  "mode":           "ladder",
  "force_provider": null,
  "tier_min":       0,
  "tier_max":       9,
  "budget_mc":      1000,
  "require_js":     false,
  "max_retries":    5,
  "routes":         [],
  "headers":        [],
  "geo":            null,
  "render_wait_ms": null,
  "timeout_ms":     null,
  "extra":          {},
  "hedge_delay_ms": 3000,
  "hedge_count":    1
}
  • mode: ladder (cheapest-first sequential, escalates on failure) | race (parallel across selected routes) | hedge (primary + staggered backups).
  • force_provider: set to a route id like "spider.cloud.smart" to bypass the ladder entirely. Pure single-route execution — no escalation, no fallback. Test mode.
  • tier_min / tier_max: clamp which tiers the ladder may use.
  • headers: per-request headers — honored by HTTP-flavored adapters; ignored by spider's local adapter.
  • extra: free-form fields some adapters need (e.g. captcha siteKey + captchaType).

Response on success:

{
  "url":                "https://example.com",
  "status":             200,
  "provider":           "spider",
  "route":              "spider.cloud.smart",
  "adapter":            "http_jsonl_stream",
  "tier":               7,
  "cost_milli":         100,
  "cost_dollars":       "0.0100",
  "cost_actual_units":  85,
  "cost_actual_unit":   "credits",
  "elapsed_ms":         1843,
  "attempt":            2,
  "content_bytes":      12345,
  "content":            "# Page heading\n\n..."
}
  • provider is the vendor prefix from the route id (e.g. "firecrawl", "spider", "zyte").
  • route is the canonical id.
  • adapter is which adapter type ran (http_json, chrome_cdp, browser_use, etc.).
  • cost_milli is the static expected cost from the catalog. cost_actual_units + cost_actual_unit are present when the vendor reports per-request cost (ZenRows + ScrapingBee credits, Spider Cloud credits, Oxylabs dollars). Other vendors omit those fields.

POST /probe

Walk tiers sequentially, return the first one that returns valid content. Request:

{ "url": "https://hard-to-scrape.test", "tier_min": 0, "tier_max": 9, "min_bytes": 500 }

Response includes a winner (or null if all tiers exhausted) and an attempts array with per-tier outcomes.

GET /routes

Returns the full catalog of available routes — id, tier, adapter, auth env vars, capabilities. No body. Use this to see what force_provider values are valid.

GET /stats

Returns waterfall stats — per (route, domain) success/failure counts and EMA latency. Used by the orchestrator to promote proven routes past the ladder warmup; exposed here for observability.

GET /healthz

{ "ok": true, "version": "...", "routes": N, "adapters": M }. No auth.

GET /v1/credits/balance

{ "balance": "12345.67890000", "currency": "credits" }. 1 credit = $0.0001. 10,000 credits = $1.

GET /v1/credits/ledger

Paginated. Each row: { delta, balance_after, reason, scrape_request_id, created_at }. Reasons include topup, auto_recharge, scrape, subscription_grant, refund, adjustment.

POST /v1/keys

Create a new key. Body: { "name": "prod", "tier": "starter" } (tier optional, falls back to account default).

Response: { "id": "...", "key": "gtm_...", "prefix": "gtm_abcd", "created_at": "..." }.

The key value is returned exactly once. Store it. If you lose it, revoke and create another.

GET /v1/keys

List your keys. Returns prefixes + metadata only — never the raw key.

DELETE /v1/keys/{id}

Revoke. Existing in-flight requests using that key are NOT cancelled, but new requests are rejected immediately.

POST /v1/byok/keys

Store a vendor API key. Body: { "vendor": "spider_cloud", "key": "sk-..." }. The key is AES-256-GCM encrypted at rest. When a request uses this vendor, we decrypt and pass it through INSTEAD of our pooled key, and you're billed only the flat infra fee (no markup).

GET /v1/byok/keys

List stored vendor keys. Returns { vendor, fingerprint, created_at, last_verified_at } only — raw keys never leave the database in plaintext.

POST /v1/billing/topup

Start a Stripe Checkout session for adding credits. Body: { "amount_dollars": 25 }. Response: { "checkout_url": "https://checkout.stripe.com/..." }. Redirect the user there.

POST /v1/webhooks/stripe

Internal — Stripe POSTs to this endpoint. Verifies signature and credits accounts on checkout.session.completed, payment_intent.succeeded, invoice.payment_succeeded.

Rate limits

Tiered. Every response includes X-RateLimit-Remaining headers:

Tier per minute per hour concurrent
free 10 100 2
starter 60 2,000 10
growth 300 20,000 50
scale 1,500 200,000 200

429 Too Many Requests includes a Retry-After header.

Error shape

{ "error": "human-readable message", "code": "MACHINE_READABLE_CODE" }

Codes: INVALID_URL, BAD_FORCE_PROVIDER, INSUFFICIENT_CREDITS, RATE_LIMIT_EXCEEDED, AUTH_REQUIRED, KEY_REVOKED, VENDOR_AUTH_MISSING, VENDOR_ERROR, EXHAUSTED.