gottem API
Universal scraper as a service. One API for every major scraping vendor. Tiered ladder, race, hedge, force-provider. Bring-your-own-key supported. Credit-based billing.
Authenticate every request with Authorization: Bearer gtm_<your_key>. Credits decrement per request based on the vendor that handled the fetch (cost-actual × markup). Use force_provider to lock to one route for testing.
Endpoints
- POST /scrape: Fetch a URL. Returns content + the provider/route used + cost.
- POST /probe: Walk tiers in order, report which one returns valid content for this URL.
- GET /routes: List the route catalog (id, tier, cost, adapter, auth env).
- GET /stats: Waterfall stats — per-(route, domain) success rate + EMA latency.
- GET /healthz: Server health + version.
- GET /v1/credits/balance: Current credit balance.
- GET /v1/credits/ledger: Recent credit movements (paginated).
- POST /v1/keys: Create a new API key. The raw key is returned ONCE.
- GET /v1/keys: List your API keys.
- DELETE /v1/keys/{id}: Revoke an API key.
- POST /v1/byok/keys: Store a vendor API key for use in BYOK requests.
- GET /v1/byok/keys: List your stored vendor keys (fingerprints, no raw values).
- POST /v1/billing/topup: Start a Stripe Checkout session to add credits.
- POST /v1/webhooks/stripe: Stripe webhook receiver (internal).
POST /scrape
Fetch a URL. Request body:
{
"url": "https://example.com",
"mode": "ladder",
"force_provider": null,
"tier_min": 0,
"tier_max": 9,
"budget_mc": 1000,
"require_js": false,
"max_retries": 5,
"routes": [],
"headers": [],
"geo": null,
"render_wait_ms": null,
"timeout_ms": null,
"extra": {},
"hedge_delay_ms": 3000,
"hedge_count": 1
}
mode:ladder(cheapest-first sequential, escalates on failure) |race(parallel across selected routes) |hedge(primary + staggered backups).force_provider: set to a route id like"spider.cloud.smart"to bypass the ladder entirely. Pure single-route execution — no escalation, no fallback. Test mode.tier_min/tier_max: clamp which tiers the ladder may use.headers: per-request headers — honored by HTTP-flavored adapters; ignored by spider's local adapter.extra: free-form fields some adapters need (e.g. captchasiteKey+captchaType).
Response on success:
{
"url": "https://example.com",
"status": 200,
"provider": "spider",
"route": "spider.cloud.smart",
"adapter": "http_jsonl_stream",
"tier": 7,
"cost_milli": 100,
"cost_dollars": "0.0100",
"cost_actual_units": 85,
"cost_actual_unit": "credits",
"elapsed_ms": 1843,
"attempt": 2,
"content_bytes": 12345,
"content": "# Page heading\n\n..."
}
provideris the vendor prefix from the route id (e.g."firecrawl","spider","zyte").routeis the canonical id.adapteris which adapter type ran (http_json,chrome_cdp,browser_use, etc.).cost_milliis the static expected cost from the catalog.cost_actual_units+cost_actual_unitare present when the vendor reports per-request cost (ZenRows + ScrapingBee credits, Spider Cloud credits, Oxylabs dollars). Other vendors omit those fields.
POST /probe
Walk tiers sequentially, return the first one that returns valid content. Request:
{ "url": "https://hard-to-scrape.test", "tier_min": 0, "tier_max": 9, "min_bytes": 500 }
Response includes a winner (or null if all tiers exhausted) and an attempts array with per-tier outcomes.
GET /routes
Returns the full catalog of available routes — id, tier, adapter, auth env vars, capabilities. No body. Use this to see what force_provider values are valid.
GET /stats
Returns waterfall stats — per (route, domain) success/failure counts and EMA latency. Used by the orchestrator to promote proven routes past the ladder warmup; exposed here for observability.
GET /healthz
{ "ok": true, "version": "...", "routes": N, "adapters": M }. No auth.
GET /v1/credits/balance
{ "balance": "12345.67890000", "currency": "credits" }.
1 credit = $0.0001. 10,000 credits = $1.
GET /v1/credits/ledger
Paginated. Each row: { delta, balance_after, reason, scrape_request_id, created_at }. Reasons include topup, auto_recharge, scrape, subscription_grant, refund, adjustment.
POST /v1/keys
Create a new key. Body: { "name": "prod", "tier": "starter" } (tier optional, falls back to account default).
Response: { "id": "...", "key": "gtm_...", "prefix": "gtm_abcd", "created_at": "..." }.
The key value is returned exactly once. Store it. If you lose it, revoke and create another.
GET /v1/keys
List your keys. Returns prefixes + metadata only — never the raw key.
DELETE /v1/keys/{id}
Revoke. Existing in-flight requests using that key are NOT cancelled, but new requests are rejected immediately.
POST /v1/byok/keys
Store a vendor API key. Body: { "vendor": "spider_cloud", "key": "sk-..." }.
The key is AES-256-GCM encrypted at rest. When a request uses this vendor, we decrypt and pass it through INSTEAD of our pooled key, and you're billed only the flat infra fee (no markup).
GET /v1/byok/keys
List stored vendor keys. Returns { vendor, fingerprint, created_at, last_verified_at } only — raw keys never leave the database in plaintext.
POST /v1/billing/topup
Start a Stripe Checkout session for adding credits. Body: { "amount_dollars": 25 }.
Response: { "checkout_url": "https://checkout.stripe.com/..." }. Redirect the user there.
POST /v1/webhooks/stripe
Internal — Stripe POSTs to this endpoint. Verifies signature and credits accounts on checkout.session.completed, payment_intent.succeeded, invoice.payment_succeeded.
Rate limits
Tiered. Every response includes X-RateLimit-Remaining headers:
| Tier | per minute | per hour | concurrent |
|---|---|---|---|
| free | 10 | 100 | 2 |
| starter | 60 | 2,000 | 10 |
| growth | 300 | 20,000 | 50 |
| scale | 1,500 | 200,000 | 200 |
429 Too Many Requests includes a Retry-After header.
Error shape
{ "error": "human-readable message", "code": "MACHINE_READABLE_CODE" }
Codes: INVALID_URL, BAD_FORCE_PROVIDER, INSUFFICIENT_CREDITS, RATE_LIMIT_EXCEEDED, AUTH_REQUIRED, KEY_REVOKED, VENDOR_AUTH_MISSING, VENDOR_ERROR, EXHAUSTED.