extractor.arkintel.com

Files in.Structured data out.

Send any file and a JSON schema. We map the file into the shape you asked for — even if the file and the schema look nothing alike. No chunking, no prompt engineering, no post-processing.

// we handle OCR, vision and metadata fusion, and schema validation. you write the schema. you send the files.

01SEND02EXTRACT{ }schema.jsonYOURSPNGDOCXPDFINVOICE.PDFPOST/v1/extract// six lenses on the same file01METADATAtraditional02OCRtraditional03LAYOUTtraditional04VISIONai05REASONINGaiARBITRATE06CONSENSUSfusionARKINTEL · EXTRACT ENGINEv1ai + traditional · arbitrated · one round tripmatches your schema{invoice_no:"INV-2026-0042"issue_date:"2026-04-21"total:1400.00currency:"EUR"}VALIDATED · TYPED · YOUR SHAPE

what you can send

documents
  • pdf
  • docx
  • doc
  • pptx
  • xlsx
  • csv
  • html
  • txt
  • md
images
  • jpg
  • png
  • heic
  • webp
  • tiff
  • bmp
  • gif
mail
  • eml
  • msg

// no audio, no video — yet

01playground

Try it now.

Pick a preset schema, drop in a sample file, and inspect the JSON we hand back. The playground uses the same extraction backend as production.

tap to activate

// presets for now. with API access you'd send your own schema in the request body.

sandboxed · no signup · nothing stored

02the engine

Six lenses on the file. One JSON in your shape.

Most extractors pick a horse — pure OCR, pure vision, or one giant LLM call — and lose what the others would have caught. We run six lanes against the same file in a single round trip, weigh them against each other, and only commit a value once they agree. The answer comes back in your schema, with your field names.

// no single method gets it right. we run six and let them argue.

// the three colours in the stack

  • traditional
  • ai
  • fusion
POST/v1/extract
multipart/form-data

// your schema

    vendorstringinvoice_nostringissued_atdatedue_atdatetotal_eurnumber

// your file

atlas_invoice.pdf

scanned · OCR

// six lenses on the same file

  • metadatatraditionalembedded text, EXIF, dates
  • ocrtraditionalprinted + handwritten characters
  • layouttraditionalcolumns, table cells, key/value
  • visionailogos, signatures, charts
  • reasoningaischema-aware llm pass
  • consensusfusioncross-checks every field

// merged answer — your shape

200 OK
{  "vendor":     "Atlas Logistics GmbH",  "invoice_no": "INV-2026-0418",  "issued_at":  "2026-04-12",  "due_at":     "2026-05-12",  "total_eur":  4280.50}

03security

Built for the files you can’t afford to leak.

Extract runs as a managed EU cloud service with zero retention. Files and responses are deleted the moment we hand back your JSON, traffic is encrypted end-to-end, and your data never trains a model.

// same engine also runs inside your own network.

arkintel cloud

Managed EU cloud

Hit /v1/extract. We handle everything else.

  • zero retention — files and responses deleted after we hand back your JSON
  • encrypted in transit (TLS) and while processing
  • your data is never used to train models
  • EU-hosted API and storage
  • LLM steps may use vetted third-party hosted model providers
  • per-tenant audit log on request

Want the longer argument? Read the sovereignty brief

04the wire

One endpoint. Boring on purpose.

Multipart POST. The schema is a JSON form field, the files are file fields, the response is your data — validated, in your shape, in the same round trip. No clever protocol to learn.

// what you don’t have to do

  • install an SDK
  • wire up webhooks
  • speak a streaming protocol
  • presign upload URLs
  • track temporary file IDs
  • poll, batch, or coordinate jobs

// the whole integration fits on a postcard.

extract.sh
curl -X POST https://api.arkintel.com/v1/extract \  -H "Authorization: Bearer $ARKINTEL_API_KEY" \  -F 'schema=@invoice.schema.json;type=application/json' \  -F "files=@invoice.pdf"

05 — ship it

Ship your own schema in production.

The schemas in this playground are examples for testing. With production access you pass your own schema in every request — no whitelisting, no waiting on us. Drop us a line and we’ll get you a key.

// typical reply within one business day.

// contact

reading inbox

Email us — humans, not a ticket queue.