Send any file and a JSON schema. We map the file into the shape you asked for — even if the file and the schema look nothing alike. No chunking, no prompt engineering, no post-processing.
// we handle OCR, vision and metadata fusion, and schema validation. you write the schema. you send the files.
what you can send
- docx
- doc
- pptx
- xlsx
- csv
- html
- txt
- md
- jpg
- png
- heic
- webp
- tiff
- bmp
- gif
- eml
- msg
// no audio, no video — yet
what you can send
- docx
- doc
- pptx
- xlsx
- csv
- html
- txt
- md
- jpg
- png
- heic
- webp
- tiff
- bmp
- gif
- eml
- msg
// no audio, no video — yet
01playground
Try it now.
Pick a preset schema, drop in a sample file, and inspect the JSON we hand back. The playground uses the same extraction backend as production.
// presets for now. with API access you'd send your own schema in the request body.
sandboxed · no signup · nothing stored
02the engine
Six lenses on the file. One JSON in your shape.
Most extractors pick a horse — pure OCR, pure vision, or one giant LLM call — and lose what the others would have caught. We run six lanes against the same file in a single round trip, weigh them against each other, and only commit a value once they agree. The answer comes back in your schema, with your field names.
// no single method gets it right. we run six and let them argue.
// the three colours in the stack
- traditional
- ai
- fusion
// your schema
- vendorstringinvoice_nostringissued_atdatedue_atdatetotal_eurnumber
// your file
atlas_invoice.pdf
scanned · OCR
// six lenses on the same file
- metadatatraditionalembedded text, EXIF, dates
- ocrtraditionalprinted + handwritten characters
- layouttraditionalcolumns, table cells, key/value
- visionailogos, signatures, charts
- reasoningaischema-aware llm pass
- consensusfusioncross-checks every field
// merged answer — your shape
{ "vendor": "Atlas Logistics GmbH", "invoice_no": "INV-2026-0418", "issued_at": "2026-04-12", "due_at": "2026-05-12", "total_eur": 4280.50}03security
Built for the files you can’t afford to leak.
Extract runs as a managed EU cloud service with zero retention. Files and responses are deleted the moment we hand back your JSON, traffic is encrypted end-to-end, and your data never trains a model.
// same engine also runs inside your own network.
arkintel cloud
Managed EU cloud
Hit /v1/extract. We handle everything else.
- zero retention — files and responses deleted after we hand back your JSON
- encrypted in transit (TLS) and while processing
- your data is never used to train models
- EU-hosted API and storage
- LLM steps may use vetted third-party hosted model providers
- per-tenant audit log on request
Want the longer argument? Read the sovereignty brief
04the wire
One endpoint. Boring on purpose.
Multipart POST. The schema is a JSON form field, the files are file fields, the response is your data — validated, in your shape, in the same round trip. No clever protocol to learn.
// what you don’t have to do
- install an SDK
- wire up webhooks
- speak a streaming protocol
- presign upload URLs
- track temporary file IDs
- poll, batch, or coordinate jobs
// the whole integration fits on a postcard.
curl -X POST https://api.arkintel.com/v1/extract \ -H "Authorization: Bearer $ARKINTEL_API_KEY" \ -F 'schema=@invoice.schema.json;type=application/json' \ -F "files=@invoice.pdf"05 — ship it
Ship your own schema in production.
The schemas in this playground are examples for testing. With production access you pass your own schema in every request — no whitelisting, no waiting on us. Drop us a line and we’ll get you a key.
// typical reply within one business day.