Convert PDF to JSON at scale with Mindee's API
Transform all your PDF files to JSON. OCR Mindee’s API is perfect for document analysis, document understanding and processing.
Registration required
Create a free account to process your PDF.
Visual example before/after PDF processing
.webp)
.webp)
.webp)
Advanced features you will benefit
Object detection results
Automatically locate and identify specific elements, signatures, pictures, text fields, and bounding boxes within your documents. Eliminate manual data entry with high‑precision, AI-driven data capture.
AI OCR image analysis
Leverage advanced deep learning models to instantly "read" and convert unstructured images and scanned documents into highly accurate, structured, and actionable JSON data.
Batch processing support
Effortlessly scale your operations by processing large volumes of documents simultaneously. Utilize our asynchronous API endpoints to handle massive multi-page files or bulk uploads with maximum efficiency.
Custom schema mapping
Tailor the extraction process to your unique business needs. Easily define your own data models (using Mindee's docBuilder) to extract only the specific fields, line items, and values that your custom workflow requires.
Developer-friendly API
Achieve immediate time-to-value with seamless integration. Build faster using our comprehensive documentation, robust SDKs in major programming languages, and ready‑to‑use no‑code connectors.
4.8/5 sur G2
(+30 reviews)
4.9/5 sur Capterra
(+10 reviews)
.webp)
how PDF to json conversion works ?
Effortlessly extract structured data from PDF Document
Mindee’s developer-first API is engineered to seamlessly convert complex PDF architectures—whether they are "born-digital" files with embedded text layers or heavy, image-based scanned PDFs—directly into a clean, structured JSON payload. When a multi-page PDF is submitted via API or SDK, our engine instantly determines the optimal extraction path.
For scanned documents, it rasterizes each page into high-quality images before applying advanced preprocessing and optical character recognition (OCR). The AI then semantically analyzes the entire document context across multiple pages. In milliseconds, it returns a comprehensive JSON response containing precise key-value pairs, page-specific bounding box coordinates, and spatial awareness, making it incredibly simple to integrate complex, multi-page data into your databases, ERPs, and automated workflows.
Key considerations about PDF to JSON conversion
When converting PDFs to JSON, developers must account for the unique structural challenges of the Portable Document Format. A primary consideration is handling the variability between native vector PDFs and lower-quality rasterized scans, which can impact processing speed and extraction logic. You must also manage multi-page pagination, embedded fonts, complex tables, and file size limitations. While Mindee's robust deep learning models effortlessly parse these mixed-content layouts, ensuring enterprise-grade security remains paramount, as PDFs frequently encapsulate dense financial reports or personally identifiable information (PII).
Mindee guarantees full GDPR and SOC 2 compliance for all API calls. By utilizing the granular, field-level confidence scores embedded in our JSON output, you can seamlessly build smart routing rules to flag ambiguous pages or complex document structures for manual human review.
Manage multi-page complexity
Don't forget enterprise security standards

FAQ to know more about Mindee's API
What file formats are supported for JSON conversion ?
The supported document formats are:
- PDF (
application/pdf) – While technically a document format rather than an image, it is natively supported for all extraction models. - JPEG / JPG (
image/jpeg) - PNG (
image/png) – Must be non-animated. - WebP (
image/webp) - TIFF / TIF (
image/tiff) – Both single-page and multi-page TIFFs are supported and processed similarly to PDFs. - HEIC (
image/heic) – Apple's High-Efficiency Image Container format.
📌 Important technical limitations to keep in mind:
To ensure the OCR pipeline runs smoothly and successfully returns your JSON, make sure your files adhere to the following constraints:
- Maximum file size: 100 MB per file.
- Maximum page count: Up to 200 pages per document.
- File state: The files cannot be encrypted, and PDFs must not be password-protected.
Is it possible to batch convert PDFs ?
Yes, it is absolutely possible to batch process and convert PDFs with Mindee. The best approach depends on how your "batch" of images or documents is organized.
Here is how you can handle batch processing based on Mindee's architecture:
1. Multiple separate image files (e.g., a folder of JPGs or PNGs)
By default, Mindee's standard API endpoint processes one file per HTTP request. To batch convert a large folder of separate documents, you simply handle the orchestration on your client side:
- Concurrent API Calls: You can write a script (using Python, Node.js, etc.) to loop through your files and send multiple API requests concurrently.
- Asynchronous Endpoints: For enterprise-scale volumes, Mindee provides an asynchronous API (
/predict_async). Instead of holding the connection open, you push your batch of PDFs into a processing queue and then use webhooks (or polling) to retrieve the structured JSON data as each file finishes processing.
2. Multiple documents clustered in a single file
If your "batch" is actually a single large file containing multiple different documents (like a 50-page PDF of mixed mail, or a single photograph containing four different receipts laid out on a table), Mindee has native AI tools specifically built for this:
- Auto-Crop: If you upload a single document that contains several distinct items, Mindee's Auto-crop feature can automatically detect, isolate, and crop each item into clean, individual file ready for data extraction.
- Auto-Split: If you upload a large multi-page batch scan, the API's intelligent boundary detection detects where each individual document begins and ends, automatically slicing the massive file into discrete, logical records.
- Auto-Classify: Once the batch is separated, the routing engine acts as a digital architect to instantly categorize each document by type (e.g., separating invoices from contracts) and sends them to the correct extraction pipeline.
By combining your own asynchronous loops for individual files with Mindee's built-in Crop, Split, and Classify features for grouped files, you can build a highly efficient, automated batch-processing pipeline!
Can I try Mindee before subscribing ?
Yes, you can absolutely try the solution before subscribing. Mindee offers a 14-day free trial, and no credit card is required to sign up.
This free trial allows you to fully test the platform and includes:
- Processing for up to 200 pages.
- The ability to make API calls so you can test the technical integration directly with your own stack.
- Access to all model types (receipts, invoices, contracts, IDs, etc.) and the full documentation.
- Access to optional features so you can test specific use cases.
Once the 14-day trial period ends, or when you reach the 200-page limit, you will simply need to choose one of their subscription plans (Starter, Pro, Business, or Enterprise) to continue using the service and match the plan to your expected processing volume.
Is my data secured and protected during the conversion process ?
Yes, your data is highly secured and protected during the entire conversion process. As an enterprise-grade OCR solution, Mindee takes data privacy very seriously. Here are the key security measures in place:
- Certified compliance: Mindee's infrastructure is SOC 2 Type II certified and fully GDPR compliant, ensuring strict industry standards for data protection.
- Encryption: All documents and extracted data are fully encrypted both in transit (during the API call) and at rest.
- Data localization: You have control over where your data is processed, with the option to host and process your data on either EU or US servers.
- Strict privacy: Mindee respects data confidentiality. Your processed documents are not shared with third parties, and your private data is not used to train global AI models.
In short, your sensitive documents are processed in a secure, isolated environment and remain completely under your control.
How do I guarantee valid JSON structured format ?
Getting JSON is step one; getting valid JSON is step two. Most modern APIs, like Mindee allows you to define a data schema. To ensure your database doesn't crash:
- Use Pydantic (Python) or Zod (TypeScript) to validate the API output.
- If the extraction doesn't meet the schema (e.g., a missing mandatory invoice_id), flag it for human review.
Can I integrate Mindee API with a business tool (ERP, CRM, etc.)?
Mindee’s API is RESTful and returns data in JSON format. XML is not returned.
To connect Mindee to your business tool you can use Mindee’s REST + JSON API via an ERP/CRM connector, HTTP steps, or webhooks. Most ERP/CRM tools integrate either through HTTP actions/nodes (low-code) or an automation platform.
If your ERP/CRM supports inbound webhooks, you can also use Mindee webhooks to receive results on your server endpoint (recommended for heavy production usage).
.webp)
