Auto-splitting document API, for multi-page file processing

Seamless batch segmentation for streamlined workflows and high‑precision data capture

Enhance processing speed through intelligent boundary detection that isolates multi-page files into discrete records

Try it for free

4.8/5  (30+ reviews)

Trusted by top-tier teams worldwide

v2-Carlabella
v2-Spendesk
v2 Payfit
v2 Lucca
v2 Circula
v2-Carlabella
v2-Spendesk
v2 Payfit
v2 Lucca
v2 Circula

Without

Auto-split

Generalist LLMs struggle with structural speed, requiring heavy resource overhead to deliver accuratesplitting results

Prone to missing boundaries in long files

You pay for thousands of tokens just to find a "page break"

Difficult to admit when it's "unsure" of a type

Trained on the open web; lacks deep "document-type" nuance

With

Auto-split

+95%

Document-level split accuracy

Handles 1-page receipts and 50-page contracts.

Scans and slices a 100-page batch in milliseconds

Built-in metrics to trigger human review only when needed

Trained on millions of real-world business documents

Implement “Split” into your document workflow, in seconds

Available for every plan

From Mindee’s platform, create a new pre‑processing model by clicking on “Split” utility

You will find it at the bottom of the user interface. If you are more familiar with this type of pre-processing model, you can directly use it by checking the Documentation for more details.

User interface showing document templates including Invoice, Receipt, Resume, Financial Document, and International ID, with document utilities below: Crop, Split (highlighted by cursor), and OCR.
User interface for adding document classes with fields labeled Invoice, Receipt, Driving License, and Other, plus buttons to cancel or create utility.

Custom to your needs

Enter the document categories that correspond to your needs

Before final pre-processing, you need to define appropriate categories. Be sure to manually add an “Undefined” category. If a file doesn’t match to your document main categories, it will be available in the “Undefined” one.

PDF, HEIC, PNG, JPEG... MUltiple formats

Upload your documents without friction : universal PDF and image support

Accelerate ingestion with native support for PDFs and all image formats. From high-res scans to mobile captures, Mindee API handles any input, ensuring your data is always ready for extraction.

User interface displaying document classification results with thumbnails on the left and a JSON response on the right detailing page ranges and document types like receipts, driving licenses, and invoices.

full document processing stack

Find all your files categorized in standard JSON format, ready for extraction based on categories

Pre-processing via auto-split can then be combined with other Mindee’s API features to further improve the granularity  or directly extract data based on each category classification.

Use auto-split and more to optimize your document workflow

1

Capture

2

Pre-processing

3

Data extraction

4

Enrichment

5

Validation

Top view of a coffee cup, pen, manila folder with envelopes and sticky notes, and IRS tax forms on a dark surface.

Smart capture image from poor quality phone pictures, handwritten notes to native PDFs

Bridge the gap between noisy inputs and structured data. Mindee API cleans low-quality phone captures, analyse handwriting, and isolate multi-documents on a single page/picture.

Older man with gray hair and beard reviewing a large stack of papers at a desk under red text that reads 'X TIME-CONSUMING'.

AI-powered classification that identifies document "DNA" (Invoices vs. Contracts) and automates batch splitting

Manual document sorting is a bottleneck of the past. Our routing engine acts as a digital architect, instantly classifying documents and directing them to the correct business logic.

User interface showing extracted fields from a supplier document including supplier logo, name Joanna Binet, line items with quantity 2 and unit price 400, and SWIFT code 1293290221079 with confidence levels.

Extract data from any layout with outstanding accuracy : complex tables, key-value pairs, and handwritten annotations supported

Move beyond simple character recognition. Our extraction layer leverages Neural Networks to understand your data contextually, turning static unstructured files into dynamic, structured assets in standard JSON format.

Logos of software platforms Sage, Salesforce, Odoo, Oracle, Sellsy, HubSpot, SAP, and Microsoft Dynamics 365 above two labeled blocks 'SDKs' and 'NO-CODE' with arrows pointing to 'mindee' logo at the bottom.

Real-time synchronization with ERP/CRM master data and automated third-party API validation (VAT, Compliance)

Data in a vacuum has limited utility. The "Enrich" phase bridges the gap between a document and your entire enterprise ecosystem (ERP, CRM, PLM) thanks to integrations.

Flowchart showing payment validation steps: if certainty is certain or high, validate payment; if medium, trigger human review.

Automated business rule validation and high-efficiency Human-in-the-Loop workflows for edge-case validation.

Go beyond simple extraction. Build resilient document pipelines that automatically verify data against your custom business rules. Our API manages the friction between automated confidence scores and human edge-case validation, ensuring your production data is always clean, compliant, and actionable.

Puzzle pieces displaying programming language logos including Ruby, Node.js, Python, Java, and PHP, with text below reading 'Also available on' followed by logos for Zapier, Make, and n8n.

Integrate Mindee into your workflow in minutes with SDKs & no-code tools

Go live in minutes using our verified Zapier & Make.comapp with zero coding, or integrate seamlessly via our well-documented REST API built for developers. SDKs available for Python, Node.JS, Java, Ruby, PHP.

Integrations details

security soc2 and gdpr

Enterprise-grade security

Our API has a SOC 2 Type II certified infrastructure and is GDPR Compliant to ensure your file information remains protected at all times.

EU or US hosting available

GDPR, CCPA Compliant

Learn more

Developers and technical profiles already used it !

Add modern AI-based Mindee OCR API to your product, in minutes.

Mindee is an integrated document processing platform backed by reliable AI technology. The service has an intuitive and user-friendly interface and provides highly accurate results extracting data from various document types, especially financial receipts and invoices, which are relatively complex and require specialized optical character recognition (OCR) services. The platform provides seamless integration with our current data processing workflows through customizable APIs, allowing for efficient data extraction and automation.

quote

on G2

Mindee is a software that helps us to convert all of our physical business data like bills, invoices, warranty cards, calendar, recipts received to us into a digital documents that can be stored in our drive and can be uploaded in different type of Excel sheets so that all the updates can be maintained and a proper analytics of transactions can be kept by the financial team

quote

on G2

Mindee is a web based tool that help us in scanning and reading different type of documents like identity cards, invoices, proposal plans etc and extract all the information with its AI and then it provides all the information and data associated with these documents a structured way.

quote

on G2

Excellent. In addition to their great product, the sales team has always been proactive on how they could help us leverage the maximum results from their product. It was like having an additional product manager on our side

quote

on Capterra

Mindee works reliably and delivers good performance. The OCR data is accurate, and the API is stable. It works like a charm.

quote

on Capterra

Mindee is a web based tool that help us in scanning and reading different type of documents like identity cards, invoices, proposal plans etc and extract all the information with its AI and then it provides all the information and data associated with these documents a structured way.

quote

on Capterra

+15M documents processed monthly
Start to auto-split files, extract data

Already +500 active users

14-day free trial

No credit card

Screenshot of a software interface showing extracted fields from an invoice including supplier phone number, customer company registration, JSON data, and highlighted text boxes for employee ID and pay date.

FAQ to know more about Mindee's API

What is automated document splitting ?

Automated document splitting is a pre-processing technology that analyzes multi-page file uploads (like a 50‑page PDF) and automatically breaks them down into separate, logical documents. Instead of a human manually reviewing a file to see where one invoice ends and another begins, the AI detects document boundaries—based on layout changes, page numbering, or content shifts—to split the batch into distinct, standalone records ready for data extraction.

What are examples of automated document splitting ?

In the real-world operational landscape, auto-splitting is a game-changer for accounts payable departments that frequently receive bulk PDF attachments containing dozens of different invoices and credit notes from a single vendor that must be processed as individual records.

It is equally essential for two-way matching and reconciliation workflows, where a single scan might bundle a purchase order (PO) with its corresponding delivery note; the API identifies the boundary between these two distinct records so they can be cross-referenced automatically for audit purposes.

For customer onboarding, this technology allows a new client to upload a single "onboarding packet" containing their ID, a utility bill, and a signed contract, which the system then splits and routes to specialized extraction models for instant verification.

Similarly, in vehicle fleet management, auto-splitting enables the seamless digitization of maintenance folders where insurance certificates, logbooks, and repair invoices are often scanned together, ensuring each document is correctly identified and filed under the right vehicle asset without any manual sorting.

You can check more real-life examples of how companies leverage this technology by visiting customer stories.

How does automated document splitting work ?

Mindee’s splitting solution uses a multi-layered approach to ensure perfect "document boundaries."

  • Visual continuity analysis: The AI looks for visual cues, such as consistent headers, logos, or page footers (e.g., "Page 1 of 3").
  • Logical boundary detection: It identifies "breaking points," such as a new invoice number, a different date, or a sudden change in document layout (e.g., shifting from a legal contract to a utility bill).
  • Batch decomposition: Once the boundaries are confirmed, the system "cuts" the multi-page file into separate digital records.

How to auto-split multiple files in a large document at once ?

Manually splitting PDFs is a massive productivity killer. To automate this, you should implement an API with native splitting capabilities rather than trying to build custom logic in Python.

Mindee’s auto-splitting feature handles this within the API call itself.

When you upload a batch file, the solution detects the record boundaries and provides a structured output of separated documents. This allows developers to build "one-click" upload features where users can drop an entire day's worth of paperwork into a single box, and the system handles the sorting, splitting, and extraction in the background.