Unlock Your Desired Data with Custom Data Field Extraction

Streamline your document workflows with accurate data extraction models. Little to no human intervention required.
Up to 99% information extraction accuracy
Retain full control over your data capture process

Trusted by 1000+ brands worldwide

Start with an online demo or contact us.

Generate Useful Data with Custom Data Field Capture

Extract any information for your business’s needs. Define which data fields you want to capture with AI-powered OCR.  
Customize data fields
Extract solely the information you need and reduce document processing times
Personalize new models
Create new models and personalize the data fields you want to extract
Continuous improvement
Add new data fields to existing model to enhance its performance

High extraction accuracy in an instant
Enjoy up to 99% data extraction accuracy by regularly feeding data to the model
Full control over data field capture
Be in control of the data used to train the models and the extraction output
Data extraction from various documents
Unlock relevant data from a multitude of document types with a single extraction model

How To Train your Data Extraction Model with Klippa

Don’t let document processing overwhelm you. Follow these five simple steps to train your data extraction models for document automation.
Hotel_Invoice

Define data fields

The first step is to determine the data fields you want the model to extract. 

Based on the documents you work with, these specific data fields could be:
Document numbers
Names
Dates
Amounts
Many more…
image to text ocr for invoices

Gather and annotate data

After defining which data fields you’d like to have extracted, our team prepares for the model training. 

Before starting, you provide us with a large amount of data. Depending on your use case and document type, the required amount could be at least 500 documents or samples. Upon receiving the data, we start the annotation process. 

Data annotation is the process of labeling text, and even images, into defined categories. This procedure is very important because it influences the accuracy of the data extraction models.

You have the option to annotate the data with your own team or you can leverage Klippa’s in-house experts to do that for you.

Upload annotated data

After annotating the data, we define categories for the data fields. Based on these, we create the dataset and upload it to the model.

To ensure the model is properly trained, we use the 80/20 rule. Let’s say you want to extract the “invoice number” field. From 100 invoices, we train the model on 80 of those invoices and keep the last 20 for benchmarking.

The model is therefore trained on a specific document type, such as an invoice. Therefore, when data extraction needs to be done, the model will be able to swiftly recognize the invoice number. 

Data extraction

After the model has been trained on the document type/field of your choice, it is time to extract the data. Our AI-powered OCR software reads the document, then identifies and extracts the data fields that were defined in the beginning.

The model becomes better and recognizing and extracting relevant data fields by making use of machine learning and AI, the more data you feed into it. This process increases the accuracy and recall levels of the model.

Evaluate model performance

To evaluate the performance of the model, we upload a test set, which the model has never seen before. This is the 20% of the annotated training data that was kept for benchmarking.

With the confidence score defined in the data extraction output (JSON), you can determine if the model is at your desired level or needs further training.

Lastly, we go over the iterations and select the model that performs the best for your specific use case.

Data Fields You Can Extract with Klippa

With our data extraction software, you can already capture the following data fields, out-of-the-box:
Date of Birth (DOB)
Document Numbers
Income
Employment Date
Insurance Numbers
Total Amounts
Mechant Names
And Many More

Enjoy the Benefits of Custom Data Field Extraction

Reduce Cost
Spend less money on manually extracting data with custom data capture.
Improve Speed
Shorten the turnaround time of your document-related processes.
Prevent Fraud
Easily recognize incorrect business information on documents.
Minimize Errors
Prevent manual data verification errors with a steady custom data field extraction solution.

We Take Your Data Privacy & Security Seriously

“It is extremely pleasant to work together with a party that is as ambitious as we are. The willingness and speed with which Klippa implemented specific modifications for us is impressive.”
Leon Backbier
IT Manager, Banijay Benelux
Get Started Now!
Let Klippa’s experts show you how you can customize data field extraction within seconds using intelligent automation.

Frequently Asked Questions

What is custom data field extraction?

Custom data field extraction is the process of extracting and processing only a particular data field from a specific document. This data field is predetermined before the extraction process.

The data field extraction process is a simple and straightforward process:

First, you send your documents to our OCR API which does data extraction. Second, Klippa converts the documents to TXT format, classifies the document type, extracts the relevant data fields, and then converts the extracted data to JSON format. 

And there you have it! You get your data back in a machine-readable format, ready for further processing, within seconds.

Which documents can I process?

With Klippa’s custom data field extraction solution, you can extract a large variety of data fields from multiple documents, including but not limited to:

How can DocHorizon be implemented in my systems?

Our API will be useful for you if you want to build your own information extraction and archiving pipeline and connect it with your existing software systems.  

Our SDK, on the other hand, enables you to turn your mobile devices into data capture devices with the ability to digitize documents and successfully archive them in a digital environment. This is useful if you want to add data classification and sorting features to your existing or soon-to-be-released mobile app.

If you’re not yet sure which choice fits you best, take a look at our documentation for more information. 

How about data privacy and security?

By default, Klippa does not store any customer data. Data is always processed under a data processing agreement (DPA) and all service from Klippa are compliant with GDPR. 

All data transfer is done via secure SSL connections. Our servers are ISO certified and by default are located in Amsterdam, the Netherlands. Getting a custom server on a location of choice is possible in any location worldwide. 

On a regular base, our security is tested via third party penetration testing to ensure state of the art security at all times.

Which industries use custom data field extraction?

Klippa’s custom data field extraction can be integrated in multiple industries, especially where the processed documents present certain particularities. These particularities can be a specific document number, logo or address.

With a specifically-trained model to extract custom data fields from invoices, bank statements or salary slips for instance, Klippa can be of great help for:
Financial and banking sector
Marketing and retail sector
Logistics and transport
Legal and public sector