Unlock Your Desired Data with Custom Data Field Extraction

Streamline your document workflows with accurate data extraction models. Little to no human intervention required.

Up to 99% information extraction accuracy

Retain full control over your data capture process

Trusted by 1000+ brands worldwide

Start with an online demo or contact us.

Generate Useful Data with Custom Data Field Capture

Extract any information for your business’s needs. Define which data fields you want to capture with AI-powered OCR.

Customize data fields

Extract solely the information you need and reduce document processing times

Personalize new models

Create new models and personalize the data fields you want to extract

Continuous improvement

Add new data fields to existing model to enhance its performance

High extraction accuracy in an instant

Enjoy up to 99% data extraction accuracy by regularly feeding data to the model

Full control over data field capture

Be in control of the data used to train the models and the extraction output

Data extraction from various documents

Unlock relevant data from a multitude of document types with a single extraction model

How To Train your Data Extraction Model with Klippa

Don’t let document processing overwhelm you. Follow these five simple steps to train your data extraction models for document automation.

Step 1: Define data fieldsStep 2: Gather and annotate dataStep 3: Upload annotated dataStep 4: Data extractionStep 5: Evaluate model performance

Define data fields

The first step is to determine the data fields you want the model to extract.

Based on the documents you work with, these specific data fields could be:

Document numbers

Names

Dates

Amounts

Many more…

Gather and annotate data

After defining which data fields you’d like to have extracted, our team prepares for the model training.

Before starting, you provide us with a large amount of data. Depending on your use case and document type, the required amount could be at least 500 documents or samples. Upon receiving the data, we start the annotation process.

Data annotation is the process of labeling text, and even images, into defined categories. This procedure is very important because it influences the accuracy of the data extraction models.

You have the option to annotate the data with your own team or you can leverage Klippa’s in-house experts to do that for you.

Upload annotated data

After annotating the data, we define categories for the data fields. Based on these, we create the dataset and upload it to the model.

To ensure the model is properly trained, we use the 80/20 rule. Let’s say you want to extract the “invoice number” field. From 100 invoices, we train the model on 80 of those invoices and keep the last 20 for benchmarking.

The model is therefore trained on a specific document type, such as an invoice. Therefore, when data extraction needs to be done, the model will be able to swiftly recognize the invoice number.

Data extraction

After the model has been trained on the document type/field of your choice, it is time to extract the data. Our AI-powered OCR software reads the document, then identifies and extracts the data fields that were defined in the beginning.

The model becomes better and recognizing and extracting relevant data fields by making use of machine learning and AI, the more data you feed into it. This process increases the accuracy and recall levels of the model.

Evaluate model performance

To evaluate the performance of the model, we upload a test set, which the model has never seen before. This is the 20% of the annotated training data that was kept for benchmarking.

With the confidence score defined in the data extraction output (JSON), you can determine if the model is at your desired level or needs further training.

Lastly, we go over the iterations and select the model that performs the best for your specific use case.

Data Fields You Can Extract with Klippa

With our data extraction software, you can already capture the following data fields, out-of-the-box:

Addresses

Date of Birth (DOB)

Tax ID Numbers

Chamber of Commerce ID

Social Security Numbers

Document Numbers

Income

Employment Date

Insurance Numbers

Total Amounts

Mechant Names

And Many More

More Data Fields

Enjoy the Benefits of Custom Data Field Extraction

Reduce Cost

Spend less money on manually extracting data with custom data capture.

Improve Speed

Shorten the turnaround time of your document-related processes.

Prevent Fraud

Easily recognize incorrect business information on documents.

Minimize Errors

Prevent manual data verification errors with a steady custom data field extraction solution.

We Take Your Data Privacy & Security Seriously

“It is extremely pleasant to work together with a party that is as ambitious as we are. The willingness and speed with which Klippa implemented specific modifications for us is impressive.”

Leon Backbier

IT Manager, Banijay Benelux

Get Started Now!

Let Klippa’s experts show you how you can customize data field extraction within seconds using intelligent automation.

Frequently Asked Questions

What is custom data field extraction?Which documents can I process?How can DocHorizon be implemented in my systems?How about data privacy and security?Which industries use custom data field extraction?

What is custom data field extraction?

Custom data field extraction is the process of extracting and processing only a particular data field from a specific document. This data field is predetermined before the extraction process.

The data field extraction process is a simple and straightforward process:

First, you send your documents to our OCR API which does data extraction. Second, Klippa converts the documents to TXT format, classifies the document type, extracts the relevant data fields, and then converts the extracted data to JSON format.

And there you have it! You get your data back in a machine-readable format, ready for further processing, within seconds.

Which documents can I process?

With Klippa’s custom data field extraction solution, you can extract a large variety of data fields from multiple documents, including but not limited to:

Retail documents

How can DocHorizon be implemented in my systems?

Our API will be useful for you if you want to build your own information extraction and archiving pipeline and connect it with your existing software systems.

Our SDK, on the other hand, enables you to turn your mobile devices into data capture devices with the ability to digitize documents and successfully archive them in a digital environment. This is useful if you want to add data classification and sorting features to your existing or soon-to-be-released mobile app.

If you’re not yet sure which choice fits you best, take a look at our documentation for more information.

How about data privacy and security?

By default, Klippa does not store any customer data. Data is always processed under a data processing agreement (DPA) and all service from Klippa are compliant with GDPR.

All data transfer is done via secure SSL connections. Our servers are ISO certified and by default are located in Amsterdam, the Netherlands. Getting a custom server on a location of choice is possible in any location worldwide.

On a regular base, our security is tested via third party penetration testing to ensure state of the art security at all times.

Which industries use custom data field extraction?

Klippa’s custom data field extraction can be integrated in multiple industries, especially where the processed documents present certain particularities. These particularities can be a specific document number, logo or address.

With a specifically-trained model to extract custom data fields from invoices, bank statements or salary slips for instance, Klippa can be of great help for:

Financial and banking sector

Marketing and retail sector

Logistics and transport

Legal and public sector