Unlock Your Desired Data with Custom Data Field Extraction
Streamline your document workflows with accurate data extraction models. Little to no human intervention required.
Trusted by 1000+ brands worldwide
Start with an online demo or contact us.
Generate Useful Data with Custom Data Field Capture
Extract any information for your business’s needs. Define which data fields you want to capture with AI-powered OCR.
How To Train your Data Extraction Model with Klippa
Don’t let document processing overwhelm you. Follow these five simple steps to train your data extraction models for document automation.
Define data fields
The first step is to determine the data fields you want the model to extract.
Based on the documents you work with, these specific data fields could be:
Gather and annotate data
After defining which data fields you’d like to have extracted, our team prepares for the model training.
Before starting, you provide us with a large amount of data. Depending on your use case and document type, the required amount could be at least 500 documents or samples. Upon receiving the data, we start the annotation process.
Data annotation is the process of labeling text, and even images, into defined categories. This procedure is very important because it influences the accuracy of the data extraction models.
You have the option to annotate the data with your own team or you can leverage Klippa’s in-house experts to do that for you.
Upload annotated data
After annotating the data, we define categories for the data fields. Based on these, we create the dataset and upload it to the model.
To ensure the model is properly trained, we use the 80/20 rule. Let’s say you want to extract the “invoice number” field. From 100 invoices, we train the model on 80 of those invoices and keep the last 20 for benchmarking.
The model is therefore trained on a specific document type, such as an invoice. Therefore, when data extraction needs to be done, the model will be able to swiftly recognize the invoice number.
Data extraction
After the model has been trained on the document type/field of your choice, it is time to extract the data. Our AI-powered OCR software reads the document, then identifies and extracts the data fields that were defined in the beginning.
The model becomes better and recognizing and extracting relevant data fields by making use of machine learning and AI, the more data you feed into it. This process increases the accuracy and recall levels of the model.
Evaluate model performance
To evaluate the performance of the model, we upload a test set, which the model has never seen before. This is the 20% of the annotated training data that was kept for benchmarking.
With the confidence score defined in the data extraction output (JSON), you can determine if the model is at your desired level or needs further training.
Lastly, we go over the iterations and select the model that performs the best for your specific use case.
Data Fields You Can Extract with Klippa
With our data extraction software, you can already capture the following data fields, out-of-the-box:
Enjoy the Benefits of Custom Data Field Extraction
Reduce Cost
Spend less money on manually extracting data with custom data capture.
Improve Speed
Shorten the turnaround time of your document-related processes.
Prevent Fraud
Easily recognize incorrect business information on documents.
Minimize Errors
Prevent manual data verification errors with a steady custom data field extraction solution.
We Take Your Data Privacy & Security Seriously
“It is extremely pleasant to work together with a party that is as ambitious as we are. The willingness and speed with which Klippa implemented specific modifications for us is impressive.”
Get Started Now!
Let Klippa’s experts show you how you can customize data field extraction within seconds using intelligent automation.
Frequently Asked Questions
What is custom data field extraction?Which documents can I process?How can DocHorizon be implemented in my systems?How about data privacy and security?Which industries use custom data field extraction?
What is custom data field extraction?
Custom data field extraction is the process of extracting and processing only a particular data field from a specific document. This data field is predetermined before the extraction process.
The data field extraction process is a simple and straightforward process:
First, you send your documents to our OCR API which does data extraction. Second, Klippa converts the documents to TXT format, classifies the document type, extracts the relevant data fields, and then converts the extracted data to JSON format.
And there you have it! You get your data back in a machine-readable format, ready for further processing, within seconds.
Which documents can I process?
With Klippa’s custom data field extraction solution, you can extract a large variety of data fields from multiple documents, including but not limited to:
How can DocHorizon be implemented in my systems?
Our API will be useful for you if you want to build your own information extraction and archiving pipeline and connect it with your existing software systems.
Our SDK, on the other hand, enables you to turn your mobile devices into data capture devices with the ability to digitize documents and successfully archive them in a digital environment. This is useful if you want to add data classification and sorting features to your existing or soon-to-be-released mobile app.
If you’re not yet sure which choice fits you best, take a look at our documentation for more information.
How about data privacy and security?
By default, Klippa does not store any customer data. Data is always processed under a data processing agreement (DPA) and all service from Klippa are compliant with GDPR.
All data transfer is done via secure SSL connections. Our servers are ISO certified and by default are located in Amsterdam, the Netherlands. Getting a custom server on a location of choice is possible in any location worldwide.
On a regular base, our security is tested via third party penetration testing to ensure state of the art security at all times.
Which industries use custom data field extraction?
Klippa’s custom data field extraction can be integrated in multiple industries, especially where the processed documents present certain particularities. These particularities can be a specific document number, logo or address.
With a specifically-trained model to extract custom data fields from invoices, bank statements or salary slips for instance, Klippa can be of great help for: