Gathering product and pricing data from supermarket receipts can be a challenging endeavor. To harvest accurate data from a large volume of receipts requires time, precision, and of course, the willingness of numerous shoppers to send in their receipts. Once gathered, you need an army of colleagues to process copies upon copies of receipts into usable data.
Perhaps you are considering back-office outsourcing or crowdsourcing, such as Amazon’s Mechanical Turk, to have this tedious work completed. Unfortunately, no matter how extensively you instruct humans, they will always lack the accuracy and reliability of a computer.
As the saying goes, our mistakes are what make us human. But besides accuracy, processing cost and turnaround time are also relevant. On both of these topics, computers tend to beat humans as well. So the question is: how do you get the software to do all of the receipt processing for you? Klippa has a savvy solution for you.
What is supermarket receipt scanning and what is its goal?
Supermarket receipt scanning is the process of reading receipts with OCR (Optical Character Recognition), identifying all relevant data fields, and converting the text into a usable structured data format.
With the Klippa API, the lion’s share of this process is completely automated. With no effort at all, you will have huge amounts of data at your fingertips.
You can use this data to conduct product research, instigate product improvements, analyze buyer behavior, research pricing strategies, set up marketing campaigns, and much more.
How does OCR on supermarket receipts work?
It’s all well and good to know that customers send a photo to the API and structured data rolls out, but what happens in between? The API is like the waiter, taking your order, moving to the kitchen where the order is processed, and returning the food to your table.
Well, here is a simplified overview of the steps the API takes:
- The customer uploads a photo of a receipt with the click of a button.
- The API takes the image and scans it.
- The image is corrected by an AI using blur and glare detection in order to elucidate the text on the receipt.
- Using OCR software, the text is read and extracted into a TXT document.
- Through machine learning, important data points and categories are identified and this data is then transformed into JSON.
- The API serves up the JSON data within a few seconds, and it is now at your disposal.
So who is the cook standing in the kitchen to prepare your meal? In this case, the cook is an AI, trained with numerous examples of receipts, tickets, invoices, and other forms of documents. The AI learns to determine what a data field constitutes, for instance, whether a data field is a product line, price, merchant address, or something else.
Over time, this AI has become a very adept chef, almost perfecting its ability to detect specific data automatically. This form of machine learning has enabled the engine to be as accurate as >95% with a capacity to process huge volumes.
The AI does not misfire and will automatically produce your JSON data in a matter of seconds. This allows the API to serve up a perfect dish.
What data can you extract from receipts?
In essence, any data that is on a receipt is extractable and adapted to your specific needs.
The following will give you a brief overview of data examples, which can all be combined to form a complete data set for thorough research purposes.
Product data
The products on the receipt not only consist of the product name that is on the receipt. It can have any manner of contextual information such as descriptions, brands, ingredients, or even country of origin. These line item descriptions are usually accompanied by data points such as quantity or price.
Product classification
Products can be divided into classes, such as food & drinks (vegetables, snacks, dairy products, soda, juice), but also electronics, cleaning, personal care, clothing, and so on. On custom order, these classifications can also be made in terms of nutritional values or as containing specific ingredients.
Location and merchant data
The name, address, website and other contact details of the merchant are extractable from the receipt, which gives you general insight into the location and brand of the stores that are on the receipt.
Pricing data
The product price, total basket size, VAT amounts and percentages, and currency are all part of the data set that can be extracted from a receipt. All data relating to pricing is substantial for research purposes.
Detecting different types of fraud
Unfortunately, fraud is very much a part of supermarket receipt scanning, especially when it is implemented in programs involving rewards. Fraudsters can be very creative when it comes to manipulating receipts in their favor.
Luckily, Klippa’s API is able to detect such cases of fraud. Fraud detection is customized on request, but the following are three examples of the type of fraud Klippa can catch:
Catch duplicate receipts
The API is able to determine whether a receipt has already been entered before. Fraudsters might try to fool the system by requesting multiple rewards with a single receipt (for example cross multiple accounts), but it can also occur accidentally. The system is able to detect such an entry by image and data hashing, identifying overlapping information between different entries.
Catch Photoshop manipulations
These days, it is not too difficult to manipulate a photo with programs such as Adobe Photoshop. This makes it easier for fraudsters to attempt to replace line items or change the price, date or time of the purchase. Klippa’s API is able to detect inconsistent pixel structures and will recognize a ‘photoshopped’ image.
Fake receipts
It is possible for someone with bad intentions to create a fake receipt from scratch or based on an existing receipt. Regardless of the quality of pixel manipulation, the API is able to cross-reference information on a receipt such as addresses, chamber of commerce numbers, phone numbers and more. Any mistake a fraudster makes can be caught.
What can you do with supermarket receipt scanning?
Once the data rolls out, there are numerous possibilities in applying the data to your needs. The following are examples of use cases:
Receipt scanning for Market Basket Analysis
In order to understand your customer behavior, one of your tasks will be to perform a market basket analysis (MBA).
To gather all the product data you need for an accurate analysis, you set up a campaign in order to encourage customers to provide you with demographic information and upload a photo of their supermarket receipts. These photos need to be transformed into usable and accurate data, such as product types, brands, or any other product quality.
Enter Klippa. Each photo a customer uploads will automatically be scanned and transformed into segmented text. This TXT will then be processed into a structured format (JSON), which grants you the opportunity to perform an effective affinity analysis.
You can discover patterns in buying behavior, such as products that are often bought together. A large retailer would then be able to instigate promotions and marketing campaigns to increase sales.
Supermarket receipt scanning for geographical pricing analysis
To determine the pricing strategy of your business, you will need to analyze competitor pricing in your area. For example, you can analyze competitor pricing in your neighborhood or province and determine what pricing would best represent your brand and draw customers to your business.
You gather customer receipts and process them via the Klippa API. The API is able to automatically read and extract all fields on the receipt, including product name, pricing, VAT, and merchant. These elements combined will enable you to determine your vicinity.
Opting for an API as opposed to outsourcing and crowdsourcing will reward you with a process that is faster, more accurate, and less expensive.
Receipt scanning for cashback and loyalty campaigns
Supermarket receipt scanning is ideal for automated cashback processing or a loyalty points system. You can set up a campaign to increase customer exposure to a new product and increase direct sales.
Customers will upload a receipt containing the product that is part of the marketing campaign in order to receive cashback. When you receive these photos in the thousands or even millions, you will be in need of a quick and accurate way to process these so that customers will receive their cashback swiftly.
The API will deploy OCR to read the line items of receipts and extract these into a JSON format that will allow you to automatically detect campaign products and set in motion the clearing of payment. All with the speed and reliability that is certain to satisfy your customers.
Frequently Asked Questions
What fields can Klippa extract from supermarket receipts?
Below the default extracted data fields are listed. These can be customized for each customer. Additional fields can be extracted on request.
- Document type
- Image quality
- Country of origin
- Receipt language
- Merchant name
- Merchant address details
- Merchant contact details
- Merchant website
- Payment method
- Card number
- Amount of change
- Date of purchase
- Total amount and currency
- VAT amounts
- VAT percentages
- Line item descriptions, quantity, prices, and category
- Receipt number
- Chamber of Commerce number
- VAT number
- And many more fields
Does it work on low-quality photos?
Our API already includes image pre-processing capabilities that will improve and rotate bad-quality images for better processing. Besides that, Klippa offers a scanning SDK that can be implemented in mobile apps.
This SDK includes image processing capabilities like perspective correction and glare- and blur detection to clarify the content of photos.
This way, you will only receive photos that contain valid information for further processing. It will prevent the garbage in garbage out principle on the side of the customer at the moment the photo is taken.
How accurate is the OCR API?
Supermarket receipts can be scanned and data extracted with >95% accuracy. This means that the AI will rarely misunderstand any line on a receipt.
You can safely assume that it will accurately process receipts. If you do see certain improvements, we can custom-train our models to support your use case and bring the best value to your business.
Does it work on all products?
There can be a variety of products on a supermarket receipt. Not only your average groceries, but also an electric toothbrush or a frying pan. Regardless of the product type, Klippa can identify every product line on a receipt using OCR and machine learning.
It will also be able to recognize product categories such as food & drinks, personal care, cleaning, clothing, electronics, and more.
What about privacy and GDPR?
As with all other services that Klippa offers, supermarket receipt scanning is fully secure and GDPR compliant. By default, we use ISO-certified servers within the European Union for processing receipts.
Are you located outside of the EU? We can set up a custom server in your region very quickly. A data processor agreement is in place. We do not store any of your or your customer’s data after processing.
In which countries can you use it?
Klippa’s engine works best in Western languages. Common languages we work with are English, Dutch, German, French, Spanish, Portuguese, Swedish, Norwegian, Danish, Finnish, and Italian.
Any other language can be supported on request. We are able to use machine learning for every language under the sun.
Get acquainted with Klippa
At Klippa, we would love to help you out with all your document processing needs.
If you have a challenge about processing receipts or any other document, feel free to send us a message or plan an online 30-minute demonstration with one of our experts below.