Azure Document Intelligence ( Form Recognizer )

6 min readMar 19, 2024

Introduction

In today’s fast-paced digital landscape, the volume of structured and unstructured data generated by organizations and businesses is overwhelming to handle. When we aim to prioritize accuracy and time management, manually extracting meaningful information from the sheer variety of data, including documents, text, invoices, and images, is indeed an arduous task to undertake. This is where Azure Document Intelligence (formerly known as Form Recognizer) steps in, providing organizations with a tailored solution that accelerates the information extraction process and automates document processing and analysis.

Overview of Azure Document Intelligence

Azure Document Intelligence is a suite of tools and cloud services that leverages Artificial Intelligence (AI) and Machine Learning (ML) techniques to meet the requirement of document processing and analysis. It empowers organizations to efficiently process, analyze, and extract valuable insights from a wide array of document types. Azure Document Intelligence utilizes Optical Character Recognition (OCR) technology as a fundamental component to extract text from various types of documents such as invoices, receipts, forms, and contracts. By leveraging OCR technology, Azure Document Intelligence automates the digitization process, making it easier to extract and analyze textual information from scanned documents such as pdfs or any other supported formats. It can then compare the retrieved text with matching field names in a database. Specific information from the invoice, such as name, address, total value, and tax amount, can be identified by document intelligence.

How does Document Intelligence work?

OCR is used to extract typeface and handwritten text documents and assign meaning to them.

Document Ingestion: The first step involves providing the receipt document to the Azure Document Intelligence service. This document can be in the form of a scanned image, PDF file, or other supported formats.
Image Preprocessing: In this step , the document image is optimized to improve the accuracy of text recognition. Image preprocessing is the initial step taken to improve image quality which involves noise reduction, deskewing, and contrast enhancement.
Text Detection: This process involves the detection of words, and text blocks within the document.
Character Recognition: Once text regions are identified, OCR algorithms analyze the shapes and patterns within these regions to recognize individual characters. This involves comparing the shapes of detected text fragments to a library of known characters.
Text Extraction: Machine learning models can interpret the data in a document or form because they are trained to recognize patterns in bounding box coordinate locations and text. Recognized characters are assembled into words, sentences, and paragraphs, reconstructing the textual content of the document.

Although Optical Character Recognition (OCR) possesses the capability to decipher both printed and handwritten documents, its output is typically in an unstructured format, presenting challenges in terms of storage within a database or analytical processing. Document intelligence, on the other hand, transcends this limitation by discerning the underlying structure of the text, including key/value relationships and tabular data.

The ability to extract text, layout, and key-value pairs from a image is known as document analysis. Document analysis provides locations of text on a page identified by bounding box . These boxes have their specific coordinates that helps machine learning models to extract text.

Information on this invoice is stored in key value pair. Here key can be item ordered and value can be Cordon Bleau. Document analysis could record the location of the field value as bounding box coordinates [4.2, 2.1], [4.3, 2.2], [4.3, 2.4]. Machine learning models can comprehend the data in a document or form by virtue of their training to discern patterns in bounding box coordinate locations and textual content. The extracted text is then made available for further processing and analysis. This output can be integrated into downstream systems or database applications for further processing, reporting, or decision-making purposes.

Models in Azure Document Intelligence

Prebuilt Models: Prebuilt models encompass pre-trained algorithms that process common document types such as invoices, business cards, ID documents, and more. These models are intricately engineered extract text, key-value pairs, tables, and structures from forms and documents. These models are capable for the extraction of:

Client and supplier particulars from invoices
Sales and transaction specifics from receipts
Taxable remuneration, mortgage interest, student loan specifics, and beyond and more like these.

Custom Models: These models can be trained to extract information and specifics which are currently not included in pre-existing models. Custom models can be trained to extract distinct data from forms and documents, specific to your use cases.

For more → Azure Document Intelligence Models

Getting started with DI Studio

To begin, access the Azure Portal and sign in. If you’re new to Azure, you’ll need to create an account first. Once signed in, locate the “Create a resource” button.

Next, initiate the creation of a new resource by searching for “Document Intelligence” in the Azure Marketplace and selecting it to proceed.

During configuration, input the required details such as your subscription, resource group, and resource name. Additionally, choose a pricing tier that aligns with your specific requirements. It’s advisable to opt for a region close to your users to minimize latency and enhance performance.

Once all necessary information is provided, review your selections carefully and click “Create” to initiate the deployment process. Azure will then proceed to deploy your Document Intelligence resource according to the specified parameters.

Upon successful deployment, navigate to your Document Intelligence resource. Within this interface navigate to keys and endpoints. It is imperative to make a note of these details, as they serve as the fundamental components for conducting API calls.

Using prebuilt models

On the Document Intelligence Studio home page, under prebuilt models, select Invoice model.
Give input invoice file.
Click on run analysis.

On the right hand side we can see there are predefined fields in this model that extract specific information from the invoice.

If in any specific use case, we need more information for which our prebuilt model is not trained, use the custom Invoices processing model, augment the behaviors of the prebuilt invoice processing model by adding new fields to be extracted in addition to the ones by default.

Scope of Azure Document Intelligence

Azure Document Intelligence offers a comprehensive set of benefits for businesses to enhance their document processing and analysis skills and boost productivity, accuracy, and creativity throughout their whole operation.

Efficiency: Azure Document Intelligence automates manual document processing tasks, saving time and resources. It accelerates workflows by quickly extracting and organizing information from documents, reducing the need for manual labor.
Accuracy: By leveraging advanced AI and machine learning algorithms, Azure Document Intelligence ensures accuracy, minimizing errors and improving data quality. This leads to more reliable and trustworthy information for decision-making.
Enhanced Customer Experience: By automating document-related processes, Azure Document Intelligence improves the customer experience by reducing wait times and errors. It enables organizations to respond more quickly to customer inquiries and requests.
Flexibility: Azure Document Intelligence supports text extraction from a wide array of document types and formats, making it suitable for diverse use cases. It can adapt to evolving business needs and requirements and drive innovation in today’s competitive landscape.
Reduced Costs: With the automated document analysis process the error rates are gradually reduced which in turn reduces the costs of error correction.

Challenges of Azure Document Intelligence

Automating the process of document analysis poses a challenge due to the diverse formats in which forms and documents are presented. For instance, while both tax forms and driver’s license documents contain an individual’s name, the bounding box coordinates for the name may vary. This necessitates training separate machine learning models to ensure high-quality results for different forms and documents. At times, prebuilt machine learning models trained on commonly encountered document formats may suffice. However, in other cases, customizing a machine learning model becomes necessary to recognize a unique document format.
Businesses heavily reliant on documentation may overlook the need to enhance document quality. Often, documents are handwritten or scanned, resulting in poor image quality. Consequently, document intelligence processing systems may struggle to accurately extract information.

Thanks for giving a read :)