Home Machine Learning The way to Construct a Generative AI Device for Info Extraction from Receipts | by Robert Martin-Quick | Apr, 2024

The way to Construct a Generative AI Device for Info Extraction from Receipts | by Robert Martin-Quick | Apr, 2024

0
The way to Construct a Generative AI Device for Info Extraction from Receipts | by Robert Martin-Quick | Apr, 2024

[ad_1]

DALLE-2’s interpretation of “A futuristic industrial doc scanning facility”

Use LangChain and OpenAI instruments to extract structured data from pictures of receipts saved in Google Drive

This text particulars how we will use open supply Python packages akin to LangChain, pytesseract and PyPDF, together with gpt-4-vision and gpt-3.5-turbo, to determine and extract key data from pictures of receipts. The ensuing dataset may very well be used for a “chat to receipts” utility. Take a look at the complete code right here.

Paper receipts are available in all types of types and codecs and signify an fascinating goal for automated data extraction. In addition they present a wealth of itemized prices that, if aggregated right into a database, may very well be very helpful for anybody fascinated by monitoring their spend at extra detailed degree than provided by financial institution statements.

Wouldn’t it’s cool if you happen to might take a photograph of a receipt, add it some utility, then have its data extracted and appended to your private database of bills, which you would then question in pure language? You could possibly then ask questions of the information like “what did I purchase once I final visited IKEA?” or “what objects do I spend most cash on at Safeway”. Such a system may additionally naturally prolong to company finance and expense monitoring. On this article, we’ll construct a easy utility that offers with the primary a part of this course of — specifically extracting data from receipts able to be saved in a database. Our system will monitor a Google Drive folder for brand new receipts, course of them and append the outcomes to a .csv file.

Technically, we’ll be doing a kind of automated data extraction known as template filling. We’ve got a pre-defined schema of fields that we need to extract from our receipts and the duty can be to fill these out, or go away them clean the place acceptable. One main concern right here is that the data contained in pictures or scans of receipts is unstructured, and though Optical Character Recognition (OCR) or PDF textual content extraction libraries may do a good job at discovering the textual content, they aren’t good preserving the relative positions of phrases in a doc, which might make it tough to match an merchandise’s value to its value for instance.

Historically, this concern is solved by template matching, the place a pre-defined geometric template of the doc is created after which extraction is just run within the areas identified to include essential data. An awesome description of this may be discovered right here. Nonetheless, this method is rigid. What if a brand new format of receipt is added?

To get round this, extra superior companies like AWS Textract and AWS Rekognition use a mix of pre-trained deep studying fashions for object detection, bounding field technology and named entity recognition (NER). I haven’t truly tried out these companies on the issue at hand, however it might be actually fascinating to take action in an effort to examine the outcomes towards what we construct with OpenAI’s LLMs.

Giant Language Fashions (LLM) akin to gpt-3.5-turbo are additionally nice at data extraction and template filling from unstructured textual content, particularly after being given a couple of examples of their immediate. This makes them rather more versatile than template matching or fine-tuning, since including a couple of examples of a brand new receipt format is far sooner and cheaper than re-training the mannequin or constructing a brand new geometric template.

If we’re to make use of gpt-3.5-turbo on textual content extracted from a receipts, the query then turns into how can we construct the examples from which it might probably study? We might after all do that manually, however that wouldn’t scale effectively. Right here we are going to discover the choice of utilizing gpt-4-vision for this. This model of gpt-4 can deal with conversations that embrace pictures, and seems notably good at describing the content material of pictures. Given a picture of a receipt and an outline of the important thing data we need to extract, gpt-4-vision ought to due to this fact be capable of do the job in a single shot, offering that the picture is sufficiently clear.

Why wouldn’t we simply use gpt-4-vision alone for this job and abandon gpt-3.5-turbo or different smaller LLMs? Technically we might, and the consequence may even be extra correct. However gpt-4-vision could be very costly and API calls are restricted, so this method additionally received’t scale. Maybe within the not-to-distant future although, imaginative and prescient LLMs will turn into a typical device on this area of knowledge extraction from paperwork.

One other motivation for this text is about exploring how we will construct this method utilizing Langchain, a well-liked open supply LLM orchestration library. As a way to pressure an LLM to return structured output, immediate engineering is required and Langchain has some wonderful instruments for this. We can even attempt to make sure that our system is in-built a manner that’s extensible, as a result of that is simply the primary a part of what might turn into a bigger “chat to receipts” undertaking.

With a short background out of the way in which, lets get began with the code! I can be utilizing Python3.9 and Langchain 0.1.14 right here, and full particulars could be discovered within the repo.

We want a handy place to retailer our uncooked receipt information. Google Drive is one alternative, and it gives a Python API that’s comparatively straightforward to make use of. To seize the receipts I exploit the GeniusScan app, which might add .pdf, .jpeg or different file varieties from the cellphone on to a Google Drive folder. The app additionally does some helpful pre-processing akin to automated doc cropping, which helps with the extraction course of.

To arrange API entry to Google Drive, you’ll have to create service account credentials which could be generated by following the directions right here. For reference, I created a folder in my drive known as “receiptchat” and arrange a key pair that permits studying of information from that folder.

The next code can be utilized to arrange a drive service object, which supplies you entry to varied strategies to question Google Drive

import os
from googleapiclient.discovery import construct
from oauth2client.service_account import ServiceAccountCredentials

class GoogleDriveService:

SCOPES = ["https://www.googleapis.com/auth/drive"]

def __init__(self):
# the listing the place your credentials are saved
base_path = os.path.dirname(os.path.dirname(os.path.dirname(__file__)))

# The identify of the file containing your credentials
credential_path = os.path.be a part of(base_path, "gdrive_credential.json")
os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = credential_path

def construct(self):

# Get credentials into the specified format
creds = ServiceAccountCredentials.from_json_keyfile_name(
os.getenv("GOOGLE_APPLICATION_CREDENTIALS"), self.SCOPES
)

# Arrange the Gdrive service object
service = construct("drive", "v3", credentials=creds, cache_discovery=False)

return service

In our easy utility, we solely actually need to do two issues: Checklist all of the recordsdata within the drive folder and obtain some listing of them. The next class handles this:

import io
from googleapiclient.errors import HttpError
from googleapiclient.http import MediaIoBaseDownload
import googleapiclient.discovery
from typing import Checklist

class GoogleDriveLoader:

# These are the varieties of recordsdata we need to obtain
VALID_EXTENSIONS = [".pdf", ".jpeg"]

def __init__(self, service: googleapiclient.discovery.Useful resource):

self.service = service

def search_for_files(self) -> Checklist:
"""
See https://builders.google.com/drive/api/guides/search-files#python
"""

# This question searches for objects that aren't folders and
# include the legitimate extensions
question = "mimeType != 'utility/vnd.google-apps.folder' and ("
for i, ext in enumerate(self.VALID_EXTENSIONS):
if i == 0:
question += "identify accommodates '{}' ".format(ext)
else:
question += "or identify accommodates '{}' ".format(ext)
question = question.rstrip()
question += ")"

# create drive api consumer
recordsdata = []
page_token = None
attempt:
whereas True:
response = (
self.service.recordsdata()
.listing(
q=question,
areas="drive",
fields="nextPageToken, recordsdata(id, identify)",
pageToken=page_token,
)
.execute()
)
for file in response.get("recordsdata"):
# Course of change
print(f'Discovered file: {file.get("identify")}, {file.get("id")}')

file_id = file.get("id")
file_name = file.get("identify")

recordsdata.append(
{
"id": file_id,
"identify": file_name,
}
)

page_token = response.get("nextPageToken", None)
if page_token is None:
break

besides HttpError as error:
print(f"An error occurred: {error}")
recordsdata = None

return recordsdata

def download_file(self, real_file_id: str) -> bytes:
"""
Downloads a single file
"""

attempt:
file_id = real_file_id
request = self.service.recordsdata().get_media(fileId=file_id)
file = io.BytesIO()
downloader = MediaIoBaseDownload(file, request)
achieved = False
whereas achieved is False:
standing, achieved = downloader.next_chunk()
print(f"Obtain {int(standing.progress() * 100)}.")

besides HttpError as error:
print(f"An error occurred: {error}")
file = None

return file.getvalue()

Operating this offers the next:

service = GoogleDriveService().construct()
loader = GoogleDriveLoader(service)
all_files loader.search_for_files() #returns an inventory of unqiue file ids and names
pdf_bytes = loader.download_file({some_id}) #returns bytes for that file

Nice! So now we will hook up with Google Drive and produce picture or pdf information onto our native machine. Subsequent, we should course of it and extract textual content.

A number of well-documented open supply libraries exist to extract uncooked textual content from pdfs and pictures. For pdfs we are going to use PyPDF right here, though for a extra complete view of comparable packages I like to recommend this article. For pictures in jpeg format, we are going to make use of pytesseract , which is a wrapper for the tesseract OCR engine. Set up directions for that may be discovered right here. Lastly, we additionally need to have the ability to convert pdfs into jpeg format. This may be achieved with the pdf2image package deal.

Each PyPDF and pytesseract present excessive degree strategies for extraction of textual content from paperwork. They each even have choices for tuning this. pytesseract , for instance, can extract each textual content and boundary packing containers (see right here), which can be of helpful in future if we need to feed the LLM extra details about the format of the receipt whose textual content its processing. pdf2image gives a way to transform pdf bytes to jpeg picture, which is precisely what we need to do right here. To transform jpeg bytes to a picture that may be visualized, we’ll use the PIL package deal.

from abc import ABC, abstractmethod
from pdf2image import convert_from_bytes
import numpy as np
from PyPDF2 import PdfReader
from PIL import Picture
import pytesseract
import io

DEFAULT_DPI = 50

class FileBytesToImage(ABC):

@staticmethod
@abstractmethod
def convert_bytes_to_jpeg(file_bytes):
elevate NotImplementedError

@staticmethod
@abstractmethod
def convert_bytes_to_text(file_bytes):
elevate NotImplementedError

class PDFBytesToImage(FileBytesToImage):

@staticmethod
def convert_bytes_to_jpeg(file_bytes, dpi=DEFAULT_DPI, return_array=False):
jpeg_data = convert_from_bytes(file_bytes, fmt="jpeg", dpi=dpi)[0]
if return_array:
jpeg_data = np.asarray(jpeg_data)
return jpeg_data

@staticmethod
def convert_bytes_to_text(file_bytes):
pdf_data = PdfReader(
stream=io.BytesIO(initial_bytes=file_bytes)
)
# receipt information ought to solely have one web page
web page = pdf_data.pages[0]
return web page.extract_text()

class JpegBytesToImage(FileBytesToImage):

@staticmethod
def convert_bytes_to_jpeg(file_bytes, dpi=DEFAULT_DPI, return_array=False):
jpeg_data = Picture.open(io.BytesIO(file_bytes))
if return_array:
jpeg_data = np.array(jpeg_data)
return jpeg_data

@staticmethod
def convert_bytes_to_text(file_bytes):
jpeg_data = Picture.open(io.BytesIO(file_bytes))
text_data = pytesseract.image_to_string(picture=jpeg_data, good=1)
return text_data

The code above makes use of the idea of summary base courses to enhance extensibility. Shall we say we need to add help for one more file kind in future. If we write the related class and inherit from FileBytesToImage , we’re pressured to put in writing convert_bytes_to_image and convert_bytes_to_text strategies in that. This makes it much less doubtless that our courses will introduce errors downstream in a big utility.

The code can be utilized as follows:

bytes_to_image = PDFBytesToImage()
picture = PDFBytesToImage.convert_bytes_to_jpeg(pdf_bytes)
textual content = PDFBytesToImage.convert_bytes_to_jpeg(pdf_bytes)
Instance of textual content extracted from a pdf doc utilizing the code above. Since receipts include PII, right here we’re simply demonstrating with a random doc uploaded to the Google Drive. Picture generated by the writer.

Now let’s use Langchain to immediate gpt-4-vision to extract some data from our receipts. We will begin through the use of Langchain’s help for Pydantic to create a mannequin for the output.

from langchain_core.pydantic_v1 import BaseModel, Area
from typing import Checklist

class ReceiptItem(BaseModel):
"""Details about a single merchandise on a reciept"""

item_name: str = Area("The identify of the bought merchandise")
item_cost: str = Area("The price of the merchandise")

class ReceiptInformation(BaseModel):
"""Info extracted from a receipt"""

vendor_name: str = Area(
description="The identify of the corporate who issued the reciept"
)
vendor_address: str = Area(
description="The road tackle of the corporate who issued the reciept"
)
datetime: str = Area(
description="The date and time that the receipt was printed in MM/DD/YY HH:MM format"
)
items_purchased: Checklist[ReceiptItem] = Area(description="Checklist of bought objects")
subtotal: str = Area(description="The whole value earlier than tax was utilized")
tax_rate: str = Area(description="The tax fee utilized")
total_after_tax: str = Area(description="The whole value after tax")

That is very highly effective as a result of Langchain can use this Pydantic mannequin to assemble format directions for the LLM, which could be included within the immediate to pressure it to provide a json output with the desired fields. Including new fields is as easy as simply updating the mannequin class.

Subsequent, let’s construct the immediate, which can simply be static:

from dataclasses import dataclass

@dataclass
class VisionReceiptExtractionPrompt:
template: str = """
You might be an knowledgeable at data extraction from pictures of receipts.

Given this of a receipt, extract the next data:
- The identify and tackle of the seller
- The names and prices of every of the objects that had been bought
- The date and time that the receipt was issued. This have to be formatted like 'MM/DD/YY HH:MM'
- The subtotal (i.e. the full value earlier than tax)
- The tax fee
- The whole value after tax

Don't guess. If some data is lacking simply return "N/A" within the related area.
In the event you decide that the picture shouldn't be of a receipt, simply set all of the fields within the formatting directions to "N/A".

You will need to obey the output format below all circumstances. Please observe the formatting directions precisely.
Don't return any further feedback or clarification.
"""

Now, we have to construct a category that may soak up a picture and ship it to the LLM together with the immediate and format directions.

from langchain.chains import TransformChain
from langchain_core.messages import HumanMessage
from langchain_core.runnables import chain
from langchain_core.output_parsers import JsonOutputParser
import base64
from langchain.callbacks import get_openai_callback

class VisionReceiptExtractionChain:

def __init__(self, llm):
self.llm = llm
self.chain = self.set_up_chain()

@staticmethod
def load_image(path: dict) -> dict:
"""Load picture and encode it as base64."""

def encode_image(path):
with open(path, "rb") as image_file:
return base64.b64encode(image_file.learn()).decode("utf-8")

image_base64 = encode_image(path["image_path"])
return {"picture": image_base64}

def set_up_chain(self):
extraction_model = self.llm
immediate = VisionReceiptExtractionPrompt()
parser = JsonOutputParser(pydantic_object=ReceiptInformation)

load_image_chain = TransformChain(
input_variables=["image_path"],
output_variables=["image"],
remodel=self.load_image,
)

# construct customized chain that features a picture
@chain
def receipt_model_chain(inputs: dict) -> dict:
"""Invoke mannequin"""
msg = extraction_model.invoke(
[
HumanMessage(
content=[
{"type": "text", "text": prompt.template},
{"type": "text", "text": parser.get_format_instructions()},
{
"type": "image_url",
"image_url": {
"url": f"data:image/jpeg;base64,{inputs['image']}"
},
},
]
)
]
)
return msg.content material

return load_image_chain | receipt_model_chain | JsonOutputParser()

def run_and_count_tokens(self, input_dict: dict):
with get_openai_callback() as cb:
consequence = self.chain.invoke(input_dict)

return consequence, cb

The primary methodology to grasp right here is set_up_chain , which we are going to stroll by step-by-step. These steps had been impressed by this weblog publish.

  • Initialize the immediate, which on this case is only a block of textual content with some normal directions
  • Create a JsonOutputParser from the Pydantic mannequin we made above. This converts the mannequin right into a set of formatting directions that may be added to the immediate
  • Make a TransformChain that permits us to include customized capabilities — on this case the load_image operate — into the general chain. Be aware that the chain will soak up a variable known as image_path and output a variable known as picture , which is a base64-encoded string representing the picture. This is without doubt one of the codecs accepted by gpt-4-vision.
  • To one of the best of my data, ChatOpenAI doesn’t but natively help sending each textual content and pictures. Due to this fact, we have to make a customized chain that invokes the occasion of ChatOpenAI we made with the encoded picture, immediate and formatting directions.

Be aware that we’re additionally making use of openai callbacks to depend the tokens and spend related to every name.

To run this, we will do the next:

from langchain_openai import ChatOpenAI
from tempfile import NamedTemporaryFile

mannequin = ChatOpenAI(
api_key={your open_ai api key},
temperature=0, mannequin="gpt-4-vision-preview",
max_tokens=1024
)

extractor = VisionReceiptExtractionChain(mannequin)

# picture from PDFBytesToImage.convert_bytes_to_jpeg()
prepared_data = {
"picture": picture
}

with NamedTemporaryFile(suffix=".jpeg") as temp_file:
prepared_data["image"].save(temp_file.identify)
res, cb = extractor.run_and_count_tokens(
{"image_path": temp_file.identify}
)

Given our random doc above, the consequence appears to be like like this:

{'vendor_name': 'N/A',
'vendor_address': 'N/A',
'datetime': 'N/A',
'items_purchased': [],
'subtotal': 'N/A',
'tax_rate': 'N/A',
'total_after_tax': 'N/A'}

Not too thrilling, however a minimum of its structured within the appropriate manner! When a legitimate receipt is supplied, these fields are crammed out and my evaluation from working a couple of exams on totally different receipts it that its very correct.

Our callbacks seem like this:

Tokens Used: 1170
Immediate Tokens: 1104
Completion Tokens: 66
Profitable Requests: 1
Whole Value (USD): $0.01302

That is important for monitoring prices, which might rapidly develop throughout testing of a mannequin like gpt-4.

Let’s assume that we’ve used the steps partly 4 to generate some examples and saved them as a json file. Every instance consists of some extracted textual content and corresponding key data as outlined by our ReceiptInformation Pydantic mannequin. Now, we need to inject these examples right into a name to gpt-3.5-turbo, within the hope that it might probably generalize what it learns from them to a brand new receipt. Few-shot studying is a robust device in immediate engineering and, if it really works, could be nice for this use case as a result of each time a brand new format of receipt is detected we will generate one instance utilizing gpt-4-vision and append it to the listing of examples used to immediate gpt-3.5-turbo. Then when a equally formatted receipt comes alongside, gpt-3.5-turbo can be utilized to extract its content material. In a manner that is like template matching, however with out the necessity to manually outline the template.

There are a lot of methods to encourage textual content based mostly LLMs to extract structured data from a block of textual content. One of many latest and strongest that I’ve discovered is right here within the Langchain documentation. The thought is to create a immediate that accommodates a placeholder for some examples, then inject the examples into the immediate as in the event that they had been being returned by some operate that the LLM had known as. That is achieved with the mannequin.with_structured_output() performance, which you’ll examine right here. Be aware that that is at the moment in beta and so may change!

Let’s have a look at the code to see how that is achieved. We’ll first write the immediate.

from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder

@dataclass
class TextReceiptExtractionPrompt:
system: str = """
You might be an knowledgeable at data extraction from pictures of receipts.

Given this of a receipt, extract the next data:
- The identify and tackle of the seller
- The names and prices of every of the objects that had been bought
- The date and time that the receipt was issued. This have to be formatted like 'MM/DD/YY HH:MM'
- The subtotal (i.e. the full value earlier than tax)
- The tax fee
- The whole value after tax

Don't guess. If some data is lacking simply return "N/A" within the related area.
In the event you decide that the picture shouldn't be of a receipt, simply set all of the fields within the formatting directions to "N/A".

You will need to obey the output format below all circumstances. Please observe the formatting directions precisely.
Don't return any further feedback or clarification.
"""

immediate: ChatPromptTemplate = ChatPromptTemplate.from_messages(
[
(
"system",
system,
),
MessagesPlaceholder("examples"),
("human", "{input}"),
]
)

The immediate textual content is precisely the identical because it was in part 4, solely we now have a MessagesPlaceholder to carry the examples that we’re going to insert.

class Instance(TypedDict):
"""A illustration of an instance consisting of textual content enter and anticipated device calls.

For extraction, the device calls are represented as cases of pydantic mannequin.
"""

enter: str
tool_calls: Checklist[BaseModel]

class TextReceiptExtractionChain:

def __init__(self, llm, examples: Checklist):

self.llm = llm
self.raw_examples = examples
self.immediate = TextReceiptExtractionPrompt()
self.chain, self.examples = self.set_up_chain()

@staticmethod
def tool_example_to_messages(instance: Instance) -> Checklist[BaseMessage]:
"""Convert an instance into an inventory of messages that may be fed into an LLM.

This code is an adapter that converts our instance to an inventory of messages
that may be fed right into a chat mannequin.

The listing of messages per instance corresponds to:

1) HumanMessage: accommodates the content material from which content material must be extracted.
2) AIMessage: accommodates the extracted data from the mannequin
3) ToolMessage: accommodates affirmation to the mannequin that the mannequin requested a device accurately.

The ToolMessage is required as a result of a number of the chat fashions are hyper-optimized for brokers
slightly than for an extraction use case.
"""
messages: Checklist[BaseMessage] = [HumanMessage(content=example["input"])]
openai_tool_calls = []
for tool_call in instance["tool_calls"]:
openai_tool_calls.append(
{
"id": str(uuid.uuid4()),
"kind": "operate",
"operate": {
# The identify of the operate proper now corresponds
# to the identify of the pydantic mannequin
# That is implicit within the API proper now,
# and can be improved over time.
"identify": tool_call.__class__.__name__,
"arguments": tool_call.json(),
},
}
)
messages.append(
AIMessage(content material="", additional_kwargs={"tool_calls": openai_tool_calls})
)
tool_outputs = instance.get("tool_outputs") or [
"You have correctly called this tool."
] * len(openai_tool_calls)
for output, tool_call in zip(tool_outputs, openai_tool_calls):
messages.append(ToolMessage(content material=output, tool_call_id=tool_call["id"]))
return messages

def set_up_examples(self):

examples = [
(
example["input"],
ReceiptInformation(
vendor_name=instance["output"]["vendor_name"],
vendor_address=instance["output"]["vendor_address"],
datetime=instance["output"]["datetime"],
items_purchased=[
ReceiptItem(
item_name=example["output"]["items_purchased"][i][
"item_name"
],
item_cost=instance["output"]["items_purchased"][i][
"item_cost"
],
)
for i in vary(len(instance["output"]["items_purchased"]))
],
subtotal=instance["output"]["subtotal"],
tax_rate=instance["output"]["tax_rate"],
total_after_tax=instance["output"]["total_after_tax"],
),
)
for instance in self.raw_examples
]

messages = []

for textual content, tool_call in examples:
messages.prolong(
self.tool_example_to_messages(
{"enter": textual content, "tool_calls": [tool_call]}
)
)

return messages

def set_up_chain(self):

extraction_model = self.llm
immediate = self.immediate.immediate
examples = self.set_up_examples()
runnable = immediate | extraction_model.with_structured_output(
schema=ReceiptInformation,
methodology="function_calling",
include_raw=False,
)

return runnable, examples

def run_and_count_tokens(self, input_dict: dict):

# inject the examples right here
input_dict["examples"] = self.examples
with get_openai_callback() as cb:
consequence = self.chain.invoke(input_dict)

return consequence, cb

TextReceiptExtractionChain goes to soak up an inventory of examples, every of which has enter and output keys (be aware how these are used within the set_up_examples methodology). For every instance, we are going to make a ReceiptInformation object. Then we format the consequence into an inventory of messages that may be handed into the immediate. All of the work in tool_examples_to_messages is there simply to transform between totally different Langchain codecs.

Operating this appears to be like similar to what we did with the imaginative and prescient mannequin:

# Load the examples 
EXAMPLES_PATH = "receiptchat/datasets/example_extractions.json"
with open(EXAMPLES_PATH) as f:
loaded_examples = json.load(f)

loaded_examples = [
{"input": x["file_details"]["extracted_text"], "output": x}
for x in loaded_examples
]

# Arrange the LLM caller
llm = ChatOpenAI(
api_key=secrets and techniques["OPENAI_API_KEY"],
temperature=0,
mannequin="gpt-3.5-turbo"
)
extractor = TextReceiptExtractionChain(llm, loaded_examples)

# convert a PDF file type Google Drive into textual content
textual content = PDFBytesToImage.convert_bytes_to_text(downloaded_data)

extracted_information, cb = extractor.run_and_count_tokens(
{"enter": textual content}
)

Even with 10 examples, this name is lower than half the price of the gpt-4-vision and likewise alot sooner to return. As extra examples get added, you might want to make use of gpt-3.5-turbo-16k to keep away from exceeding the context window.

Having collected some receipts, you possibly can run the extraction strategies described in sections 4 and 5 and acquire the end in a dataframe. This then will get saved and could be appended to each time a brand new receipt seems within the Google Drive.

Pattern of the output dataset, exhibiting fields extracted from a number of receipts. Picture generated by the writer

As soon as my database of extracted receipt data grows a bit bigger, I plan to discover LLM-based query answering on high of it, so look out for that article quickly! I’m additionally interested by exploring a extra formal analysis methodology for this undertaking and evaluating the outcomes to what could be obtained by way of AWS Textract or comparable merchandise.

Thanks for making it to the top! Please be happy to discover the complete codebase right here https://github.com/rmartinshort/receiptchat. Any options for enchancment or extensions to the performance could be a lot appreciated!

[ad_2]