Skip to content

Conversation

SurajBaloni
Copy link
Collaborator

Leveraging Multimodal Inputs for Information Extraction Task Using a Third-Party Large Language Model

…ion Task Using a Third-Party Large Language Model
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@SurajBaloni SurajBaloni requested a review from 1297rohit August 4, 2025 05:45
@SurajBaloni SurajBaloni requested a review from BP-Ent August 26, 2025 04:27
@SurajBaloni
Copy link
Collaborator Author

@BP-Ent Please review this sample NB.

Copy link

review-notebook-app bot commented Aug 26, 2025

View / edit / reply to this conversation on ReviewNB

BP-Ent commented on 2025-08-26T22:01:24Z
----------------------------------------------------------------

Use multimodal inputs for information extraction with a third-party large language model


Copy link

review-notebook-app bot commented Aug 26, 2025

View / edit / reply to this conversation on ReviewNB

BP-Ent commented on 2025-08-26T22:01:25Z
----------------------------------------------------------------

As businesses increasingly digitize operations, vast amounts of transactional data—such as sales receipts—are often stored as scanned images or photos. Extracting meaningful, structured information from these image-based documents can support a variety of analytical and operational workflows, such as sales tracking, sales management, and customer insights.

Recent advancements in large language models (LLMs) have opened new possibilities for interpreting and extracting information from such inputs with greater accuracy and flexibility. The GeoAI toolbox in ArcGIS Pro supports integration of third-party language models, allowing users to process and analyze text using external AI services. Custom third-party models can be wrapped in ESRI Deep Learning Package (.dlpk) files and used within GeoAI tools and the arcgis.learn API. In this sample, we demonstrate how one such model, GPT-4o from OpenAI, can be used with the Process Text Using AI Model tool to extract relevant entities from receipt images. Third-party model support in ArcGIS Pro and the arcgis.learn API allows users to bring in AI models (whether hosted by providers like OpenAI, Azure, etc., built from open-source code, or fine-tuned for a specific task) to enhance natural language processing directly within ArcGIS workflows.

For this use case, we use a set of sales receipt images and perform entity extraction to identify key pieces of information. The extracted entities can be used for downstream tasks like sales analytics, customer profiling, inventory management, or tax auditing. The model extracts the following entities:

  • Sale Date
  • Customer Name
  • Product
  • Quantity
  • Unit Price
  • Total Amount
  • Tax Rate
  • Payment Method

This use case highlights how advanced AI models can streamline the transformation of unstructured receipt data into structured, analysis-ready datasets within the ArcGIS ecosystem.


Copy link

review-notebook-app bot commented Aug 26, 2025

View / edit / reply to this conversation on ReviewNB

BP-Ent commented on 2025-08-26T22:01:26Z
----------------------------------------------------------------

For this use case, we use a sample set of images of printed sales receipts from various retail outlets like Walmart, Costco Wholesale, etc.


Copy link

review-notebook-app bot commented Aug 26, 2025

View / edit / reply to this conversation on ReviewNB

BP-Ent commented on 2025-08-26T22:01:26Z
----------------------------------------------------------------


Copy link

review-notebook-app bot commented Aug 26, 2025

View / edit / reply to this conversation on ReviewNB

BP-Ent commented on 2025-08-26T22:01:27Z
----------------------------------------------------------------

In this use case, we use a third-party hosted model (GPT-4o) to extract key entities from scanned sales receipt images.


Copy link

review-notebook-app bot commented Aug 26, 2025

View / edit / reply to this conversation on ReviewNB

BP-Ent commented on 2025-08-26T22:01:28Z
----------------------------------------------------------------

The __init__ method initializes instance variables like namedescription, and other attributes essential for the NLP function.


Copy link

review-notebook-app bot commented Aug 26, 2025

View / edit / reply to this conversation on ReviewNB

BP-Ent commented on 2025-08-26T22:01:28Z
----------------------------------------------------------------

Define initialize function

The initialize method is called at the start of the custom Python NLP function. Within this function, we will set up the necessary variables. It accepts two parameters via kwargs:

Parameters in kwargs

  • model: The path to the ESRI Model Definition (.emd) file.
  • device: The name of the device (either GPU or CPU). This is particularly important for on-premises models.

initialize reads the ESRI Model Definition (.emd) file and configures the essential variables needed for inference.


Copy link

review-notebook-app bot commented Aug 26, 2025

View / edit / reply to this conversation on ReviewNB

BP-Ent commented on 2025-08-26T22:01:29Z
----------------------------------------------------------------

The getParameterInfo function is designed to collect parameters from the user through GeoAI tools. For our use case, it gathers the text prompt used to instruct the language model on which entities to extract from the input receipt text, as well as the connection file required to authenticate with the third-party model provider. Refer to the getParameterInfo section to get a detailed explanation of the getParameterInfo function.


Copy link

review-notebook-app bot commented Aug 26, 2025

View / edit / reply to this conversation on ReviewNB

BP-Ent commented on 2025-08-26T22:01:30Z
----------------------------------------------------------------

For our use case, we are using a large language model (LLM) provided by an external AI service, GPT-4o from OpenAI. To enable communication between our custom Python NLP function and the OpenAI endpoint, we need a connection file. You can refer to the Create AI Service Connection File section for instructions on how to create one.

This connection file securely stores the required credentials that will be retrieved and used to authenticate and initialize the third-party model. These saved credentials will be accessed from the connection file and set within the current class context in the following function.


Copy link

review-notebook-app bot commented Aug 26, 2025

View / edit / reply to this conversation on ReviewNB

BP-Ent commented on 2025-08-26T22:01:31Z
----------------------------------------------------------------

This .dlpk file is now ready for use with the Process Text Using AI Model tool inside ArcGIS Pro.


Copy link

review-notebook-app bot commented Aug 26, 2025

View / edit / reply to this conversation on ReviewNB

BP-Ent commented on 2025-08-26T22:01:31Z
----------------------------------------------------------------

This guide demonstrates how powerful AI models like GPT-4o can be seamlessly integrated into ArcGIS Pro through the Process text Using AI Model tool to extract structured information from unstructured receipt images. By using the Process Text Using AI Model tool and a custom LLM wrapped in a Deep Learning Package (.dlpk), users can automate the extraction of key entities like sale date, customer name, product details, and payment method.

This approach reduces the need for manual data entry or annotation, while delivering high-quality, analysis-ready outputs that can support a range of GIS and business workflows like sales analytics, inventory tracking, and customer insights. With built-in support for third-party AI services, ArcGIS Pro enables users to bring the latest in NLP and machine learning directly into their geospatial data pipelines.


Copy link
Collaborator

@BP-Ent BP-Ent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SurajBaloni Suggested changes made on reviewNB

Copy link

review-notebook-app bot commented Aug 27, 2025

View / edit / reply to this conversation on ReviewNB

1297rohit commented on 2025-08-27T04:23:09Z
----------------------------------------------------------------

we can remove the imports that are not needed.


Copy link

review-notebook-app bot commented Aug 27, 2025

View / edit / reply to this conversation on ReviewNB

1297rohit commented on 2025-08-27T04:23:09Z
----------------------------------------------------------------

Can we add that this function will be called only once so we want to load anything it is best to add it here ?


Copy link

review-notebook-app bot commented Aug 27, 2025

View / edit / reply to this conversation on ReviewNB

1297rohit commented on 2025-08-27T04:23:10Z
----------------------------------------------------------------

Should we also show how to access device from the kwargs?


Copy link

review-notebook-app bot commented Aug 27, 2025

View / edit / reply to this conversation on ReviewNB

1297rohit commented on 2025-08-27T04:23:10Z
----------------------------------------------------------------

This function is designed to collect parameters from the user through the GeoAI tools.

Does this function collect parameter from the user ? This function tells what all paramaters to load in the GeoAI tool, its data type and other values.


Copy link

review-notebook-app bot commented Aug 27, 2025

View / edit / reply to this conversation on ReviewNB

1297rohit commented on 2025-08-27T04:23:11Z
----------------------------------------------------------------

The getConfiguration method sets up and manages the parameters required by the NLP function. It receives keyword arguments (kwargs) that contain the values provided by the user—either through the GeoAI tool interface or programmatically in the getParameterInfo function.

This method is responsible for:

  • Extracting user-specified values input from the tool (e.g., prompt, AI connection file path).
  • Storing these values in class-level variables or a configuration dictionary.
Controlling how the model processes the input and generates the output based on the updated parameters.

Does it control how the model process the input ? It seems getConfiguration simply gets the value from the user input.


Copy link

review-notebook-app bot commented Aug 27, 2025

View / edit / reply to this conversation on ReviewNB

1297rohit commented on 2025-08-27T04:23:12Z
----------------------------------------------------------------

Should we add few more detail about what to add in predict and how to return the procesed value that will be shown in pro ?


Copy link

review-notebook-app bot commented Aug 27, 2025

View / edit / reply to this conversation on ReviewNB

1297rohit commented on 2025-08-27T04:23:12Z
----------------------------------------------------------------

Do we zip the folder or just the files ?


@SurajBaloni
Copy link
Collaborator Author

Thanks @BP-Ent and @1297rohit I’ve added your suggestions — they were really helpful.

Copy link
Collaborator

@1297rohit 1297rohit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@SurajBaloni SurajBaloni requested a review from BP-Ent August 27, 2025 05:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants