-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Leveraging Multimodal Inputs for Information Extraction Task Using a Third-Party Large Language Model #2323
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
…ion Task Using a Third-Party Large Language Model
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
@BP-Ent Please review this sample NB. |
View / edit / reply to this conversation on ReviewNB BP-Ent commented on 2025-08-26T22:01:24Z Use multimodal inputs for information extraction with a third-party large language model
|
View / edit / reply to this conversation on ReviewNB BP-Ent commented on 2025-08-26T22:01:25Z As businesses increasingly digitize operations, vast amounts of transactional data—such as sales receipts—are often stored as scanned images or photos. Extracting meaningful, structured information from these image-based documents can support a variety of analytical and operational workflows, such as sales tracking, sales management, and customer insights. Recent advancements in large language models (LLMs) have opened new possibilities for interpreting and extracting information from such inputs with greater accuracy and flexibility. The GeoAI toolbox in ArcGIS Pro supports integration of third-party language models, allowing users to process and analyze text using external AI services. Custom third-party models can be wrapped in ESRI Deep Learning Package (.dlpk) files and used within GeoAI tools and the arcgis.learn API. In this sample, we demonstrate how one such model, GPT-4o from OpenAI, can be used with the Process Text Using AI Model tool to extract relevant entities from receipt images. Third-party model support in ArcGIS Pro and the For this use case, we use a set of sales receipt images and perform entity extraction to identify key pieces of information. The extracted entities can be used for downstream tasks like sales analytics, customer profiling, inventory management, or tax auditing. The model extracts the following entities:
This use case highlights how advanced AI models can streamline the transformation of unstructured receipt data into structured, analysis-ready datasets within the ArcGIS ecosystem. |
View / edit / reply to this conversation on ReviewNB BP-Ent commented on 2025-08-26T22:01:26Z For this use case, we use a sample set of images of printed sales receipts from various retail outlets like Walmart, Costco Wholesale, etc. |
View / edit / reply to this conversation on ReviewNB BP-Ent commented on 2025-08-26T22:01:26Z
|
View / edit / reply to this conversation on ReviewNB BP-Ent commented on 2025-08-26T22:01:27Z In this use case, we use a third-party hosted model (GPT-4o) to extract key entities from scanned sales receipt images. |
View / edit / reply to this conversation on ReviewNB BP-Ent commented on 2025-08-26T22:01:28Z The |
View / edit / reply to this conversation on ReviewNB BP-Ent commented on 2025-08-26T22:01:28Z Define |
View / edit / reply to this conversation on ReviewNB BP-Ent commented on 2025-08-26T22:01:29Z
The |
View / edit / reply to this conversation on ReviewNB BP-Ent commented on 2025-08-26T22:01:30Z For our use case, we are using a large language model (LLM) provided by an external AI service, GPT-4o from OpenAI. To enable communication between our custom Python NLP function and the OpenAI endpoint, we need a connection file. You can refer to the Create AI Service Connection File section for instructions on how to create one. This connection file securely stores the required credentials that will be retrieved and used to authenticate and initialize the third-party model. These saved credentials will be accessed from the connection file and set within the current class context in the following function. |
View / edit / reply to this conversation on ReviewNB BP-Ent commented on 2025-08-26T22:01:31Z This |
View / edit / reply to this conversation on ReviewNB BP-Ent commented on 2025-08-26T22:01:31Z This guide demonstrates how powerful AI models like GPT-4o can be seamlessly integrated into ArcGIS Pro through the This approach reduces the need for manual data entry or annotation, while delivering high-quality, analysis-ready outputs that can support a range of GIS and business workflows like sales analytics, inventory tracking, and customer insights. With built-in support for third-party AI services, ArcGIS Pro enables users to bring the latest in NLP and machine learning directly into their geospatial data pipelines. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@SurajBaloni Suggested changes made on reviewNB
View / edit / reply to this conversation on ReviewNB 1297rohit commented on 2025-08-27T04:23:09Z we can remove the imports that are not needed. |
View / edit / reply to this conversation on ReviewNB 1297rohit commented on 2025-08-27T04:23:09Z Can we add that this function will be called only once so we want to load anything it is best to add it here ? |
View / edit / reply to this conversation on ReviewNB 1297rohit commented on 2025-08-27T04:23:10Z Should we also show how to access |
View / edit / reply to this conversation on ReviewNB 1297rohit commented on 2025-08-27T04:23:10Z This function is designed to collect parameters from the user through the GeoAI tools. Does this function collect parameter from the user ? This function tells what all paramaters to load in the GeoAI tool, its data type and other values. |
View / edit / reply to this conversation on ReviewNB 1297rohit commented on 2025-08-27T04:23:11Z The
This method is responsible for:
Controlling how the model processes the input and generates the output based on the updated parameters. Does it control how the model process the input ? It seems getConfiguration simply gets the value from the user input. |
View / edit / reply to this conversation on ReviewNB 1297rohit commented on 2025-08-27T04:23:12Z Should we add few more detail about what to add in predict and how to return the procesed value that will be shown in pro ? |
View / edit / reply to this conversation on ReviewNB 1297rohit commented on 2025-08-27T04:23:12Z Do we zip the folder or just the files ? |
Thanks @BP-Ent and @1297rohit I’ve added your suggestions — they were really helpful. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Leveraging Multimodal Inputs for Information Extraction Task Using a Third-Party Large Language Model