This project demonstrates a few end-to-end proof-of-concepts for extracting structured information from images of U.S. identification documents (drivers licenses, state IDs, and passports) using Fireworks AI.
This was done as a demo for Fireworks AI job application process in early 2025, which took 6 weeks, 6 interviews, a demo take home-home, defense of that take home, and then on completion, I got a no thank you from them. So, approximately 2 full weeks of unpaid time. They also made me sign paperwork to allow them to use the code as they see fit before the demo. This was the first time they heard of a validation agent according to the interviewer of the demo.
streamlit run app.py
stremalit run test_passports.py
This is an agentic workflow for OCR processing. It sends the image ot Google for OCR, and then back to the DocType Operator to start the pipeline. You can see the approximate path below.
-
Document Inlining:
The solution leverages Fireworks AI’s Document Inlining feature to process mulitple images by embedding image data directly (via Base64 encoding) into the API request. -
JSON Mode Structured Responses:
We use Fireworks AI’s JSON Mode to instruct the model to return results in a well-structured JSON format. This allows easy validation and further processing of the extracted data.
-
Clone the repository:
git clone <your-repo-url> cd <your-repo-directory>
-
Create a .env file with your key:
touch .env
Add these values to your
.env:BASE_URL="https://api.fireworks.ai/inference/v1" FIREWORKS_API_KEY="xxxx" GOOGLE_API_KEY="xxxx" -
Run the apps:
pip install -r requirements.txtTo run the single image test:
streamlit run app.pyTo run the multimage testing, you'll need to have images installed in
synthetic_passports, along with an appropriatejsonlfile. You can use thegenerate_passports.pyfile under thefine_tuningfolder to select a template (hard coded), and then it should populate. From there, you simply come back, and run:streamlit run test_passports.py
