Skip to content

lesreaper/ocr-fireworks-agentic-workflow

Repository files navigation

Identification Document Extraction using Fireworks AI

This project demonstrates a few end-to-end proof-of-concepts for extracting structured information from images of U.S. identification documents (drivers licenses, state IDs, and passports) using Fireworks AI.

This was done as a demo for Fireworks AI job application process in early 2025, which took 6 weeks, 6 interviews, a demo take home-home, defense of that take home, and then on completion, I got a no thank you from them. So, approximately 2 full weeks of unpaid time. They also made me sign paperwork to allow them to use the code as they see fit before the demo. This was the first time they heard of a validation agent according to the interviewer of the demo.

To Run Apps

streamlit run app.py
stremalit run test_passports.py

What's Inside

This is an agentic workflow for OCR processing. It sends the image ot Google for OCR, and then back to the DocType Operator to start the pipeline. You can see the approximate path below.

worflow

Key Features in Document Inlining

  • Document Inlining:
    The solution leverages Fireworks AI’s Document Inlining feature to process mulitple images by embedding image data directly (via Base64 encoding) into the API request.

  • JSON Mode Structured Responses:
    We use Fireworks AI’s JSON Mode to instruct the model to return results in a well-structured JSON format. This allows easy validation and further processing of the extracted data.

Setup and Installation (Docker removed for now)

  1. Clone the repository:

    git clone <your-repo-url>
    cd <your-repo-directory>
  2. Create a .env file with your key:

    touch .env

    Add these values to your .env:

    BASE_URL="https://api.fireworks.ai/inference/v1"
    FIREWORKS_API_KEY="xxxx"
    GOOGLE_API_KEY="xxxx"
    
  3. Run the apps:

    pip install -r requirements.txt
    

    To run the single image test:

    streamlit run app.py
    

    To run the multimage testing, you'll need to have images installed in synthetic_passports, along with an appropriate jsonl file. You can use the generate_passports.py file under the fine_tuning folder to select a template (hard coded), and then it should populate. From there, you simply come back, and run:

    streamlit run test_passports.py
    

About

This is an agent workflow for OCR using Fireworks AI

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published