GitHub

Automated Credit Analysis

Our goal is to create a scalable proof-of-concept of automated credit analysis.

Our CEO asked us for some analysis.
Our input data is this bank statement.
Docsumo.ipynb calls Docsumo and saves output to disk.
Analysis.ipynb calls OpenAI, computes some ratios, and outputs this Credit analysis.

Focus on IDFC borrower

I opted to focus on the Indian borrower because his bank statements told a story. It was eye-opening to see him shuffling loans.

The techniques used should be extendable to different bank statements easily.

Approach

To win the trust of the CEO, we'll address most of the points in his requests.

State withdrawals and deposits for each month.
Detect transactions with lenders.
Comment on large transactions.
Raise red flags that should be deal-breakers for lending to the applicant.
Write up the findings.

I decided the highest reward-to-effort was to focus on the lender's current debt burden.

Doc parsing

I evaluated a half dozen options for parsing the document.

LLMSherpa solves a lot of problems with using LLMs to parse large PDFS, but it fails badly at parsing tables out of docs.

The best free options for parsing tables out of PDFS are Camelot and Tabula. I used Tabula, because from what I read Camelot required more setup. Cleaning up Tabula's output made the code brittle, so I evaluated doc parsing services.

Docparser and Parseur produced worse output than Tabula.
Nanonets requires a human to use a GUI to select ranges in a document.
CaptureFast requires talking to a salesperson to get an API key
DocSumo worked fantastic the output was accurate and it had great features such as confidence measures on each unit of data. I performed my analysis on the output of Docsumo.

Classification

The prompt pushed us to lean into LLMs to parse the doc, so I used ChatGPT to identify financial transactions. If we had data it would be better to use supervised learning. It may be worth paying to classify the docs, for example with Plaid Enrich.

ChatGPT caught most financial transactions but usually misclasified one of the two large transactions. With additional context, such as a list of financial institutions or classified list of line items, ChatGPTs performance may improve.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
docsumo_api_responses		docsumo_api_responses
statements		statements
.gitignore		.gitignore
README.md		README.md
analysis.ipynb		analysis.ipynb
credit_analysis.pdf		credit_analysis.pdf
docsumo.ipynb		docsumo.ipynb
hello.pdf		hello.pdf
prompt.md		prompt.md
requirements.txt		requirements.txt
tabula.ipynb		tabula.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Automated Credit Analysis

Focus on IDFC borrower

Approach

Doc parsing

Classification

About

Releases

Packages

Languages

owenbrown/casca

Folders and files

Latest commit

History

Repository files navigation

Automated Credit Analysis

Focus on IDFC borrower

Approach

Doc parsing

Classification

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages