Our goal is to create a scalable proof-of-concept of automated credit analysis.
- Our CEO asked us for some analysis.
- Our input data is this bank statement.
- Docsumo.ipynb calls Docsumo and saves output to disk.
- Analysis.ipynb calls OpenAI, computes some ratios, and outputs this Credit analysis.
I opted to focus on the Indian borrower because his bank statements told a story. It was eye-opening to see him shuffling loans.
The techniques used should be extendable to different bank statements easily.
To win the trust of the CEO, we'll address most of the points in his requests.
- State withdrawals and deposits for each month.
- Detect transactions with lenders.
- Comment on large transactions.
- Raise red flags that should be deal-breakers for lending to the applicant.
- Write up the findings.
I decided the highest reward-to-effort was to focus on the lender's current debt burden.
I evaluated a half dozen options for parsing the document.
LLMSherpa solves a lot of problems with using LLMs to parse large PDFS, but it fails badly at parsing tables out of docs.
The best free options for parsing tables out of PDFS are Camelot and Tabula. I used Tabula, because from what I read Camelot required more setup. Cleaning up Tabula's output made the code brittle, so I evaluated doc parsing services.
- Docparser and Parseur produced worse output than Tabula.
- Nanonets requires a human to use a GUI to select ranges in a document.
- CaptureFast requires talking to a salesperson to get an API key
- DocSumo worked fantastic the output was accurate and it had great features such as confidence measures on each unit of data. I performed my analysis on the output of Docsumo.
The prompt pushed us to lean into LLMs to parse the doc, so I used ChatGPT to identify financial transactions. If we had data it would be better to use supervised learning. It may be worth paying to classify the docs, for example with Plaid Enrich.
ChatGPT caught most financial transactions but usually misclasified one of the two large transactions. With additional context, such as a list of financial institutions or classified list of line items, ChatGPTs performance may improve.