Experiment with Llama3.2-vision and/or Llama3.3-vision to determine the elements that can be (somewhat) reliably detected on a variety of receipts.
Based on the above define a "schema" (in JSON?) that represents a receipt. Make it a superset of what can be detected (not every element needs to be reliably detected).
Implement, on top of App Engine's persistence service, the ability to persist a collection of such schemas. Make the mechanism flexible for all possible applications by using linked chain or doubly linked chain (or even 3D linking mechanism).