This is a desirable best practice originating from the "Perfect AI Docs Assistant" project https://github.com/RocketChat/google-summer-of-code/blob/main/google-summer-of-code-2025.md#-perfect-ai-docs-assistant-app
The dataset should include receipts (perhaps in base64 format) that we will use in our unit tests. It should be organized in a manner that is easy for our unit tests to consume.
In the future perhaps we can even experiment with the use of the dataset to fine-tune/post-train a rc-receipts-friendly vision model.