Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

info method - expose information about the model #1178

Open
martinmodrak opened this issue Aug 10, 2023 · 4 comments
Open

info method - expose information about the model #1178

martinmodrak opened this issue Aug 10, 2023 · 4 comments

Comments

@martinmodrak
Copy link

Summary:

Add a new info method that lets tools (or users) get info about the model to help e.g. deciding if the model needs recompilation.

Description:

Currently, a lot of the metadata about a compiled model can only be retrieved by running inference (Stan version, stanc flags) or not at all (cpp compile options, compiler, ...). This limits the ability of wrappers like cmdstanr to make good decisions on when to recompile a model as a change in compiler options or Stan version will not be recognized by checking modification times.

Beyond choosing to recompile or not, other tools may benefit from this metadata in my own work, the SBC package tries to do caching of its results and needs to check if the model it is given is substantially equivalent to a model the cached results were compiled with. It could rely on modification time, but this could result in some unnecessary recomputations (e.g. when the model is modified, found to be problematic and then the change is reverted). For those more extended usages by tools, it might make sense to also include the Stan code and the contents of user-provided extra .hpp in the information stored.

Since all of the information is known at a compile time, a simple implementation would have a .json (or other format) created at compile time and embedded as a resource. The output format could be configurable, but since the primary consumers are likely to be tools, it should probably default to JSON or similar.

Additional information:

If there is an agreement on implementing this, I'd be happy to write a PR.

Current Version:

v2.32.2

@WardBrian
Copy link
Member

We already have something like this (it’s even called info, #1010), but it’s definitely not the most informative.

@martinmodrak
Copy link
Author

Oh, I didn't see it in help or manual, so I thought it doesn't exist :-D . So then the proposal would be to add stanc options, cpp options and possible also compiler info and model and header code (and update the documentation)...

@WardBrian
Copy link
Member

Model code might not always be desired (I'm imagining a situation where someone wants to provide a proprietary package without the source), but we could have it as an option, or we could do something like include a hash of the source or something.

@martinmodrak
Copy link
Author

So, to be a bit more specific, I propose to:

  1. Cmdstan would create a .json file containing metadata about the model. I believe this should be possible completely within make targets, but in the worst case scenario, it might need a small compiled utility.
  2. The metadata would by default contain all options passed to stanc and the c compiler, info about compiler (CXX_YYY variables), hash of the source code (or potentially hash of normalized source code) and hash of the user header. There would be an option to also add the full source code. (I am not sure I could get hashing to be reliable across platforms within make, so that might require the special utility program)
  3. The .json file will be included as a binary resource in the executable (there appears to be a bewildering array of options to actually do this, so I'll need to investigate a bit what would be the most reliable approach)
  4. The info method would load the metadata from the .json resource, and combine it with the currently stored medata
  5. The info method would get an argument for output format - it could be the key:value format currently used or json for something more readable by tools (once again, especially for including the whole source code)
  6. The info method would be documented in cmdstan User's guide.

Would that make sense? Are there other people I should consult this with? Should I make a post on Discourse?

I find a file resources to be potentially more flexible (especially for adding the whole source code), but I am open to be convinced that just passing everything by macros and -D is better.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants