-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Add experimental dims
module with objects that follow dim-based semantics (like xarray without coordinates)
#7820
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
day_of_conception = datetime.date(2025, 6, 17) | ||
day_of_last_bug = datetime.date(2025, 6, 17) | ||
today = datetime.date.today() | ||
days_with_bugs = (day_of_last_bug - day_of_conception).days |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wtf 😆
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This has two purposes: distract reviewers so they don't focus on the critical changes, and prove that OSS libraries can't be fun.
1cfde5b
to
f571e5d
Compare
Can this index using labels? |
f571e5d
to
56402aa
Compare
I don't know what Is Like in xarray, you can do You cannot do The new PyTensor objects have dims but not coords. It's not trivial to encode coord based operations in our backends. |
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
3eb6738
to
67a0eda
Compare
3234468
to
d4017fb
Compare
We should make this the 6.0 release. |
I agree, but would perhaps wait until we beta-tested it to the point it no longer feels too experimental |
dims
module with objects that follow dim-based semantics (like xarray)
f1f7478
to
56c05f1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
adding one extra comment about myst syntax but outside reviewnb because it always messes up rendering of backticks.
I will try to play around and build/port some models one of these days, and look at the code itself while I do that
c2b3c7f
to
619b1ad
Compare
dims
module with objects that follow dim-based semantics (like xarray)dims
module with objects that follow dim-based semantics (like xarray without coordinates)
b1b624c
to
05e3033
Compare
2fb0a52
to
fc635ab
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #7820 +/- ##
==========================================
- Coverage 92.92% 92.87% -0.05%
==========================================
Files 107 115 +8
Lines 18299 18816 +517
==========================================
+ Hits 17004 17476 +472
- Misses 1295 1340 +45
🚀 New features to boost your workflow:
|
33db15f
to
4ab90c7
Compare
Pasting the documentation link again The new notebook doesn't show up: |
How about adding here too: https://pymcio--7820.org.readthedocs.build/projects/docs/en/7820/api.html |
Also remove commented out code
* Use RV Op name when provided * More robust detection of observed data variables (after pymc-devs#7656 arbitrary graphs are allowed) * Remove self loops explicitly (closes pymc-devs#7722)
Co-authored-by: Oriol Abril-Pla <[email protected]> Co-authored-by: Allen Downey <[email protected]>
4ab90c7
to
36350e8
Compare
Thanks @williambdean it should now be listed. I also tried to add some stuff in the API but it's likely broken. Help appreciated @OriolAbril (last commit) |
The perfect PR doesnt exi... |
docs/source/api/dims.rst
Outdated
|
||
This submodule contains functions for defining distributions and operations that use explicit dimensions. | ||
|
||
The module is presented in :doc:`dims_module`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should be a ref
type cross-reference: :ref:`dims_module`
.. currentmodule:: pymc.dims | ||
|
||
.. autosummary:: | ||
:toctree: generated/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should be a toctree directive only as (for now) it is listing other doc pages, not yet pymc objects
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you propose the changes with github comment thing? I'm not sure if I should take autosummary or something else?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to and didn't work, now it does allow me:
.. currentmodule:: pymc.dims | |
.. autosummary:: | |
:toctree: generated/ | |
.. toctree:: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Current preview of this page: https://pymcio--7820.org.readthedocs.build/projects/docs/en/7820/api/dims.html. I had to enter the url manually because it is not part of the toctree, it should be added to https://github.com/pymc-devs/pymc/blob/main/docs/source/api.rst?plain=1#L7-L25 (adding it here will also make the page have a sidebar like the other api pages)
Dims | ||
==== | ||
|
||
This submodule contains functions for defining distributions and operations that use explicit dimensions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would probably add a warning about the module here too
class Flat(DimDistribution): | ||
xrv_op = pxr.as_xrv(flat) | ||
|
||
@classmethod | ||
def dist(cls, **kwargs): | ||
return super().dist([], **kwargs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These currently end up having an empty docstring: https://pymcio--7820.org.readthedocs.build/projects/docs/en/7820/api/dims/generated/pymc.dims.Flat.html. Is it possible to dynamically copy the docstring from the "regular" distribution?
For vector distributions or transforms we already have specific docstrings (and I think we want that there) but scalar ones can probably share the docstring after a search/replace of tensor_like
-> xtensor_like
36350e8
to
341ffca
Compare
This builds on top of PyTensor xtensor module, to introduce distributions and model objects that follow xarray-like semantics (without coordinates). Example model:
Equivalently, with the traditional API:
More details in the new core notebook
Points of contention
People have expressed dissatisfaction with some aspects of the current approach. I'll try to summarize it here as fair as possible, and feel free to correct me in the comments
Dims being strings
Current design: dims are just strings that label the axis of a tensor.
There are 3 concerns with this simplicity:
item
ortime
to something less confusing likeproduct_dim = "item"
, or just use other labels.Obviously we can have
safe_name = get_discardable_new_dim_name(chol.type.dims)
, but the point still stands that it can be awkward to do abstract work on dims.x[1:]
come out with the same dims asx
is error-prone because you may then accidentally add them back and get an error only at runtime. I agree partially with this concern.Note: It's not that dims must be strings (although they do now). Dims could be made to be any hashable / pickleable object, not just strings without much work in PyTensor. When we talk of richer dims, we mean something like: pymc-devs/pytensor#407
Counterpoints:
I think the shape problem is mostly mitigated by using static shapes in PyTensor. XTensorVariables still have the
type.shape
attribute and if you try to addx + x[1:]
, if x had a static shape it would raise immediately. However we don't use static shapes in PyMC by default, all dims start as being mutable and can later be made constant with helpers likefreeze_data_and_dims
, which the jax / nutpie samplers do by default, but not the pymc ones.It's much easier to write the actual model. You can say
concat([x, y, x], dim="time")
, whereas with specialized dims this may not be possible or actively forbidden. Alternative APIs may look like some variant ofconcat([x, y, x], dim=(x.dim[0], y.dim[1], x.dim[0])
, which imo defeats the main purpose of having dims (not having to worry about dimension position in the underlying memory).Alternatively, richer structured dims introduced by PyTensor automatically (say when you do
x[::-1]
) could be made somehow comparable / matcheable to the original dims the user provided. So an operator like concat may be able to treat the dims ofx[::-1].dims
as identical to the originalx.dims
for the purpose of deciding which axis to align. If this can be achieved, it should require no user-facing changes in the current code and so the existing code could be merged as is.No coordinates
As you can see from the comment below by @twiecki (and it's indeed the first thing everyone asks), people wish to be able to do coordinate based operations, mainly selection like
x.sel(time="yesterday")
. That's not possible in the current implementation (a useful error message is raised if you try to use that method).Also it creates a hard problems for us. This is discussed in the new notebook, but in summary, the current PyMC API uses coords to actually define the model AND propagates these to xarray objects after sampling, whenever the user tells us that variable has the same dims. With the new dims model we always know the dims of every variable, so we should always propagate?
Not so easy. The original coordinates are not trivially valid for intermediate operations. They may have the wrong length or order. So what options do we have?
Also note that xarray Datasets always need coords, even if individual DataArrays do not. We have to decide what to do with coords one way or another :)
This PR takes the most dumb pragmatic approach. Keep coords unless you detect a shape inconsistency. In that case, refuse to propagate them with a warning. Options for users are to use custom coords for intermediate operations (in which case they also need a custom dim, since PyMC doesn't have a concept of coords per variable (#7852) or to use the old
pm.Deterministic
without dims (instead of pm.dims.Deterministic) and get the behavior mentioned above (no custom coords nor dims).If users care about coords they have to mend them manually after the fact. This is discussed at the end of the new notebook (still have to add example of mending).
So why don't we have coords in the graph?
I don't think this can be realistically done in a way that is both performant, simple and doesn't restrict us to static shapes.
Just give up on static shapes? Some people really doubt that we need this flexibility in PyTensor (JAX does just fine without it right?). I strongly disagree. Just to give one example: It rules out any sort of symbolic iterative algorithms unless you pad everything to the worst case scenarios. Saying this example is not worthwhile is not saying we never need runtime shapes! It's just one example where it's needed, there are more.
Besides getting rid of runtime shapes doesn't make it trivially easy to implement and work with coords abstractly. A lot of PyMC magic is transforming user code to go from graph A -> graph B. Adding coordinates increases code complexity.
These are related concerns
Talking about concerns for the lack of coords circles back to the concerns about dims listed above. coords in a sense supercharge dims in xarray. Look at how the behavior of
x[:-idx] + x[idx:]
changes when you have arrays with coordinates vs without.If we know the coordinates of variables, it's no longer confusing that things may have different shapes. There are clear rules how to solve the discrepancy (which are controlled by global variables in xarray..., which may affect how much you like it.)
Reminder: The current approach works the same as xarrays without coords. Some proposals for richer dims without coords would still have to decide what
x[idx:] + x[:-idx]
should do. Options and my opinionated downsides areFinal note
Perfection is the enemy of progress. We have had 5 solid years with deficient use of coords/dims. People seem to like it, and I don't think anyone is doing anything better out there. I believe this is a solid incremental step that brings dims to the model for real, while still handling coords deficiently.
I'm not against coords, I am against bad performance, code complexity and static-shape handicaps (in that order). Maybe these problems can be solved, but they won't be solved by me.
I'm also not against more structured dims at the PyTensor level. Happy to help out there.
I also don't think we should wait. The XTensor stuff, simple as it is, took a while. I opened a draft PR two summers ago, and there is still a lot to do. I would like to share it and start iterating with user feedback, instead of just theoretical concerns. I'm optimistic that this would both inform a more advanced approach and allow us to transition from a less "beta" stage than the one we are at now.
I'm also optimistic the community will accept the experimental / exploratory nature of the new approach, and that we may drop it for something else down the road, that we believe is even better.