-
Notifications
You must be signed in to change notification settings - Fork 352
Draft of potential masked array implementation. #849
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
Is the endgoal of this PR (and further work) to be as close as possible to the ones in numpy? I ask because I'm a long time user of numpy, as is everyone in the company I work for, and nobody used this In brief, I see 3 problems with this (not your PR spedifically, but the concept)
What I would gladly use is a but this is somewhat irrelevant to the current discussion :) |
|
I appreciate reading your sketch andrei, you're more productive than me, just having a go at a draft instead of trying to make something perfect. I think it's been mentioned before yeah, the question whether to have masked arrays or masked operations on arrays. I dread the complexity of either. Thanks nilgoyette for the candid thoughts too. I think we should start with masked operations. I think that's what a masked array type (if it were to exist) would need as basis anyway. And it allows having a separate mask too - which should hopefully be more efficent (packed or sparse bitmap?) |
|
I realise this is an older PR but I would vote in favour of masks. I'm by no means experienced in numpy so the following may have a different solution. My current use case is a 2D jagged array of ids. Each row represents the following: [parent_id, child_id, facet1_id, facet2_id, ...facetN_id]Since numpy docs state 2D arrays must be rectangular, not jagged, I use Following that, I mask the facet_groups = [[A, B], [C]]
filtered = arr[np.logical_and.reduce([np.isin(arr, facet_ids).any(axis=1) for facet_ids in facet_groups])]Now I can remove the mask and extract all the parent_id/child_id values with a slice: filtered.mask = np.ma.nomask
parent_ids = filtered[:,0]
child_ids = filtered[:,1]In addition, I use All in all, it's a very concise bit of code that performs very well for the small dataset of a few hundred thousand values. |
|
Yes, I highly agree with @stuarta0, there needs to be a masked_fill feature in ndarray. Currently, you would do so on a nth dimensional array using loops, where if a specific value is mask, then replacing it with z. |
At least this is what I do, there most definitely is a simpler version out somewhere. |
There are two files:
src/ma/mod.rs- masked array implementation, all the types and traits live there.tests/ma.rs- a couple of tests that demonstrate the potential public API of masked array.The main idea is to have a
Masktrait which is pretty generic and can be implemented not just byArrayBase, but by for example a set of whitelist/blacklist indices, set of whitelisted/blacklisted values, etc.