Upgraded metadata lookup for backwards compatibility, added option to input metadata manually by PriyankaKetkarBNL · Pull Request #179 · usnistgov/PyHyperScattering

PriyankaKetkarBNL · 2025-02-23T20:03:12Z

Addresses issue #178.

mdLookup dictionary was updated so that beamline metaddata key names can be entered as a list instead of single value (or two values in the case of secondary_lookup table). Sets up infrastructure for backwards compatibility with respect to key names used historically at the beamline. secondary_lookup dictionary was removed, as all historical key names are consolidated into mdLookup. These changes were propagated to the code where Tiled is searched for metadata keys as well as in the construction of reverse lookup table (reverse_lut) in loadRun. These changes were tested for scan IDs 92849, 93065, and 91175, which include scans from before and after the beamline codebase upgrades in January 2025.

Loader = phs.load.SST1RSoXSDB(corr_mode="none")

scanID = 92849 ## Count scan with 50 repeats at constant energy and polarization
#scanID = 93065 ## RSoXS energy scan with 1 repeat post-January 2025
#scanID = 91175 ## RSoXS energy scan with 1 repeat December 2024

scan = Loader.loadRun(scanID, dims=["time", "energy", "polarization"]).unstack('system')
scan

Also tested the following:

scanID_spiral = 92770

scan = loader.loadRun(scanID_spiral, dims=['sam_x','sam_y']).unstack('system')
scan

Additionally, mdManual input was added into loadRun function so that metadata values can be entered manually in case they were not originally written out with the scan. My understanding is that the current coords input cannot take more qualitative single-value entries (e.g., {"sample_notes": "details"}) that are meant to be stored in attrs. If it makes sense, I could think of how to consolidate these two inputs.

…table.

…g purposes.

…written out during a scan

…here epoch defined.

…rched in Tiled for backwards compatibility

pbeaucage

This is actually a pretty minor change to the data structure that maps metadata fields to PyHyper words, made far more complicated by changing a bunch of variable names and re-inventing basic function calls. Before merging, we should

revert the changes to variable names from lowercase/underscore to camel case, making them match the rest of the class. This made this quite difficult to review and I'd like to look again after this is fixed to ensure we didn't lose anything in the diff.
add a test that covers the new metadata names
ideally, add a documentation page on this variable name mechanism. I'd accept making an issue for this.

src/PyHyperScattering/SST1RSoXSDB.py

…ade during 20250308 beam time.

…d values for now.

Co-authored-by: Peter Beaucage <peter.beaucage@nist.gov>

src/PyHyperScattering/SST1RSoXSDB.py

pbeaucage

I believe this is now good to go, assuming you are OK with the workaround for manual metadata documented in the issue.

Also, can you please add some tests for new data before we merge?

…anges

…ase changes

PriyankaKetkarBNL · 2025-04-01T19:24:10Z

I believe this is now good to go, assuming you are OK with the workaround for manual metadata documented in the issue.

Also, can you please add some tests for new data before we merge?

Yes, your changes look good to me. I just added some tests.

I noticed before adding my tests that some of the tests had failed. Possibly from the recent commits, but I hadn't kept a close eye.

PriyankaKetkarBNL · 2025-04-01T19:25:27Z

This is actually a pretty minor change to the data structure that maps metadata fields to PyHyper words, made far more complicated by changing a bunch of variable names and re-inventing basic function calls. Before merging, we should

revert the changes to variable names from lowercase/underscore to camel case, making them match the rest of the class. This made this quite difficult to review and I'd like to look again after this is fixed to ensure we didn't lose anything in the diff.

add a test that covers the new metadata names

ideally, add a documentation page on this variable name mechanism. I'd accept making an issue for this.

What do you mean regarding the third bullet? Is that for explaining PyHyperScattering variable names or beamline variable names?

pbeaucage · 2025-04-01T19:26:57Z

Tests failing: Weird, transient Tiled errors. I was looking at that. Hopefully they don't recur. You can click thru to get the run log.

Variable names: a list of the standard PyHyper words and what they mean; energy, polarization, sam_x, sam_y, sam_th, etc. I can copy that into an issue.

Just curious: do those test load calls work without hinting the dimensions? I'm especially curious for the count scan, if it works by default. We can fix that if not.

PriyankaKetkarBNL · 2025-04-01T19:30:36Z

Few style changes here, in particular, it is bad form to put anything of consequence on the same line as a flow control block, really that should be used for only minimal logic, it is far more readable to add a newline. I tried to add some comments explaining the logic flow as well for the benefit of future maintainers. If these all look good, the only thing left is to decide on override_md or similar, I will comment in issue thread.

What do you mean by the first sentence? Is the "consequence" referring to some operation and the "flow control block" things like for, if, try, etc.?

pbeaucage · 2025-04-01T19:35:06Z

Blegh, tests are working, but the Tiled server is just dropping connections. I've messaged on NSLS2 slack.

PriyankaKetkarBNL · 2025-04-01T19:41:06Z

Tests failing: Weird, transient Tiled errors. I was looking at that. Hopefully they don't recur. You can click thru to get the run log.

Variable names: a list of the standard PyHyper words and what they mean; energy, polarization, sam_x, sam_y, sam_th, etc. I can copy that into an issue.

Just curious: do those test load calls work without hinting the dimensions? I'm especially curious for the count scan, if it works by default. We can fix that if not.

The energy scans (scan IDs 93065 and 91175) and the spiral (scan ID 92770) worked. Time scan (scan ID 92849) did not work.

loader.loadRun(run=92849).unstack('system')

Error:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[187], line 1
----> 1 scan = loader.loadRun(run=92849).unstack('system')
      2 scan

File ~\.conda\envs\20250308_1\Lib\site-packages\xarray\core\dataarray.py:3019, in DataArray.unstack(self, dim, fill_value, sparse)
   2958 def unstack(
   2959     self,
   2960     dim: Dims = None,
   (...)
   2963     sparse: bool = False,
   2964 ) -> Self:
   2965     """
   2966     Unstack existing dimensions corresponding to MultiIndexes into
   2967     multiple new dimensions.
   (...)
   3017     DataArray.stack
   3018     """
-> 3019     ds = self._to_temp_dataset().unstack(dim, fill_value=fill_value, sparse=sparse)
   3020     return self._from_temp_dataset(ds)

File ~\.conda\envs\20250308_1\Lib\site-packages\xarray\core\dataset.py:5833, in Dataset.unstack(self, dim, fill_value, sparse)
   5829         result = result._unstack_full_reindex(
   5830             d, stacked_indexes[d], fill_value, sparse
   5831         )
   5832     else:
-> 5833         result = result._unstack_once(d, stacked_indexes[d], fill_value, sparse)
   5834 return result

File ~\.conda\envs\20250308_1\Lib\site-packages\xarray\core\dataset.py:5653, in Dataset._unstack_once(self, dim, index_and_vars, fill_value, sparse)
   5650 variables: dict[Hashable, Variable] = {}
   5651 indexes = {k: v for k, v in self._indexes.items() if k != dim}
-> 5653 new_indexes, clean_index = index.unstack()
   5654 indexes.update(new_indexes)
   5656 for idx in new_indexes.values():

File ~\.conda\envs\20250308_1\Lib\site-packages\xarray\core\indexes.py:1057, in PandasMultiIndex.unstack(self)
   1054 clean_index = remove_unused_levels_categories(self.index)
   1056 if not clean_index.is_unique:
-> 1057     raise ValueError(
   1058         "Cannot unstack MultiIndex containing duplicates. Make sure entries "
   1059         f"are unique, e.g., by  calling ``.drop_duplicates('{self.dim}')``, "
   1060         "before unstacking."
   1061     )
   1063 new_indexes: dict[Hashable, Index] = {}
   1064 for name, lev in zip(clean_index.names, clean_index.levels, strict=True):

ValueError: Cannot unstack MultiIndex containing duplicates. Make sure entries are unique, e.g., by  calling ``.drop_duplicates('system')``, before unstacking.

… reference

pbeaucage · 2025-04-01T20:08:32Z

What do you mean by the first sentence? Is the "consequence" referring to some operation and the "flow control block" things like for, if, try, etc.?

So yes, while it is technically valid Python syntax to write, for instance:

try: from wherever import thing
except ImportError: from elsewhere import thing

(or yes, any ofter flow control statement like for or if)
that is really only appropriate for things that are short and minimal, not things that require thought. You would usually put a newline and then indent it.

try:
    from wherever import thing
except ImportError:
    from elsewhere import thing

One way to think about this is that Python is a language where whitespace and layout are meaningful, compared with say C. So following expected patterns of whitespace and flow are helpful.

pbeaucage · 2025-04-01T20:32:05Z

Created #190, #191 for the last few items.

PriyankaKetkarBNL · 2025-04-01T20:46:54Z

The energy scans (scan IDs 93065 and 91175) and the spiral (scan ID 92770) worked. Time scan (scan ID 92849) did not work.

I'm testing more recent scans now and ran into a situation where scan ID 93983 loads fine with dims=["energy"] hinted but throws an error without hinting.

scan = loader.loadRun(run=93983).unstack('system')

Error:

---------------------------------------------------------------------------
NotImplementedError                       Traceback (most recent call last)
Cell In[10], line 1
----> 1 scan = loader.loadRun(run=93983).unstack('system')
      2 scan

File ~\Anaconda3\envs\20250331_1\Lib\site-packages\PyHyperScattering\SST1RSoXSDB.py:740, in SST1RSoXSDB.loadRun(self, run, dims, coords, return_dataset, useMonitorShutterThinning)
    737     dims_to_join.append(val)
    738     dim_names_to_join.append(key)
--> 740 index = pd.MultiIndex.from_arrays(dims_to_join, names=dim_names_to_join)
    741 # handle the edge case of a partly-finished scan
    742 if len(index) != len(data["time"]):

File ~\Anaconda3\envs\20250331_1\Lib\site-packages\pandas\core\indexes\multi.py:533, in MultiIndex.from_arrays(cls, arrays, sortorder, names)
    530     if len(arrays[i]) != len(arrays[i - 1]):
    531         raise ValueError("all arrays must be same length")
--> 533 codes, levels = factorize_from_iterables(arrays)
    534 if names is lib.no_default:
    535     names = [getattr(arr, "name", None) for arr in arrays]

File ~\Anaconda3\envs\20250331_1\Lib\site-packages\pandas\core\arrays\categorical.py:3069, in factorize_from_iterables(iterables)
   3065 if len(iterables) == 0:
   3066     # For consistency, it should return two empty lists.
   3067     return [], []
-> 3069 codes, categories = zip(*(factorize_from_iterable(it) for it in iterables))
   3070 return list(codes), list(categories)

File ~\Anaconda3\envs\20250331_1\Lib\site-packages\pandas\core\arrays\categorical.py:3069, in <genexpr>(.0)
   3065 if len(iterables) == 0:
   3066     # For consistency, it should return two empty lists.
   3067     return [], []
-> 3069 codes, categories = zip(*(factorize_from_iterable(it) for it in iterables))
   3070 return list(codes), list(categories)

File ~\Anaconda3\envs\20250331_1\Lib\site-packages\pandas\core\arrays\categorical.py:3042, in factorize_from_iterable(values)
   3037     codes = values.codes
   3038 else:
   3039     # The value of ordered is irrelevant since we don't use cat as such,
   3040     # but only the resulting categories, the order of which is independent
   3041     # from ordered. Set ordered to False as default. See GH #15457
-> 3042     cat = Categorical(values, ordered=False)
   3043     categories = cat.categories
   3044     codes = cat.codes

File ~\Anaconda3\envs\20250331_1\Lib\site-packages\pandas\core\arrays\categorical.py:425, in Categorical.__init__(self, values, categories, ordered, dtype, fastpath, copy)
    422 elif isinstance(values, np.ndarray):
    423     if values.ndim > 1:
    424         # preempt sanitize_array from raising ValueError
--> 425         raise NotImplementedError(
    426             "> 1 ndim Categorical are not supported at this time"
    427         )
    428     values = sanitize_array(values, None)
    429 else:
    430     # i.e. must be a list

NotImplementedError: > 1 ndim Categorical are not supported at this time

pbeaucage · 2025-04-01T20:50:44Z

OK, that sounds like a separate issue. Not sure what would be causing it offhand. Feel free to make new thread to look into.

pbeaucage · 2025-04-01T20:57:29Z

Tests pass! merging

PriyankaKetkarBNL · 2025-04-01T20:59:33Z

OK, that sounds like a separate issue. Not sure what would be causing it offhand. Feel free to make new thread to look into.

Sounds good. Created issue #193.

PriyankaKetkarBNL added 16 commits February 19, 2025 10:39

Added new names for sam_x and sam_y into md_secondary_lookup

85bcc5d

Trying temporary fix to replace metadata names in the primary lookup …

b33d40b

…table.

Added a comment.

b1d6d8e

Temporarily removing actual_exposure in loadRun

09ff9c7

Trying to put placeholder value for md["exposure"] for troubleshootin…

7a3fbc9

…g purposes.

Added comment

7732c9d

Updated how to handle metadata that loadRun is expecting but was not …

c63d446

…written out during a scan

Using primary timestamps instead of start document time for images

2c4b3f6

Changing epoch in 2 places?

4c47923

Removed .timestamps() attribute and also removed duplicate instance w…

ac53c04

…here epoch defined.

Updated mdLookup dictionary and propagated changes to where md is sea…

5fe73ca

…rched in Tiled for backwards compatibility

Fixed typo

74ab887

Fixed indentation

5538ed0

Fixed typo

b607a75

Updated md key name and warnings while looking up data from Tiled.

5c7ae68

Added comment.

c7d0429

PriyankaKetkarBNL requested a review from pbeaucage February 23, 2025 20:03

PriyankaKetkarBNL linked an issue Feb 23, 2025 that may be closed by this pull request

Bug: loadRun is not working for 2025-1 cycle scans due to beamline codebase refactoring. #178

Closed

Simplified checking if a key is in md dictionary

cd06476

pbeaucage requested changes Mar 3, 2025

View reviewed changes

PriyankaKetkarBNL and others added 10 commits March 8, 2025 13:10

Added beamline md keys for sam_x and sam_y based on Bluesky changes m…

02bd425

…ade during 20250308 beam time.

Additional sample motors that got changed during 20250308 beam time

82ef0a7

WAXS beam center metadata not being saved from Tiled. Using hard-code…

e285f75

…d values for now.

Updated config identifying condition

489d41f

Changed variable names to snake case

6cd2c07

Added back "en_monoen_setpoint" into md_lookup dictionary

bfcb607

Using dict.update() instead of for loop to add manual metadata entries.

68f49fd

Updated reverse_lut nested for loops to nested dict comprehension

567d8f9

Update src/PyHyperScattering/SST1RSoXSDB.py

db68115

Co-authored-by: Peter Beaucage <peter.beaucage@nist.gov>

Updated baseline warning

ec1d747

pbeaucage added 4 commits April 1, 2025 14:37

Merge branch 'main' into Issue178_FixLoadRun

1e52b58

Remove mdManual kwarg

1d5b07f

Update src/PyHyperScattering/SST1RSoXSDB.py

0972b0a

Formatting fix in imports

4d3ad8d

pbeaucage reviewed Apr 1, 2025

View reviewed changes

src/PyHyperScattering/SST1RSoXSDB.py Outdated Show resolved Hide resolved

pbeaucage reviewed Apr 1, 2025

View reviewed changes

src/PyHyperScattering/SST1RSoXSDB.py Outdated Show resolved Hide resolved

pbeaucage reviewed Apr 1, 2025

View reviewed changes

src/PyHyperScattering/SST1RSoXSDB.py Outdated Show resolved Hide resolved

Apply final review fixes

7441146

pbeaucage approved these changes Apr 1, 2025

View reviewed changes

PriyankaKetkarBNL added 2 commits April 1, 2025 15:19

Added new tests to capture January-February 2025 beamline codebase ch…

e8b9d1b

…anges

Added another test for spiral scans after January 2025 beamline codeb…

0f1b715

…ase changes

Added comments documenting when metadata names started being used for…

dcce991

… reference

pbeaucage mentioned this pull request Apr 1, 2025

SST1RSoXSDB: support for auto-hinting count plans #190

Closed

pbeaucage added 2 commits April 1, 2025 16:35

Remove explicit dimension hinting where not necessary

6722c26

remove polarization asserts

9c60828

PriyankaKetkarBNL mentioned this pull request Apr 1, 2025

Bug: Energy scan requires hinting during loadRun #193

Closed

pbeaucage merged commit cc880b4 into main Apr 1, 2025
16 checks passed

pbeaucage deleted the Issue178_FixLoadRun branch April 1, 2025 20:57

Conversation

PriyankaKetkarBNL commented Feb 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pbeaucage left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

pbeaucage left a comment

Choose a reason for hiding this comment

Uh oh!

PriyankaKetkarBNL commented Apr 1, 2025

Uh oh!

PriyankaKetkarBNL commented Apr 1, 2025

Uh oh!

pbeaucage commented Apr 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

PriyankaKetkarBNL commented Apr 1, 2025

Uh oh!

pbeaucage commented Apr 1, 2025

Uh oh!

PriyankaKetkarBNL commented Apr 1, 2025

Uh oh!

pbeaucage commented Apr 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pbeaucage commented Apr 1, 2025

Uh oh!

PriyankaKetkarBNL commented Apr 1, 2025

Uh oh!

pbeaucage commented Apr 1, 2025

Uh oh!

pbeaucage commented Apr 1, 2025

Uh oh!

Uh oh!

PriyankaKetkarBNL commented Apr 1, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

PriyankaKetkarBNL commented Feb 23, 2025 •

edited

Loading

pbeaucage commented Apr 1, 2025 •

edited

Loading

pbeaucage commented Apr 1, 2025 •

edited

Loading