Skip to content

Add typed Units columns and UnitMetrics table for per-unit metrics#20

Merged
h-mayorquin merged 16 commits into
mainfrom
heberto_metrics
Jun 9, 2026
Merged

Add typed Units columns and UnitMetrics table for per-unit metrics#20
h-mayorquin merged 16 commits into
mainfrom
heberto_metrics

Conversation

@h-mayorquin

Copy link
Copy Markdown
Contributor

I'd like to propose a different shape for spike-sorting metrics storage than the open MetricExtension PR (#18). Two pieces:

Cell properties as typed columns on the Units table
Some cell properties are properties of the cell, independent of any analysis run, so they belong on nwbfile.units as typed VectorData columns. As a minimal implementation of the idea I am adding FiringRate; other candidates are peak-to-trough duration, trough half-width, and median amplitude, we can add them after this PR if we refine this idea.

A generic UnitMetrics table for run-dependent metrics
Besides cell properties, the metrics extension covers a lot. From the SpikeInterface side we have quality_metrics (defined by purpose), template_metrics (defined by source), and spiketrain_metrics (defined by source). Like in #18, I have a generic DynamicTable to cover all those cases, but with some differences from #18: (1) explicit linkage to the Units table via a required unit DynamicTableRegion per row, which ensures provenance; (2) per-row obs_intervals matching the NWB-core Units.obs_intervals ragged-column pattern, so we reuse the existing NWB convention for period attribution rather than introducing a new shape (this also covers the periods produced by SpikeInterface's valid_unit_periods extension).

Taken together, this PR is the structural base: typed canonical columns on nwbfile.units for cell properties, plus a generic UnitMetrics DynamicTable for run-dependent metrics. Future PRs can either canonize more typed columns on Units or subclass UnitMetrics for specific purposes (e.g., curation) with their own canonical columns.

@alejoe91

Copy link
Copy Markdown
Contributor

As we discussed, since only a subset of metrics ise the periods, individual columns could link to the ValidUnitPeriods (if they were used for computation) or another TimeIntervals for custom periods.

This is how to check if valid unit periods were used in SI and which metric used it:

from spikeinterface.metrics import ComputeQualityMetrics

qm_ext = analyzer.get_extension("quality_metrics")
use_valid_periods = qm_ext.params["use_valid_periods"] and qm_ext.params["periods"] is not None

metrics_with_periods = []
for m in ComputeQualityMetrics.metric_list:
    if m.supports_periods:
        metrics_with_periods.extend(list(m.metric_columns.keys()))

@h-mayorquin the only "issue" here is that spike train metrics (which are also in the quality metrics list) also support periods, but I think that's ok...

@h-mayorquin

Copy link
Copy Markdown
Contributor Author

I implemented the logic for adding the time_support link only to metric columns whose underlying SpikeInterface metric class has supports_periods=True.

This PR is ready to go, here is a summary of the current implementation:

  • UnitsMetrics is meant to be a flexible way of adding analysis data to the Units table. We can have more than one in case more than one analysis is done. Properties should go in the Units table directly if they are definitive cell properties (i.e. cell_type, firing_rate, brain_region, peak_channel).

  • ValidUnitPeriods remains and is meant to be a generic container for per-unit intervals during which each unit's sort can be trusted; one or more may coexist per file. SpikeInterface's valid_unit_periods extension produces these by thresholding false-positive (refractory violations) and false-negative (amplitude cutoff) rates per time bin and merging contiguous good bins, but the type stays algorithm-agnostic so other methods can populate it too. When converting NWB back to a SortingAnalyzer we restore the SpikeInterface extension with method="user_defined".

  • We have now a column for UnitVectorData that adds a time_support attribute for provenance about which time domain the computation used. Depending on whether qm.params["periods"] is set and whether a SpikeInterface valid_unit_periods extension is also present (and whether the two arrays match via np.array_equal), we link this to a ValidUnitPeriods, but if no valid_unit_periods extension is available we store a plain TimeIntervals to link the time_support to. This will be simplified after your SpikeInterface PR.

  • The reason for time_support being a column attribute is twofold: 1) it allows for flexibility (every column can link to a different dynamic table if they used different time support) but also 2) it allows for avoiding data duplication (if all the columns were calculated with the same time support we link all of them to one). I think this mechanism for storing computational provenance might be used more generally in NWB, so I am happy to test it here.

Let me know if you want to to take a second look. Otherwise, we can merge and make a release.

@alejoe91

alejoe91 commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

Good to merge for me @h-mayorquin !

@h-mayorquin

Copy link
Copy Markdown
Contributor Author

I merged yours. SHOuld we still merge this one?

@h-mayorquin

Copy link
Copy Markdown
Contributor Author

I merged yours. SHOuld we still merge this one?

Yes. I realize I have tests here.

@h-mayorquin h-mayorquin merged commit 6710e0a into main Jun 9, 2026
8 checks passed
@h-mayorquin h-mayorquin deleted the heberto_metrics branch June 9, 2026 16:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants