Unify report writing across single-rank and MPI runs using recording sites#65
Unify report writing across single-rank and MPI runs using recording sites#65
Conversation
bluecellulab/cell/core.py
Outdated
| else: | ||
| rec_name = section_to_variable_recording_str(sec, float(seg), variable_name) | ||
| if rec_name not in self.recordings: | ||
| self.add_variable_recording(variable=variable_name, section=sec, segx=float(seg)) |
There was a problem hiding this comment.
is there a reason we check for rec_name presence in self.recordings in the else but not in the if?
There was a problem hiding this comment.
Good catch, it should be checked in both cases. Thanks!
| """Write all configured reports (compartment and spike) in SONATA | ||
| format. | ||
|
|
||
| Parameters |
There was a problem hiding this comment.
Could you still keep Parameters in the docstring please?
There was a problem hiding this comment.
Added in the last commit
| def prepare_recordings_for_reports( | ||
| cells: Dict[CellId, Any], | ||
| simulation_config: Any, | ||
| ) -> tuple[dict[CellId, list[str]], dict[CellId, list[SiteEntry]]]: |
There was a problem hiding this comment.
docstring please! Especially with the complicated return type, would be nice to explain what it represents
There was a problem hiding this comment.
Added in my last commit
| ---------- | ||
| cells : dict | ||
| Mapping from (population, gid) → Cell object. | ||
| def prepare_recordings_for_reports( |
There was a problem hiding this comment.
This function is quite long. Would be nice to refactor it into something with less lines. This can be done in another PR though if we want to merge this fast.
There was a problem hiding this comment.
Refactored in my last commit
darshanmandge
left a comment
There was a problem hiding this comment.
Thanks! I have made some comments.
bluecellulab/cell/core.py
Outdated
| if sec is None: | ||
| self.add_variable_recording(variable=variable_name, section=None, segx=float(seg)) | ||
| sec_obj = self.soma | ||
| rec_name = section_to_variable_recording_str(sec_obj, float(seg), variable_name) | ||
| else: | ||
| rec_name = section_to_variable_recording_str(sec, float(seg), variable_name) | ||
| if rec_name not in self.recordings: |
There was a problem hiding this comment.
Do you want to check for duplicate recording for None (which results in soma recording) as you do in the else statement in L1049?
There was a problem hiding this comment.
Already fixed in one of the latest commit
bluecellulab/reports/utils.py
Outdated
| ): | ||
| """Build per-cell recording sites based on source type and report | ||
| configuration. | ||
| for (sec, sec_name, segx), rec_name in zip(sites, rec_names): |
There was a problem hiding this comment.
When len(rec_names) < len(sites) (as warned in L99 above), this zip silently drops the tail of sites. Those dropped sites never get a site_entry in sites_index or report_sites, which could cause a silent mismatch when payload_to_cells reconstructs cells on rank 0.
Could you log which specific (sec_name, segx) pairs were skipped?
There was a problem hiding this comment.
I updated configure_recording to returns (site, rec_name) pairs. Also, we log sec_name/segx in the exception path in configure_recording and I updated the AttributeError path to include sec_name/segx as well so skipped sites are identifiable in logs.
| sites_index: Mapping[CellId, list[SiteEntry]], | ||
| ) -> Dict[CellId, RecordedCell]: | ||
| """ | ||
| payload: {"pop_gid": {"recordings": {rec_name: [floats...]}}} |
There was a problem hiding this comment.
You are coding both the population name and gid as pop_gid. Population names sometimes have underscores, too. Do you want to maintain the tuple structure as in SONATA (population_name, node_id) in payload?
There was a problem hiding this comment.
Good catch ! I will switch the payload to use CellId directly
bluecellulab/reports/utils.py
Outdated
| report_sites = getattr(cell, "report_sites", None) | ||
| if not isinstance(report_sites, dict): | ||
| report_sites = {} | ||
| setattr(cell, "report_sites", report_sites) |
There was a problem hiding this comment.
Real Cell objects always have report_sites (set in __init__), so this getattr/setattr fallback will never execute for them — it only exists to handle objects that implement ReportConfigurableCell but forgot to include report_sites. Since the Protocol doesn't declare report_sites as a required attribute, those implementors have no way of knowing they need it. Adding it to the Protocol explicitly can be an option:
class ReportConfigurableCell(ReportSiteResolvable, Protocol):
report_sites: dict[str, list[dict]] # ← add this
There was a problem hiding this comment.
Right, I removed the protocol to reduce complexity. Both Cell and RecordedCell now define report_sites, so objects without it are no longer supported and the dynamic fallback isn’t needed anymore.
bluecellulab/type_aliases.py
Outdated
| TStim: TypeAlias = hoc_type | ||
|
|
||
| SectionMapping = Dict[str, NeuronSection] | ||
| SiteEntry: TypeAlias = dict[str, Any] |
There was a problem hiding this comment.
SiteEntry is used in at least 5 places with fixed string keys "report", "rec_name", "section", "segx". Using dict[str, Any] does not provide static checking. A TypedDict would catch key typos at type-check time:
class SiteEntry(TypedDict):
report: str
rec_name: str
section: str
segx: float
There was a problem hiding this comment.
Good point, I replaced dict[str, Any] with a TypedDict
| ) | ||
| spikes[pop][gid] = list(times) if times is not None else [] | ||
| except Exception: | ||
| spikes[pop][gid] = [] |
There was a problem hiding this comment.
You can add logging logger.warning("Failed to collect spikes for (%s, %d): %s", pop, gid, e, exc_info=True) for easier debugging.
There was a problem hiding this comment.
Good idea, I added logging but at debug level since missing spikes are expected for some cells and shouldn’t warn in large simulations.
Refactors report writing to operate on cell recordings instead of voltage trace dictionaries. Reports are now generated from configured recording sites, unifying single-rank and MPI workflows and making compartment reports consistent with the NEURON recording model.
What changed
cell.report_sites[report_name].
and write reports there.
The cells argument must be a mapping {CellId: cell-like} where entries expose:
Why
The previous pipeline supported multiple input shapes (live cells vs gathered trace
dictionaries). This made behavior harder to reason about and fragile, especially for
non-voltage variables and MPI execution.
The new pipeline separates responsibilities:
1. recording configuration
2. simulation
3. gathering
4. writing
MPI usage (Recommended)
Consumers running under MPI should gather locally recorded data and write only on rank 0:
This replaces the previous approach where users gathered trace dicts and passed cells_or_traces=traces.
Non-MPI usage
Single-rank usage remains simple:
Any consumer code passing cells_or_traces or trace dictionaries must migrate to:
• cells=sim.cells (single rank), or
• cells=payload_to_cells(...) on rank 0 after MPI gather.