HPC sharding for stability analysis

## Summary

Add SLURM-friendly parallelisation to `analysis/stability_analysis.py` so the simulation grid can be distributed across array jobs.

## New CLI flags

| Flag | Purpose |
|------|---------|
| `--reps INT` | Override `REPS_FULL` (default: 10). Set to 200 for publication. |
| `--task-id INT` | This worker's 0-based index. Typically `$SLURM_ARRAY_TASK_ID`. |
| `--num-tasks INT` | Total number of workers. |
| `--shard-dir PATH` | Write per-task JSON shards here (e.g. `results/shards/`). Each task writes `stab_{task_id}.json` and `cov_{task_id}.json`. |
| `--merge` | Merge mode: concatenate all shards from `--shard-dir` into final `stability_results.json` + `coverage_results.json` in `--output-dir`, then generate all figures. |

## Sharding design

The unit of work is a **(setting, rep)** pair where setting = (n, p, true_rank).

1. Enumerate all valid settings: `[(n, p, k) for n, p, k in product(ns, ps, ks) if k < min(n, p)]` — currently 140.
2. Cross with reps: 140 × 200 = 28,000 work units. Each unit runs all 4 missingness patterns and all metrics.
3. Partition the 28,000 units across `--num-tasks` workers using `units[task_id::num_tasks]` (interleaved, not block, for load balancing — larger n×p settings are slower).
4. Each task writes its shard of `_Trial` and `_CoverageTrial` records as JSON.

## Merge step

`--merge` reads all `stab_*.json` and `cov_*.json` from `--shard-dir`, concatenates, deduplicates (by n,p,k,miss,metric,rep key), validates expected record count, writes final JSON to `--output-dir`, and generates all 9 figures.

## Example SLURM usage

```bash
#!/bin/bash
#SBATCH --job-name=vbpca-stability
#SBATCH --array=0-559
#SBATCH --cpus-per-task=2
#SBATCH --mem=4G
#SBATCH --time=02:00:00

source activate vbpca-env
python analysis/stability_analysis.py \
    --reps 200 \
    --task-id $SLURM_ARRAY_TASK_ID \
    --num-tasks 560 \
    --shard-dir results/shards/ \
    --fmt png

# After all tasks complete (separate job with dependency):
# python analysis/stability_analysis.py --merge --shard-dir results/shards/ --fmt png
```

## Backward compatibility

- Without `--task-id`/`--num-tasks`, behaves exactly as before (single-process sequential)
- `--reps` alone (without sharding) just overrides the rep count for local runs
- Existing `--smoke`, `--plot-only`, `--coverage-only` flags remain unchanged

## Seed handling

Each (setting_index, rep) pair should derive a deterministic sub-seed: `seed = base_seed + setting_index * max_reps + rep`. This ensures identical results regardless of sharding configuration.

## Acceptance criteria

- [ ] `--reps 25` overrides REPS_FULL
- [ ] `--task-id 0 --num-tasks 4` runs ~1/4 of the grid
- [ ] Shards are valid JSON, loadable as lists of `_Trial` / `_CoverageTrial`
- [ ] `--merge` produces identical results to a single-process run (modulo float ordering)
- [ ] Interleaved partitioning gives roughly equal wall time per task
- [ ] Works with existing `--smoke` flag for quick validation

Flag	Purpose
`--reps INT`	Override `REPS_FULL` (default: 10). Set to 200 for publication.
`--task-id INT`	This worker's 0-based index. Typically `$SLURM_ARRAY_TASK_ID`.
`--num-tasks INT`	Total number of workers.
`--shard-dir PATH`	Write per-task JSON shards here (e.g. `results/shards/`). Each task writes `stab_{task_id}.json` and `cov_{task_id}.json`.
`--merge`	Merge mode: concatenate all shards from `--shard-dir` into final `stability_results.json` + `coverage_results.json` in `--output-dir`, then generate all figures.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HPC sharding for stability analysis #44

Summary

New CLI flags

Sharding design

Merge step

Example SLURM usage

Backward compatibility

Seed handling

Acceptance criteria

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

HPC sharding for stability analysis #44

Description

Summary

New CLI flags

Sharding design

Merge step

Example SLURM usage

Backward compatibility

Seed handling

Acceptance criteria

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions