Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
188 changes: 188 additions & 0 deletions apps/data-processing/scripts/README_ledger_export.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,188 @@
# Ledger-Range Export — Issue #883

Repeatable export of raw Soroban contract events and normalized project state for a given Stellar ledger range. Intended for **incident debugging** by maintainers.

## Quick Start

```bash
# Export all data for ledger range 1000–2000
python scripts/export_ledger_range.py --start-ledger 1000 --end-ledger 2000

# Custom output directory
python scripts/export_ledger_range.py --start-ledger 1000 --end-ledger 2000 \
--output-dir /tmp/incident_exports

# Single-ledger export
python scripts/export_ledger_range.py --start-ledger 1500 --end-ledger 1500

# Override database URL
python scripts/export_ledger_range.py --start-ledger 1000 --end-ledger 2000 \
--database-url postgresql://user:pass@host:5432/lumenpulse
```

The script reads `DATABASE_URL` from the environment if `--database-url` is not provided.

## Input Parameters

| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| `--start-ledger` | int | ✓ | First ledger number, inclusive |
| `--end-ledger` | int | ✓ | Last ledger number, inclusive |
| `--output-dir` | string | | Output directory (default: `exports/ledger`) |
| `--database-url` | string | | Overrides `DATABASE_URL` env var |

Validation rules:
- Both ledgers must be non-negative integers.
- `start_ledger` must be ≤ `end_ledger`.
- `start_ledger == end_ledger` is valid (single-ledger export).

## Output File

A single JSON file is written:

```
<output-dir>/ledger_export_<start>_<end>.json
```

Example: `exports/ledger/ledger_export_1000_2000.json`

### Top-Level Structure

```json
{
"metadata": {
"startLedger": 1000,
"endLedger": 2000,
"exportTimestamp": "2026-06-25T12:00:00+00:00",
"exportVersion": "1"
},
"raw": [ ... ],
"normalized": {
"project_views": [ ... ],
"project_contributors": [ ... ],
"project_milestones": [ ... ]
}
}
```

### `metadata`

| Field | Type | Description |
|-------|------|-------------|
| `startLedger` | int | Inclusive start of the requested range |
| `endLedger` | int | Inclusive end of the requested range |
| `exportTimestamp` | ISO-8601 | UTC time of export run |
| `exportVersion` | string | Schema version (`"1"`) |

### `raw` — array of ContractEvent rows

Each object represents one raw Soroban event whose `ledger` column falls in `[startLedger, endLedger]`:

```json
{
"id": 1,
"contract_id": "CABC...",
"event_id": "evt-1",
"ledger": 1500,
"event_type": "contribution",
"project_id": 42,
"contributor": "GBOB...",
"amount": 100.0,
"milestone_id": null,
"status": "active",
"topics": [],
"raw_data": { "key": "value" },
"timestamp": "2024-01-01T00:00:00+00:00"
}
```

### `normalized` — object with three arrays

Normalized rows are matched by their `last_event_ledger` / `last_contribution_ledger` column:

#### `project_views`

Rows from `project_views` where `last_event_ledger` is in range:

```json
{
"id": 1,
"project_id": 42,
"contract_id": "CABC...",
"owner": "GALICE...",
"total_contributions": 100.0,
"unique_contributors": 1,
"status": "active",
"last_event_ledger": 1500,
"extra_data": {}
}
```

#### `project_contributors`

Rows from `project_contributors` where `last_contribution_ledger` is in range:

```json
{
"id": 1,
"project_id": 42,
"contributor": "GBOB...",
"total_contributed": 100.0,
"first_contribution_ledger": 1500,
"last_contribution_ledger": 1500,
"extra_data": {}
}
```

#### `project_milestones`

Rows from `project_milestones` where `last_event_ledger` is in range:

```json
{
"id": 1,
"project_id": 42,
"milestone_id": 1,
"status": "pending",
"approved_at": null,
"last_event_ledger": 1500,
"extra_data": {}
}
```

## Intended Debugging Workflow

1. Identify the approximate ledger range of an incident (e.g., from monitoring alerts or Stellar explorer).
2. Run the export:
```bash
python scripts/export_ledger_range.py --start-ledger <start> --end-ledger <end>
```
3. Inspect `raw` to see exactly which contract events arrived in that window.
4. Compare `normalized` against expected project state — mismatches between raw events and normalized output indicate a processing bug.
5. Re-run as many times as needed; the tool never modifies source data and always overwrites the output file with a fresh snapshot.

## Python API

```python
from src.ledger_export import LedgerRangeExporter

exporter = LedgerRangeExporter(
start_ledger=1000,
end_ledger=2000,
output_dir="exports/ledger",
)
result = exporter.export()
# result.path, result.raw_count, result.normalized_counts, result.status
```

## Running Tests

```bash
pytest tests/test_ledger_export.py -v
```

## Limitations

- **Normalized coverage**: `project_views` and `project_milestones` are matched by `last_event_ledger`; `project_contributors` by `last_contribution_ledger`. Rows updated by earlier ledgers whose last-ledger pointer falls outside the range will not appear, even if they were affected by events within the range.
- **No DB required for tests**: All tests use mocks; a live database is only needed for actual incident debugging.
- **Large ranges**: Rows are loaded entirely into memory. For very large ranges (millions of events), increase available memory or narrow the range.
73 changes: 73 additions & 0 deletions apps/data-processing/scripts/export_ledger_range.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
#!/usr/bin/env python3
"""
Export raw and normalized ledger data for incident debugging — Issue #883.

Usage:
python scripts/export_ledger_range.py --start-ledger 1000 --end-ledger 2000
python scripts/export_ledger_range.py --start-ledger 1000 --end-ledger 2000 \
--output-dir /tmp/incident_exports
python scripts/export_ledger_range.py --start-ledger 500 --end-ledger 500
"""

import argparse
import json
import logging
import sys
from pathlib import Path

# Allow running from repo root or scripts/ directory
sys.path.insert(0, str(Path(__file__).resolve().parents[1]))

logging.basicConfig(
level=logging.INFO,
format="%(asctime)s - %(levelname)s - %(message)s",
handlers=[logging.StreamHandler(sys.stdout)],
)
logger = logging.getLogger(__name__)


def parse_args() -> argparse.Namespace:
parser = argparse.ArgumentParser(
description="Export ledger-range data (raw events + normalized state) for incident debugging"
)
parser.add_argument(
"--start-ledger", type=int, required=True, help="First ledger (inclusive)"
)
parser.add_argument(
"--end-ledger", type=int, required=True, help="Last ledger (inclusive)"
)
parser.add_argument(
"--output-dir",
default="exports/ledger",
help="Directory to write the export file (default: exports/ledger)",
)
parser.add_argument(
"--database-url", default=None, help="Override DATABASE_URL env var"
)
return parser.parse_args()


def main() -> None:
args = parse_args()

from src.ledger_export import LedgerRangeExporter, _validate_ledger_range

try:
_validate_ledger_range(args.start_ledger, args.end_ledger)
except (TypeError, ValueError) as exc:
logger.error("Invalid ledger range: %s", exc)
sys.exit(1)

exporter = LedgerRangeExporter(
start_ledger=args.start_ledger,
end_ledger=args.end_ledger,
output_dir=args.output_dir,
database_url=args.database_url,
)

result = exporter.export()
print(json.dumps(result.to_dict(), indent=2))


if __name__ == "__main__":
main()
Loading
Loading