Skip to content

Commit d80b6e7

Browse files
feat: add --subnets flag to deploy multiple nodes per client (#136)
* feat: add isAggregator flag to validator configuration Add support for configuring nodes as aggregators through validator-config.yaml. This allows selective designation of nodes to perform aggregation duties by setting isAggregator: true in the validator configuration. Changes: - Add isAggregator field (default: false) to all validators in both local and ansible configs - Update parse-vc.sh to extract and export isAggregator flag - Modify all client command scripts to pass --is-aggregator flag when enabled - Add isAggregator status to node information output * spin-node: add --subnets flag to deploy multiple nodes per client Adds --subnets N (1–5) to deploy N nodes of each client on their associated servers, each on a distinct attestation subnet. New files: - generate-subnet-config.py: expands validator-config.yaml into validator-config-subnets-N.yaml with unique node names, incremented ports (quic/metrics/api), fresh P2P private keys, and explicit subnet membership per entry. Also sets config.attestation_committee_count = N so each client correctly partitions validators across N committees. Changes: - parse-env.sh: add --subnets N and --dry-run flags - spin-node.sh: - expand validator-config before genesis setup when --subnets N given - select one aggregator per subnet randomly; print prominent summary - --dry-run: simulate full deployment without applying any changes (Ansible runs with --check --diff, local execs are echoed only) - run-ansible.sh: pass validator_config_basename extra var so playbooks use the active (possibly expanded) config; add --check --diff in dry-run - ansible/playbooks/deploy-nodes.yml: use validator_config_basename to sync the correct config file to remote hosts - ansible/playbooks/prepare.yml: open port ranges for all subnet nodes on a host by matching entries via IP, not just hostname - convert-validator-config.py: fall back to httpPort for Lantern nodes when generating Leanpoint upstreams - README.md: document --subnets and --dry-run; update --prepare firewall table to reflect port ranges when --subnets N is active Rules enforced by generate-subnet-config.py: - No two nodes on the same server may share a subnet (template validated) - Each subnet has exactly one node per client - N=1 is a no-op expansion (single-subnet baseline) - N capped at 5 * ansible: copy only the node's own hash-sig keys to each server Previously both deploy-nodes.yml and copy-genesis.yml synced the entire hash-sig-keys/ directory to every remote host, meaning every server received every validator's sk/pk pair. Now each playbook: 1. Reads annotated_validators.yaml on the controller to look up the privkey_file entries for the node being deployed (inventory_hostname). 2. Derives the pk filename by replacing _sk.ssz → _pk.ssz. 3. Copies only those specific files to the target host. A server running zeam_0 (validator_0_sk.ssz / validator_0_pk.ssz) no longer receives validator_1_sk.ssz, validator_2_sk.ssz, etc. * spin-node: assert exactly 1 aggregator per subnet after selection * validator-config: add privkey for commented-out gean_0, lean_node_0, peam_0 * spin-node: derive subnet from config 'subnet' field, not node name suffix The old suffix-based detection (ethlambda_1 → subnet 1) broke when a config contained multiple nodes for the same client without --subnets (e.g. ethlambda_0..4 for redundancy), incorrectly creating 5 subnets and forcing ethlambda nodes as the sole aggregator on subnets 1-4. Subnet membership is now read from the explicit 'subnet:' field that generate-subnet-config.py writes for each entry. Nodes without this field (all standard configs) default to subnet 0, so a single-subnet deployment always selects exactly one aggregator from all active nodes regardless of numeric suffixes in their names. * docs: add client integration guide with link from README * spin-node: honour pre-existing isAggregator: true when no --aggregator flag is passed Previously the script always reset all flags and randomly re-selected an aggregator, ignoring any manual isAggregator: true already set in the YAML. This caused ethlambda_0 (user's choice) to be silently replaced by ethlambda_1 (random pick). Aggregator selection now follows a three-level priority: 1. --aggregator <node> CLI flag 2. Pre-existing isAggregator: true in the config (manual YAML edit) 3. Random selection (fallback when neither is set) The preset node is validated against the active node list. If it no longer exists a warning is printed and random selection takes over. * docs: clarify touch point 1 — both configs required, separate local/ansible examples * docs: add note to contact zeam team for server IP assignment * spin-node: fix associative array for bash 3.2 compatibility * validator-config: use apiPort for lantern instead of httpPort * fix: cadvisor deploy * prepare: install jq alongside yq and docker * fix: grandine address flag * fix: grandine address flag ansible * spin-node: skip aggregator selection when using --restart-client * validator-config: enable gean_0 node * run-ansible: derive inventory groups dynamically instead of hardcoding The hardcoded group list (zeam_nodes, ream_nodes, ...) caused newly added client types (e.g. gean_nodes) to never have their ansible_user updated. This meant --useRoot was silently ignored for those nodes, causing Ansible to SSH as the current local user (partha) instead of root, and fail. * validator-config: add nlean_0 node * ansible: add gean and nlean roles and wire into deploy * docs: update adding-a-new-client guide with gean and nlean * nlean: remove --pull=always for locally-built image * nlean: use ghcr.io/nleaneth/nlean:latest as docker image * fix: enable metrics flag for nlean --------- Co-authored-by: Katya Ryazantseva <sibkatya@gmail.com>
1 parent 9765190 commit d80b6e7

22 files changed

Lines changed: 1374 additions & 150 deletions

File tree

README.md

Lines changed: 60 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ A single command line quickstart to spin up lean node(s)
1111
- Uses PK's `eth-beacon-genesis` docker tool (not custom tooling)
1212
- Generates PQ keys based on specified configuration in `validator-config.yaml`
1313
- Force regen with flag `--forceKeyGen` when supplied with `generateGenesis`
14-
- ✅ Integrates zeam, ream, qlean, lantern, lighthouse, grandine, ethlambda
14+
- ✅ Integrates zeam, ream, qlean, lantern, lighthouse, grandine, ethlambda, gean, nlean, peam
1515
- ✅ Configure to run clients in docker or binary mode for easy development
1616
- ✅ Linux & Mac compatible & tested
1717
- ✅ Option to operate on single or multiple nodes or `all`
@@ -212,10 +212,17 @@ Every Ansible deployment automatically deploys an observability stack alongside
212212
15. `--prepare` verify and install the software required to run lean nodes on every remote server, and open + persist the necessary firewall ports.
213213
- **Ansible mode only** — fails with an error if `deployment_mode` is not `ansible`
214214
- Installs: `python3` (Ansible requirement), Docker CE + Compose plugin (all clients run as containers), `yq` (required by the `common` role at every deploy)
215-
- Opens per-node ports (`quicPort`/UDP, `metricsPort`/TCP, `apiPort`/TCP) read from `validator-config.yaml`, plus fixed observability ports (9090, 9080, 9098, 9100). Enables `ufw` with default deny incoming (persisted across reboots).
215+
- Opens per-node ports (`quicPort`/UDP, `metricsPort`/TCP, `apiPort`/TCP) read from the active validator config, plus fixed observability ports (9090, 9080, 9098, 9100). With `--subnets N`, all N nodes' port ranges are opened per host. Enables `ufw` with default deny incoming (persisted across reboots).
216216
- Prints a per-tool, per-host status summary (`✅ ok` / `❌ missing`) and `ufw status verbose`
217-
- `--node` is not required and is ignored; all other flags are also ignored except `--sshKey` and `--useRoot`
217+
- `--node` is not required; passing unsupported flags alongside `--prepare` produces a prominent error — only `--sshKey` and `--useRoot` are accepted
218218
- Example: `NETWORK_DIR=ansible-devnet ./spin-node.sh --prepare --sshKey ~/.ssh/id_ed25519 --useRoot`
219+
16. `--subnets N` expand the validator config to deploy N nodes of each client on the same server, where N is 1–5.
220+
- Generates `validator-config-subnets-N.yaml` from the template (without modifying the original)
221+
- Each subnet node gets a unique name (`{client}_0`, `{client}_1`, …), ports incremented by the subnet index, and a fresh P2P identity key for subnets > 0
222+
- Subnet assignment rule: each server contributes **exactly one node per subnet** — nodes on the same server are never in the same subnet
223+
- Every subnet contains the same set of client types
224+
- `N=1` renames nodes to `{client}_0` with no port changes (useful for canonical naming)
225+
- Example: `NETWORK_DIR=ansible-devnet ./spin-node.sh --node all --subnets 3 --sshKey ~/.ssh/id_ed25519 --useRoot`
219226

220227
### Preparing remote servers
221228

@@ -237,7 +244,7 @@ NETWORK_DIR=ansible-devnet ./spin-node.sh --prepare --sshKey ~/.ssh/id_ed25519 -
237244

238245
**Constraints:**
239246
- Only works in ansible mode (`deployment_mode: ansible` in your config, or `--deploymentMode ansible`)
240-
- Any other flags (e.g., `--node`, `--generateGenesis`) are silently ignored — only `--sshKey` and `--useRoot` are used
247+
- Passing unsupported flags (e.g. `--node`, `--generateGenesis`) alongside `--prepare` produces a prominent error — only `--sshKey` and `--useRoot` are accepted
241248
- `--node` is not required; the playbook runs on all remote hosts in the inventory
242249

243250
Once preparation succeeds, proceed with the normal deploy command:
@@ -246,6 +253,43 @@ Once preparation succeeds, proceed with the normal deploy command:
246253
NETWORK_DIR=ansible-devnet ./spin-node.sh --node all --generateGenesis --sshKey ~/.ssh/id_ed25519 --useRoot
247254
```
248255

256+
### Deploying multiple subnets
257+
258+
Use `--subnets N` to run N independent copies of each client on the same server. This is useful for testing multi-subnet P2P scenarios without provisioning additional machines.
259+
260+
```sh
261+
# Deploy 3 subnets of every client (ansible)
262+
NETWORK_DIR=ansible-devnet ./spin-node.sh --node all --subnets 3 \
263+
--generateGenesis --sshKey ~/.ssh/id_ed25519 --useRoot
264+
```
265+
266+
**How it works:**
267+
268+
`--subnets N` generates `validator-config-subnets-N.yaml` from the template (the original file is never modified). For each client in the template it creates N entries:
269+
270+
| Subnet index | Name | quicPort | metricsPort | apiPort |
271+
|---|---|---|---|---|
272+
| 0 | `zeam_0` | base | base | base |
273+
| 1 | `zeam_1` | base+1 | base+1 | base+1 |
274+
||||||
275+
| N-1 | `zeam_N-1` | base+N-1 | base+N-1 | base+N-1 |
276+
277+
**Rules enforced:**
278+
- `N` must be between 1 and 5
279+
- Each server contributes exactly one node per subnet (nodes on the same server are never in the same subnet)
280+
- Every subnet contains the same set of client types
281+
- Each node beyond subnet 0 gets a fresh P2P identity key
282+
283+
**Running `--prepare` with subnets:**
284+
285+
Always run `--prepare` with the same `--subnets N` value before deploying, so the firewall opens all N port ranges per host:
286+
287+
```sh
288+
# Prepare firewall for 3 subnets
289+
NETWORK_DIR=ansible-devnet ./spin-node.sh --prepare --subnets 3 \
290+
--sshKey ~/.ssh/id_ed25519 --useRoot
291+
```
292+
249293
### Checkpoint sync
250294

251295
Checkpoint sync lets you restart clients by syncing from a remote checkpoint instead of from genesis. This is useful for joining an existing network (e.g., leanpoint mainnet) without replaying the full chain.
@@ -278,7 +322,7 @@ NETWORK_DIR=local-devnet ./spin-node.sh --restart-client zeam_0 \
278322
- **Local** (`NETWORK_DIR=local-devnet`): Uses Docker directly
279323
- **Ansible** (`NETWORK_DIR=ansible-devnet`): Uses Ansible to deploy to remote hosts
280324

281-
**Supported clients:** zeam, ream, qlean, lantern, lighthouse, grandine, ethlambda, peam
325+
**Supported clients:** zeam, ream, qlean, lantern, lighthouse, grandine, ethlambda, gean, nlean, peam
282326

283327
> **Note:** All clients accept `--checkpoint-sync-url`. Client implementations may use different parameter names internally; update client-cmd scripts if parameters change.
284328
@@ -293,9 +337,13 @@ Current following clients are supported:
293337
5. Lighthouse
294338
6. Grandine
295339
7. Ethlambda
296-
8. Peam
340+
8. Gean
341+
9. Nlean
342+
10. Peam
343+
344+
Adding a new client requires 6 small, well-defined steps. See the full integration guide:
297345

298-
However adding a lean client to this setup is very easy. Feel free to do the PR or reach out to the maintainers.
346+
📖 **[Adding a New Client](docs/adding-a-new-client.md)**
299347

300348
## How It Works
301349

@@ -806,7 +854,7 @@ ansible/
806854
│ └── all.yml # Global variables
807855
├── playbooks/
808856
│ ├── site.yml # Main playbook (clean + copy genesis + deploy)
809-
│ ├── prepare.yml # Bootstrap: install Docker, build-essential, yq, etc.
857+
│ ├── prepare.yml # Bootstrap: install Docker CE, yq; open firewall ports
810858
│ ├── clean-node-data.yml # Clean node data directories
811859
│ ├── generate-genesis.yml # Generate genesis files
812860
│ ├── copy-genesis.yml # Copy genesis files to remote hosts
@@ -846,13 +894,13 @@ The command runs `ansible/playbooks/prepare.yml` against all remote hosts in the
846894

847895
**Firewall rules opened (via `ufw`):**
848896

849-
Each host's ports are read directly from `validator-config.yaml`, so only the ports actually configured for that node are opened:
897+
Ports are read from the active validator config (the `--subnets`-expanded file when `--subnets N` is used, or `validator-config.yaml` otherwise). Entries are matched by IP address, so all N subnet nodes on a server are found and all their ports are opened:
850898

851899
| Port | Protocol | Source |
852900
|---|---|---|
853-
| `quicPort` | UDP | Per-node — QUIC/P2P transport (e.g. 9001) |
854-
| `metricsPort` | TCP | Per-node — Prometheus scrape endpoint (e.g. 9095) |
855-
| `apiPort` / `httpPort` | TCP | Per-node — REST API (e.g. 5055) |
901+
| `quicPort` `quicPort+N-1` | UDP | Per-node — QUIC/P2P transport (e.g. 9001–9003 for N=3) |
902+
| `metricsPort` `metricsPort+N-1` | TCP | Per-node — Prometheus scrape endpoint |
903+
| `apiPort`/`httpPort` `+N-1` | TCP | Per-node — REST API |
856904
| 9090 | TCP | Observability — Prometheus |
857905
| 9080 | TCP | Observability — Promtail |
858906
| 9098 | TCP | Observability — cAdvisor |

ansible-devnet/genesis/validator-config.yaml

Lines changed: 21 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -54,7 +54,7 @@ validators:
5454
ip: "65.109.131.177"
5555
quic: 9001
5656
metricsPort: 9095
57-
httpPort: 5055
57+
apiPort: 5055
5858
isAggregator: false
5959
count: 1
6060

@@ -97,16 +97,28 @@ validators:
9797
isAggregator: false
9898
count: 1
9999

100-
# - name: "gean_0"
101-
# enrFields:
102-
# ip: "204.168.134.201"
103-
# quic: 9001
104-
# metricsPort: 9095
105-
# apiPort: 5055
106-
# isAggregator: false
107-
# count: 1
100+
- name: "gean_0"
101+
privkey: "df008e968231c25c3938d80fee9bcc93b4b9711312cf471c1b6f77e67ad68d08"
102+
enrFields:
103+
ip: "204.168.134.201"
104+
quic: 9001
105+
metricsPort: 9095
106+
apiPort: 5055
107+
isAggregator: false
108+
count: 1
109+
110+
- name: "nlean_0"
111+
privkey: "d94e3dc35e320440c891b66bd82d1aaf2079364162815b32c2633ecae009c84c"
112+
enrFields:
113+
ip: "95.216.164.165"
114+
quic: 9001
115+
metricsPort: 9095
116+
apiPort: 5055
117+
isAggregator: false
118+
count: 1
108119

109120
# - name: "lean_node_0"
121+
# privkey: "d94e3dc35e320440c891b66bd82d1aaf2079364162815b32c2633ecae009c84c"
110122
# enrFields:
111123
# ip: "95.217.19.42"
112124
# quic: 9001

ansible/playbooks/copy-genesis.yml

Lines changed: 17 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -153,20 +153,31 @@
153153
loop_control:
154154
label: "{{ item }}.key"
155155

156+
- name: Resolve hash-sig key files for this node
157+
vars:
158+
_av: "{{ lookup('file', genesis_dir + '/annotated_validators.yaml') | from_yaml }}"
159+
_assignments: "{{ _av[inventory_hostname] | default([]) }}"
160+
_sk_files: "{{ _assignments | map(attribute='privkey_file') | list }}"
161+
_pk_files: "{{ _sk_files | map('regex_replace', '_sk\\.ssz$', '_pk.ssz') | list }}"
162+
set_fact:
163+
node_hash_sig_files: "{{ _sk_files + _pk_files }}"
164+
when: hash_sig_keys_stat.stat.exists
165+
156166
- name: Create hash-sig-keys directory on remote
157167
file:
158168
path: "{{ actual_remote_genesis_dir }}/hash-sig-keys"
159169
state: directory
160170
mode: '0755'
161-
when: hash_sig_keys_stat.stat.exists
171+
when: hash_sig_keys_stat.stat.exists and (node_hash_sig_files | default([]) | length > 0)
162172

163-
- name: Copy hash-sig-keys directory to remote host
173+
- name: Copy hash-sig key files for this node only
164174
copy:
165-
src: "{{ genesis_dir }}/hash-sig-keys/"
166-
dest: "{{ actual_remote_genesis_dir }}/hash-sig-keys/"
167-
mode: '0644'
175+
src: "{{ genesis_dir }}/hash-sig-keys/{{ item }}"
176+
dest: "{{ actual_remote_genesis_dir }}/hash-sig-keys/{{ item }}"
177+
mode: '0600'
168178
force: yes
169-
when: hash_sig_keys_stat.stat.exists
179+
loop: "{{ node_hash_sig_files | default([]) }}"
180+
when: hash_sig_keys_stat.stat.exists and (node_hash_sig_files | default([]) | length > 0)
170181

171182
- name: List files on remote genesis directory
172183
find:

ansible/playbooks/deploy-nodes.yml

Lines changed: 27 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@
88
# - node key files (*.key)
99
# - config.yaml, validators.yaml, nodes.yaml
1010
# - genesis.ssz, genesis.json
11-
# - hash-sig-keys/ directory (if exists, for qlean nodes)
11+
# - hash-sig-keys/ directory (if exists): only the sk/pk files for this node's validators
1212

1313
- name: Parse and validate node names
1414
hosts: localhost
@@ -122,7 +122,10 @@
122122

123123
- name: Sync validator-config.yaml to remote host
124124
copy:
125-
src: "{{ local_genesis_dir }}/validator-config.yaml"
125+
# Use the expanded subnet config when --subnets was specified; fall back
126+
# to the standard validator-config.yaml otherwise. The destination is
127+
# always validator-config.yaml so client roles don't need to change.
128+
src: "{{ local_genesis_dir }}/{{ validator_config_basename | default('validator-config.yaml') }}"
126129
dest: "{{ genesis_dir }}/validator-config.yaml"
127130
mode: '0644'
128131
force: yes
@@ -165,23 +168,37 @@
165168
- deploy
166169
- sync
167170

171+
- name: Resolve hash-sig key files for this node
172+
vars:
173+
_av: "{{ lookup('file', local_genesis_dir + '/annotated_validators.yaml') | from_yaml }}"
174+
_assignments: "{{ _av[node_name] | default([]) }}"
175+
_sk_files: "{{ _assignments | map(attribute='privkey_file') | list }}"
176+
_pk_files: "{{ _sk_files | map('regex_replace', '_sk\\.ssz$', '_pk.ssz') | list }}"
177+
set_fact:
178+
node_hash_sig_files: "{{ _sk_files + _pk_files }}"
179+
when: hash_sig_keys_local.stat.exists
180+
tags:
181+
- deploy
182+
- sync
183+
168184
- name: Create hash-sig-keys directory on remote
169185
file:
170186
path: "{{ genesis_dir }}/hash-sig-keys"
171187
state: directory
172188
mode: '0755'
173-
when: hash_sig_keys_local.stat.exists
189+
when: hash_sig_keys_local.stat.exists and (node_hash_sig_files | default([]) | length > 0)
174190
tags:
175191
- deploy
176192
- sync
177193

178-
- name: Sync hash-sig-keys directory (for qlean nodes)
194+
- name: Copy hash-sig key files for this node only
179195
copy:
180-
src: "{{ local_genesis_dir }}/hash-sig-keys/"
181-
dest: "{{ genesis_dir }}/hash-sig-keys/"
182-
mode: '0644'
196+
src: "{{ local_genesis_dir }}/hash-sig-keys/{{ item }}"
197+
dest: "{{ genesis_dir }}/hash-sig-keys/{{ item }}"
198+
mode: '0600'
183199
force: yes
184-
when: hash_sig_keys_local.stat.exists
200+
loop: "{{ node_hash_sig_files | default([]) }}"
201+
when: hash_sig_keys_local.stat.exists and (node_hash_sig_files | default([]) | length > 0)
185202
tags:
186203
- deploy
187204
- sync
@@ -199,6 +216,8 @@
199216
- lighthouse
200217
- grandine
201218
- ethlambda
219+
- gean
220+
- nlean
202221
- peam
203222
- deploy
204223

ansible/playbooks/helpers/deploy-single-node.yml

Lines changed: 18 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -86,6 +86,22 @@
8686
- ethlambda
8787
- deploy
8888

89+
- name: Deploy Gean node
90+
include_role:
91+
name: gean
92+
when: client_type == "gean"
93+
tags:
94+
- gean
95+
- deploy
96+
97+
- name: Deploy Nlean node
98+
include_role:
99+
name: nlean
100+
when: client_type == "nlean"
101+
tags:
102+
- nlean
103+
- deploy
104+
89105
- name: Deploy Peam node
90106
include_role:
91107
name: peam
@@ -96,5 +112,5 @@
96112

97113
- name: Fail if unknown client type
98114
fail:
99-
msg: "Unknown client type '{{ client_type }}' for node '{{ node_name }}'. Expected: zeam, ream, qlean, lantern, lighthouse, grandine, ethlambda or peam"
100-
when: client_type not in ["zeam", "ream", "qlean", "lantern", "lighthouse", "grandine", "ethlambda", "peam"]
115+
msg: "Unknown client type '{{ client_type }}' for node '{{ node_name }}'. Expected: zeam, ream, qlean, lantern, lighthouse, grandine, ethlambda, gean, nlean or peam"
116+
when: client_type not in ["zeam", "ream", "qlean", "lantern", "lighthouse", "grandine", "ethlambda", "gean", "nlean", "peam"]

0 commit comments

Comments
 (0)