Implement PeerNetworkInterface #339

SupernaviX · 2025-11-05T01:22:24Z

Fixes #204

Implements a peer-network-interface module. This module runs the ChainSync and BlockFetch protocols against a small set of explicitly-configured peers. It follows the fork defined by the first peer in the list, but will switch to other forks if that peer disconnects.

Testing strategy was a combination of unit tests of the ChainState struct, and manual testing against three preview nodes on my laptop which I randomly killed and revived.

Includes a small architecture diagram: https://github.com/input-output-hk/acropolis/blob/sg/peer-network-interface/modules/peer_network_interface/NOTES.md

Manual testing

To test it, you can run the omnibus process using the "local" configuration:

cd processes/omnibus
cargo run -- --config omnibus-local.toml

That configuration tries connecting to three Cardano nodes running against the preview environment, on ports 3001 3002 and 3003. To create such a setup, you can use this gist https://gist.github.com/SupernaviX/16627499dae71092abeac96434e96817

Hoisted comments...

The main method I used to test fallover was starting from the origin (by setting sync-point = "origin" in omnibus-local.toml), and then using docker stop and docker start to stop and start the three cardano nodes while it synced. The module emits a log line for every 1000 messages it produces, which happens pretty rapidly when syncing from origin. It also emits log lines when a node is disconnected (and tries reconnecting every 5 seconds). So what you see in logs with this setup is

While all nodes are running, it just keeps logging that progress is being made every few seconds ("Published block 5999").
When one or two nodes stop running, it keeps logging that progress is made, and also logs that there are connection issues every 5 seconds or so.
When all nodes are stopped, it logs that there are connection issues and does not log that progress is being made.
When a previously-stopped node starts again, it stops logging connection issues (for that node), and goes back to publishing blocks if it hadn't been before.

There's other points it can start from too:

from the last block in a newly-restored snapshot
from a local cache which it writes as it syncs (very slow and unreliable until

#341)
from the tip (pretty reliable, but blocks only come every ~20 seconds so the logs don't show many signs of life)

buddhisthead · 2025-11-06T16:11:23Z

I'd like to run this on my computer, if that's possible. Can you give some detailed instructions in the PR for how to do that, please?

SupernaviX · 2025-11-07T15:56:42Z

I'd like to run this on my computer, if that's possible. Can you give some detailed instructions in the PR for how to do that, please?

Sorry for the delay, I missed this yesterday. I just attached setup instructions.

buddhisthead · 2025-11-07T17:34:04Z

I'd like to run this on my computer, if that's possible. Can you give some detailed instructions in the PR for how to do that, please?

Sorry for the delay, I missed this yesterday. I just attached setup instructions.

Thanks. And how will I know that it's working? That it prints certain log messages? You mention killing and reviving the other nodes. Can you briefly describe that sequence and what I should expect?

SupernaviX · 2025-11-07T19:14:32Z

I'd like to run this on my computer, if that's possible. Can you give some detailed instructions in the PR for how to do that, please?

Sorry for the delay, I missed this yesterday. I just attached setup instructions.

Thanks. And how will I know that it's working? That it prints certain log messages? You mention killing and reviving the other nodes. Can you briefly describe that sequence and what I should expect?

Yep! The main method I used to test fallover was starting from the origin (by setting sync-point = "origin" in omnibus-local.toml), and then using docker stop and docker start to stop and start the three cardano nodes while it synced. The module emits a log line for every 1000 messages it produces, which happens pretty rapidly when syncing from origin. It also emits log lines when a node is disconnected (and tries reconnecting every 5 seconds). So what you see in logs with this setup is

While all nodes are running, it just keeps logging that progress is being made every few seconds ("Published block 5999").
When one or two nodes stop running, it keeps logging that progress is made, and also logs that there are connection issues every 5 seconds or so.
When all nodes are stopped, it logs that there are connection issues and does not log that progress is being made.
When a previously-stopped node starts again, it stops logging connection issues (for that node), and goes back to publishing blocks if it hadn't been before.

There's other points it can start from too:

from the last block in a newly-restored snapshot
from a local cache which it writes as it syncs (very slow and unreliable until Change format of upstream cache #341)
from the tip (pretty reliable, but blocks only come every ~20 seconds so the logs don't show many signs of life)

buddhisthead · 2025-11-11T03:17:03Z

thread 'main' panicked at /Users/chris/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/caryatid_process-0.12.1/src/process.rs:125:30:
called `Result::unwrap()` on an `Err` value: Pointer cache directory 'cache' does not exist.

I ran this before I started any cardano nodes. It probably shouldn't panic, but rather print something and exit nicely.

buddhisthead · 2025-11-11T04:04:36Z

Sorry, but I'm stuck on setting up the environment for this test. I got as far as trying to startup the cardano containers. Your gist hints at nice things, but doesn't explain how to use them. I realize this is sort of common knowledge for most cardano developers so perhaps point to a document somewhere. Or make a nice gist the explains it all and then just reference that in the future for testing.

buddhisthead · 2025-11-11T04:13:21Z

Okay, figured out that I have to run restore.sh before I run startup.sh to create the configuration directories. But db fails because there is no aarch64 for mithril. Seems to still startup but I imagine mithril chain fetch won't work.

buddhisthead · 2025-11-11T04:19:25Z

I was able to start the cardano images, but when I run the omni-bus process, I still get:

thread 'main' panicked at /home/parallels/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/caryatid_process-0.12.1/src/process.rs:125:30:
called `Result::unwrap()` on an `Err` value: Pointer cache directory 'cache' does not exist.

So I need some help knowing what creates the cache directory.

Copilot

Pull Request Overview

This PR implements a new peer-network-interface module that provides a more robust alternative to the existing upstream chain fetcher. The module uses the ChainSync and BlockFetch protocols to fetch blocks from multiple configured upstream peers, following one preferred chain while supporting graceful failover to other peers during network issues.

Key changes:

Introduces the PeerNetworkInterface module with event-driven architecture supporting multiple upstream peers
Refactors UpstreamCache into common for reuse across both upstream chain fetcher implementations
Adds support for the preview network in genesis bootstrapper

Reviewed Changes

Copilot reviewed 21 out of 22 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
modules/peer_network_interface/src/peer_network_interface.rs	Main module implementation handling initialization, cache management, and block publishing
modules/peer_network_interface/src/network.rs	NetworkManager coordinating multiple peer connections and chain state
modules/peer_network_interface/src/chain_state.rs	ChainState tracking block announcements, rollbacks, and publishing queue across multiple peers
modules/peer_network_interface/src/connection.rs	PeerConnection managing individual peer connections using ChainSync and BlockFetch protocols
modules/peer_network_interface/src/configuration.rs	Configuration loading and sync point options
modules/peer_network_interface/config.default.toml	Default configuration with mainnet backbone nodes
modules/peer_network_interface/Cargo.toml	Package definition and dependencies
modules/peer_network_interface/README.md	Module documentation and usage guide
modules/peer_network_interface/NOTES.md	Architecture diagram and design notes
common/src/upstream_cache.rs	Refactored cache implementation moved from upstream_chain_fetcher for reuse
common/src/lib.rs	Export upstream_cache module
common/src/genesis_values.rs	Added kebab-case serde attribute for configuration deserialization
modules/upstream_chain_fetcher/src/upstream_chain_fetcher.rs	Updated to use refactored UpstreamCache from common
modules/upstream_chain_fetcher/src/body_fetcher.rs	Updated imports for UpstreamCache
modules/genesis_bootstrapper/src/genesis_bootstrapper.rs	Added preview network genesis support
modules/genesis_bootstrapper/build.rs	Download preview network genesis files
processes/omnibus/src/main.rs	Register PeerNetworkInterface module
processes/omnibus/Cargo.toml	Add peer_network_interface dependency
processes/omnibus/omnibus-local.toml	Local configuration for testing with preview network
processes/omnibus/.gitignore	Ignore upstream-cache directory
Cargo.toml	Add peer_network_interface to workspace members
Cargo.lock	Lock file updates for new module

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

modules/peer_network_interface/src/network.rs

modules/peer_network_interface/src/chain_state.rs

processes/omnibus/omnibus-local.toml

modules/peer_network_interface/README.md

SupernaviX · 2025-11-11T15:19:18Z

I was able to start the cardano images, but when I run the omni-bus process, I still get:
thread 'main' panicked at /home/parallels/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/caryatid_process-0.12.1/src/process.rs:125:30:
called `Result::unwrap()` on an `Err` value: Pointer cache directory 'cache' does not exist.
So I need some help knowing what creates the cache directory.

It looked like the stake_delta_filter module was supposed to do this, and wasn't. it now does.

SupernaviX added 15 commits November 3, 2025 14:47

feat: implementation of PeerNetworkInterface

7390aad

fix: blockfetch uses slots

2281311

feat: test PeerNetworkInterface in omnibus

2d1c5b1

fix: gracefully handle dropped connections

41ee5de

fix: handle rollbacks when switching chains

e085283

feat: integrate upstream cache

ece87d3

fix: handle cache errors more gracefully

4647b72

feat: extract chain logic to separate helper

dabb272

fix: gracefully handle disconnection during in-flight block reqs

6159534

fix: correctly implement and test chain switching

a91d460

docs: docs

21f7e67

Merge branch 'main' into sg/peer-network-interface

bf8a3b7

fix: credit where credit is due

03ec421

fix: finish a sentence

f9b1a4e

fix: run fmt on whole project

b0c5fc7

SupernaviX requested review from buddhisthead, sandtreader and shd November 5, 2025 01:31

fix: gracefully handle connecting to lagging-behind peer

1b85707

github-actions bot mentioned this pull request Nov 10, 2025

📊 Weekly Status - Week of 2025-11-10 #355

Closed

buddhisthead requested a review from Copilot November 11, 2025 04:19

Copilot started reviewing on behalf of buddhisthead November 11, 2025 04:19 View session

Copilot finished reviewing on behalf of buddhisthead November 11, 2025 04:21

Copilot AI reviewed Nov 11, 2025

View reviewed changes

SupernaviX added 4 commits November 11, 2025 09:34

Merge branch 'main' into sg/peer-network-interface

89bf544

fix: fix cargo-shear issues

41d802a

fix: make stake-delta-filter create cache as needed

3c87d61

fix: address copilot comments

212ca55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Implement PeerNetworkInterface #339

Implement PeerNetworkInterface #339

SupernaviX commented Nov 5, 2025 •

edited by buddhisthead

Loading

Uh oh!

buddhisthead commented Nov 6, 2025

Uh oh!

SupernaviX commented Nov 7, 2025

Uh oh!

buddhisthead commented Nov 7, 2025

Uh oh!

SupernaviX commented Nov 7, 2025 •

edited

Loading

Uh oh!

buddhisthead commented Nov 11, 2025 •

edited

Loading

Uh oh!

buddhisthead commented Nov 11, 2025

Uh oh!

buddhisthead commented Nov 11, 2025

Uh oh!

buddhisthead commented Nov 11, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

SupernaviX commented Nov 11, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Implement PeerNetworkInterface #339

Are you sure you want to change the base?

Implement PeerNetworkInterface #339

Conversation

SupernaviX commented Nov 5, 2025 • edited by buddhisthead Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Manual testing

Hoisted comments...

Uh oh!

buddhisthead commented Nov 6, 2025

Uh oh!

SupernaviX commented Nov 7, 2025

Uh oh!

buddhisthead commented Nov 7, 2025

Uh oh!

SupernaviX commented Nov 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

buddhisthead commented Nov 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

buddhisthead commented Nov 11, 2025

Uh oh!

buddhisthead commented Nov 11, 2025

Uh oh!

buddhisthead commented Nov 11, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

SupernaviX commented Nov 11, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

SupernaviX commented Nov 5, 2025 •

edited by buddhisthead

Loading

SupernaviX commented Nov 7, 2025 •

edited

Loading

buddhisthead commented Nov 11, 2025 •

edited

Loading