Skip to content

Conversation

@SupernaviX
Copy link
Collaborator

@SupernaviX SupernaviX commented Nov 5, 2025

Fixes #204

Implements a peer-network-interface module. This module runs the ChainSync and BlockFetch protocols against a small set of explicitly-configured peers. It follows the fork defined by the first peer in the list, but will switch to other forks if that peer disconnects.

Testing strategy was a combination of unit tests of the ChainState struct, and manual testing against three preview nodes on my laptop which I randomly killed and revived.

Includes a small architecture diagram: https://github.com/input-output-hk/acropolis/blob/sg/peer-network-interface/modules/peer_network_interface/NOTES.md

Manual testing

To test it, you can run the omnibus process using the "local" configuration:

cd processes/omnibus
cargo run -- --config omnibus-local.toml

That configuration tries connecting to three Cardano nodes running against the preview environment, on ports 3001 3002 and 3003. To create such a setup, you can use this gist https://gist.github.com/SupernaviX/16627499dae71092abeac96434e96817

Hoisted comments...

The main method I used to test fallover was starting from the origin (by setting sync-point = "origin" in omnibus-local.toml), and then using docker stop and docker start to stop and start the three cardano nodes while it synced. The module emits a log line for every 1000 messages it produces, which happens pretty rapidly when syncing from origin. It also emits log lines when a node is disconnected (and tries reconnecting every 5 seconds). So what you see in logs with this setup is

While all nodes are running, it just keeps logging that progress is being made every few seconds ("Published block 5999").
When one or two nodes stop running, it keeps logging that progress is made, and also logs that there are connection issues every 5 seconds or so.
When all nodes are stopped, it logs that there are connection issues and does not log that progress is being made.
When a previously-stopped node starts again, it stops logging connection issues (for that node), and goes back to publishing blocks if it hadn't been before.

There's other points it can start from too:

from the last block in a newly-restored snapshot
from a local cache which it writes as it syncs (very slow and unreliable until 

#341)
from the tip (pretty reliable, but blocks only come every ~20 seconds so the logs don't show many signs of life)

@buddhisthead
Copy link
Collaborator

I'd like to run this on my computer, if that's possible. Can you give some detailed instructions in the PR for how to do that, please?

@SupernaviX
Copy link
Collaborator Author

I'd like to run this on my computer, if that's possible. Can you give some detailed instructions in the PR for how to do that, please?

Sorry for the delay, I missed this yesterday. I just attached setup instructions.

@buddhisthead
Copy link
Collaborator

I'd like to run this on my computer, if that's possible. Can you give some detailed instructions in the PR for how to do that, please?

Sorry for the delay, I missed this yesterday. I just attached setup instructions.

Thanks. And how will I know that it's working? That it prints certain log messages? You mention killing and reviving the other nodes. Can you briefly describe that sequence and what I should expect?

@SupernaviX
Copy link
Collaborator Author

SupernaviX commented Nov 7, 2025

I'd like to run this on my computer, if that's possible. Can you give some detailed instructions in the PR for how to do that, please?

Sorry for the delay, I missed this yesterday. I just attached setup instructions.

Thanks. And how will I know that it's working? That it prints certain log messages? You mention killing and reviving the other nodes. Can you briefly describe that sequence and what I should expect?

Yep! The main method I used to test fallover was starting from the origin (by setting sync-point = "origin" in omnibus-local.toml), and then using docker stop and docker start to stop and start the three cardano nodes while it synced. The module emits a log line for every 1000 messages it produces, which happens pretty rapidly when syncing from origin. It also emits log lines when a node is disconnected (and tries reconnecting every 5 seconds). So what you see in logs with this setup is

  • While all nodes are running, it just keeps logging that progress is being made every few seconds ("Published block 5999").
  • When one or two nodes stop running, it keeps logging that progress is made, and also logs that there are connection issues every 5 seconds or so.
  • When all nodes are stopped, it logs that there are connection issues and does not log that progress is being made.
  • When a previously-stopped node starts again, it stops logging connection issues (for that node), and goes back to publishing blocks if it hadn't been before.

There's other points it can start from too:

  • from the last block in a newly-restored snapshot
  • from a local cache which it writes as it syncs (very slow and unreliable until Change format of upstream cache #341)
  • from the tip (pretty reliable, but blocks only come every ~20 seconds so the logs don't show many signs of life)

@buddhisthead
Copy link
Collaborator

buddhisthead commented Nov 11, 2025

thread 'main' panicked at /Users/chris/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/caryatid_process-0.12.1/src/process.rs:125:30:
called `Result::unwrap()` on an `Err` value: Pointer cache directory 'cache' does not exist.

I ran this before I started any cardano nodes. It probably shouldn't panic, but rather print something and exit nicely.

@buddhisthead
Copy link
Collaborator

Sorry, but I'm stuck on setting up the environment for this test. I got as far as trying to startup the cardano containers. Your gist hints at nice things, but doesn't explain how to use them. I realize this is sort of common knowledge for most cardano developers so perhaps point to a document somewhere. Or make a nice gist the explains it all and then just reference that in the future for testing.

@buddhisthead
Copy link
Collaborator

Okay, figured out that I have to run restore.sh before I run startup.sh to create the configuration directories. But db fails because there is no aarch64 for mithril. Seems to still startup but I imagine mithril chain fetch won't work.

@buddhisthead
Copy link
Collaborator

I was able to start the cardano images, but when I run the omni-bus process, I still get:

thread 'main' panicked at /home/parallels/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/caryatid_process-0.12.1/src/process.rs:125:30:
called `Result::unwrap()` on an `Err` value: Pointer cache directory 'cache' does not exist.

So I need some help knowing what creates the cache directory.

Copilot finished reviewing on behalf of buddhisthead November 11, 2025 04:21
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR implements a new peer-network-interface module that provides a more robust alternative to the existing upstream chain fetcher. The module uses the ChainSync and BlockFetch protocols to fetch blocks from multiple configured upstream peers, following one preferred chain while supporting graceful failover to other peers during network issues.

Key changes:

  • Introduces the PeerNetworkInterface module with event-driven architecture supporting multiple upstream peers
  • Refactors UpstreamCache into common for reuse across both upstream chain fetcher implementations
  • Adds support for the preview network in genesis bootstrapper

Reviewed Changes

Copilot reviewed 21 out of 22 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
modules/peer_network_interface/src/peer_network_interface.rs Main module implementation handling initialization, cache management, and block publishing
modules/peer_network_interface/src/network.rs NetworkManager coordinating multiple peer connections and chain state
modules/peer_network_interface/src/chain_state.rs ChainState tracking block announcements, rollbacks, and publishing queue across multiple peers
modules/peer_network_interface/src/connection.rs PeerConnection managing individual peer connections using ChainSync and BlockFetch protocols
modules/peer_network_interface/src/configuration.rs Configuration loading and sync point options
modules/peer_network_interface/config.default.toml Default configuration with mainnet backbone nodes
modules/peer_network_interface/Cargo.toml Package definition and dependencies
modules/peer_network_interface/README.md Module documentation and usage guide
modules/peer_network_interface/NOTES.md Architecture diagram and design notes
common/src/upstream_cache.rs Refactored cache implementation moved from upstream_chain_fetcher for reuse
common/src/lib.rs Export upstream_cache module
common/src/genesis_values.rs Added kebab-case serde attribute for configuration deserialization
modules/upstream_chain_fetcher/src/upstream_chain_fetcher.rs Updated to use refactored UpstreamCache from common
modules/upstream_chain_fetcher/src/body_fetcher.rs Updated imports for UpstreamCache
modules/genesis_bootstrapper/src/genesis_bootstrapper.rs Added preview network genesis support
modules/genesis_bootstrapper/build.rs Download preview network genesis files
processes/omnibus/src/main.rs Register PeerNetworkInterface module
processes/omnibus/Cargo.toml Add peer_network_interface dependency
processes/omnibus/omnibus-local.toml Local configuration for testing with preview network
processes/omnibus/.gitignore Ignore upstream-cache directory
Cargo.toml Add peer_network_interface to workspace members
Cargo.lock Lock file updates for new module

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@SupernaviX
Copy link
Collaborator Author

I was able to start the cardano images, but when I run the omni-bus process, I still get:

thread 'main' panicked at /home/parallels/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/caryatid_process-0.12.1/src/process.rs:125:30:
called `Result::unwrap()` on an `Err` value: Pointer cache directory 'cache' does not exist.

So I need some help knowing what creates the cache directory.

It looked like the stake_delta_filter module was supposed to do this, and wasn't. it now does.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Multi-peer - Basic

3 participants