Skip to content

Conversation

@ch1bo
Copy link
Member

@ch1bo ch1bo commented Oct 27, 2025

Adds a chapter that should set the context, what is important, and provide plenty links to past/related work on the network/consensus design of Cardano.

Tagged @coot @pagio and @nfrisby as possible reviewers - you are authors on some of the linked documents and maybe have an opinion here.

@ch1bo ch1bo requested review from coot, nfrisby and pagio October 27, 2025 10:57
@ch1bo ch1bo changed the base branch from main to ch1bo/dependencies-interactions October 27, 2025 10:58
@ch1bo ch1bo changed the title Leios design: introduction and overview chapter Leios design: Write introduction and overview chapter Oct 27, 2025
@ch1bo ch1bo linked an issue Oct 27, 2025 that may be closed by this pull request
12 tasks
> TODO: (re-)introduce the main protocol flow of Leios?
As was the case for the [Praos variant of Ouroboros](https://ouroboros-network.cardano.intersectmbo.org/pdfs/network-design/network-design.pdf#subsection.5.1), the specification embodied in the published and peer-reviewed [research paper for Ouroboros Leios](https://eprint.iacr.org/2025/1115.pdf) was not intended to be directly implementable. Initial research and development studies confirmed this expectation, identifying several unsolved problems with the fully concurrent block production design when considering the concrete Cardano ledger and what consequences this would have (TODO: cite suitable R&D reports, [Tech Report #2](https://github.com/input-output-hk/ouroboros-leios/blob/main/docs/technical-report-2.md#conflicts-ledger-and-incentives)).

The design presented in [CIP-164](https://github.com/cardano-scaling/CIPs/blob/leios/CIP-0164/README.md), also known as "Linear Leios", addresses these implementation challenges by focusing on the core insight of better utilizing network and computational resources during the necessary and eponymous "calm periods" of the Praos protocol. This approach provides an immediately implementable design that can deliver orders of magnitude higher throughput while preserving the security guarantees that make Cardano valuable.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Linear Leios doesn't utilise the network any better than Praos. It uses bursts of traffic, as Praos does, but with even higher bursts than Praos does - from that perspective, it's strictly worse than Praos. So we need to be a bit more careful here, what about:

Although Linear Leios does not improve the burstiness of the Praos protocol, it will utilise unused bandwidth during long gaps between Praos blocks. For better utilisation of the underlying TCP protocol, a protocol which uses constant high pressure on the network is required (e.g. some form of full Leios).

Copy link
Member Author

@ch1bo ch1bo Oct 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Linear Leios doesn't utilise the network any better than Praos. It uses bursts of traffic, as Praos does, but with even higher bursts than Praos does - from that perspective, it's strictly worse than Praos

Hm, interesting take that Leios would have higher bursts than Praos. I'd like to challenge this viewpoint: In Leios we communicate five things: Ranking blocks which announce/certify EBs, votes that lead to certificates, transactions submitted by users, EBs, any "missing transactions".

A certifying ranking block (~10kB) is about 90% smaller than a full Praos block (90kB); if not certifying ranking block = praos block.

Voting on EBs could be considered bursts of network traffic (in the order of ~50kB per round), as they are created at the same time, however votes are going to be created distributed across the network and this should even it out further than the single source full block one would expect in Praos?

Which leaves us with transaction submission and diffusion of EBs + missing transactions: how much of the transaction submission is impacting network traffic depends on our understanding of the load scenario - whether traffic would be organic, artificial, coming from one or many sources, etc. - it is what it is; now, under any given high demand (otherwise we'd be using only praos), Leios will only result in the overhead of EBs (~40 bytes per tx) and re-submission of a subset of endorsed transactions (the ones which were not diffused originally during submission). This means, in the average case, this is minimal, and in the worst case it boils down to the same burstiness as Praos would have at this load!?

For better utilisation of the underlying TCP protocol, a protocol which uses constant high pressure on the network is required (e.g. some form of full Leios

Isn't this the case already due to transaction submission, no matter how consensus is achieved?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the happy case, sure, but we should really be focused on improving the worst-case scenario, which, with linear Leios, can lead to 12MB of fresh data being downloaded over a short period, with the TCP window closed due to an idle period.

Copy link
Member Author

@ch1bo ch1bo Oct 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with designing for the worst case (that's why I wrote that section in this PR!). The worst case for Leios is not the same as the worst case for Praos though - is it? What is the worst case scenario for EB diffusion?

I see two situations where urgent fetching of big transaction closures is required:

  1. A block is (maliciously) produced that announces a full EB with completely unknown transactions
  2. A block is produced that certifies a full EB with completely unknown transactions

Case 1. is only possibly affecting high throughput and would only be problematic if a) most stake is acting like this and/or b) this can be caused by a network attacker (e.g. by eclipsing / partition the majority of the network).

Only 2. is on the critical path in Leios and needs to happen within worst case $\Delta_EB$ (to not affect safety). However, in this situation, it is guaranteed that $\tau - \alpha_A$ honest stake has the data available. Now, the one million dollar question is: would we only have a single upstream peer with a closed TCP window to reach that honest stake?

If not, that worst case scenario is strictly less bad as you paint it. Am I missing something / is my thinking flawed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Meanwhile I have refined the wording in this paragraph: d9c975d. Is this better?


The implementation of Leios must be understood in the context of the Cardano node as a concurrent, reactive system operating under real-time constraints in an adversarial environment. While "real-time" in this context does not refer to the millisecond-level hard deadlines found in industrial control systems, timely action nontheless remains crucial to protocol success and network security.

The currently deployed Praos implementation establishes clear [data diffusion targets](https://ouroboros-network.cardano.intersectmbo.org/pdfs/network-design/network-design.pdf#subsection.5.1): blocks must reach 95% of nodes within the 5-second $\Delta$ parameter, with target performance at 98% and stretch goals at 99%. While these are comfortably achieved most of the time, blocks are regularly adopted within 1 second across the network, there are some situations even in the current system where the target is not reached. For example, due to reward calculations happening at the epoch boundary.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A digression: we've noticed some improvements in cardano-node-10.3 thanks to optimisations in the ledger - we should do more of that!

A discrete event simulation implemented in Rust, models Leios message exchanges between nodes, abstracting lower-level details for speed—running orders of magnitude faster than real time to enable statistical analysis over thousands of runs with complete observability and arbitrary adversarial behavior injection. This validates security arguments by systematically exploring protocol behavior under varying loads, expected data diffusion in small to medium sized network topologies, or adversarial scenarios like data withholding, and exploration of protocol parameters before testnet deployment.

Another Haskell-based simulation using IOSim and the actual network framework used in the `cardano-node`. This reduces model-implementation divergence while enabling studies of the dynamic behavior and resource management in detail. While IOSim is used in the existing network and consensus layers through property-based testing, and extends naturally to Leios components, the simulator built from this was not able to scale to large networks.
Another Haskell-based simulation using IOSim and the actual network framework used in the `cardano-node`. This reduces model-implementation divergence while enabling studies of the dynamic behavior and resource management in detail. While IOSim is used in the existing network and consensus layers through property-based testing, and extends naturally to Leios components, the simulator built from this was not able to scale to large networks.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to know what has prevented the simulation from scaling to a large network. Was it io-sim performance, or something else?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bwbush @nfrisby Do you recall what made the Haskell sim slow?

@ch1bo ch1bo force-pushed the ch1bo/dependencies-interactions branch from 0f7df84 to 5e93857 Compare October 29, 2025 08:44
@ch1bo ch1bo force-pushed the ch1bo/design-overview branch from 900273e to 73fd4b5 Compare October 29, 2025 08:47
@ch1bo ch1bo force-pushed the ch1bo/dependencies-interactions branch from 5e93857 to c536295 Compare October 29, 2025 08:54
@ch1bo ch1bo force-pushed the ch1bo/design-overview branch from 73fd4b5 to 445ad50 Compare October 29, 2025 08:54
@ch1bo ch1bo force-pushed the ch1bo/design-overview branch from 445ad50 to 958ba2c Compare October 29, 2025 09:19
@ch1bo ch1bo force-pushed the ch1bo/design-overview branch from ee87a8f to d048a4a Compare October 29, 2025 09:38
@ch1bo ch1bo requested review from coot and pagio October 29, 2025 09:39
@ch1bo ch1bo force-pushed the ch1bo/design-overview branch from d048a4a to 2ef5fa4 Compare October 29, 2025 09:57
@ch1bo ch1bo force-pushed the ch1bo/design-overview branch from 2ef5fa4 to 08201a3 Compare October 29, 2025 10:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Write technical design and implementation plan

4 participants