-
Couldn't load subscription status.
- Fork 9
Leios design: Write introduction and overview chapter #597
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: ch1bo/dependencies-interactions
Are you sure you want to change the base?
Conversation
docs/leios-design/README.md
Outdated
| > TODO: (re-)introduce the main protocol flow of Leios? | ||
| As was the case for the [Praos variant of Ouroboros](https://ouroboros-network.cardano.intersectmbo.org/pdfs/network-design/network-design.pdf#subsection.5.1), the specification embodied in the published and peer-reviewed [research paper for Ouroboros Leios](https://eprint.iacr.org/2025/1115.pdf) was not intended to be directly implementable. Initial research and development studies confirmed this expectation, identifying several unsolved problems with the fully concurrent block production design when considering the concrete Cardano ledger and what consequences this would have (TODO: cite suitable R&D reports, [Tech Report #2](https://github.com/input-output-hk/ouroboros-leios/blob/main/docs/technical-report-2.md#conflicts-ledger-and-incentives)). | ||
|
|
||
| The design presented in [CIP-164](https://github.com/cardano-scaling/CIPs/blob/leios/CIP-0164/README.md), also known as "Linear Leios", addresses these implementation challenges by focusing on the core insight of better utilizing network and computational resources during the necessary and eponymous "calm periods" of the Praos protocol. This approach provides an immediately implementable design that can deliver orders of magnitude higher throughput while preserving the security guarantees that make Cardano valuable. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Linear Leios doesn't utilise the network any better than Praos. It uses bursts of traffic, as Praos does, but with even higher bursts than Praos does - from that perspective, it's strictly worse than Praos. So we need to be a bit more careful here, what about:
Although Linear Leios does not improve the burstiness of the Praos protocol, it will utilise unused bandwidth during long gaps between Praos blocks. For better utilisation of the underlying TCP protocol, a protocol which uses constant high pressure on the network is required (e.g. some form of full Leios).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Linear Leios doesn't utilise the network any better than Praos. It uses bursts of traffic, as Praos does, but with even higher bursts than Praos does - from that perspective, it's strictly worse than Praos
Hm, interesting take that Leios would have higher bursts than Praos. I'd like to challenge this viewpoint: In Leios we communicate five things: Ranking blocks which announce/certify EBs, votes that lead to certificates, transactions submitted by users, EBs, any "missing transactions".
A certifying ranking block (~10kB) is about 90% smaller than a full Praos block (90kB); if not certifying ranking block = praos block.
Voting on EBs could be considered bursts of network traffic (in the order of ~50kB per round), as they are created at the same time, however votes are going to be created distributed across the network and this should even it out further than the single source full block one would expect in Praos?
Which leaves us with transaction submission and diffusion of EBs + missing transactions: how much of the transaction submission is impacting network traffic depends on our understanding of the load scenario - whether traffic would be organic, artificial, coming from one or many sources, etc. - it is what it is; now, under any given high demand (otherwise we'd be using only praos), Leios will only result in the overhead of EBs (~40 bytes per tx) and re-submission of a subset of endorsed transactions (the ones which were not diffused originally during submission). This means, in the average case, this is minimal, and in the worst case it boils down to the same burstiness as Praos would have at this load!?
For better utilisation of the underlying TCP protocol, a protocol which uses constant high pressure on the network is required (e.g. some form of full Leios
Isn't this the case already due to transaction submission, no matter how consensus is achieved?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the happy case, sure, but we should really be focused on improving the worst-case scenario, which, with linear Leios, can lead to 12MB of fresh data being downloaded over a short period, with the TCP window closed due to an idle period.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with designing for the worst case (that's why I wrote that section in this PR!). The worst case for Leios is not the same as the worst case for Praos though - is it? What is the worst case scenario for EB diffusion?
I see two situations where urgent fetching of big transaction closures is required:
- A block is (maliciously) produced that announces a full EB with completely unknown transactions
- A block is produced that certifies a full EB with completely unknown transactions
Case 1. is only possibly affecting high throughput and would only be problematic if a) most stake is acting like this and/or b) this can be caused by a network attacker (e.g. by eclipsing / partition the majority of the network).
Only 2. is on the critical path in Leios and needs to happen within worst case
If not, that worst case scenario is strictly less bad as you paint it. Am I missing something / is my thinking flawed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Meanwhile I have refined the wording in this paragraph: d9c975d. Is this better?
|
|
||
| The implementation of Leios must be understood in the context of the Cardano node as a concurrent, reactive system operating under real-time constraints in an adversarial environment. While "real-time" in this context does not refer to the millisecond-level hard deadlines found in industrial control systems, timely action nontheless remains crucial to protocol success and network security. | ||
|
|
||
| The currently deployed Praos implementation establishes clear [data diffusion targets](https://ouroboros-network.cardano.intersectmbo.org/pdfs/network-design/network-design.pdf#subsection.5.1): blocks must reach 95% of nodes within the 5-second $\Delta$ parameter, with target performance at 98% and stretch goals at 99%. While these are comfortably achieved most of the time, blocks are regularly adopted within 1 second across the network, there are some situations even in the current system where the target is not reached. For example, due to reward calculations happening at the epoch boundary. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A digression: we've noticed some improvements in cardano-node-10.3 thanks to optimisations in the ledger - we should do more of that!
| A discrete event simulation implemented in Rust, models Leios message exchanges between nodes, abstracting lower-level details for speed—running orders of magnitude faster than real time to enable statistical analysis over thousands of runs with complete observability and arbitrary adversarial behavior injection. This validates security arguments by systematically exploring protocol behavior under varying loads, expected data diffusion in small to medium sized network topologies, or adversarial scenarios like data withholding, and exploration of protocol parameters before testnet deployment. | ||
|
|
||
| Another Haskell-based simulation using IOSim and the actual network framework used in the `cardano-node`. This reduces model-implementation divergence while enabling studies of the dynamic behavior and resource management in detail. While IOSim is used in the existing network and consensus layers through property-based testing, and extends naturally to Leios components, the simulator built from this was not able to scale to large networks. | ||
| Another Haskell-based simulation using IOSim and the actual network framework used in the `cardano-node`. This reduces model-implementation divergence while enabling studies of the dynamic behavior and resource management in detail. While IOSim is used in the existing network and consensus layers through property-based testing, and extends naturally to Leios components, the simulator built from this was not able to scale to large networks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to know what has prevented the simulation from scaling to a large network. Was it io-sim performance, or something else?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
0f7df84 to
5e93857
Compare
900273e to
73fd4b5
Compare
5e93857 to
c536295
Compare
73fd4b5 to
445ad50
Compare
445ad50 to
958ba2c
Compare
ee87a8f to
d048a4a
Compare
d048a4a to
2ef5fa4
Compare
2ef5fa4 to
08201a3
Compare
Adds a chapter that should set the context, what is important, and provide plenty links to past/related work on the network/consensus design of Cardano.
Tagged @coot @pagio and @nfrisby as possible reviewers - you are authors on some of the linked documents and maybe have an opinion here.