Skip to content

feat!: concrete errors #3161

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 57 commits into
base: main
Choose a base branch
from
Open

feat!: concrete errors #3161

wants to merge 57 commits into from

Conversation

dignifiedquire
Copy link
Contributor

@dignifiedquire dignifiedquire commented Feb 3, 2025

Description

Fixes #2741

Breaking Changes

Notes & open questions

Change checklist

  • Self-review.
  • Documentation updates following the style guide, if relevant.
  • Tests if relevant.
  • All breaking changes documented.

@@ -37,6 +37,48 @@ pub(crate) mod streams;
#[cfg(not(wasm_browser))]
mod util;

/// Client related errors
#[allow(missing_docs)]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think there is much value in documenting these variants currently, which is why I am skipping them

Copy link

github-actions bot commented Feb 3, 2025

Documentation for this PR has been generated and is available at: https://n0-computer.github.io/iroh/pr/3161/docs/iroh/

Last updated: 2025-05-21T16:11:43Z

Copy link

github-actions bot commented Feb 3, 2025

Netsim report & logs for this PR have been generated and is available at: LOGS
This report will remain available for 3 days.

Last updated for commit: 2a8ce1b

@matheus23 matheus23 changed the title feat(iroh-relay)!: use concret errors for the client part of iroh-relay feat(iroh-relay)!: use concrete errors for the client part of iroh-relay Feb 3, 2025
Copy link
Contributor

@flub flub left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think fundamentally I'm not happy with these giant enum errors. I think at every point an error can be returned, each error variant of the type must be possible. The approach here just makes every source error a variant without any concern on whether this really helps anything.

@dignifiedquire dignifiedquire marked this pull request as draft February 3, 2025 17:13
@dignifiedquire dignifiedquire marked this pull request as ready for review February 17, 2025 19:43
Copy link
Member

@matheus23 matheus23 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm having a little bit of trouble evaluating this PR. Not sure how to decide what kinds of errors are good / how to best group them etc.

I liked the change towards .expect in some places, where it'd really just be a bug in our code if there were an error.

Anyhow - given we want to move forward with this, we should get this into main. I can see some git conflicts on the horizon for two of my WIP branches 🙃

@dignifiedquire
Copy link
Contributor Author

I'm having a little bit of trouble evaluating this PR. Not sure how to decide what kinds of errors are good / how to best group them etc.

I think this is something we should explore, there is no fixed rule that works for every project I have found so far.

Copy link
Contributor

@flub flub left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am struggling to figure out how to be constructive on this. I still don't particularly like this error style. I think it does not think enough about the human consumer. The error variants inside single enums are often of widely different "types".

OTOH I don't feel like particularly holding this up. Though I am a little worried if it becomes the model we use everywhere. If I were to properly handle this I'd create a counter-proposal as PR... but that takes time. Which I guess is part of the question, my approach would be a much more time-intensive I expect as it really asks for a deep understanding of each API.

Anyway, eh, feel free to merge?

@@ -9,8 +9,8 @@ use std::{
};

use curve25519_dalek::edwards::CompressedEdwardsY;
pub use ed25519_dalek::Signature;
use ed25519_dalek::{SignatureError, SigningKey, VerifyingKey};
pub use ed25519_dalek::{Signature, SignatureError};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will there be any regrets about re-exporting these without wrapping?

#[error("Invalid relay URL {0}")]
InvalidRelayUrl(Url),
#[error(transparent)]
Websocket(#[from] tokio_tungstenite_wasm::Error),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this error is self-describing. Maybe an #[error("Websocket transport error: {0}")] would be better? (If that is the correct interpretation)

Which immediately makes me wonder "but what if I'm not using websocket"? Why is there no generic IO or Transport error this falls under?

InvalidTlsServername,
#[error("No local address available")]
NoLocalAddr,
#[error("tls connection failed")]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#[error("tls connection failed")]
#[error("TLS connection failed")]

#[allow(missing_docs)]
#[derive(Debug, thiserror::Error)]
#[non_exhaustive]
pub enum DialError {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reading just the error types I am rather confused what the difference between ConnectError and DialError is. Sounds like the same to me.

It seems DialError is a variant of ConnectError so maybe it's more of a ConnectionError? But then also ConnectError is returned from Client::connect so it probably was meant as ConnectError 😵‍💫

Comment on lines 55 to 90
#[error(transparent)]
Websocket(#[from] tokio_tungstenite_wasm::Error),
#[error("invalid protocol message encoding")]
InvalidProtocolMessageEncoding,
#[error("Unexpected frame received {0}")]
UnexpectedFrame(crate::protos::relay::FrameType),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Websocket seems like part of the Io variant?
And InvalidProtocolMessageEncoding and UnexpectedFrame are part of the Protocol variant?

I think the io/protocol distinction is pretty decent on a high-level and that was what I was going for in the ConnSendError.

Also it might be possible to have ConnSendError and RecvError be the same type?

@dignifiedquire dignifiedquire marked this pull request as draft February 18, 2025 14:48
@dignifiedquire
Copy link
Contributor Author

Unfortunately I found a massive drawback to using thiserror type errors. You loose backtrace information for most errors that are wrapped, including std errors. This results in quite an awful debugging experience, especially in a test runner setup.

The reasons for this is that tldr rust-lang/rust#99301 is not stable yet and error libraries can't propagate backtrace information properly. This works with anyhow because it uses a singular error type, which can poperly propagate backtraces.

Until I can find a solution for this, using error enums is for now unacceptable.

If you ask, Yes I probably looked at your favourite error handling library, they all fail to do this on STABLE rust, including

  • eyre and it's various reporters
  • thiserror
  • snafu
  • anyhow combined with thiserror or snafu
  • testresult

dignifiedquire and others added 8 commits April 29, 2025 14:44
## Description

Use a new macro from nested-enum-utils to DRY the strum error enums. One
example.

## Breaking Changes

Adds a dependency.

## Notes & open questions

Should we reexport common_fields from n0_snafu?

## Change checklist
<!-- Remove any that are not relevant. -->
- [ ] Self-review.
- [ ] Documentation updates following the [style
guide](https://rust-lang.github.io/rfcs/1574-more-api-documentation-conventions.html#appendix-a-full-conventions-text),
if relevant.
- [ ] Tests if relevant.
- [ ] All breaking changes documented.
- [ ] List all breaking changes in the above "Breaking Changes" section.
- [ ] Open an issue or PR on any number0 repos that are affected by this
breaking change. Give guidance on how the updates should be handled or
do the actual updates themselves. The major ones are:
    - [ ] [`quic-rpc`](https://github.com/n0-computer/quic-rpc)
    - [ ] [`iroh-gossip`](https://github.com/n0-computer/iroh-gossip)
    - [ ] [`iroh-blobs`](https://github.com/n0-computer/iroh-blobs)
    - [ ] [`dumbpipe`](https://github.com/n0-computer/dumbpipe)
    - [ ] [`sendme`](https://github.com/n0-computer/sendme)
@dignifiedquire dignifiedquire marked this pull request as ready for review May 7, 2025 10:55
@dignifiedquire
Copy link
Contributor Author

marking as ready for review, as this needs mostly review and cleanup now

pull bot pushed a commit to ApeWorX/iroh that referenced this pull request May 7, 2025
…ing the expected `NodeId` out of the `ServerName` in verifiers (n0-computer#3290)

## Description

This is code that will both update deps like n0-computer#3288 but also makes 0-RTT
tests pass again.

They were failing due to a [change in
`rustls`](https://github.com/rustls/rustls/releases/tag/v%2F0.23.24):
> Behavior change: Clients no longer offer resumption between different
`ClientConfig`s that share a resumption store but do not share server
certificate verification and client authentication credentials. If you
share a resumption store between multiple `ClientConfig`s, please ensure
their server certificate verification and client authentication
credentials are also shared. Please read the new documentation on the
`ClientConfig::resumption` item for details.
>
> Additionally, if you share a resumption store or ticketer between
multiple `ServerConfig`s, please see the new documentation on
`ServerConfig` about this.

Essentially what they're doing is [checking that the certificate
verifiers are `Arc::prt_eq` and depending on that allow or disallow
0-RTT](https://github.com/rustls/rustls/pull/2361/files#diff-9d58cb90e1df25604b88544aca7ea999ccd23bc5657384e196e77f407c0f4acfR256).

When running the tests with rustls logging enabled, I confirmed that
this is indeed what's happening:
> [TRACE rustls::msgs::persist] resumption not allowed between different
ServerCertVerifiers

---

The fix isn't straight-forward.
The reason we're creating new `ServerVerifier`s every time is that we're
passing in the expected remote node ID every time.

There used to be no other way of grabbing the other node's ID in the
verifier, until, coincidentally n0-computer#3146 landed, which changed the server
name that's set when connecting from always being `localhost` to
`<base32 NodeId>.iroh.invalid`.

I'm now pulling in that value from the `ServerName` that's passed to the
`ServerCertificateVerifier` in `verify_server_cert`.
As far as I understand that value *should* be a client-set value not a
value we get over the network.

The downside of doing this is that we'll not be compatible with iroh
versions that don't use the `<base32 NodeId>.iroh.invalid` server names.

---

There's an unrelated change in here that's changing `make_client_config`
and `make_server_config` infallible, as was done in n0-computer#3161, which will
definitely also have merge conflicts with this PR.

## Breaking Changes

- `Router::spawn` is now a plain function instead of an `async fn`
- `Router::spawn` is now infallible, instead of returning
`anyhow::Result<()>`

## Change checklist
<!-- Remove any that are not relevant. -->
- [x] Self-review.
- [x] Documentation updates following the [style
guide](https://rust-lang.github.io/rfcs/1574-more-api-documentation-conventions.html#appendix-a-full-conventions-text),
if relevant.
- [x] All breaking changes documented.
- [x] List all breaking changes in the above "Breaking Changes" section.

---------

Co-authored-by: dignifiedquire <[email protected]>
Copy link
Member

@matheus23 matheus23 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Posting some of my review comments so far (quite a big PR).

Comment on lines +44 to +45
[patch.crates-io]
n0-snafu = { git = "https://github.com/n0-computer/n0-snafu", branch = "expand-test-error" }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to remember to replace this with a published version

pkarr = { version = "3.7", features = ["relays", "dht"], default-features = false }
rcgen = "0.13"
redb = "2.0.0"
redb = "=2.4.0" # 2.5.0 has MSRV 1.85
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
redb = "=2.4.0" # 2.5.0 has MSRV 1.85
redb = ">= 2.0.0, < 2.5" # 2.5.0 has MSRV 1.85

use clap::{Parser, ValueEnum};
use iroh::{
discovery::dns::{N0_DNS_NODE_ORIGIN_PROD, N0_DNS_NODE_ORIGIN_STAGING},
dns::DnsResolver,
NodeId,
};
use n0_snafu::{TestResult as Result, TestResultExt};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we shouldn't call it TestResult then 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I welcome better names

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about just Result? :D
Maybe ReportResult?

@@ -29,7 +28,7 @@ pub async fn run_with_config_until_ctrl_c(config: Config) -> Result<()> {
store = store.with_mainline_fallback(bootstrap);
};
let server = Server::spawn(config, store, metrics).await?;
tokio::signal::ctrl_c().await?;
tokio::signal::ctrl_c().await.context("ctrl-c")?;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These .contexts aren't super fun. Should we change them to .e()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: 🏗 In progress
Development

Successfully merging this pull request may close these issues.

switch to using custom error types, rather than generic anyhow::Error
5 participants