Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ci: move to Nix (simpler, easy to maintain version) #1320

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

storopoli
Copy link
Contributor

@storopoli storopoli commented Feb 4, 2024

Description

This is a simpler version of #1257.
Although not fully reproducible (we are fetching Cargo dependencies instead of declaring all of them).
It is WAY easier to maintain and provides most of the benefits and Dev-Experience from Nix.

You can switch flawlessly from Rust stable, MSRV, and nightly using either:

  • nix develop .: latest Rust stable
  • nix develop .#stable: latest Rust stable
  • nix develop .#msrv: Rust MRSV
  • nix develop .#nightly: Rust Nightly

It handles all the annoyances of Cargo.locks and cargo updates.

Also you don't need to worry about installing binaries from bitcoind or electrs crates,
since we fetch these from nixpkgs (with pinned versions) and add them to your local environment.
Additionally, all the extra dependencies that you need for WASM are handled as well.
Finally, no need to set ENV vars: they are also set all by default to you.

If you are committing typos, or failing to adhere to our guidelines in CONTRIBUTING.md;
we won't let you commit or push.
This is handled by Nix's automated pre-commit hooks which gives helpful error messages.

Please check the Nix.md file to learn more about Nix and what we are aiming to do with the Nix integration into BDK.

I am taking the liberty of tagging certain people that are Nix enthusiast to help the discussion and review.
Cc @notmandatory, @dpc, @casey-bowman, @plebhash, @matthiasdebernardini, @gudnuf, @oleonardolima.

Notes to the reviewers

  • Pre-commit checks for everything that we enforce:
    • Checks for typos in the source code and documentation.
    • Checks if all the commits are GPG-signed and follow the conventional commits style.
      (again, please check CONTRIBUTING.md)
    • Runs cargo clippy in all workspace.
    • Runs cargo fmt in all workspace.
    • Runs nixpgs-fmt in all workspace (for .nix files)
  • Instructions and rationale in Nix.md

Closes #1162. Superseds #1122, #1156, #1165, #1257.

Pinneds Dependencies:

Changelog notice

  • We moved to Nix-based CI and Local development environment to better onboard new developers and enhance the productivity of developers.

Checklists

All Submissions:

  • I've signed all my commits
  • I followed the contribution guidelines
  • I ran cargo fmt and cargo clippy before committing

New Features:

  • I've added tests for the new feature
  • I've added docs for the new feature

Bugfixes:

  • This pull request breaks the existing API
  • I've added tests to reproduce the issue which are now passing
  • I'm linking the issue being fixed by this PR

@storopoli storopoli force-pushed the js/nix-simple branch 7 times, most recently from ab132c9 to d139091 Compare February 4, 2024 18:50
@dpc
Copy link

dpc commented Feb 4, 2024

This can be fixed with -- --test-threads=2. Shall we add?

Why not increase bitcoind's rpc threads and work queue from the potato-like defaults instead?

@dpc
Copy link

dpc commented Feb 4, 2024

So just the dev shells, no crane etc.? Seems reasonable. Dev shells are like 80% of benefits for very little Nix involvment.

crates/bdk/Cargo.toml Outdated Show resolved Hide resolved
@storopoli
Copy link
Contributor Author

This can be fixed with -- --test-threads=2. Shall we add?

Why not increase bitcoind's rpc threads and work queue from the potato-like defaults instead?

Good call I did this in a1997cf

bitcoind_conf.args.push("-rpcworkqueue=1024");
bitcoind_conf.args.push("-rpcthreads=64");

@evanlinjin
Copy link
Member

This can be fixed with -- --test-threads=2. Shall we add?

Why not increase bitcoind's rpc threads and work queue from the potato-like defaults instead?

Good call I did this in a1997cf

bitcoind_conf.args.push("-rpcworkqueue=1024");
bitcoind_conf.args.push("-rpcthreads=64");

64 threads is way too much. I think the ideal is to have 1 thread per CPU core. I think a good compromise is 4 (between desktop and CI).

@storopoli
Copy link
Contributor Author

This can be fixed with -- --test-threads=2. Shall we add?

Why not increase bitcoind's rpc threads and work queue from the potato-like defaults instead?

Good call I did this in a1997cf

bitcoind_conf.args.push("-rpcworkqueue=1024");
bitcoind_conf.args.push("-rpcthreads=64");

64 threads is way too much. I think the ideal is to have 1 thread per CPU core. I think a good compromise is 4 (between desktop and CI).

4 is the default. So no change then? I can easily delete a1997cf (that's why I did it in a separate commit 😄)

@dpc
Copy link

dpc commented Feb 4, 2024

64 threads is way too much. I think the ideal is to have 1 thread per CPU core. I think a good compromise is 4 (between desktop and CI).

Its a io bound workload. Absent async capabilities default 4 is laughable even for one core and I don't even know why bitcoind is doing it.

@evanlinjin
Copy link
Member

64 threads is way too much. I think the ideal is to have 1 thread per CPU core. I think a good compromise is 4 (between desktop and CI).

Its a io bound workload. Absent async capabilities default 4 is laughable even for one core and I don't even know why bitcoind is doing it.

I didn't realize it was io-bounded. However, in our test environment, I doubt there will be 64 concurrent calls. How about we meet in the middle and say 16?

@dpc
Copy link

dpc commented Feb 5, 2024

I was on my phone before, so let me elaborate.

I didn't realize it was io-bounded. However, in our test environment, I doubt there will be 64 concurrent calls. How about we meet in the middle and say 16?

From a perspective of most workloads e.g. asking for blockchain data, bitcoind (and electrs) acts very much like database software, e.g. posgresql. Postgresql is an interesting case because it utilizes process-based model - meaning it will spawn not just a thread, but a whole process for every client connection - waaay heavier than just a thread.

Even for that heavy model, the recommendations will be to limit it around around 500.

It takes many threads to saturate modern IO device. Just an example from a yt video I had on hand benchamarking filesystems. Each bar is number of workers. And these are dedicated IO-generating workers, generating heavy load in a tight loop. So in this benchmark, it will take like 4 workers. Possibly one worker is even more threads. But an rpc thread in bitcoind by necessity will be spending lots of time reading things and writing stuff to a tcp connection, doing some checks, re-encodings and what not. To actually get a maximum IO performance there will need to be a lot of them. I'd say 64 per core.

On my mostly idle laptop at this very moment there are 2k threads running:

image

On a modern Linux a thread is just a stack, which due to CoW optimizations will really amount to 1 or 2 pages (4K page size) of physically used memory plus a tiny (1K?) amount to track it all in the kernel.

There's really not much to save on thread counts in a grand scheme of things.

Personally, I like to run as lots and lots of tests in parallel, especially any sort of integration and e2e tests, which tend to have lots of dead time - waiting for stuff (like disk and network IO - latency-bound things). The usual limit to this is the available memory. Additional benefit is also that it tends to expose problems more often, as all operations tend to have more ... timing jitter.

So I'm really surprised that the default in bitcoind is 4. That's too little even for an old generation raspberry pi.

CONTRIBUTING.md Show resolved Hide resolved
CONTRIBUTING.md Show resolved Hide resolved
NIX.md Outdated Show resolved Hide resolved
NIX.md Outdated Show resolved Hide resolved
@matthiasdebernardini
Copy link

Can you have it print at the beginning, "First time takes a good long while..." or something like that. Some parts seem so slow that you'd think the shell crashed or something like that. Would be good for the user to know this.

Also I am having 8 tests fail, is that just me?

running 8 tests
test mempool_re_emits_if_tx_introduction_height_not_reached ... FAILED
test mempool_avoids_re_emission ... FAILED
test no_agreement_point ... FAILED
test mempool_during_reorg ... FAILED
test tx_can_become_unconfirmed_after_reorg ... FAILED
test test_sync_local_chain ... FAILED
test ensure_block_emitted_after_reorg_is_at_reorg_height ... FAILED
test test_into_tx_graph ... FAILED

failures:

---- mempool_re_emits_if_tx_introduction_height_not_reached stdout ----
Error: JSON-RPC error: transport error: Couldn't connect to host: Resource temporarily unavailable (os error 35)

---- mempool_avoids_re_emission stdout ----
Error: JSON-RPC error: transport error: Couldn't connect to host: Resource temporarily unavailable (os error 35)

---- no_agreement_point stdout ----
Error: JSON-RPC error: transport error: Couldn't connect to host: Resource temporarily unavailable (os error 35)

---- mempool_during_reorg stdout ----
Error: JSON-RPC error: transport error: Couldn't connect to host: Resource temporarily unavailable (os error 35)

---- tx_can_become_unconfirmed_after_reorg stdout ----
Error: JSON-RPC error: transport error: Couldn't connect to host: Resource temporarily unavailable (os error 35)

---- test_sync_local_chain stdout ----
Error: JSON-RPC error: transport error: Couldn't connect to host: Resource temporarily unavailable (os error 35)

---- ensure_block_emitted_after_reorg_is_at_reorg_height stdout ----
Error: JSON-RPC error: transport error: Couldn't connect to host: Resource temporarily unavailable (os error 35)

---- test_into_tx_graph stdout ----
getting new addresses!
got new addresses!
mining block!
Error: JSON-RPC error: transport error: Couldn't connect to host: Resource temporarily unavailable (os error 35)


failures:
    ensure_block_emitted_after_reorg_is_at_reorg_height
    mempool_avoids_re_emission
    mempool_during_reorg
    mempool_re_emits_if_tx_introduction_height_not_reached
    no_agreement_point
    test_into_tx_graph
    test_sync_local_chain
    tx_can_become_unconfirmed_after_reorg

test result: FAILED. 0 passed; 8 failed; 0 ignored; 0 measured; 0 filtered out; finished in 24.35s

error: test failed, to rerun pass `-p bdk_bitcoind_rpc --test test_emitter`
(nix:nix-shell-env) matthiasdebernardini@Matthiass-MacBook-Pro-2:~/Projects/bdk_nix$

@storopoli

This comment was marked as resolved.

flake.nix Outdated Show resolved Hide resolved
flake.nix Outdated Show resolved Hide resolved
@matthiasdebernardini
Copy link

I was finally able to get all the tests to pass using this command

nix develop -L .#msrv --command cargo test --all-features -- --test-threads=2

is there anything else to look after?

@storopoli
Copy link
Contributor Author

is there anything else to look after?

Try without the --test-threads=2 please?

@matthiasdebernardini
Copy link

Could not get it to work, which is why I added the test threads argument.

➜  bdk git:(js/nix-simple) nix develop -L .#msrv --command cargo test --all-features
warning: input 'rust-overlay' has an override for a non-existent input 'flake-utils'
    Finished test [unoptimized + debuginfo] target(s) in 0.40s
warning: the following packages contain code that will be rejected by a future version of Rust: libc v0.2.155
note: to see what the problems were, use the option `--future-incompat-report`, or run `cargo report future-incompatibilities --id 7`
     Running unittests src/lib.rs (target/debug/deps/bdk_bitcoind_rpc-7626c46bad197f9a)

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

     Running tests/test_emitter.rs (target/debug/deps/test_emitter-dc28ba34d3b67736)

running 8 tests
Error: JSON-RPC error: transport error: Couldn't connect to host: Resource temporarily unavailable (os error 35)
test mempool_during_reorg ... FAILED
Error: JSON-RPC error: transport error: Couldn't connect to host: Resource temporarily unavailable (os error 35)
test mempool_avoids_re_emission ... FAILED
Error: JSON-RPC error: transport error: Couldn't connect to host: Resource temporarily unavailable (os error 35)
test no_agreement_point ... FAILED
Error: JSON-RPC error: transport error: Couldn't connect to host: Resource temporarily unavailable (os error 35)
test mempool_re_emits_if_tx_introduction_height_not_reached ... FAILED
Error: JSON-RPC error: transport error: Couldn't connect to host: Resource temporarily unavailable (os error 35)
test test_sync_local_chain ... FAILED
Error: JSON-RPC error: transport error: Couldn't connect to host: Resource temporarily unavailable (os error 35)
test tx_can_become_unconfirmed_after_reorg ... FAILED
Error: JSON-RPC error: transport error: Couldn't connect to host: Resource temporarily unavailable (os error 35)
test ensure_block_emitted_after_reorg_is_at_reorg_height ... FAILED
Error: JSON-RPC error: transport error: Couldn't connect to host: Resource temporarily unavailable (os error 35)
test test_into_tx_graph ... FAILED

failures:

---- mempool_during_reorg stdout ----
thread 'mempool_during_reorg' panicked at 'assertion failed: `(left == right)`
  left: `1`,
 right: `0`: the test returned a termination value with a non-zero status code (1) which indicates a failure', /rustc/4b91a6ea7258a947e59c6522cd5898e7c0a6a88f/library/test/src/lib.rs:184:5
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

---- mempool_avoids_re_emission stdout ----
thread 'mempool_avoids_re_emission' panicked at 'assertion failed: `(left == right)`
  left: `1`,
 right: `0`: the test returned a termination value with a non-zero status code (1) which indicates a failure', /rustc/4b91a6ea7258a947e59c6522cd5898e7c0a6a88f/library/test/src/lib.rs:184:5

---- no_agreement_point stdout ----
thread 'no_agreement_point' panicked at 'assertion failed: `(left == right)`
  left: `1`,
 right: `0`: the test returned a termination value with a non-zero status code (1) which indicates a failure', /rustc/4b91a6ea7258a947e59c6522cd5898e7c0a6a88f/library/test/src/lib.rs:184:5

---- mempool_re_emits_if_tx_introduction_height_not_reached stdout ----
thread 'mempool_re_emits_if_tx_introduction_height_not_reached' panicked at 'assertion failed: `(left == right)`
  left: `1`,
 right: `0`: the test returned a termination value with a non-zero status code (1) which indicates a failure', /rustc/4b91a6ea7258a947e59c6522cd5898e7c0a6a88f/library/test/src/lib.rs:184:5

---- test_sync_local_chain stdout ----
thread 'test_sync_local_chain' panicked at 'assertion failed: `(left == right)`
  left: `1`,
 right: `0`: the test returned a termination value with a non-zero status code (1) which indicates a failure', /rustc/4b91a6ea7258a947e59c6522cd5898e7c0a6a88f/library/test/src/lib.rs:184:5

---- tx_can_become_unconfirmed_after_reorg stdout ----
thread 'tx_can_become_unconfirmed_after_reorg' panicked at 'assertion failed: `(left == right)`
  left: `1`,
 right: `0`: the test returned a termination value with a non-zero status code (1) which indicates a failure', /rustc/4b91a6ea7258a947e59c6522cd5898e7c0a6a88f/library/test/src/lib.rs:184:5

---- ensure_block_emitted_after_reorg_is_at_reorg_height stdout ----
thread 'ensure_block_emitted_after_reorg_is_at_reorg_height' panicked at 'assertion failed: `(left == right)`
  left: `1`,
 right: `0`: the test returned a termination value with a non-zero status code (1) which indicates a failure', /rustc/4b91a6ea7258a947e59c6522cd5898e7c0a6a88f/library/test/src/lib.rs:184:5

---- test_into_tx_graph stdout ----
getting new addresses!
got new addresses!
mining block!
thread 'test_into_tx_graph' panicked at 'assertion failed: `(left == right)`
  left: `1`,
 right: `0`: the test returned a termination value with a non-zero status code (1) which indicates a failure', /rustc/4b91a6ea7258a947e59c6522cd5898e7c0a6a88f/library/test/src/lib.rs:184:5


failures:
    ensure_block_emitted_after_reorg_is_at_reorg_height
    mempool_avoids_re_emission
    mempool_during_reorg
    mempool_re_emits_if_tx_introduction_height_not_reached
    no_agreement_point
    test_into_tx_graph
    test_sync_local_chain
    tx_can_become_unconfirmed_after_reorg

test result: FAILED. 0 passed; 8 failed; 0 ignored; 0 measured; 0 filtered out; finished in 37.59s

error: test failed, to rerun pass '-p bdk_bitcoind_rpc --test test_emitter'

@storopoli
Copy link
Contributor Author

Ok, that's strange. Maybe a MacOS thing. I have 12 cores in my laptop and 32 in my desktop andi don't get these errors.

@dpc
Copy link

dpc commented Aug 6, 2024

Error: JSON-RPC error: transport error: Couldn't connect to host: Resource temporarily unavailable (os error 35)

Possibly too low ulimit. See https://github.com/fedimint/fedimint/blob/d7135bce5bda0e623c84d89736d363961df7663e/flake.nix#L314

@matthiasdebernardini
Copy link

I'm on the latest macOS with a high core count macbook.

I ran this command

➜  bdk git:(js/nix-simple) ulimit -Sn 1024

and got the same error

@oleonardolima
Copy link
Contributor

I tried it on my side too and got the same issue now.
I would guess it's how the tests are structured.

@dpc
Copy link

dpc commented Aug 7, 2024

Try setting the ulimit to higher numbers. 1024 is still quite small for a lot of things running concurrently.

@dpc
Copy link

dpc commented Aug 7, 2024

Also, if you are using single bitcoind instance for all tests, you might want to increase the queue size and rpc thread count on it. Otherwise bitcoind randomly rejects connections, even when there isn't all that many of them yet. Though to that I'd add that the production code should probably do some retries as well.

Anyway, I'm guessing. Unfortunately non trivial tests that involve lots of different software pieces etc. often require some debugging and refactoring/tweaking to work reliably.

ordinals/ord#3364 (comment)

https://github.com/fedimint/fedimint/blob/1ca5227e0ceb8c4f3b0ca122364653e0f017a38c/devimint/src/cfg/bitcoin.conf#L9

LagginTimes pushed a commit to LagginTimes/bdk that referenced this pull request Aug 8, 2024
8bf8c7d chore: fix clippy lints (Jose Storopoli)

Pull request description:

  ### Description

  Fix some clippy CI lints

  ### Notes to the reviewers

  More caught by the Nix CI in bitcoindevkit#1320.

  ### Changelog notice

  chore: clippy lints

  ### Checklists

  #### All Submissions:

  * [x] I've signed all my commits
  * [x] I followed the [contribution guidelines](https://github.com/bitcoindevkit/bdk/blob/master/CONTRIBUTING.md)
  * [x] I ran `cargo fmt` and `cargo clippy` before committing

  #### New Features:

  * [ ] I've added tests for the new feature
  * [ ] I've added docs for the new feature

  #### Bugfixes:

  * [ ] This pull request breaks the existing API
  * [ ] I've added tests to reproduce the issue which are now passing
  * [ ] I'm linking the issue being fixed by this PR

ACKs for top commit:
  notmandatory:
    ACK 8bf8c7d

Tree-SHA512: 6b53cb739e506d79106a2f42aa2b8fa28ef226543fbbf100225f10ed82257f6fd0218f09c0f1291803fbc9c1c88e32ba1c7e02fe3f601ccc9dc5be8a6efea6e1
@oleonardolima
Copy link
Contributor

Try setting the ulimit to higher numbers. 1024 is still quite small for a lot of things running concurrently.

I tested with ulimit size up to 16384, and it still fails. I would guess it's something other than the ulimit.

Also, if you are using single bitcoind instance for all tests, you might want to increase the queue size and rpc thread count on it. Otherwise bitcoind randomly rejects connections, even when there isn't all that many of them yet. Though to that I'd add that the production code should probably do some retries as well.

I did a little bit of it and further testing, the main problem occurs in the bdk_testenv::TestEnv::mine_blocks, it's the same lineno failure for all the tests. Specifically when it's trying to first mine some blocks, and calling the RPC command generatetoaddress.

It's really weird, because I checked it and it's running the bitcoind on different ports, datadirs, and such 🤔.

https://github.com/fedimint/fedimint/blob/1ca5227e0ceb8c4f3b0ca122364653e0f017a38c/devimint/src/cfg/bitcoin.conf#L9

I think we should give it a try.

@oleonardolima
Copy link
Contributor

https://github.com/fedimint/fedimint/blob/1ca5227e0ceb8c4f3b0ca122364653e0f017a38c/devimint/src/cfg/bitcoin.conf#L9

I think we should give it a try.

FWIW, I'm not sure if I did something wrong, but it still didn't solve the issue 🤔

@storopoli
Copy link
Contributor Author

storopoli commented Sep 10, 2024

rebased to latest master.
Added the new checks in rust stable for the example-crates.
Once #1599 is merged I can delete the 3507055, since it is the same commit

notmandatory added a commit that referenced this pull request Sep 11, 2024
bc83e41 fix: typos (Jose Storopoli)

Pull request description:

  <!-- You can erase any parts of this template not applicable to your Pull Request. -->

  ### Description

  more typos caught up during rebasing of #1320.

  ### Notes to the reviewers

  ### Changelog notice

  <!-- Notice the release manager should include in the release tag message changelog -->
  <!-- See https://keepachangelog.com/en/1.0.0/ for examples -->

  ### Checklists

  #### All Submissions:

  * [x] I've signed all my commits
  * [x] I followed the [contribution guidelines](https://github.com/bitcoindevkit/bdk/blob/master/CONTRIBUTING.md)
  * [x] I ran `cargo fmt` and `cargo clippy` before committing

  #### New Features:

  * [ ] I've added tests for the new feature
  * [ ] I've added docs for the new feature

  #### Bugfixes:

  * [ ] This pull request breaks the existing API
  * [ ] I've added tests to reproduce the issue which are now passing
  * [ ] I'm linking the issue being fixed by this PR

ACKs for top commit:
  LagginTimes:
    ACK bc83e41
  oleonardolima:
    ACK bc83e41
  thunderbiscuit:
    ACK bc83e41.

Tree-SHA512: 1dc76d64ddb0273c60899a6617dcbc63fab75057447080260b7050bb5d178ef56c6f79f2c7f1ca5b7b4eb09e2b1e67d782be91598ab58e6a3b3179d6b5edf5c5
@storopoli
Copy link
Contributor Author

rebased to master after #1599 was merged

@thesimplekid
Copy link

I stole this for cdk just fyi been working well. https://github.com/cashubtc/cdk/blob/main/flake.nix

@storopoli
Copy link
Contributor Author

rebased master after #1561 was merged

Copy link
Member

@notmandatory notmandatory left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this looks like a great use of Nix to setup a local dev env use the pre-checks to prevent common commit problems.

I would like to figure out the macos test threads issue, but that can be done in a follow-up PR. I also have a few other comments below.

Thanks all the hard work getting this to work.

Nix.md Outdated Show resolved Hide resolved
.github/workflows/cont_integration.yml Show resolved Hide resolved
flake.nix Outdated Show resolved Hide resolved
.github/workflows/update_flake_lock.yml Show resolved Hide resolved
Nix.md Outdated Show resolved Hide resolved
README.md Show resolved Hide resolved
justfile Outdated Show resolved Hide resolved
@storopoli
Copy link
Contributor Author

Thanks all the hard work getting this to work.

My pleasure.

Ok the trick that @thesimplekid has that does not need a Cargo.lock seems to work. Thanks! (I did some extra stuff that calls cargo update in the non-MSRV shellHooks)
I'll squash everything into a single commit and ask for a new round of reviews.

PS: Since we are changing the GitHub CI workflow names, we might need to update the branch protection rules for master after this gets merged.

You can switch flawlessly from Rust stable, MSRV, and nightly using
either:

- `nix develop .`: Rust MRSV
- `nix develop .#msrv`: Rust MRSV
- `nix develop .#stable`: Rust latest stable
- `nix develop .#nightly`: Rust Nightly

Also you don't need to worry about installing binaries from `bitcoind`
or `electrs` crates,
since we fetch these from `nixpkgs` (with pinned versions) and add them
to your local environment.
Additionally, all the extra dependencies that you need for WASM are
handled as well.
Finally, no need to set ENV vars: they are also set all by default to
you.

If you are committing typos, or failing to adhere to our guidelines in
[CONTRIBUTING.md](CONTRIBUTING.md);
we won't let you commit or push.
This is handled by Nix's automated `pre-commit` hooks which gives
helpful error messages.

Please check the [Nix.md](Nix.md) file to learn more about Nix and what
we are aiming to do with the Nix integration into BDK.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: Needs Review
Development

Successfully merging this pull request may close these issues.

ci: workflows failing due to "Connection timed out" errors
10 participants