Bluetooth: Host: Move tx_processor to bt_taskq #97288

alwa-nordic · 2025-10-09T14:53:30Z

This PR moves tx_processor off the system workqueue to a dedicated workqueue to prevent deadlocks. See commits for details.

Core changes

New bt_taskq workqueue - "for quick non-blocking Bluetooth tasks"
Move tx_processor to bt_taskq - Now it's safer to block on system work queue (which the Host does unfortunately)

Fallout fixes

Defer ATT user cb - User callbacks like write_cmd_cb stay in system work queue
Grab some RAM - BT_MAX_CONN reduced from 62 to 61 in peripheral_identity sample to fit bt_taskq

Cleanups

Fewer workarounds - bt_cmd_send_sync workaround disabled when tx_processor uses dedicated thread

ATT is invoking user callbacks in its net_buf destroy function. It is common practice that these callbacks can block on bt_hci_cmd_alloc(). This is a deadlock when the net_buf_unref() happens inside the HCI driver, invoked from tx_processor. Blocking callbacks like this appear in our own samples. See further down about how this problem was detected. tx_processor not protect against blocking callbacks so it is de-facto forbidden. The Host should not equip net_bufs with dangerous destroy callbacks. This commit makes ATT defer its net_buf destruction and user callback invocation to the system workqueue, so that net_buf_unref is safe to call from non-blocking threads. In the case of the deadlock, the net_buf_unref() was below the tx_processor in the call stack, which (at the time of this commit) is on the system work queue, so defering it to the system work queue is preserving the existing behavior. Future improvement may be to allow the user to provide their own workqueue for ATT callbacks. This deadlock was detected because the following test was failing while moving tx_processor to the bt_taskq: tests/bsim/bluetooth/ll/throughput/tests_scripts/gatt_write.sh The above test has an ATT callback `write_cmd_cb` invokes `bt_conn_le_param_update` can block waiting for `tx_processor`. The reason it was not failing while tx_processor was on the system work queue is that the GATT API has a special non-blocking behavior when called from the system work queue. Signed-off-by: Aleksander Wasaznik <[email protected]>

Reduce BT_MAX_CONN from 62 to 61 to make it build on integration platform qemu_cortex_m3/ti_lm3s6965 when we add bt_taskq in subsequent commit. Signed-off-by: Aleksander Wasaznik <[email protected]>

Add a new workqueue bt_taskq specifically designed for quick non-blocking work items in the Bluetooth subsystem. Signed-off-by: Aleksander Wasaznik <[email protected]>

It's not safe for the tx_processor to share the system workqueue with work items that block the thread until tx_processor runs. This is a deadlock. The Bluetooth Host itself performs these operations, usually involving bt_hci_cmd_alloc(), on the system workqueue. This change effectively gives tx_processor its own thread, like the BT TX thread that used to exist. But, this time the thread is intended to be shared with any other non-blocking Bluetooth Host tasks. The bt_taskq rules tx_processor is supposed to be non-blocking and only have code under our control on the thread stack. Unfortunately, this is not entirely true currently. But we consider it close enough for now and will ensure it starts adhering to the rules in the future. Examples of problems: - The tx_processor invokes bt_hci_send(), driver code which has no rules limiting what it can do on our thread. - The tx_processor invokes net_buf_unref() on stack-external net_buf which executes user code on our thread. Signed-off-by: Aleksander Wasaznik <[email protected]>

The workaround in bt_cmd_send_sync is no longer needed when tx_processor runs on a dedicated bt_taskq and not on system workqueue. But for defensive programming, we keep the workaround in place and log a warning if it's triggered. If CONFIG_TEST is enabled, we panic instead. Signed-off-by: Aleksander Wasaznik <[email protected]>

sonarqubecloud · 2025-10-14T11:37:31Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

alwa-nordic · 2025-10-14T13:52:40Z

Future work: This should make CONFIG_BT_RECV_WORKQ_SYS=y less problematic, since tx_processor is no longer blocked by blocking work on the system work queue. Should we update our opinion about its safety? Is it safer to enable BT_RECV_WORKQ_SYS or BT_TASKQ_SYSTEM_WORKQUEUE if you have to choose? The stack size for dedicated bt_workq is smaller.

PavelVPV

I don't think it is correct to run TX processor on a generic workqueue (even if it is Bluetooth Host specific). As soon as the generic workq is used for command sending, a deadlock will occur.

Also, currently there is no use-case for bt_taskq. I don't think introducing it now is right.

alwa-nordic · 2025-10-14T13:53:41Z

As soon as the generic workq is used for command sending

That's not allowed.

alwa-nordic · 2025-10-14T13:55:13Z

Also, currently there is no use-case for bt_taskq. I don't think introducing it now is right.

Running tx_processor?

PavelVPV · 2025-10-14T13:56:48Z

I mean, the it is defined is how it can be used. It is quite hard in the code to track which API may eventually allocate or send command to Controller (which I guess is the main work for Host).

TX processor needs its own thread for now. This solves the exact problem. I don't see a problem that BT task solves currently.

alwa-nordic · 2025-10-14T14:20:45Z

It's not a generic work queue. The reason I define a bt_taskq is to establish that blocking on this thread is an error. That includes any blocking in tx_processor. Maybe we will find errors in tx_processor, but then we know we have to fix them.

This is in contrast to simply a thread dedicated for tx_processor. Then it's not an error to block in tx_processor.

We are concerned about RAM usage. We simply can't afford many threads. I want us to have a ready-to-use place for non-blocking tasks. It makes adding more tasks later easy and rewards this. Finding RAM for taskq was hard enough. Adding more threads later will be even harder. I really don't want tx_processor to need its own thread.

It is quite hard in the code to track which API may eventually allocate or send command to Controller (which I guess is the main work for Host).

Yeah. Writing good code is hard. We will have to maintain discipline with contracts on the taskq.

TX processor needs its own thread for now.

Does it? Why? Let's fix that!

This solves the exact problem. I don't see a problem that BT task solves currently.

What is 'this'?

jhedberg · 2025-10-14T15:02:26Z

I need to put some time aside to do a proper review, however initial question is that is this complementary to #93033 or an alternate approach for the same issue?

PavelVPV · 2025-10-14T15:37:08Z

It's not a generic work queue. The reason I define a bt_taskq is to establish that blocking on this thread is an error. That includes any blocking in tx_processor. Maybe we will find errors in tx_processor, but then we know we have to fix them.

This will end up in a situation where blocking call allocating a command buffer 99% of time doesn't block the thread, and in 1% blocks, thus changing application behavior.

This is in contrast to simply a thread dedicated for tx_processor. Then it's not an error to block in tx_processor.

This is fine, still, the thread should exclusively be used for tx processor for now. Later this can be changed, but now there's nothing that requires this.

We are concerned about RAM usage. We simply can't afford many threads. I want us to have a ready-to-use place for non-blocking tasks. It makes adding more tasks later easy and rewards this. Finding RAM for taskq was hard enough. Adding more threads later will be even harder. I really don't want tx_processor to need its own thread.

It doesn't change anything. This is now a new thread used by tx processor. Nothing else is using it. It is obvious that it will require memory. But now thread analyzer needs to be run to check how much is freed on sysworkq.

It is quite hard in the code to track which API may eventually allocate or send command to Controller (which I guess is the main work for Host).

Yeah. Writing good code is hard. We will have to maintain discipline with contracts on the taskq.

Sure, but how are you going to ensure this if even the bug that was triggered this change was hiding since removing of tx processor thread?

TX processor needs its own thread for now.

Does it? Why? Let's fix that!

I mean, this entire task is driven by the deadlock As we discuss in the team not a long time ago. I will remind you ticket in PM.

This solves the exact problem. I don't see a problem that BT task solves currently.

What is 'this'?

This -> a dedicated thread for tx processor.

alwa-nordic force-pushed the bt-taskq branch 7 times, most recently from a9fc732 to 03c365b Compare October 10, 2025 16:56

This comment was marked as outdated.

Sign in to view

alwa-nordic changed the title ~~Bt taskq~~ Bluetooth: Host: Move tx_processor to bt_taskq Oct 10, 2025

alwa-nordic force-pushed the bt-taskq branch 6 times, most recently from a16c556 to 38bd9d0 Compare October 14, 2025 07:59

alwa-nordic added 5 commits October 14, 2025 13:13

Bluetooth: Samples: Reduce RAM requirement of peripheral_identity

439b079

Reduce BT_MAX_CONN from 62 to 61 to make it build on integration platform qemu_cortex_m3/ti_lm3s6965 when we add bt_taskq in subsequent commit. Signed-off-by: Aleksander Wasaznik <[email protected]>

Bluetooth: Host: Add bt_taskq workqueue for quick non-blocking tasks

39ff09e

Add a new workqueue bt_taskq specifically designed for quick non-blocking work items in the Bluetooth subsystem. Signed-off-by: Aleksander Wasaznik <[email protected]>

alwa-nordic force-pushed the bt-taskq branch from 38bd9d0 to 93c2d8d Compare October 14, 2025 11:17

alwa-nordic marked this pull request as ready for review October 14, 2025 13:33

zephyrbot added area: Bluetooth Host Bluetooth Host (excluding BR/EDR) area: Samples Samples area: Bluetooth labels Oct 14, 2025

zephyrbot requested review from HaavardRei, JarmouniA, cvinayak and hermabe October 14, 2025 13:36

zephyrbot requested review from PavelVPV, Thalley, jhedberg, kartben, nashif, rugeGerritsen and sjanc October 14, 2025 13:36

zephyrbot assigned jhedberg and alwa-nordic Oct 14, 2025

PavelVPV requested changes Oct 14, 2025

View reviewed changes

alwa-nordic added the Bluetooth Review Discussion in the Bluetooth WG meeting required label Oct 14, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Bluetooth: Host: Move tx_processor to bt_taskq #97288

Bluetooth: Host: Move tx_processor to bt_taskq #97288

alwa-nordic commented Oct 9, 2025 •

edited

Loading

Uh oh!

This comment was marked as outdated.

sonarqubecloud bot commented Oct 14, 2025

Uh oh!

alwa-nordic commented Oct 14, 2025

Uh oh!

PavelVPV left a comment

Uh oh!

alwa-nordic commented Oct 14, 2025

Uh oh!

alwa-nordic commented Oct 14, 2025

Uh oh!

PavelVPV commented Oct 14, 2025

Uh oh!

alwa-nordic commented Oct 14, 2025 •

edited

Loading

Uh oh!

jhedberg commented Oct 14, 2025

Uh oh!

PavelVPV commented Oct 14, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Bluetooth: Host: Move tx_processor to bt_taskq #97288

Are you sure you want to change the base?

Bluetooth: Host: Move tx_processor to bt_taskq #97288

Conversation

alwa-nordic commented Oct 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Core changes

Fallout fixes

Cleanups

Uh oh!

This comment was marked as outdated.

sonarqubecloud bot commented Oct 14, 2025

Quality Gate passed

Uh oh!

alwa-nordic commented Oct 14, 2025

Uh oh!

PavelVPV left a comment

Choose a reason for hiding this comment

Uh oh!

alwa-nordic commented Oct 14, 2025

Uh oh!

alwa-nordic commented Oct 14, 2025

Uh oh!

PavelVPV commented Oct 14, 2025

Uh oh!

alwa-nordic commented Oct 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jhedberg commented Oct 14, 2025

Uh oh!

PavelVPV commented Oct 14, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

alwa-nordic commented Oct 9, 2025 •

edited

Loading

alwa-nordic commented Oct 14, 2025 •

edited

Loading