Skip to content

feat!: replace Pyro4/ZMQ transport with gRPC#569

Merged
chombourger merged 9 commits intosiemens:nextfrom
chombourger:grpc
Apr 2, 2026
Merged

feat!: replace Pyro4/ZMQ transport with gRPC#569
chombourger merged 9 commits intosiemens:nextfrom
chombourger:grpc

Conversation

@chombourger
Copy link
Copy Markdown
Collaborator

Replace the three-port Pyro4+ZMQ architecture with a single-port gRPC service (default port 5556):

  • Add mtda/grpc/ package: mtda.proto defines all RPCs including a server-streaming Subscribe RPC (replaces ZMQ PUB/SUB console/event stream) and a client-streaming StorageWrite RPC (replaces ZMQ PUSH/PULL storage data path).
  • Add QueueDataStream in mtda/storage/datastream.py; remove dead NetworkDataStream and its zmq dependency.
  • Rewrite mtda/console/remote.py: RemoteConsole and RemoteMonitor now subscribe via gRPC instead of ZMQ SUB sockets; topic bytes normalisation ensures correct dispatch.
  • Fix ConsoleLogger._print(): always call publish() regardless of socket state so console data reaches subscriber queues.
  • Rewrite Client in mtda/client.py with _GrpcImpl and storage socket adapters; extract session-name generation into a static method (ASCII-only word list prevents invalid gRPC metadata); remove zmq.
  • Fix mtda-www WebConsole: normalise topic bytes, add reconnect loop, create reader tasks after mtda_start().
  • Remove python3-zmq from mtda-service in debian/control; grpcio already present in mtda-common covers all packages transitively.
  • Update docs: remove obsolete console/data port entries from [remote] config docs; correct default www port (9080 -> 5000).

BREAKING CHANGE: the console (5557) and data (5558) ports no longer exist; all traffic flows over the single control port (default 5556).

Closes: #524

@fmoessbauer
Copy link
Copy Markdown
Member

Wow. That's amazing. I'll give it a try and report back.

@chombourger
Copy link
Copy Markdown
Collaborator Author

Wow. That's amazing. I'll give it a try and report back.

mostly tested with mtda-kvm. I have not flashed on the Pi4 yet

@fmoessbauer
Copy link
Copy Markdown
Member

I just gave this a try on a rpi4 (but the results should apply to all targets):

  • the console sometimes does not handle special characters (like backspace) correctly (^H instead of backspace).
  • storage operations work flawlessly (including bmap file)
  • power on / works

@chombourger chombourger requested a review from fmoessbauer March 17, 2026 16:03
@chombourger chombourger force-pushed the grpc branch 2 times, most recently from bffd5b7 to a6f64aa Compare March 20, 2026 11:37
@chombourger
Copy link
Copy Markdown
Collaborator Author

@chombourger With the latest changes, I get the following error on startup

-- Boot 86d661890f3c4682a4906a5503c5033a --
Mar 20 08:51:09 systemd[1]: Starting mtda.service - Multi-Tenant Device Access...
Mar 20 08:51:12 mtda-service[535]: Traceback (most recent call last):
Mar 20 08:51:12 mtda-service[535]:   File "/usr/sbin/mtda-service", line 31, in <module>
Mar 20 08:51:12 mtda-service[535]:     from mtda.grpc import mtda_pb2_grpc
Mar 20 08:51:12 mtda-service[535]:   File "/usr/lib/python3.13/dist-packages/mtda/grpc/mtda_pb2_grpc.py", line 6, in <module>
Mar 20 08:51:12 mtda-service[535]:     from mtda.grpc import mtda_pb2 as mtda__pb2
Mar 20 08:51:12 mtda-service[535]:   File "/usr/lib/python3.13/dist-packages/mtda/grpc/mtda_pb2.py", line 9, in <module>
Mar 20 08:51:12 mtda-service[535]:     from google.protobuf import runtime_version as _runtime_version
Mar 20 08:51:12 mtda-service[535]: ImportError: cannot import name 'runtime_version' from 'google.protobuf' (/usr/lib/python3/dist-packages/google/protobuf/__init__.py)
Mar 20 08:51:12 systemd[1]: mtda.service: Main process exited, code=exited, status=1/FAILURE
Mar 20 08:51:12 systemd[1]: mtda.service: Failed with result 'exit-code'.

Update: Actually I'm wondering if this is a bug in the debian grpc package.

Have not yet seen that one!

@chombourger chombourger added check and removed check labels Mar 20, 2026
Signed-off-by: Cedric Hombourger <cedric.hombourger@siemens.com>
Signed-off-by: Cedric Hombourger <cedric.hombourger@siemens.com>
Condition.wait_for() re-acquires the underlying lock before returning
whether the predicate matched or the timeout expired. The previous code
only called rx_lock.release() on the True (match) branch, so a timeout
left rx_lock permanently held. Every subsequent console operation
(run, wait, head, tail, clear, dump) would then block forever, and the
process_rx() reader thread would also stall, making recovery impossible.

Wrap the wait_for() call in a try/finally so rx_lock is always released.

Signed-off-by: Cedric Hombourger <cedric.hombourger@siemens.com>
run() acquired rx_lock and then called rx_cond.wait_for() twice with no
timeout argument, meaning an unresponsive DUT would hold rx_lock forever.
Since process_rx() must also acquire rx_lock to deliver new data, no new
notifications could ever arrive to satisfy the condition — a
self-reinforcing deadlock with no escape.

Add a timeout parameter (default 30 s) to both wait_for() calls and
return None on expiry. Wrap the entire body in try/finally so rx_lock is
released even if write() or any other call raises an exception.

Signed-off-by: Cedric Hombourger <cedric.hombourger@siemens.com>
Use 'or' instead of 'and' to skip non-STORAGE-WRITING events.
With 'and', events like 'FOO WRITING' or 'STORAGE FOO' would
pass the guard and reach app.imgname. Add a None guard to
safely ignore events when no write operation is in progress.

Signed-off-by: Cedric Hombourger <cedric.hombourger@siemens.com>
imgname was set only inside storage_write/storage_update,
so console_rx threads receiving STORAGE WRITING events in
other contexts raised AttributeError. Initialize it to None
in __init__ to guarantee the attribute always exists.

Signed-off-by: Cedric Hombourger <cedric.hombourger@siemens.com>
endswith() fails when the received data contains extra
characters after the prompt (e.g. trailing spaces or CR).
Use 'in' to match the prompt anywhere in the received line.

Signed-off-by: Cedric Hombourger <cedric.hombourger@siemens.com>
@chombourger chombourger added check and removed check labels Mar 20, 2026
@fmoessbauer
Copy link
Copy Markdown
Member

fmoessbauer commented Mar 20, 2026

@chombourger With the latest changes, I get the following error on startup

-- Boot 86d661890f3c4682a4906a5503c5033a --
Mar 20 08:51:09 systemd[1]: Starting mtda.service - Multi-Tenant Device Access...
Mar 20 08:51:12 mtda-service[535]: Traceback (most recent call last):
Mar 20 08:51:12 mtda-service[535]:   File "/usr/sbin/mtda-service", line 31, in <module>
Mar 20 08:51:12 mtda-service[535]:     from mtda.grpc import mtda_pb2_grpc
Mar 20 08:51:12 mtda-service[535]:   File "/usr/lib/python3.13/dist-packages/mtda/grpc/mtda_pb2_grpc.py", line 6, in <module>
Mar 20 08:51:12 mtda-service[535]:     from mtda.grpc import mtda_pb2 as mtda__pb2
Mar 20 08:51:12 mtda-service[535]:   File "/usr/lib/python3.13/dist-packages/mtda/grpc/mtda_pb2.py", line 9, in <module>
Mar 20 08:51:12 mtda-service[535]:     from google.protobuf import runtime_version as _runtime_version
Mar 20 08:51:12 mtda-service[535]: ImportError: cannot import name 'runtime_version' from 'google.protobuf' (/usr/lib/python3/dist-packages/google/protobuf/__init__.py)
Mar 20 08:51:12 systemd[1]: mtda.service: Main process exited, code=exited, status=1/FAILURE
Mar 20 08:51:12 systemd[1]: mtda.service: Failed with result 'exit-code'.

Update: Actually I'm wondering if this is a bug in the debian grpc package.

Have not yet seen that one!

Here's a patch to cherry pick which solves this issue:

From 1afe6e297f010ec06cad720c237c06a00897a7b3 Mon Sep 17 00:00:00 2001
From: Felix Moessbauer <felix.moessbauer@siemens.com>
Date: Fri, 20 Mar 2026 14:37:44 +0100
Subject: [PATCH 1/1] chore(mtda-isar): delete auto-generated pb files

These files must be re-generated for the target set of protobuf
dependencies (the ones in the image, not on the build system).
By deleting them prior to the build, we ensure that they are
regenerated.

Signed-off-by: Felix Moessbauer <felix.moessbauer@siemens.com>
---
 meta-isar/recipes-python/mtda/mtda_git.bb | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/meta-isar/recipes-python/mtda/mtda_git.bb b/meta-isar/recipes-python/mtda/mtda_git.bb
index dc0b4db2..31dd8644 100644
--- a/meta-isar/recipes-python/mtda/mtda_git.bb
+++ b/meta-isar/recipes-python/mtda/mtda_git.bb
@@ -62,6 +62,8 @@ do_gen_working_repo() {
 	for file in ${MTDA_FILES}; do
 		cp -a ${LAYERDIR_mtda}/../$file ${S}/
 	done
+	# delete auto-generated files as they need to be re-generated on building
+	find ${S} -name 'mtda_pb2*.py' -delete
 	rm -f ${S}/debian/source/format
 }
 do_gen_working_repo[cleandirs] += "${S}"
-- 
2.53.0

update it still does not reliably solve the issue because the files are still copied in. Let me rework the patch.

update 2 I updated the patch to make it more robust. Now it reliably works.

@chombourger
Copy link
Copy Markdown
Collaborator Author

  • the console sometimes does not handle special characters (like backspace) correctly (^H instead of backspace).

input is now sent as binary data and I am now making sure that mtda-cli consumes all available data from stdin and transmits it as a single console_send RPC call to avoid fragmenting e.g. escape sequences

@fmoessbauer
Copy link
Copy Markdown
Member

  • the console sometimes does not handle special characters (like backspace) correctly (^H instead of backspace).

input is now sent as binary data and I am now making sure that mtda-cli consumes all available data from stdin and transmits it as a single console_send RPC call to avoid fragmenting e.g. escape sequences

I tested this again and can confirm that the escape sequences are no longer split (iow, it works as expected).

Still, I need the patch from #569 (comment) as I run pip install -e from the mtda repo tree and build the image from the same tree. Please include that as well.

@chombourger
Copy link
Copy Markdown
Collaborator Author

  • the console sometimes does not handle special characters (like backspace) correctly (^H instead of backspace).

input is now sent as binary data and I am now making sure that mtda-cli consumes all available data from stdin and transmits it as a single console_send RPC call to avoid fragmenting e.g. escape sequences

I tested this again and can confirm that the escape sequences are no longer split (iow, it works as expected).

Still, I need the patch from #569 (comment) as I run pip install -e from the mtda repo tree and build the image from the same tree. Please include that as well.

sure. could you git send-email that patch so we do not loose your credits?

chombourger and others added 2 commits April 1, 2026 15:17
usb_ids(), usb_add(), and command() called .splitlines() on
the return value of _cmd() / cmd() without checking for None.
_cmd() returns None when QEMU fails to start. Add the same
guard that status() already applies.

Signed-off-by: Cedric Hombourger <cedric.hombourger@siemens.com>
These files must be re-generated for the target set of protobuf dependencies (the ones in the image, not on the build system).
By deleting them prior to the build, we ensure that they are regenerated.

Signed-off-by: Felix Moessbauer <felix.moessbauer@siemens.com>
@chombourger chombourger added check and removed check labels Apr 1, 2026
@chombourger
Copy link
Copy Markdown
Collaborator Author

@fmoessbauer CI passed with your patch added. thanks! kindly approve the PR so we can merge it

@fmoessbauer
Copy link
Copy Markdown
Member

@fmoessbauer CI passed with your patch added. thanks! kindly approve the PR so we can merge it

I will just give it another try on my RPI4 and then approve.

@chombourger chombourger requested a review from fmoessbauer April 1, 2026 18:24
Copy link
Copy Markdown
Member

@fmoessbauer fmoessbauer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to update cip to 185b390af944e095d803c82d434bd51e74c999fd to be able to build the rpi4-ebg variant. But after doing so the new version worked flawlessly.

Many thanks for implementing this. Having an gRPC interface makes things so much easier when running in a larger lab.

@chombourger chombourger merged commit d351437 into siemens:next Apr 2, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants