tacd: error propagation instead of unwraping part 1 #43

KarlK90 · 2023-09-28T10:04:39Z

Instead of unwrapping Results and potentially panicking we replace it with proper error propagation that can be acted upon in the future e.g. return different exit codes for systemd or display error messages.

Instead of unwrapping Results and potentially panicking we replace it with proper error propagation that can be acted upon in the future e.g. return different exit codes for systemd or display error messages. Signed-off-by: Stefan Kerkmann <[email protected]>

hnez

Hi,

thanks for taking on the task to make tacd less panicky!

I've added some comments. Mostly around Result propagation inside of async_std tasks and std threads.
This got me thinking about how tasks in the tacd should be monitored better in general.

hnez · 2023-09-29T07:29:31Z

src/digital_io.rs


    let (mut src, _) = topic.clone().subscribe_unbounded();

    spawn(async move {
        while let Some(ev) = src.next().await {
-            dst.set_value((ev ^ inverted) as _).unwrap();
+            dst.set_value((ev ^ inverted) as _)?;


An error from the spawned task propagates nowhere and thus silently ignored.
In my opinion that's worse than crashing the tacd.

We could build a wrapped async_std::task::spawn() that works something like BrokerBuilder in that you pass it around everywhere you want to spawn a task during init and it adds the spawned task to an internal list.
The JoinHandles could then be watched for returned errors in main(), so we can break out of it if something happens.

The usage would be something like:

src/digital_io.rs

… fn handle_line_wo( bb: &mut BrokerBuilder, watched: &mut WatchedTaskBuilder, … ) -> Result<Arc<Topic<bool>>> { … watched.spawn(async move { … }); } …

src/main.rs

… #[async_std::main] async fn main() -> Result<()> { let mut watched = WatchedTaskBuilder::new(); … let dig_io = DigitalIo::new(&mut bb, &mut watched, led.out_0.clone(), led.out_1.clone())?; … watched.run().await }

That would also give us the opportunity to define tasks that should complete before marking the initialization as successful (i.e. when to notify systemd that we are ready, so that the RAUC slot can be marked good by rauc-mark-good.service) by e.g. spawning them via a special WatchedTaskBuilder::spawn_init() or the likes.

I quite like the idea. Let me mock something up real quick …

hnez · 2023-09-29T07:47:04Z

src/dut_power.rs

    pwr_line: &LineHandle,
    discharge_line: &LineHandle,
    fail_state: &AtomicU8,
-) {
-    pwr_line.set_value(1 - PWR_LINE_ASSERTED).unwrap();
-    discharge_line.set_value(DISCHARGE_LINE_ASSERTED).unwrap();
+) -> Result<()> {
+    pwr_line.set_value(1 - PWR_LINE_ASSERTED)?;
+    discharge_line.set_value(DISCHARGE_LINE_ASSERTED)?;


If setting the outputs fails the tacd should exit with an error code as fast as possible, so the fail safes can kick in and disable the output for us.
I am not saying that the error-propagating code can not ensure that. I just wanted to write this down as kind of a mental note to check that it is does.

hnez · 2023-09-29T08:04:11Z

src/dut_power.rs

-                        thread_res_tx
-                            .try_send(Ok((tick, request.clone(), state.clone())))
-                            .unwrap();
+                        thread_res_tx.try_send(Ok((tick, request.clone(), state.clone())))?;

                        (tick_weak, request, state)
                    }
                    Err(e) => {
-                        thread_res_tx.try_send(Err(e)).unwrap();
+                        thread_res_tx.try_send(Err(e))?;


The Result returned from the spawned thread (this time an actual thread and not an async-std task) does not go anywhere (which is part of the reason we have the thread_res_tx solution in the first place).
In this case this would mean the tacd would wait forever for a message on thread_res_rx until the systemd watchdog would kill it.
The .unwrap() based implementation would have at least printed an error message as a clue to what went wrong.

PS: Now that I've had a second look I think dropping thread_res_tx would generate an event on thread_res_rx because it makes the Stream end, so we would at least get a "didn't receive thread result" message.

If you really want to get rid of this .unwrap() (for esthetic reasons - this .unwrap() should never hit) you could use async_std::task::spawn_blocking() instead of thread::spawn to spawn a thread for which you can receive the return value in an async fashion and use e.g. futures_lite::future::race() to check both the thread_res_rx channel, as well as the threads JoinHandle for an event. In this case we would only communicate a sucess via the channel, meaning it does no longer have to carry a Result.

That could also simplify the error handling in the thread a bit as we could ? in other places in the initialization and maybe also in the steady state.

hnez · 2023-09-29T08:07:32Z

src/dut_power.rs

@@ -442,7 +441,7 @@ impl DutPwrThread {
                            &pwr_line,
                            &discharge_line,
                            &state,
-                        );
+                        )?;


Once again the Results from the thread do not go anywhere and the Results are just silently dropped.
If the WatchedTaskBuilder (the name is still up for discussion) idea manifests we could just add the thread to the tasks to watch.

hnez · 2023-09-29T08:18:01Z

src/dut_power.rs

-                            discharge_line
-                                .set_value(1 - DISCHARGE_LINE_ASSERTED)
-                                .unwrap();
-                            pwr_line.set_value(PWR_LINE_ASSERTED).unwrap();
+                            discharge_line.set_value(1 - DISCHARGE_LINE_ASSERTED)?;
+                            pwr_line.set_value(PWR_LINE_ASSERTED)?;
                            state.store(OutputState::On as u8, Ordering::Relaxed);
                        }
                        OutputRequest::Off => {
-                            discharge_line.set_value(DISCHARGE_LINE_ASSERTED).unwrap();
-                            pwr_line.set_value(1 - PWR_LINE_ASSERTED).unwrap();
+                            discharge_line.set_value(DISCHARGE_LINE_ASSERTED)?;
+                            pwr_line.set_value(1 - PWR_LINE_ASSERTED)?;
                            state.store(OutputState::Off as u8, Ordering::Relaxed);
                        }
                        OutputRequest::OffFloating => {
-                            discharge_line
-                                .set_value(1 - DISCHARGE_LINE_ASSERTED)
-                                .unwrap();
-                            pwr_line.set_value(1 - PWR_LINE_ASSERTED).unwrap();
+                            discharge_line.set_value(1 - DISCHARGE_LINE_ASSERTED)?;
+                            pwr_line.set_value(1 - PWR_LINE_ASSERTED)?;


Same as above

hnez · 2023-09-29T08:36:05Z

src/led.rs

-                let max = led.max_brightness().unwrap();
+                let max = led.max_brightness()?;


Once again an error that propagates nowhere.

hnez · 2023-09-29T08:38:54Z

src/led.rs

        });
    }

-    topic
+    Ok(topic)


handle_color() itself is still infallible. Errors from the spawned task do not propagate out.
No need for the -> Result<…> and Ok(topic).

hnez · 2023-09-29T08:46:56Z

src/measurement.rs

-        SystemTime::now().checked_sub(age).unwrap()
+        SystemTime::now()
+            .checked_sub(age)
+            .with_context(|| "couldn't get system time")


Getting the system time is infallible, what's actually checkd here is if SystemTime::now() - age can be expressed as a SystemTime (SystemTime::checked_sub()).
I don't quite know why I did not just use a SystemTime::now() - age here as SystemTime actually implements Sub and this failing would mean something really strange is happening.
Maybe just do that to get rid of this .unwrap().

hnez · 2023-09-29T09:01:07Z

src/measurement.rs

+        match {
+            || {
+                let time = self.in_system_time()?;
+                let timestamp = time.duration_since(SystemTime::UNIX_EPOCH)?;
+                let js_timestamp = 1000.0 * timestamp.as_secs_f64();
+                anyhow::Ok(js_timestamp)
+            }
+        }() {


Well that took me a bit to figure out. We should find a solution my brain can parse in a matter of seconds, not minutes.

How about adding a fn as_js_timestamp(&self) -> Result<f64> method to Timestamp and using it here?.

hnez · 2023-09-29T09:04:37Z

src/measurement.rs

-        let _js_timestamp = f64::deserialize(deserializer)?;
-        // We need both Serialize and Deserialize for Topics, even when they
-        // are never deserialized in practice like Timestamps.
-        unimplemented!();
+        use serde::de::Error;
+        Err(Error::custom("unused implementation"))


This reminds me that I always wanted to change the BrokerBuilder so that you do not have to implement Serialize for write-only (e.g. they can only be written from the HTTP API but never read) Topics and Deserialize for read-only Topics, but did not get around to it yet.

Until then this is the nicer workaround.

hnez · 2023-09-29T15:09:53Z

Hi,

I've hacked together a prototype that keeps track of running tasks and should propagate errors down the line. Did not do any testing yet though and as of now the cargo tests do not compile, but I think it's a good start.

have a look at my watched-tasks branch.

hnez · 2023-10-02T14:28:09Z

I've now cleaned up my work in watched-tasks and created a pull request #47.
It would be cool to hear what you think about it.

hnez · 2023-11-23T12:19:15Z

I'd suggest rebasing this once #48 is merged, because it should pave the way to make some of these changes actually work.
I'll assign @KarlK90 to keep track of this.

KarlK90 force-pushed the topic/error-handling/all-the-unwraps-part1 branch 2 times, most recently from e4495ac to c0391af Compare September 28, 2023 10:10

KarlK90 force-pushed the topic/error-handling/all-the-unwraps-part1 branch from c0391af to a0cb336 Compare September 28, 2023 10:51

hnez requested changes Sep 29, 2023

View reviewed changes

hnez mentioned this pull request Oct 2, 2023

watched_tasks: maintain a list of spawned async tasks and propagate errors #47

Merged

hnez assigned KarlK90 Nov 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tacd: error propagation instead of unwraping part 1 #43

tacd: error propagation instead of unwraping part 1 #43

KarlK90 commented Sep 28, 2023

hnez left a comment

hnez Sep 29, 2023

hnez Sep 29, 2023

hnez Sep 29, 2023

hnez Sep 29, 2023

hnez Sep 29, 2023

hnez Sep 29, 2023

hnez Sep 29, 2023

hnez Sep 29, 2023

hnez Sep 29, 2023

hnez Sep 29, 2023

hnez commented Sep 29, 2023

hnez commented Oct 2, 2023

hnez commented Nov 23, 2023

		let max = led.max_brightness().unwrap();
		let max = led.max_brightness()?;

tacd: error propagation instead of unwraping part 1 #43

Are you sure you want to change the base?

tacd: error propagation instead of unwraping part 1 #43

Conversation

KarlK90 commented Sep 28, 2023

hnez left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hnez commented Sep 29, 2023

hnez commented Oct 2, 2023

hnez commented Nov 23, 2023