- Add field `is_retried` to Node model to detect retried jobs - Add pipeline service `job_retry.py` to listen to node events with `node.state=done` and `result in ("fail", "incomplete")` - Publish cloud event for scheduler (maybe introduce a new pubsub channel `retry`?) - Decide frequency of retries and retry job based on `is_retried` flag