Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

downloader improvements during outages #2033

Merged
merged 6 commits into from
Apr 22, 2021
Merged

Conversation

eriknordmark
Copy link
Contributor

@eriknordmark eriknordmark commented Apr 20, 2021

If we have 100% packet loss on one interface, then we still try to use it for every POST to zedcloud. Since we are sending info messages for the device, apps, volumes, content-trees, and blobs, it means that EVE can be far behind in terms of reporting things to zedcloud. The first commit addresses that by using the test status from nim.

As we retry the state transitions seen by the controller are confusing, since the error is cleared when a retry starts. With these changes we don't clear it until the retry has succedded. Also, the log messages include that 1) the operation will be retried in N seconds, and 2) "retry N" has started

Each commit can be reviewed separately.

Copy link
Contributor

@rvs rvs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oooph -- this is a bit complex -- need more time to properly review this.

In the meantime -- let it test!

Copy link
Contributor

@gkodali-zededa gkodali-zededa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@giggsoff
Copy link
Contributor

giggsoff commented Apr 21, 2021

Interfaces don't seem to return from an error state during verify. I add modifications for adam: lf-edge/adam#88 and apply diff to your PR and it works fine:

diff --git a/pkg/pillar/zedcloud/send.go b/pkg/pillar/zedcloud/send.go
index db5d05d3a..9c45436d4 100644
--- a/pkg/pillar/zedcloud/send.go
+++ b/pkg/pillar/zedcloud/send.go
@@ -196,10 +196,17 @@ func VerifyAllIntf(ctx *ZedCloudContext,
                case types.SenderStatusRefused, types.SenderStatusCertInvalid:
                        remoteTemporaryFailure = true
                }
-               if resp != nil &&
-                       (resp.StatusCode >= http.StatusInternalServerError &&
-                               resp.StatusCode <= http.StatusNetworkAuthenticationRequired) {
-                       remoteTemporaryFailure = true
+               if resp != nil {
+                       if resp.StatusCode >= http.StatusInternalServerError &&
+                               resp.StatusCode <= http.StatusNetworkAuthenticationRequired {
+                               remoteTemporaryFailure = true
+                       }
+                       if resp.StatusCode == http.StatusForbidden {
+                               log.Tracef("VerifyAllIntf: Zedcloud reachable via interface %s", intf)
+                               intfStatusMap.RecordSuccess(intf)
+                               intfSuccessCount++
+                               continue
+                       }
                }
                if err != nil {
                        log.Errorf("Zedcloud un-reachable via interface %s: %s",

@eriknordmark
Copy link
Contributor Author

Interfaces don't seem to return from an error state during verify. I add modifications for adam: lf-edge/adam#88 and apply diff to your PR and it works fine:

Thanks for explaining this.
My PR tried to optimize it so that when there is a choice, skip the failed interfaces.
But since nim only updates the failed/succeeded every 5 minutes and as you discovered, we can potentially have false notions of failed during the onboarding/registration, I need to have the code in send.go try the failed ones if there are no other choices. So added a commit for that and let's see if it passes.

Copy link
Contributor

@rvs rvs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kicking off Eden

@eriknordmark
Copy link
Contributor Author

@rvs do you still need more time to review?

@rvs
Copy link
Contributor

rvs commented Apr 22, 2021

LGTM! Thanks for being patient @eriknordmark !

Lets merge this

@eriknordmark eriknordmark merged commit 47ef16d into lf-edge:master Apr 22, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants