Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle sendto() function failure conditions #333

Merged

Conversation

pranit-sirsat-imgtec
Copy link
Contributor

The sendto() function can fail because of multiple conditions.
Currently we are handling few conditions. If there is an error due
to another condition, the process gets stuck in repeatedly trying
to call sendto() function and failing with the same error condition.
For example, when awa session is established and after that
network interface goes down(ex. ethernet cable is removed),
the sendto() fuction fails with error 'ENETUNREACH'. This error
is not handled and the process gets stuck in repeatedly calling
sendto() fuction. There are also other possible error conditions.
In this change handling all error conditions, so that process
does not get stuck in endless loop.

Ref: AWA-311
Signed-off-by: Pranit Tanaji Sirsat [email protected]

The sendto() function can fail because of  multiple conditions.
Currently we are handling few conditions. If there is an error due
to another condition, the process gets stuck in repeatedly trying
to call sendto() function and failing with the same error condition.
For example, when awa session is established and after that
network interface goes down(ex. ethernet cable is removed),
the sendto() fuction fails with error 'ENETUNREACH'. This error
is not handled and the process gets stuck in repeatedly calling
sendto() fuction. There are also other possible error conditions.
In this change handling all error conditions, so that process
does not get stuck in endless loop.

Ref: AWA-311
Signed-off-by: Pranit Tanaji Sirsat <[email protected]>
@boyvinall
Copy link
Contributor

Ref: AWA-311

@pranit-sirsat-imgtec do you mean #311 ?

@boyvinall boyvinall added this to the 0.2.5 milestone Jan 20, 2017
@boyvinall boyvinall added the bug label Jan 20, 2017
@pranit-sirsat-imgtec
Copy link
Contributor Author

@pranit-sirsat-imgtec do you mean #311 ?

yes, i added that in commit message.
i am using the github for first time so don't know if something has gone wrong

@boyvinall
Copy link
Contributor

@pranit-sirsat-imgtec No problem. To refer to github issues or pull requests, take a look at https://help.github.com/articles/autolinked-references-and-urls/. Always worth checking the "preview" tab when you're writing comments too, at least until you get used to it.

@cheekyhalf cheekyhalf merged commit e9e2c84 into ConnectivityFoundry:master Jan 20, 2017
@datachi7d
Copy link
Collaborator

@boyvinall - this looks like it removes error checking. Can you review this again please?

@datachi7d
Copy link
Collaborator

@pranit-sirsat-imgtec is it possible to write some tests that show the issue without your fix? This will help us understand how your changes fix the issue you saw.

@pranit-sirsat-imgtec
Copy link
Contributor Author

Hi @datachi7d, thank you for your comment.

can you please explain how it removes the error checking ?

the if else condition was previously checking for few possible errors on which sendto() could not have recovered and returning NetworkSocketError_SendError. for other error conditions, considering those conditions will recover, it was continuing in while loop and calling sendto(). Now if the error condition is unrecoverable then the process is stuck in while loop and useless after that. one example as wrote in commit message was of ENETUNREACH.

now in the new code, as per my understanding EWOULDBLOCK and EINTR are immediately recoverable so remaining in while loop, otherwise for other conditions it will return NetworkSocketError_SendError. this way it will never get stuck in while loop. in case the error was recoverable, the NetworkSocket_Send() function will get called again after some time with same coap data (at least i had checked about coap_transactions)

list of possible errors http://pubs.opengroup.org/onlinepubs/009695399/functions/sendto.html

@pranit-sirsat-imgtec
Copy link
Contributor Author

as we use the awa_clientd daemon
to reproduce the issue,

1] get awa_clientd daemon connected with device server
2] call awa-client-get on command line, should work
3] down the interface, ifconfig down ......
4] wait for few second i think the poll duration in main loop (some 3-4 sec.)
5] call awa-client-get on command line, should fail, with timeout error

for test case, if i get time i will take look at other tests, but for this we will need to up/down interface. do you have fixed name for interfaces on test device ? will that affect the CI system ?

@boyvinall
Copy link
Contributor

@datachi7d I don't think this does remove error checks? Previously, the code was testing a bunch of lastError values but handling them all the same way. There were potentially some values of lastError that were not tested for. The patch has reduced identical handling into a single else which also catches any other values of lastError .. so actually it's adding error handling?

@boyvinall
Copy link
Contributor

@datachi7d However, the point about a test case to catch this in future is a good one. @pranit-sirsat-imgtec we would welcome that if you get the chance, thanks.

@datachi7d
Copy link
Collaborator

@boyvinall my mistake - I was confused by the title combined with the removal of code (I was expecting something to be added). So is there a possibility that some of the returned values caught by the else are not errors? For example EAGAIN.

@datachi7d
Copy link
Collaborator

@pranit-sirsat-imgtec yes this looks tricky to test. Having a mechanism to test this scenario would help with creating transparency of what's happening with the network stack (#319 or #313) as it would allow testing of conditions that lead to network related errors. Using a tun/tap device to create a pseudo interface that can be managed from the CI tests (ifconfig tun0 down) could be a place to start.

@DavidAntliff do we run our CI jobs in docker containers? If the tests aren't in a container getting permission to create the tun/tap device means sudo-ing the tests (very bad).

@datachi7d
Copy link
Collaborator

@DavidAntliff looks like the Jenkins CI jobs are in a container but what about the Travis job?

@DavidAntliff
Copy link
Collaborator

@datachi7d yes, the tests run in docker containers. I don't recall there being any issue with using tun/tap inside them, but I can't be sure. Travis uses Docker too, but implicitly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants