Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat!: Ensure that all autoevent resource scheduler gets invoked properly during every boot of the target machine #445

Draft
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

TuhinDasgupta-eaton
Copy link

@TuhinDasgupta-eaton TuhinDasgupta-eaton commented Feb 28, 2023

Autoevents resources are getting triggered properly during first boot of the target. But, we are facing a flaky EdgeX service issue where all the device autoevents doesn't get triggered when we reboot the target. It has been found that whenever our sdk tries to add device to metadata through edgex_add_device API, there is a device existing check inside that and if the device exists already, then an API edgex_device_release gets called which further calls edgex_device_autoevent_stop. If suppose the respective device autoevents are already invoked and running, it stops all autoevents and further no autoevent gets called ever. Due to this we are getting 502 error while acquiring autoevent resources value from edgex developer's UI.

This change would ensure that all the device service autoevents gets triggered properly during every boot...

Signed-off-by: Tuhin Dasgupta [email protected]

If your build fails due to your commit message not passing the build checks, please review the guidelines here: https://github.com/edgexfoundry/device-sdk-c/blob/main/.github/Contributing.md

PR Checklist

Please check if your PR fulfills the following requirements:

  • I am not introducing a breaking change (if you are, flag in conventional commit message with BREAKING CHANGE: describing the break)
  • I am not introducing a new dependency (add notes below if you are)
  • I have added unit tests for the new feature or bug fix (if not, why?)
  • I have fully tested (add details below) this the new feature or bug fix (if not, why?)
  • I have opened a PR for the related docs change (if not, why?)

Testing Instructions

New Dependency Instructions (If applicable)

@TuhinDasgupta-eaton TuhinDasgupta-eaton marked this pull request as draft February 28, 2023 14:49
@@ -28,6 +28,12 @@ typedef struct devsdk_nvpairs
struct devsdk_nvpairs *next;
} devsdk_nvpairs;

typedef struct devsdk_new_device
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of defining a new structure to hold a list of device names, use a string map:
iot_data_t *devices;
Initialisation (create empty map):
devices = iot_data_alloc_map (IOT_DATA_STRING);
Add entry:
iot_data_map_add (devices, iot_data_alloc_string (name, IOT_DATA_COPY), iot_data_alloc_null ());
Test for existing name:
if (iot_data_string_map_get (devices, name))
Clean up
iot_data_free (devices);

Also, store the map as a member of the devmap or service structs rather than a global

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @iain-anderson for this suggestion... This shall be more efficient to be used...

result = search_devsdk_device(device_first, dup->name);
if (result)
{
edgex_device_autoevent_start (map->svc, dup);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what effect this is intended to have - looks as though autoevents will only be started for devices added via edgex_add_device?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what effect this is intended to have - looks as though autoevents will only be started for devices added via edgex_add_device?

@iain-anderson Thanks for the review... I am putting my observation and analysis below point wise:

The autoevents gets invoked in 2 ways for 2 different scenarios:

1. When we boot target for the very first time:
In this scenario the device profile and default device interfaces (viz. eth0 in this case) gets added to metadata for the very first time.
After the successful addition of the device the autoevents gets scheduled and invoked from the API's in the below order:

edgex_devmap_populate_devices---->add_locked---->edgex_device_autoevent_start---->ae_runner

2. When we reboot the target or poweroff and start the target:
In this scenario the device profile and default device interfaces (viz. eth0 in this case) is already in the entry and hence when sdk tries
to add the device in metadata, it goes for a check initially where it is found that device already exits in the entry and hence the device
profile and device just needs to be updated in the sdk in order to invoke autoevents with the help of device profile update callback.

Issue in target machine:
It runs fine when target gets booted for the very first time. But after reboot a flaky issue is found. Only 1st autoevent is invoked and rest
all of them are not appearing. Hence, in this case when we try to acquire value of rest of the autoevent resources from edgex developer's UI
we get 502 error.

Root cause of the issue:
After reboot when service starts, sometimes the below order of API's gets triggered first before the device exist check while device addition.

edgex_devmap_populate_devices---->add_locked--->edgex_device_autoevent_start---->ae_runner

Since the device is already in metadata even before the existing check, the above API order tries to invoke the autoevents for the device with schedulers.
The ae_runner works in schedulers running parallely. Meanwhile there is an attempt for adding device where it is found under the check that
device already exists. Under the same check there is an API edgex_device_release call which has call for edgex_device_autoevent_stop inside
which actually stops and deletes the remaining autoevent schedulers which are yet to be run. Once stopped further no autoevents gets triggered ever.
In this scenario no device profile update callback gets invoked which also triggers the autoevents.

Possible fix:
We can make a data structure which would contain the list of only newly added device. We will do a search operation on the data structure just before edgex_device_autoevent_start
during the API order call as mentioned above earlier. This will make sure that autoevents gets invoked with this API order call only for newly added device.

edgex_devmap_populate_devices---->add_locked---->search_devsdk_device--->{if search for a new device successful}---->edgex_device_autoevent_start---->ae_runner

For devices already in the metadata entry, the autoevents will be automatically scheduled and invoked by the device profile update callback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants