Skip to content

prov/verbs: Reload the list of verbs devices on each call to fi_getinfo() #10881

@sydidelot

Description

@sydidelot

Is your feature request related to a problem? Please describe.
The verbs provider loads the list of verbs devices available on the node only once, on the first call to fi_getinfo(). If a network device is added (hot-plugged) after this initial call to fi_getinfo(), it won't be visible in libfabric.
A subsidiary problem is that fi_getinfo() only returns network adapters with active links:
[ if (!ifa->ifa_addr || !(ifa->ifa_flags & IFF_UP) || ](https://github.com/ofiwg/libfabric/blob/main/prov/verbs/src/verbs_info.c#L1152)
If the link is initially inactive and becomes active after the first call to fi_getinfo(), this interface will not be visible in libfabric.

This is a particularly a problem for long-running services where restarting a process to discover newly added network devices is not an option.

Describe the solution you'd like
I would like the list of verbs devices to be updated on each call to fi_getinfo()

Describe alternatives you've considered
Restarting the process is the only work-around I know.

Additional context

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions