-
Notifications
You must be signed in to change notification settings - Fork 127
Description
Describe your problem
A clear and concise description of what your problem is. It might be a bug,
a feature request, or just a problem that need support from the vineyard team.
When RDMAClient::Connect() start trigger fi_connect and fi_eq_sread, it cost about 18.5s when Status RDMAServer::GetEvent(VineyardEventEntry& vineyard_entry) recv the msg. It is so long.
By the way,how to set the interval of client reconnect? tks
Status RDMAClient::Connect() {
CHECK_ERROR(!fi_connect(ep, fi->dest_addr, NULL, 0), "fi_connect failed.");
fi_eq_cm_entry entry;
uint32_t event;
CHECK_ERROR(
fi_eq_sread(eq, &event, &entry, sizeof(entry), -1, 0) == sizeof(entry),
"fi_eq_sread failed.");
if (event != FI_CONNECTED || entry.fid != &ep->fid) {
return Status::Invalid("Unexpected event:" + std::to_string(event));
}
return Status::OK();
}
Status RDMAServer::GetEvent(VineyardEventEntry& vineyard_entry) {
struct fi_eq_cm_entry entry;
uint32_t event;
while (true) {
int rd = fi_eq_sread(eq, &event, &entry, sizeof entry, 500, 0);
if (rd < 0 && (rd != -FI_ETIMEDOUT && rd != -FI_EAGAIN)) {
return Status::IOError("fi_eq_sread broken. ret:" + std::to_string(rd));
}
if (rd == -FI_ETIMEDOUT || rd == -FI_EAGAIN) {
if (state == STOPED) {
return Status::Invalid("Server is stoped.");
}
continue;
}
if (event == FI_SHUTDOWN) {
fid_ep* closed_ep = container_of(entry.fid, fid_ep, fid);
RemoveClient(closed_ep);
continue;
}
vineyard_entry.fi = entry.info;
vineyard_entry.event_id = event;
vineyard_entry.fid = entry.fid;
return Status::OK();
}
}
If is is a bug report, to help us reproducing this bug, please provide information below:
- Your Operation System version (
uname -a
): - The version of vineyard you use (
vineyard.__version__
): - Versions of crucial packages, such as gcc, numpy, pandas, etc.:
- Full stack of the error (if there are a crash):
- Minimized code to reproduce the error:
If it is a feature request, please provides a clear and concise description of what you want to happen:
What is the problem:
The behaviour that you expect to work:
Additional context
Add any other context about the problem here.