Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

configurable timeout for ovsdb connection #2790

Closed
wants to merge 1 commit into from

Conversation

jxiaobin
Copy link

@jxiaobin jxiaobin commented Feb 2, 2022

currently ovsdb connection timeout is hardcoded to 20s, it might
not be sufficient when running large scale test.

this commit makes the timeout value configurable during startup
by specifying arg --ovsdb-connect-timeout, e.g.

--ovsdb-connect-timeout 15s

scripts and spec templates were also updated, so that the timeout
value can be set through daemonset.sh, e.g.

./daemonset.sh --ovsdb-connect-timeout=25s

Signed-off-by: xqu [email protected]

currently ovsdb connection timeout is hardcoded to 20s, it might
not be sufficient when running large scale test.

this commit makes the timeout value configurable during startup
by specifying arg `--ovsdb-connect-timeout`, e.g.

`--ovsdb-connect-timeout 15s`

scripts and spec templates were also updated, so that the timeout
value can be set through `daemonset.sh`, e.g.

`./daemonset.sh --ovsdb-connect-timeout=25s`

Signed-off-by: xqu <[email protected]>
MonitorAll: true,
LFlowCacheEnable: true,
RawClusterSubnets: "10.128.0.0/14/23",
OVSDBConnectTimeout: types.OVSDBTimeout,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like some spaces and orders were changed. I would avoid changing anything than what the patch intend to do. It helps the review.

Copy link
Author

@jxiaobin jxiaobin Feb 3, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @dougsland, only some spaces were added by gofmt, because the new field OVSDBConnectTimeout is longer than the rest ones, gofmt added spaces for all the fields.
I agree with you, but for this case, if I want to avoid the space change, I need either shorten the field name to make it less meaningful, or break gofmt convention, adding spaces seem more acceptable to me.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to me it's fine. I just noticed a bunch of spaces added :) lgtm

@dcbw
Copy link
Contributor

dcbw commented Feb 5, 2022

@jxiaobin do you see this on the initial connection, or on reconnections?

The reason I ask is that we know reconnections are sub-optimal because libovsdb doesn't yet use the LastTransactionID feature of MonitorCondSince. That's being worked on in ovn-org/libovsdb#283 and will very likely solve the problem for reconnect.

When I did scale tests, the majority of the time in reconnects is taken by parsing and processing the JSON received from the DB, and that will be greatly reduced by the libovsdb PR above.

So hopefully when that PR lands, we won't need a configurable timeout?

@hzhou8
Copy link
Contributor

hzhou8 commented Feb 7, 2022

@jxiaobin do you see this on the initial connection, or on reconnections?

The reason I ask is that we know reconnections are sub-optimal because libovsdb doesn't yet use the LastTransactionID feature of MonitorCondSince. That's being worked on in ovn-org/libovsdb#283 and will very likely solve the problem for reconnect.

When I did scale tests, the majority of the time in reconnects is taken by parsing and processing the JSON received from the DB, and that will be greatly reduced by the libovsdb PR above.

So hopefully when that PR lands, we won't need a configurable timeout?

@dcbw it was seen mostly during reconnection but it is also possible that the initial connection could take a long time, right? I agree with you that MonitorCondSince would help reconnection fast in most cases, but there are still exceptions, e.g. when transaction history on the server happen to be discarded before reconnecting (if the transaction rate is very high), and it would fall back to download the whole data. So I think it may still be good to have it configurable instead of hardcoded, and of course make sure the default value is good enough for average use cases. What do you think?

@girishmg
Copy link
Contributor

girishmg commented Feb 8, 2022

@jxiaobin @hzhou8 I am confused with ths PR..

  1. We now have two timeouts types.OVSDBTimeout (default value is 10s) and config.Defualt.OVSDBConnectTimeout (also 10s)
  2. We used to use 20s as timeout here, but now it has become 10s
 const connectTimeout time.Duration = types.OVSDBTimeout * 2. <====== used to be 20s .. but now 10s?
 logger := klogr.New()
 options := []client.Option{
 	// Reading and parsing the DB after reconnect at scale can (unsurprisingly)
 	// take longer than a normal ovsdb operation. Give it a bit more time so
 	// we don't time out and enter a reconnect loop.
 	client.WithReconnect(connectTimeout, &backoff.ZeroBackOff{}),
  1. I see that we set the Context timeout to this config.Defualt.OVSDBConnectTimeout just before we call monitorAll
    	ctx, cancel := context.WithTimeout(context.Background(), config.Defualt.OVSDBConnectTimeout)
    go func() {
    	<-stopCh
    	cancel()
    }()
    
    // Only Monitor Required SBDB tables to reduce memory overhead
    _, err = c.Monitor(ctx,
    	c.NewMonitor(
    
   So, it is not just for `client.Connect`, looks like we are using it form OVSDB method like `monitor`

		

@jxiaobin
Copy link
Author

jxiaobin commented Feb 8, 2022

@girishmg

  • types.OVSDBTimeout is a constant, config.Default.OVSDBConnectTimeout seems confusing, it's not default value, but only holds value for arg --ovsdb-connect-timeout passed from cmd line; if no value is specified, the constant value will be assigned to config.Default.OVSDBConnectTimeout. other code should refer to config.Default.OVSDBConnectTimeout instead of types.OVSDBTimeout.
  • users need to specify --ovsdb-connect-timeout in cmd line to override 10s default timeout
  • yes, the monitors need a timeout setting anyway, so use config.Default.OVSDBConnectTimeout for consistency.

@dcbw
Copy link
Contributor

dcbw commented May 18, 2022

LastTransactionID support was added to libovsdb in ovn-org/libovsdb#292 and should be in ovn-kube for a couple months.

@dcbw
Copy link
Contributor

dcbw commented May 18, 2022

Tim asks whether we should split the timeout apart, so that ConnectTimeout really means "I can open a TCP connection to the DB" that is separate from the actual reading of a large database from the DBserver after connecting.

eg, if we're still receiving data and doing work, should that really be a timeout?

@jxiaobin
Copy link
Author

I went through both ovn-kubernetes and libovsdb code, as far as I can see, for reconnect case, the timeout context is shared by both connect and monitor (downloading data).
https://github.com/ovn-org/libovsdb/blob/2e36bf99be1bb31c4968ece8f807d74fe6b4cdee/client/client.go#L296
If we want to have separate connect and read timeout, we may need to either split reconnect logic from connect function, or create separate timeout context for connect and reconnect respectively, also remove context from current function signature.

@dcbw could you please advise?

@jxiaobin
Copy link
Author

jxiaobin commented Jun 1, 2022

@dcbw created a libovsdb PR ovn-org/libovsdb#313, PTAL.

@jxiaobin jxiaobin closed this Sep 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants