-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Failures while assigning public IP via enableStaticNat #10313
Comments
Just a small update.
Another maybe interesting thing, when I tried to add public IP to VM that already had public IP, I immediately got corresponding error.
|
Release IP was also very slow, though I did it from UI, and it did not fail. From the allocated list IPs disappeared fast enough, but notifications about IP release started appearing in UI just after maybe some 40 seconds. Release of all 16 IPs I had took about 2 minutes.
then failed with
|
Maybe anybody can suggest java parameters to improve this? |
I found this warning in management log, seems it's relevant
The part of log corresponding to |
@akrasnov-drv , do you have webhooks enabled? it seems like the issue is not in enableStaticNat itself. |
@akrasnov-drv |
@DaanHoogland I don't think webhooks is the cause here. The logs shared are for the API call that finished in around 10s. I see multiple agent-server Command-Answer communication taking few seconds. Also, multiple errors like,
Maybe the API needs optimization or there is underlying network issue |
it looks @akrasnov-drv uses redundant VRs, the error happens in the BACKUP VR which do not have default route as the public nic is DOWN.
|
First of all thanks for the attention and care. @DaanHoogland I tried using webhooks in the past, but when started getting different issues I recreated the cluster without webhooks. Here is my agent config
Nevertheless (I believe I reported it before) there is
and I had some doubts about it. Though I do not see how it can be related to the current nat issue. I created 100 VMs via api without a problem, only this call fails with timeout. @weizhouapache I have workers in global config set to 50, but as you see above, agent has it set to 5, and it's not something I set. I can increase that, no problem, but I really doubt number of workers should be relevant to a failure in single api call. @shwstppr I'll clean the env and start it again to provide a wider log, covering both successful executions and then failure. In the meantime, has anybody tried my flow? Did it work (and then the failure is just in my env)? |
@akrasnov-drv enabling hundred(s) static nat adresses in not a usual case. But it should work. We'll have to investigate what might go wrong. Do you have a clean environment to experiment in? I doubt either is the culprit but we'll have to start simple. |
I removed all VMs and network and recreated with standard isolated network offering with static nat with single VR.
It managed to assign about 40 IPs on the way before failing.
Attaching logs from management and from VR for the time of above. |
@DaanHoogland it was not our intention. We are just trying to use CloudStack fleet in Jenkins. The only plugin supporting CloudStack is jcloud plugin, and it requires public IP and static nat. |
Noted @akrasnov-drv , this is added to the backlog and has to be investigated. I am afraid I don't have a workaround off the top of my head. |
problem
enableStaticNat
starts failing just after several uses.Actually I see that IP is assigned but the call still fails with timeout just after 3-4 uses
For my test I created a number of VMs, and then tried to assign public IP to all of them sequentially:
I tried different configurations of network and VR, and got it always failing, in the best case after 6-7 successful assignments.
large VR with 4 CPUS and several GB memory did not help neither.
time
for all failing ones showsreal 0m10.350s
or slightly more.Started from #10184
versions
CloudStack 4.20.0.0 with https://github.com/apache/cloudstack/pull/10254/files applied (PR does not help with this)
Ubuntu 22.04.5 LTS
libvirt 8.0.0-1ubuntu7.10
isolated network over VLAN
about 1000 public IPs in /20
The steps to reproduce the bug
Repeat 2-3 for VMs in 1. till it starts failing (just after about 3-4 cycles in my case)
associateIpAddress
to get public IP IDenableStaticNat
with VM ID and IP IDInitially
enableStaticNat
takes 9 seconds then increases to 10, and then just starts failing with timeoutHere is a cycle doing the above
What to do about it?
Looks like the call is taking too much time to return. Should be optimized.
The text was updated successfully, but these errors were encountered: