-
Notifications
You must be signed in to change notification settings - Fork 867
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
do not start a second GetNextMessageAsync thread in case one is already running #4354
base: master
Are you sure you want to change the base?
do not start a second GetNextMessageAsync thread in case one is already running #4354
Conversation
/azp run |
Commenter does not have sufficient privileges for PR 4354 in repo microsoft/azure-pipelines-agent |
Hi @echalone , thank you for the analysis and detailed write up. For accepting this change, it would be helpful if you could create a knob that enables / disables the behavior. The new behavior should be off by default. That would help us with validating and testing before we make it the new default behavior. You'll be able to enable the behavior by setting the env variable defined by the knob.. Sorry that we don't have docs on this pattern currently: |
Hello @merlynomsft, I've implemented the enabling of the bugfix via environment variable as you described and the default behavior is to not have the bugfix enabled. In our case the hash check isn't successful when our Azure DevOps 2022 Server tries to downgrade to agent version from 2.218.1 to 2.210.1, maybe that helps to recreate a testing scenario. |
Prevent the agent from starting a second GetNextMessageAsync thread in case one is already running. This might happen for example if an automatic agent update was triggered due to a pipeline job coming in and the agent update then failing for some reason. In this case the agent would have started another GetNextMessageAsync thread in the background, which would wait for another message from the server, while trying to update the agent. But when the agent update fails this first thread will cycle back and also start another GetNextMessageAsync thread trying to wait for another message from the server.
However, since only one thread per agent session is "allowed" to wait for a response from the Azure DevOps REST API the second thread will immediately return instead of having to wait 50 seconds for a response as is usual and needed for the agent to work properly. This means that while there's one thread that will keep doing what it should do and check for new messages from the server in a 50 second polling cycle, the other thread will constantly call the same REST API uri and immediately return because the server doesn't let it wait for messages, then will again call the same REST API and again immediatly return, doing this in an endless loop and not just generating extreme network overhead (multiple calls per agent per second) but also fill up the IIS log of the Azure DevOps server so that any disc would be full in just a few days or even hours, thereby crashing the server (as happened to us).
This would happen if you have an agent with a higher version than needed by your server, and the server would try to downgrade the agent version with the first build or deployment job coming in, and then the update of the agent fails. It would not happen if the first job coming in is actually the agent update request itself, say because you clicked onto "Update all targets" on the server, it will also not happen if the agent update was successful, since the affected agent is shut down after the update and the new agent takes over with a new polling loop that is not affected.
Tested with Azure DevOps Server 2022 OnPremis and agent version 2.218.1 as well as with an agent compiled from the latest master branch.
Now, one can disable agent auto update for build agents, but there's actually no way yet to disable auto update for deployment agents in the GUI, which was what got us and resulted in a crashed server due to too little disk space (because of extreme large IIS logs). One needs to use the REST API to disable agent auto update for deployment agents, but that's probably another topic.