- 
                Notifications
    You must be signed in to change notification settings 
- Fork 3
fix: Ensure the LambdaRuntimeAPI server is up before launching INIT phase #42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| Contexttl;dr there's a race condition when trying to initialise the RIE before starting the Lambda Runtime API server. The solution is to block and poll the runtime server until we're certain it's up. ProblemWhen building a Sandbox, the RIE automatically starts the runtime API server in the background. 
 lambda-runtime-init/cmd/localstack/main.go Lines 214 to 215 in 9be92e0 
 
 
 
 lambda-runtime-init/lambda/rapid/sandbox.go Line 106 in 9be92e0 
 
 lambda-runtime-init/cmd/localstack/main.go Line 231 in 9be92e0 
 SolutionInstead of relying on the good graces of the scheduler to start the  lambda-runtime-init/lambda/rapi/router.go Lines 29 to 32 in 9be92e0 
 | 
3cac6da    to
    66e356c      
    Compare
  
    There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good work! I think this should resolve the issue, however I am a bit concerned about the possible increase in startup time. Let's discuss the timeout before merging!
        
          
                cmd/localstack/runtime.go
              
                Outdated
          
        
      | Timeout: 5 * time.Second, | ||
| } | ||
|  | ||
| ticker := time.NewTicker(500 * time.Millisecond) | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you per chance test how many - on average - retries we get here, and how this affects the average startup times? Does it make sense to reduce this to 10 - 50ms, for example? More pings, but less delay? The startup time should be significantly less than 10ms ideally anyway, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dfangl Fair points. I saw these in the order of milliseconds based on my local runs (keeping in mind that I'm on ARM64 and don't have issues with this).
{"file":"lambda-runtime-init/lambda/rapi/server.go:108","func":"go.amzn.com/lambda/rapi.(*Server).Listen","level":"debug","msg":"Runtime API Server listening on 127.0.0.1:9001","time":"2025-10-17T19:35:58+02:00"}
{"file":"lambda-runtime-init/lambda/rapi/middleware/middleware.go:76","func":"go.amzn.com/lambda/rapi/middleware.AccessLogMiddleware.func1.1","level":"debug","msg":"API request - GET /2018-06-01/ping, Headers:map[Accept-Encoding:[gzip] User-Agent:[Go-http-client/1.1]]","time":"2025-10-17T19:35:59+02:00"}
{"file":"lambda-runtime-init/lambda/rapi/middleware/middleware.go:76","func":"go.amzn.com/lambda/rapi/middleware.AccessLogMiddleware.func1.1","level":"debug","msg":"API request - GET /2018-06-01/ping, Headers:map[Accept-Encoding:[gzip] User-Agent:[Go-http-client/1.1]]","time":"2025-10-17T19:35:59+02:00"}
{"file":"ambda-runtime-init/cmd/localstack/main.go:245","func":"main.main","level":"debug","msg":"Starting runtime init.","time":"2025-10-17T19:36:04+02:00"}In anycase, these checks are probably more conservative than what is necessary so increasing frequency and decreasing timeout duration seems logical.
Otherwise, if we're trying to make this as fast as possible, so as to not delay startup times, we can also do a single check to see if the port is open (similar to LocalStack's is_port_open()) with some timeout of 5 seconds (or something equivalently short).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine either way - I just think 500ms for delay is too much, especially since the first one will for quite some systems fail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK changed to 50ms! Happy to 🚢 ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks for increasing the poll rate!
Motivation
Changes
waitForRuntimeAPIhelper function that checks the/pingendpoint of the runtime API server before allowing theINITfunctionality to proceed.