Skip to content

Conversation

@CLFutureX
Copy link
Contributor

Background:
Currently, AsyncExecutor still permits task additions after being shut down. This could lead to unpredictable exceptions.
Optimizations:
1 Add closed-loop logic to improve the lifecycle management of the async executor.
2 Add pre-checks to reduce concurrent locking overhead.
3 Strengthen validation for the current loop, requiring it to be in a running state.
4 Optimize the waiting process for loop startup.

@CLFutureX
Copy link
Contributor Author

@ryanhoangt @xingyaoww hey, PTAL , thanks

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on my analysis of the AsyncExecutor changes, here is my review:

Issues Found

🔴 Critical Issues

  • Race condition in _ensure_loop() early return (lines 32-33): The optimization that checks if self._loop is not None and self._loop.is_running() BEFORE acquiring the lock introduces a critical race condition. Between checking if the loop is running and returning it, another thread could call _shutdown_loop(), making the returned loop invalid. This violates the "thread safety" guarantee stated in the class docstring.

    Impact: A caller could receive a reference to a loop that's being shut down or already stopped, leading to RuntimeErrors when trying to schedule tasks.

    Recommendation: Remove the early return at lines 32-33, or move the check inside the lock to maintain thread safety.

  • Bypass of shutdown check: The early return at line 33 bypasses the _shutdown.is_set() check at line 35-36. This means that if shutdown has been initiated but the loop hasn't fully stopped yet, _ensure_loop() could still return the loop, allowing new tasks to be scheduled during shutdown.

    Recommendation: Ensure the shutdown check always happens before returning a loop reference.

🟡 Important Issues

  • Typo in error message (line 36): Error message says "asyncExecutor has been shut down" but should be "AsyncExecutor" (capital E) for consistency with the class name.

  • String continuation style (lines 41-42): The warning message uses backslash line continuation (\), which is not the preferred Python style. Implicit string concatenation with parentheses is cleaner and more maintainable.

    # Current:
    logger.warning("The loop is not empty, but it is not in a running state." \
    " Under normal circumstances, this should not happen.")
    
    # Better:
    logger.warning(
        "The loop is not empty, but it is not in a running state. "
        "Under normal circumstances, this should not happen."
    )
  • Questionable "should not happen" warning: The warning at lines 41-42 claims this scenario "should not happen under normal circumstances," but the new early return optimization actually makes this more likely due to potential race conditions.

🟢 Minor Issues

  • Import ordering (line 7): import time is placed after from collections.abc import Callable, breaking the convention of grouping standard library imports alphabetically. Should be moved up to line 4 (after import threading).

  • Double-check locking optimization may be premature: The performance benefit of the early return check (lines 32-33) is likely minimal for this use case, while it introduces significant correctness risks. Consider whether this optimization is actually needed based on profiling data.

Signed-off-by: CLFutureX <[email protected]>
Signed-off-by: CLFutureX <[email protected]>
Signed-off-by: CLFutureX <[email protected]>
@CLFutureX
Copy link
Contributor Author

Based on my analysis of the AsyncExecutor changes, here is my review:

Issues Found

🔴 Critical Issues

  • Race condition in _ensure_loop() early return (lines 32-33): The optimization that checks if self._loop is not None and self._loop.is_running() BEFORE acquiring the lock introduces a critical race condition. Between checking if the loop is running and returning it, another thread could call _shutdown_loop(), making the returned loop invalid. This violates the "thread safety" guarantee stated in the class docstring.
    Impact: A caller could receive a reference to a loop that's being shut down or already stopped, leading to RuntimeErrors when trying to schedule tasks.
    Recommendation: Remove the early return at lines 32-33, or move the check inside the lock to maintain thread safety.
  • Bypass of shutdown check: The early return at line 33 bypasses the _shutdown.is_set() check at line 35-36. This means that if shutdown has been initiated but the loop hasn't fully stopped yet, _ensure_loop() could still return the loop, allowing new tasks to be scheduled during shutdown.
    Recommendation: Ensure the shutdown check always happens before returning a loop reference.

🟡 Important Issues

  • Typo in error message (line 36): Error message says "asyncExecutor has been shut down" but should be "AsyncExecutor" (capital E) for consistency with the class name.
  • String continuation style (lines 41-42): The warning message uses backslash line continuation (\), which is not the preferred Python style. Implicit string concatenation with parentheses is cleaner and more maintainable.
    # Current:
    logger.warning("The loop is not empty, but it is not in a running state." \
    " Under normal circumstances, this should not happen.")
    
    # Better:
    logger.warning(
        "The loop is not empty, but it is not in a running state. "
        "Under normal circumstances, this should not happen."
    )
  • Questionable "should not happen" warning: The warning at lines 41-42 claims this scenario "should not happen under normal circumstances," but the new early return optimization actually makes this more likely due to potential race conditions.

🟢 Minor Issues

  • Import ordering (line 7): import time is placed after from collections.abc import Callable, breaking the convention of grouping standard library imports alphabetically. Should be moved up to line 4 (after import threading).
  • Double-check locking optimization may be premature: The performance benefit of the early return check (lines 32-33) is likely minimal for this use case, while it introduces significant correctness risks. Consider whether this optimization is actually needed based on profiling data.

Based on my analysis of the AsyncExecutor changes, here is my review:

Issues Found

🔴 Critical Issues

  • Race condition in _ensure_loop() early return (lines 32-33): The optimization that checks if self._loop is not None and self._loop.is_running() BEFORE acquiring the lock introduces a critical race condition. Between checking if the loop is running and returning it, another thread could call _shutdown_loop(), making the returned loop invalid. This violates the "thread safety" guarantee stated in the class docstring.
    Impact: A caller could receive a reference to a loop that's being shut down or already stopped, leading to RuntimeErrors when trying to schedule tasks.
    Recommendation: Remove the early return at lines 32-33, or move the check inside the lock to maintain thread safety.
  • Bypass of shutdown check: The early return at line 33 bypasses the _shutdown.is_set() check at line 35-36. This means that if shutdown has been initiated but the loop hasn't fully stopped yet, _ensure_loop() could still return the loop, allowing new tasks to be scheduled during shutdown.
    Recommendation: Ensure the shutdown check always happens before returning a loop reference.

🟡 Important Issues

  • Typo in error message (line 36): Error message says "asyncExecutor has been shut down" but should be "AsyncExecutor" (capital E) for consistency with the class name.
  • String continuation style (lines 41-42): The warning message uses backslash line continuation (\), which is not the preferred Python style. Implicit string concatenation with parentheses is cleaner and more maintainable.
    # Current:
    logger.warning("The loop is not empty, but it is not in a running state." \
    " Under normal circumstances, this should not happen.")
    
    # Better:
    logger.warning(
        "The loop is not empty, but it is not in a running state. "
        "Under normal circumstances, this should not happen."
    )
  • Questionable "should not happen" warning: The warning at lines 41-42 claims this scenario "should not happen under normal circumstances," but the new early return optimization actually makes this more likely due to potential race conditions.

🟢 Minor Issues

  • Import ordering (line 7): import time is placed after from collections.abc import Callable, breaking the convention of grouping standard library imports alphabetically. Should be moved up to line 4 (after import threading).
  • Double-check locking optimization may be premature: The performance benefit of the early return check (lines 32-33) is likely minimal for this use case, while it introduces significant correctness risks. Consider whether this optimization is actually needed based on profiling data.

Good analysis – this is indeed an area that needs further optimization. Here's the new adjustment: since the loop is a shared resource, we should add a lock for every write operation on it.

@CLFutureX
Copy link
Contributor Author

@openhands-agent please help review it

@enyst enyst requested review from openhands-agent and removed request for openhands-agent November 8, 2025 17:05
Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on my analysis of the PR changes, here is my review:

Issues Found

🔴 Critical Issues

  • Event loop resource leak in _shutdown_loop (lines 62-81): The method stops the event loop but never calls loop.close(), causing a ResourceWarning: unclosed event loop. After stopping the loop and joining the thread, you must add loop.close() to properly clean up resources.

    # After line 81, add:
    if loop and not loop.is_closed():
        loop.close()
  • Missing error handling for loop.close() (line 43): In _safe_execute_on_loop, loop.close() is called without a try-except block. If the loop has pending callbacks or is in an invalid state, this will raise an exception and crash the method while holding the lock. Wrap this in proper error handling.

🟡 Important Issues

  • Coroutine leak when run_async is called with pre-created coroutine after shutdown (lines 106-113): If a user creates a coroutine and then calls run_async(coro) after shutdown, the coroutine will never be awaited, causing a RuntimeWarning: coroutine was never awaited. When raising RuntimeError at line 107, you should check if awaitable_or_fn is already a coroutine and close it with awaitable_or_fn.close().

  • Misleading log message in _shutdown_loop (line 65): The message "AsyncExecutor has been shutdown" implies past tense, but it's checking if shutdown was already called. Change to "AsyncExecutor is already shut down" or "Shutdown already in progress" for clarity.

  • No timeout for loop startup wait (lines 55-56): The while not loop.is_running() loop could potentially hang forever if the thread fails to start properly. Add a timeout mechanism (e.g., 5 seconds) and raise an exception if the loop doesn't start in time.

🟢 Minor Issues

  • Confusing warning message wording (line 41): "The loop is not empty" is unclear. Consider changing to "The loop exists but is not running. This usually indicates improper cleanup." for better clarity.

  • Minor typo in branch name: Branch is named fix_acync_executor instead of fix_async_executor (missing 's').

@enyst
Copy link
Collaborator

enyst commented Nov 8, 2025

@openhands-agent please help review it

Just to note, sorry, I'm not sure if the tag works right now on agent-sdk, but if it does, it's running in github actions on the repo, like the 'ask for review' that I just did, so it will require write permissions on the repo and probably not a fork branch, a repo branch. But you could install it on your repo and run it. 😅

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants