Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JVM SupendApp never terminates #140

Open
weronkagolonka opened this issue Feb 7, 2025 · 3 comments
Open

JVM SupendApp never terminates #140

weronkagolonka opened this issue Feb 7, 2025 · 3 comments
Assignees

Comments

@weronkagolonka
Copy link

weronkagolonka commented Feb 7, 2025

Hello!

After upgrading to 0.5.0, we've noticed a strange behaviour where our Kafka consumer, launched in a separate coroutine, was abruptly stopped few seconds after startup. What's worth noticing, the application itself was not terminated or crashed. The kafka coroutine is running within the same scope of the suspended app and it is polling for messages as long as the jobs is active. Initially, it seemed like simply wrapping it in a new scope would solve the issue - and it did, as the consumer was successfully polling and handling the messages. However, after a deeper investigation to find the root cause of the issue we noticed that the application would never stop, regardless of what was put in the SuspendApp lambda. Given this extremely simple example:

fun main() = SuspendApp { logger.info { "The app is running!" } }

We discovered a significant change in the top-level coroutine launched in the SuspendApp, where after executing the lambda block, the process is being exited with status code 0:

@OptIn(ExperimentalStdlibApi::class)
fun SuspendApp(
  context: CoroutineContext = Dispatchers.Default,
  uncaught: (Throwable) -> Unit = Throwable::printStackTrace,
  timeout: Duration = Duration.INFINITE,
  process: Process = process(),
  block: suspend CoroutineScope.() -> Unit,
): Unit =
  process.use { env ->
    env.runScope(context) {
      val job =
        launch(start = CoroutineStart.LAZY) {
          try {
            block()
            env.exit(0) // <- here
          } catch (_: SuspendAppShutdown) {} catch (e: Throwable) {
            uncaught(e)
            env.exit(-1)
          }
        }
      val unregister =
        env.onShutdown {
          withTimeout(timeout) {
            job.cancel(SuspendAppShutdown)
            job.join()
          }
        }
      job.start()
      job.join()
      unregister()
    }
  }

This call triggers the sequence of JVM shutdown hooks, which then executes env.OnShutdown lambda defined in unregister variable, where the job is being cancelled (and that explains why our kafka consumer was suddenly stopped and why running it in a new scope allowed the consumer to proceed). The process then waits for the cancelled job to complete, but that never happens - the application basically hangs, unless a finite timeout duration is specified, then it simply terminates with timeout exception.

My question is - what is the rationale behind exiting the process after lambda execution instead of letting the job complete and eventually calling unregister() as it was done in version 0.4.0? Right now it doesn't seem like the unregistering logic is ever executed, as the exit(0) method is called first.

If this behaviour is expected, how should the logic of SuspendApp be defined to make sure that the application actually terminates at some point?

Thanks in advance for your help, for now we're skipping the upgrade of the library until we get some more insight 😄

@nomisRev nomisRev self-assigned this Feb 11, 2025
@nomisRev
Copy link
Member

nomisRev commented Feb 11, 2025

Hey @weronkagolonka,

Thanks you for the report! I am going to investigate this.

There was a small refactor in the codebase to include system.exit at the right moments. This was needed for some cloud providers that required explicit system.exit to close pods. To guarantee the same implementation for all platforms, a change was made in the KMP structure.

However, this should absolutely not happen! I am investigating atm, and will report back asap.

Also, any ideas or tips to properly test this automatically would be awesome.

@serras
Copy link
Member

serras commented Feb 11, 2025

Ported to the main Arrow repo at arrow-kt/arrow#3583, please continue the discussion in that issue

@weronkagolonka
Copy link
Author

Thank you for the quick response, looking forward to the results of your investigation. We do not have much test logic around the SuspendApp ATM, but we do consider refactoring the application logic to follow the examples showed in the Arrow's documentation, so there is a room to add more test coverage. I can happily share the outcomes once we start working on that 😊

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants