Skip to content

Fix workflow freezing thread safety. #1375

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

zach-klippenstein
Copy link
Collaborator

@zach-klippenstein zach-klippenstein commented Jul 18, 2025

Two commits:

  1. First one adds a stress test that shows the bug.
  2. Second one fixes the bug and makes the test pass.

@zach-klippenstein zach-klippenstein force-pushed the zachklipp/workflow-multithreading branch 4 times, most recently from a1d2b3b to 0fe019b Compare July 18, 2025 18:16
@zach-klippenstein zach-klippenstein marked this pull request as ready for review July 18, 2025 18:16
@zach-klippenstein zach-klippenstein requested a review from a team as a code owner July 18, 2025 18:16
@zach-klippenstein zach-klippenstein force-pushed the zachklipp/workflow-multithreading branch from 0fe019b to 58c2441 Compare July 18, 2025 18:17
@@ -100,7 +100,6 @@ internal class RealRenderContext<PropsT, StateT, OutputT>(
* Freezes this context so that any further calls to this context will throw.
*/
fun freeze() {
checkNotFrozen("freeze") { "freeze" }
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was just a defensive programming check, since this method is only called by internal code. It's more useful for it to be idempotent now.

@zach-klippenstein zach-klippenstein force-pushed the zachklipp/workflow-multithreading branch from 58c2441 to f3c38bb Compare July 18, 2025 18:25
Copy link
Contributor

@rjrjr rjrjr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You have my LG for what it's worth. Not my strong suit.

*/
internal actual class ThreadLocal<T>(
private val initialValue: () -> T
) : NSObject(), NSCopyingProtocol {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jamieQ maybe you could give this a sanity check?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes please.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

import java.util.concurrent.CountDownLatch

/**
* Returns the maximum number of threads that can be ran in parallel on the host system, rounded
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* Returns the maximum number of threads that can be ran in parallel on the host system, rounded
* Returns the maximum number of threads that can be run in parallel on the host system, rounded

:D

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After August 1 there's gonna be so many more "ran" in the codebases :P

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bonus points if one of the last things you do is write a lint check that catches this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh my God.

Oh no.

@@ -51,8 +51,12 @@ internal class RealRenderContext<PropsT, StateT, OutputT>(
* Used to:
* - prevent modifications to this object after [freeze] is called.
* - prevent sending to sinks before render returns.
*
* This is a [ThreadLocal] since we only care about preventing calls during rendering from the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And we're confident that we only ever call freeze() from the render thread? I can't think of any violations.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ya I can't think of RenderContext methods that would be called off main thread? will keep thinking.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can add another comment to freeze saying that's a requirement, but yea we only call it from one place.

Copy link
Collaborator Author

@zach-klippenstein zach-klippenstein Jul 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And to @steve-the-edwards reply to this comment (which doesn't show up in the thread for some reason, fucking github), it doesn't matter if we call it on the main thread or not, or even (to Ray's question) if they're called only from render, we just need to make sure freeze and unfreeze are always called symmetrically from the same thread (i.e. when one thread unfreezes, the same thread later freezes).

If arbitrary RenderContext methods are called from various threads, and that code isn't already thread-safe, then it's already broken since (1) frozen is not a thread synchronization mechanism and (2) even if it were, we mutate internal data structures outside of the unfrozen section. actionSink.send is thread-safe since it just sends to a channel. But i think we were catching that case with the default value of frozen before, which we're not now, so I need to rethink that. Maybe a separate flag for whether sending to a sink is allowed.

})

val testDispatcher = newFixedThreadPoolContext(nThreads = testThreadCount, name = "test")
testDispatcher.use {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this API and why do we need it? is this like a resource API for the threads underneath this disaptcher?

Copy link
Collaborator Author

@zach-klippenstein zach-klippenstein Jul 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a generic helper on Closeable: https://kotlinlang.org/api/core/kotlin-stdlib/kotlin.io/use.html

This thread pool is such, and the kdoc is extremely insistent about the fact that there are resources owned by this dispatcher and it must be closed to avoid leaks.

*/
internal actual class ThreadLocal<T>(
private val initialValue: () -> T
) : NSObject(), NSCopyingProtocol {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes please.

@@ -51,8 +51,12 @@ internal class RealRenderContext<PropsT, StateT, OutputT>(
* Used to:
* - prevent modifications to this object after [freeze] is called.
* - prevent sending to sinks before render returns.
*
* This is a [ThreadLocal] since we only care about preventing calls during rendering from the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ya I can't think of RenderContext methods that would be called off main thread? will keep thinking.

import kotlin.test.Test
import kotlin.test.assertEquals

class ThreadLocalTest {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jamieQ and whoever else to review this as well then


@Test fun set_fromDifferentThreads_doNotConflict() {
val threadLocal = ThreadLocal(initialValue = { 0 })
val threadStartedLatch = NSLock().apply { lock() }
Copy link

@jamieQ jamieQ Jul 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does apply { ... } do?


edit: the AI tells me it runs the closure on self and returns the instance – so, effectively is a configuration block

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yup, config block.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Equivalent to this:

val threadStartedLatch = NSLock()
threadStartedLatch.lock()

val secondReadLatch = NSLock().apply { lock() }

val thread = NSThread {
threadStartedLatch.unlock()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm a bit surprised this works without some assertion failure, as the lock API typically require that unlocks happen on the same thread that acquired the lock.

i'd like to offer some possible alternatives, but i'm having trouble figuring out where these 'platform' imports are defined, so am not sure what's available (e.g. semaphores might be the way to go here). anyone know where i might try looking?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

platform is just the prefix used for what is bundled by default as a prebuilt framework with the kotlin native interop.

So this is the Foundation framework accessed through those prebuilts. (https://www.jetbrains.com/help/kotlin-multiplatform-dev/multiplatform-ios-dependencies.html)

There are a bunch of others available, e.g.:
image

Though your point about Semaphore does make me think - why don't we just use the kotlinx.coroutines Semaphore or Mutex that are already in common kotlin code shared on platforms.

I looked those up and they use kotlinx.atomicfu for synchronization - e.g., https://github.com/Kotlin/kotlinx-atomicfu?tab=readme-ov-file#locks for Locks.

I think that's likely better tested and supported tthen rolling our own?

Was there another reason to roll our own Zach?

Copy link
Collaborator Author

@zach-klippenstein zach-klippenstein Jul 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we pair on this? Anything in Cocoa should be available I think, but I'm not sure what the exact rules are. I will also make this test fail to ensure that it's actually running, although when I run it from the IDE it says it runs and passes.

Copy link
Collaborator Author

@zach-klippenstein zach-klippenstein Jul 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't coroutines code, we need actual thread-level synchronization here. I will try using atomicfu but it was a headache last time I tried, it's a compiler plugin so it's a bit more complex but maybe they've ironed out the kinks.

Also AFAIK atomicfu doesn't have anything like ThreadLocal, so we'll have to roll our own for that anyway — I think. @jamieQ from what I found, Cocoa doesn't have ThreadLocal either, just this threadDictionary thing, but I would be very happy to be wrong about this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good poitn about not beign able to use the coroutine synchronization 🤦🏻 .

But yes, something lower level like atomicfu might be good. if not, and you come up with better actuals with @jamieQ then that's great!

Copy link
Collaborator Author

@zach-klippenstein zach-klippenstein Jul 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there's no native ThreadLocal, then TBH i'm not too worried about using (the correct) explicit Apple type for a lock over atomicfu - a lock is a lock, ultimately it's not that complicated. The tests here are for the ThreadLocal implementation, I didn't write (and don't intend to) write any tests for the lock expect/actuals.

I looked at Compose and they use every one of these techniques – KMP uses atomicfu SynchronizedObject, but there's unix-specific code in AOSP that even manually builds a lock type on atomics. I really don't want to do that, ever.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants