-
Notifications
You must be signed in to change notification settings - Fork 8k
Mark interned strings as IS_STR_VALID_UTF8 on-the-fly #20100
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Alternative to phpGH-10870. Use atomic writes for adding the IS_STR_VALID_UTF8 flag to UTF-8-verified interned strings in ext-mbstring. x86 and other architectures guarantee atomic writes/reads for aligned variables up to size_t, which we already rely on, particularly for zend_op.handler being swapped out in the JIT. The atomic write is only needed here to not drop any other newly written bits (which there currently aren't any of). We use GCC and sync atomics because they don't require annotating the modified variable with the C11 _Atomic keyword.
Could ... we export that as ZEND_API or inline header function in zend_string.h ... maybe? I do have quite some interest in using this API from extension code. |
I know it's early, but I believe that AcqRel consistency should be good enough and full sequential consistency is "too much". |
I'm open to it. The only gotcha is that
I'm open to it. FWIU even |
It's been a few years since I worked on issues related to the |
@alexdowad Data races. Interned strings are shared across processes and threads. If two threads write to the same memory at the same time, the first write will be lost. It wouldn't be crucial in this case, and we actually don't even add any other bits, so a direct write would probably also be ok. Regardless, if we ever tried to do this with some other flag, it's better not to have to go hunt down data races. |
I am in favor of the original #10870. The reason is that functions like |
@alexdowad It just occured to me you might mean why we didn't go with GH-10870. The main reason is just that there are barely any other uses for this flag outside of mbstring (one more place in ext-dom), and I didn't feel like moving a bunch of code into core was warranted then (you asked for moving the SIMD-version of mbstring to core). I still don't mind that if you prefer. |
Alternative to GH-10870. Use atomic writes for adding the IS_STR_VALID_UTF8 flag to UTF-8-verified interned strings in ext-mbstring. x86 and other architectures guarantee atomic writes/reads for aligned variables up to size_t, which we already rely on, particularly for zend_op.handler being swapped out in the JIT. The atomic write is only needed here to not drop any other newly written bits (which there currently aren't any of). We use GCC and sync atomics because they don't require annotating the modified variable with the C11 _Atomic keyword.