-
Notifications
You must be signed in to change notification settings - Fork 159
Open
Labels
internalNo API-level changesNo API-level changes
Description
Something like
diff --git a/src/Data/Text/Encoding.hs b/src/Data/Text/Encoding.hs
index 239e15e..9b6a01c 100644
--- a/src/Data/Text/Encoding.hs
+++ b/src/Data/Text/Encoding.hs
@@ -436,7 +436,10 @@ encodeUtf8 (Text arr off len)
fp <- mallocByteString (len*3) -- see https://github.com/haskell/text/issues/194 for why len*3 is enough
withForeignPtr fp $ \ptr ->
with ptr $ \destPtr -> do
- c_encode_utf8 destPtr (A.aBA arr) (fromIntegral off) (fromIntegral len)
+ -- When ByteArray# is large enough, it is pinned.
+ -- Then we can use safe FFI calls, which we indeed want for big inputs!
+ (if len > 4096 then c_encode_utf8_safe else c_encode_utf8_unsafe)
+ destPtr (A.aBA arr) (fromIntegral off) (fromIntegral len)
newDest <- peek destPtr
let utf8len = newDest `minusPtr` ptr
if utf8len >= len `shiftR` 1
@@ -535,5 +538,8 @@ foreign import ccall unsafe "_hs_text_decode_utf8_state" c_decode_utf8_with_stat
foreign import ccall unsafe "_hs_text_decode_latin1" c_decode_latin1
:: MutableByteArray# s -> Ptr Word8 -> Ptr Word8 -> IO ()
-foreign import ccall unsafe "_hs_text_encode_utf8" c_encode_utf8
+foreign import ccall unsafe "_hs_text_encode_utf8" c_encode_utf8_unsafe
+ :: Ptr (Ptr Word8) -> ByteArray# -> CSize -> CSize -> IO ()
+
+foreign import ccall safe "_hs_text_encode_utf8" c_encode_utf8_safe
:: Ptr (Ptr Word8) -> ByteArray# -> CSize -> CSize -> IO ()
The benchmark suite doesn't have exercises for multithreaded programs.
As far as I understand the code now, encoding or decoding big Text
will prevent GC from running. Is that an issue, hard to tell.
E.g. cryptohash-sha256
branches on size: https://hackage.haskell.org/package/cryptohash-sha256-0.11.102.0/docs/src/Crypto.Hash.SHA256.html#c_sha256_update, (my diff above is probably written better in that way too)
Metadata
Metadata
Assignees
Labels
internalNo API-level changesNo API-level changes