Replies: 3 comments 1 reply
-
For in-memory non-target global objects, the object itself is hashed in memory. For functions, the function body is deparsed (to return a character string) and then the character string is hashed. For actual targets, the file itself is hashed, not the object in memory. If you have a custom deterministic way of saving parquet files, then |
Beta Was this translation helpful? Give feedback.
-
Thanks! Unfortunately we can't control the non-determinism, and if the on-disk targets are hashed they will be different. This prevents me from going down another rabbit hole, though. This should eventually be fixed on the arrow side (if I can fight my way through enough C++ for this PR 😄 apache/arrow#40392) |
Beta Was this translation helpful? Give feedback.
-
Oh, a follow-up, which is related to #1244, as well: does any hashing or serialization occur in |
Beta Was this translation helpful? Give feedback.
-
Help
Description
Quick question: Does
targets
hash target objects in-memory before serialization to files, or does it hash the files of objects after serialization?We recently discovered a source of non-determinism for parquet files (apache/arrow#40361), which helped us understand why we got different target hashes for parquet files written on different platforms. In this case, we were using
format = "file"
, writing to disk inside our functions. I assume in this case serialization of the file on-disk is performed. We were wondering if instead we used a target "parquet" format or custom format of an object we would expect to have identical hashes, which would be the case if hashing occurred in-memory before serialization.@emmamendelsohn
Beta Was this translation helpful? Give feedback.
All reactions