Summary
When MongoShake reads incremental updates via change_stream, an update event can contain updateDescription.truncatedArrays with empty updatedFields and empty removedFields. This can happen for aggregation-pipeline updates that shrink an array.
The change stream conversion path currently appears to build the replay oplog object only from updatedFields and removedFields. If both are empty, the generated update object is empty. The executor then treats update objects without a $ operator as replacement updates, so the target document can be replaced by an empty replacement document and end up preserving only _id.
This causes source/target divergence and target-side field loss.
Environment
- MongoShake version observed: v2.8.4
- Code inspection suggests the same gap exists on current
develop / recent v2.8.x code paths.
- Relevant mode/config shape:
sync_mode = incr
incr_sync.mongo_fetch_method = change_stream
incr_sync.change_stream.watch_full_document = false
tunnel = direct
Minimal reproduction shape
Start with the same document on source and target:
db.items.insertOne({
_id: 1,
arr: ["a", "b"],
keep: "value"
})
Apply an aggregation-pipeline update that truncates the array:
db.items.updateOne(
{ _id: 1 },
[
{ $set: { arr: { $slice: ["$arr", 1] } } }
]
)
MongoDB may emit a change stream update event shaped like:
{
operationType: "update",
documentKey: { _id: 1 },
updateDescription: {
updatedFields: {},
removedFields: [],
truncatedArrays: [
{ field: "arr", newSize: 1 }
]
}
}
Expected behavior
MongoShake should preserve the other fields and apply the array truncation on the target:
{ _id: 1, arr: ["a"], keep: "value" }
Actual behavior
Because the change stream update conversion ignores truncatedArrays, the generated replay update object can be empty. The executor then follows the replacement path for update objects that do not contain a $ operator, so the target document can become effectively:
Code path
The relevant current code paths appear to be:
oplog/change_stream_event.go: update conversion handles updatedFields and removedFields, but not updateDescription.truncatedArrays.
executor/db_writer_single.go and executor/db_writer_bulk.go: update objects without a $ operator are replayed through replacement semantics.
- The oplog fetch path already has
$v:2 diff conversion logic for array truncation, but the change stream path bypasses that representation.
Suggested fix
Possible fixes:
- Parse
updateDescription.truncatedArrays in the change stream update conversion path.
- Convert each truncated array entry into a safe target update, for example a pipeline update using
$slice, or an equivalent update expression that truncates the array without replacing the whole document.
- Add a guard so an update event converted to an empty replay object cannot be silently executed as a replacement update.
- Add a regression test for a change stream update whose only change is
updateDescription.truncatedArrays.
MongoDB documents updateDescription.truncatedArrays as part of update change events, so MongoShake should either replay it correctly or fail safely instead of replacing the target document.
Summary
When MongoShake reads incremental updates via
change_stream, an update event can containupdateDescription.truncatedArrayswith emptyupdatedFieldsand emptyremovedFields. This can happen for aggregation-pipeline updates that shrink an array.The change stream conversion path currently appears to build the replay oplog object only from
updatedFieldsandremovedFields. If both are empty, the generated update object is empty. The executor then treats update objects without a$operator as replacement updates, so the target document can be replaced by an empty replacement document and end up preserving only_id.This causes source/target divergence and target-side field loss.
Environment
develop/ recent v2.8.x code paths.sync_mode = incrincr_sync.mongo_fetch_method = change_streamincr_sync.change_stream.watch_full_document = falsetunnel = directMinimal reproduction shape
Start with the same document on source and target:
Apply an aggregation-pipeline update that truncates the array:
MongoDB may emit a change stream update event shaped like:
Expected behavior
MongoShake should preserve the other fields and apply the array truncation on the target:
Actual behavior
Because the change stream update conversion ignores
truncatedArrays, the generated replay update object can be empty. The executor then follows the replacement path for update objects that do not contain a$operator, so the target document can become effectively:Code path
The relevant current code paths appear to be:
oplog/change_stream_event.go: update conversion handlesupdatedFieldsandremovedFields, but notupdateDescription.truncatedArrays.executor/db_writer_single.goandexecutor/db_writer_bulk.go: update objects without a$operator are replayed through replacement semantics.$v:2diff conversion logic for array truncation, but the change stream path bypasses that representation.Suggested fix
Possible fixes:
updateDescription.truncatedArraysin the change stream update conversion path.$slice, or an equivalent update expression that truncates the array without replacing the whole document.updateDescription.truncatedArrays.MongoDB documents
updateDescription.truncatedArraysas part of update change events, so MongoShake should either replay it correctly or fail safely instead of replacing the target document.