When the function diff_match_patch.prototype.patch_addContext_ adds context to a patch, it increments/decreases the index by a constant, Patch_Margin = 4. However, since JavaScript's substring function operates with UTF-16 code unit indexing, there's a chance that Patch_Margin may split a Unicode surrogate pair.
Consider the following example:
import diff_match_patch from "diff-match-patch";
console.log(
JSON.stringify(
new diff_match_patch().patch_make("🧮 **a", "🧮 **")[0].diffs[0][1],
)
);
The output is "\uddee **" (🧮 corresponds to "\ud83e\uddee").
If you attempt to use diff_match_patch.patch_obj.prototype.toString on this patch, it leads to a crash. encodeURI will throw a URIError if URI contains a lone surrogate.
import diff_match_patch from "diff-match-patch";
const diff = new diff_match_patch();
console.log(
JSON.stringify(
diff.patch_toText(diff.patch_make("🧮 **a", "🧮 **")) // URIError: URI malformed
)
);
A straightforward solution might involve adding a verification step after applying Patch_Margin to ensure the indices remain valid. I can start a PR, but I've noticed that Patch_Margin is used in many places, and I'm unsure about the best way to make changes.