Patch margin index splits Unicode surrogate pairs unexpectedly

When the function [`diff_match_patch.prototype.patch_addContext_`](https://github.com/google/diff-match-patch/blob/62f2e689f498f9c92dbc588c58750addec9b1654/javascript/diff_match_patch_uncompressed.js#L1619-L1630) adds context to a patch, it increments/decreases the index by a constant, `Patch_Margin = 4`. However, since JavaScript's `substring` function operates with UTF-16 code unit indexing, there's a chance that `Patch_Margin` may split a Unicode surrogate pair.

Consider the following example:

```js
import diff_match_patch from "diff-match-patch";

console.log(
  JSON.stringify(
    new diff_match_patch().patch_make("🧮 **a", "🧮 **")[0].diffs[0][1],
  )
);
```

The output is `"\uddee **"` (🧮 corresponds to `"\ud83e\uddee"`).

If you attempt to use `diff_match_patch.patch_obj.prototype.toString` on this patch, it leads to a crash. [`encodeURI`](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/encodeURI) will throw a `URIError` if URI contains a lone surrogate.

```js
import diff_match_patch from "diff-match-patch";

const diff = new diff_match_patch();

console.log(
  JSON.stringify(
    diff.patch_toText(diff.patch_make("🧮 **a", "🧮 **")) // URIError: URI malformed
  )
);
```

A straightforward solution might involve adding a verification step after applying `Patch_Margin` to ensure the indices remain valid. I can start a PR, but I've noticed that `Patch_Margin` is used in many places, and I'm unsure about the best way to make changes.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Patch margin index splits Unicode surrogate pairs unexpectedly #149

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Participants

Patch margin index splits Unicode surrogate pairs unexpectedly #149

Description

Activity

dmsnell commented on Oct 28, 2023

jcubic commented on Dec 20, 2023

keizo commented on Jan 4, 2024

dmsnell commented on Jan 4, 2024

keizo commented on Jan 4, 2024

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Participants

Issue actions