Skip to content

Auto Fix Issue : herb-lint --fix corrupts source when a multi-byte UTF-8 char precedes a second <%= "#{...}" %> offence #1761

@stevegeek

Description

@stevegeek

Input:

<%= "#{first}  – " %>
<%= "#{second} - " %>

Output:

<%= first %><%= cond} %> -

Expected:

<%= first %><%= second %> -

Playground:

That input snippet, when put in the playground shows the issue when you "Auto Fix" it.

Additional context:

The trigger is a multi-byte UTF-8 char (the em-dash , of course claude would add that!) in the first ERB; prism reports byte offsets but String.prototype.substring indexes UTF-16 code units, so any expression after a multi-byte char gets sliced from the wrong window.

Other call sites with same issue

Same source.substring(prismNode.location.startOffset, ...) pattern elsewhere in @herb-tools/linter:

  • erb-prefer-direct-output — has autofix — corrupts source via autofix. With em-dash before offence: expected <%= second %>, actual <%= cond} %>.
  • erb-no-debug-output — no autofix — wrong text in error message. Expected pp @foo, actual @foo %.
  • erb-no-unused-expressions — no autofix — wrong text in error message. Expected @foo.bar, actual oo.bar %.
  • erb-no-unused-literals — no autofix — wrong text in error message. Expected "hello", actual ello" %.

Proposed fix

Convert byte offsets to code-unit offsets at the slice boundary, eg a sliceByteRange(source, start, length) shared helper so every rule that slices a JS string with prism byte offsets goes through one safe path.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions