Skip to content

Source mappings are incorrect if there are braille Unicode characters in source code comments #14733

Open
@lastchancedegen

Description

@lastchancedegen

Description

When using braille unicode characters (range: U+2800 to U+28ff), the AST source mapping information is incorrect.

Environment

  • Compiler version:
    Tried with several 0.8.* (including latest). I did not try with earlier versions.

I do not know if there are other unicode characters that trigger the bug apart from braille ones.

Steps to Reproduce

This is a contract containing braille unicode characters:

/**
⠴⠁⠀⠚⠀
*/

pragma solidity ^0.8.23;

contract Bad {

  uint256 num;

  constructor(uint256 _a) {
     num = _a;
  }
}

After compiling it:

solc --ast-compact-json  Bad.sol

We get the following source mapping for the constructor FunctionDefinition: "src": "85:44:0",.

We can quickly check with JS:

const fs = require('fs')

const name = process.argv[2]
const offset = process.argv[3]
const length = process.argv[4]

const source = fs.readFileSync(`${name}.sol`, 'utf-8')
const constructorSource = source.substring(offset, offset + length)

console.log(constructorSource);
node parse.js Bad 85 44
r(uint256 _a) {
     num = _a;
  }
}

The source is not correct. If we do the same with:

/**
  Non braille comment
*/

pragma solidity ^0.8.23;

contract Good {

  uint256 num;

  constructor(uint256 _a) {
     num = _a;
  }
}

After compiling, we get the following source mapping: "src": "92:44:0",.

Which is correct:

node parse.js Good 92 44
constructor(uint256 _a) {
     num = _a;
  }
}

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions