Skip to content

Conversation

@samyron
Copy link
Contributor

@samyron samyron commented Oct 31, 2025

This PR uses SWAR on little endian architectures to recognize and parse eight consecutive ASCII digits. This seems to have positive performance improvements when parsing long (but not too long) integers and floats.

The unsigned 32 bit integer parsing data was created with the following:

File.write("integers-rand-unsigned-32bits.json", JSON.generate((1..10000).map { rand(4294967295) }))

The integer parsing was created witht he following:

File.write("integers.json", JSON.generate((1..10000).map { rand(18446744073709551615) }))

The benchmarks below are using a Macbook Air M1.

This branch compared to master

Run 1

== Parsing float parsing (2251051 bytes)
ruby 3.4.2 (2025-02-15 revision d2930f8e7a) +YJIT +PRISM [arm64-darwin24]
Warming up --------------------------------------
               after    17.000 i/100ms
Calculating -------------------------------------
               after    186.020 (± 1.6%) i/s    (5.38 ms/i) -    935.000 in   5.027877s

Comparison:
              before:      164.3 i/s
               after:      186.0 i/s - 1.13x  faster


== Parsing integer parsing (204025 bytes)
ruby 3.4.2 (2025-02-15 revision d2930f8e7a) +YJIT +PRISM [arm64-darwin24]
Warming up --------------------------------------
               after   123.000 i/100ms
Calculating -------------------------------------
               after      1.248k (± 0.9%) i/s  (801.04 μs/i) -      6.273k in   5.025278s

Comparison:
              before:     1167.0 i/s
               after:     1248.4 i/s - 1.07x  faster


== Parsing unsigned 32 bit integer parsing (107355 bytes)
ruby 3.4.2 (2025-02-15 revision d2930f8e7a) +YJIT +PRISM [arm64-darwin24]
Warming up --------------------------------------
               after   862.000 i/100ms
Calculating -------------------------------------
               after      8.593k (± 1.4%) i/s  (116.38 μs/i) -     43.100k in   5.016926s

Comparison:
              before:     6805.6 i/s
               after:     8592.6 i/s - 1.26x  faster

Run 2

== Parsing float parsing (2251051 bytes)
ruby 3.4.2 (2025-02-15 revision d2930f8e7a) +YJIT +PRISM [arm64-darwin24]
Warming up --------------------------------------
               after    17.000 i/100ms
Calculating -------------------------------------
               after    183.029 (± 2.2%) i/s    (5.46 ms/i) -    918.000 in   5.017972s

Comparison:
              before:      160.3 i/s
               after:      183.0 i/s - 1.14x  faster


== Parsing integer parsing (204025 bytes)
ruby 3.4.2 (2025-02-15 revision d2930f8e7a) +YJIT +PRISM [arm64-darwin24]
Warming up --------------------------------------
               after   123.000 i/100ms
Calculating -------------------------------------
               after      1.234k (± 5.4%) i/s  (810.62 μs/i) -      6.150k in   5.006831s

Comparison:
              before:     1139.1 i/s
               after:     1233.6 i/s - same-ish: difference falls within error


== Parsing unsigned 32 bit integer parsing (107355 bytes)
ruby 3.4.2 (2025-02-15 revision d2930f8e7a) +YJIT +PRISM [arm64-darwin24]
Warming up --------------------------------------
               after   847.000 i/100ms
Calculating -------------------------------------
               after      8.534k (± 1.7%) i/s  (117.17 μs/i) -     43.197k in   5.063015s

Comparison:
              before:     6760.0 i/s
               after:     8534.5 i/s - 1.26x  faster

This branch compared to other libraries

== Parsing float parsing (2251051 bytes)
ruby 3.4.2 (2025-02-15 revision d2930f8e7a) +PRISM [arm64-darwin24]
Warming up --------------------------------------
                json    17.000 i/100ms
          json_coder    18.000 i/100ms
                  oj     2.000 i/100ms
          Oj::Parser     2.000 i/100ms
           rapidjson    20.000 i/100ms
Calculating -------------------------------------
                json    182.670 (± 0.5%) i/s    (5.47 ms/i) -    918.000 in   5.025684s
          json_coder    189.703 (± 0.5%) i/s    (5.27 ms/i) -    954.000 in   5.029077s
                  oj     24.049 (± 0.0%) i/s   (41.58 ms/i) -    122.000 in   5.073862s
          Oj::Parser     28.384 (± 0.0%) i/s   (35.23 ms/i) -    142.000 in   5.003921s
           rapidjson    219.182 (± 0.9%) i/s    (4.56 ms/i) -      1.100k in   5.019120s

Comparison:
                json:      182.7 i/s
           rapidjson:      219.2 i/s - 1.20x  faster
          json_coder:      189.7 i/s - 1.04x  faster
          Oj::Parser:       28.4 i/s - 6.44x  slower
                  oj:       24.0 i/s - 7.60x  slower


== Parsing integer parsing (204025 bytes)
ruby 3.4.2 (2025-02-15 revision d2930f8e7a) +PRISM [arm64-darwin24]
Warming up --------------------------------------
                json   122.000 i/100ms
          json_coder   124.000 i/100ms
                  oj    97.000 i/100ms
          Oj::Parser    28.000 i/100ms
           rapidjson   214.000 i/100ms
Calculating -------------------------------------
                json      1.233k (± 0.6%) i/s  (811.23 μs/i) -      6.222k in   5.047603s
          json_coder      1.237k (± 0.7%) i/s  (808.46 μs/i) -      6.200k in   5.012715s
                  oj    971.156 (± 0.6%) i/s    (1.03 ms/i) -      4.947k in   5.094138s
          Oj::Parser    280.331 (± 1.4%) i/s    (3.57 ms/i) -      1.428k in   5.094857s
           rapidjson      2.166k (± 0.6%) i/s  (461.78 μs/i) -     10.914k in   5.040056s

Comparison:
                json:     1232.7 i/s
           rapidjson:     2165.5 i/s - 1.76x  faster
          json_coder:     1236.9 i/s - same-ish: difference falls within error
                  oj:      971.2 i/s - 1.27x  slower
          Oj::Parser:      280.3 i/s - 4.40x  slower


== Parsing unsigned 32 bit integer parsing (107355 bytes)
ruby 3.4.2 (2025-02-15 revision d2930f8e7a) +PRISM [arm64-darwin24]
Warming up --------------------------------------
                json   860.000 i/100ms
          json_coder   868.000 i/100ms
                  oj   390.000 i/100ms
          Oj::Parser   737.000 i/100ms
           rapidjson   565.000 i/100ms
Calculating -------------------------------------
                json      8.353k (± 5.2%) i/s  (119.72 μs/i) -     42.140k in   5.061476s
          json_coder      8.508k (± 3.2%) i/s  (117.54 μs/i) -     43.400k in   5.106714s
                  oj      3.789k (± 2.4%) i/s  (263.90 μs/i) -     19.110k in   5.046051s
          Oj::Parser      7.611k (± 4.0%) i/s  (131.39 μs/i) -     38.324k in   5.043984s
           rapidjson      5.673k (± 1.8%) i/s  (176.26 μs/i) -     28.815k in   5.080640s

Comparison:
                json:     8352.7 i/s
          json_coder:     8507.9 i/s - same-ish: difference falls within error
          Oj::Parser:     7610.9 i/s - 1.10x  slower
           rapidjson:     5673.3 i/s - 1.47x  slower
                  oj:     3789.3 i/s - 2.20x  slower

@byroot byroot force-pushed the sm/swar-integer-parsing branch from a9c9a22 to 5b76dc4 Compare October 31, 2025 07:31
Copy link
Member

@byroot byroot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this a lot. But I think I'll refactor it a bit to reduce duplication, and potentially improve code generation.

static inline int has_eight_consecutive_digits(const char *p) {
uint64_t val;
memcpy(&val, p, sizeof(uint64_t));
return (((val & 0xF0F0F0F0F0F0F0F0) | (((val + 0x0606060606060606) & 0xF0F0F0F0F0F0F0F0) >> 4)) == 0x3333333333333333);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking we could use that trick combined with clz (or similar) to know how many consecutive digits we have.

I suspect 8 consecutive digits aren't that common, but if we also had a 4 digits (uint32_t) version and a fast dispatch, that could help on more benchmarks.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I think we can simply do (comp & 0xFFFFFFFF) == 0x33333333 to check for 4 consecutive digits.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

4bytes version:

static inline uint32_t parse_four_digits_unrolled(const char *p) {
    uint64_t large_val;
    memcpy(&large_val, p, sizeof(uint64_t));
    uint32_t val = (uint32_t)large_val;

    const uint32_t mask = 0x000000FF;
    const uint32_t mul1 = 100;
    val -= 0x30303030;
    val = (val * 10) + (val >> 8); // val = (val * 2561) >> 8;
    val = ((val & mask) * mul1) + (((val >> 16) & mask));
    return (uint32_t)val;
}

@byroot byroot mentioned this pull request Nov 1, 2025
byroot added a commit to byroot/json that referenced this pull request Nov 1, 2025
Closes: ruby#878

```
== Parsing float parsing (2251051 bytes)
ruby 3.4.6 (2025-09-16 revision dbd83256b1) +YJIT +PRISM [arm64-darwin24]
Warming up --------------------------------------
               after    23.000 i/100ms
Calculating -------------------------------------
               after    214.382 (± 0.5%) i/s    (4.66 ms/i) -      1.081k in   5.042555s

Comparison:
              before:      189.5 i/s
               after:      214.4 i/s - 1.13x  faster
```

Co-Authored-By: Scott Myron <[email protected]>
byroot added a commit to byroot/json that referenced this pull request Nov 1, 2025
Closes: ruby#878

```
== Parsing float parsing (2251051 bytes)
ruby 3.4.6 (2025-09-16 revision dbd83256b1) +YJIT +PRISM [arm64-darwin24]
Warming up --------------------------------------
               after    23.000 i/100ms
Calculating -------------------------------------
               after    214.382 (± 0.5%) i/s    (4.66 ms/i) -      1.081k in   5.042555s

Comparison:
              before:      189.5 i/s
               after:      214.4 i/s - 1.13x  faster
```

Co-Authored-By: Scott Myron <[email protected]>
@byroot byroot closed this in #885 Nov 1, 2025
matzbot pushed a commit to ruby/ruby that referenced this pull request Nov 1, 2025
Closes: ruby/json#878

```
== Parsing float parsing (2251051 bytes)
ruby 3.4.6 (2025-09-16 revision ruby/json@dbd83256b1) +YJIT +PRISM [arm64-darwin24]
Warming up --------------------------------------
               after    23.000 i/100ms
Calculating -------------------------------------
               after    214.382 (± 0.5%) i/s    (4.66 ms/i) -      1.081k in   5.042555s

Comparison:
              before:      189.5 i/s
               after:      214.4 i/s - 1.13x  faster
```

ruby/json@6348ff0891

Co-Authored-By: Scott Myron <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants