Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make JSON.generate 1.75x as fast #562

Closed
wants to merge 10 commits into from
Closed

Conversation

mame
Copy link
Member

@mame mame commented Dec 27, 2023

This PR speeds up JSON.generate by approximately 1.75x (485k instructions per second -> 840k instructions per second) for the benchmark of Oj. This makes JSON.generate nearly as fast as Oj.dump.

Before:

$ ruby --yjit -Ilib -I ext oj-bench.rb
** Oj version 3.16.3 **
ruby 3.4.0dev (2023-12-27T05:30:20Z master 862cfcaf75) +YJIT [x86_64-linux]
Warming up --------------------------------------
             Oj.dump    83.089k i/100ms
    Oj.dump [compat]    72.410k i/100ms
     Oj.dump [rails]    57.698k i/100ms
       JSON.generate    49.706k i/100ms
Calculating -------------------------------------
             Oj.dump    836.635k (± 0.4%) i/s -     12.630M in  15.095866s
    Oj.dump [compat]    718.031k (± 0.2%) i/s -     10.789M in  15.026031s
     Oj.dump [rails]    573.882k (± 0.3%) i/s -      8.655M in  15.081085s
       JSON.generate    484.650k (± 1.0%) i/s -      7.307M in  15.078021s

After:

$ ruby --yjit -Ilib -I ext oj-bench.rb
** Oj version 3.16.3 **
ruby 3.4.0dev (2023-12-27T05:30:20Z master 862cfcaf75) +YJIT [x86_64-linux]
Warming up --------------------------------------
             Oj.dump    83.349k i/100ms
    Oj.dump [compat]    71.605k i/100ms
     Oj.dump [rails]    57.058k i/100ms
       JSON.generate    84.498k i/100ms
Calculating -------------------------------------
             Oj.dump    837.304k (± 0.3%) i/s -     12.586M in  15.031372s
    Oj.dump [compat]    718.657k (± 0.5%) i/s -     10.812M in  15.045573s
     Oj.dump [rails]    565.824k (± 0.3%) i/s -      8.502M in  15.025348s
       JSON.generate    839.614k (± 0.3%) i/s -     12.675M in  15.096044s

This PR consists of the following several improvements.

  • Drop prebuild of array_delim, etc.

    • array_delim is usually a single comma character. Using memcpy to copy a single character was inefficient.
    • It is much faster to output a comma and (optional) array_nl separately without prebuild.
    • This improved the speed by about 24%, from 480k i/s to 593k i/s.
  • Use faster Ruby API for encoding checks.

    • This improved the speed by 12%, from 593k i/s to 665k i/s.
  • Use a fast path when string escaping is not needed.

    • This improves the performance by 16%, from 665ki/s to 770k i/s.
  • Use faster Ruby API for dispatching the class of objects.

    • This improved performance by 5%, from 770k i/s to 806k i/s.
  • Use generate_json_string for object keys.

    • Since object keys are already verified to be String, using generate_json in general dispatch was an unnecessary overhead.
    • This improved the performance by 3%, from 806k i/s to 830k i/s.
  • Use faster Ruby API for reading array elements.

    • This improved the performance by about 4%, from 830k i/s to 854k i/s.

I made them into one PR because I thought separating this to multiple PRs would bring many conflicts between PRs. However, if you want me to do so, feel free to let me know.

The purpose of this change is to exploit `fbuffer_append_char` that is
faster than `fbuffer_append`.

`array_delim` was a buffer that concatenated a single comma with
`array_nl`. However, in the typical use case (`JSON.generate(data)`),
`array_nl` is empty. This means that `array_delim` was a
single-character buffer in many cases.

`fbuffer_append(buffer, array_delim)` used `memcpy` to copy one byte,
which was not so efficient.
Rather, this change uses `fbuffer_append_char(buffer, ',')` and then
`fbuffer_append(buffer, array_nl)` only when `array_nl` is not NULL.

This speeds up `JSON.generate` by about 9% in a benchmark.
This speeds up `JSON.generate` by about 4% in a benchmark
Also, remove static functions that are no longer used.

This speeds up `JSON.generate` by about 5% in a benchmark.
This speeds up `JSON.generate` by about 4% in a benchmark.
This speeds up `JSON.generate` by about 12% in a benchmark.
... instead of `rb_enc_str_asciionly_p`.
If escaping is not needed, we can use `fbuffer_append` directly, which
is much faster.

This speeds up `JSON.generate` by about 16% in a benchmark.
Dispatching based on Ruby's VALUE structure is more efficient than
simply cascaded "if ... else if ..." checks.

This speeds up `JSON.generate` by about 5% in a benchmark.
... instead of `generate_json`.

Since the object key is already confirmed to be a string, using a
generic dispatch function brings an unnecessary overhead.

This speeds up `JSON.generate` by about 3% in a benchmark.
It is safe to use `RARRAY_AREF` here because no Ruby code is executed
between `RARRAY_LEN` and `RARRAY_AREF`.

This speeds up `JSON.generate` by about 4% in a benchmark.
@mame
Copy link
Member Author

mame commented Dec 27, 2023

Note: I got oj-bench.rb from this article.

@hsbt
Copy link
Member

hsbt commented Dec 27, 2023

I will re-run https://github.com/flori/json/actions/runs/7336797514/job/19976695584?pr=562 after supporting Ruby 3.3 at ruby/setup-ruby.

@Earlopain
Copy link

I'd love to see this merged, @hsbt could you take another look now that Ruby 3.3 is properly released?


for (p = ptr; (unsigned long)(p - ptr) < len;) {
need_escape |= escapeTable[(int)*p++];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
need_escape |= escapeTable[(int)*p++];
need_escape |= escapeTable[(int)*p++];
if (need_escape) break;

@byroot
Copy link
Member

byroot commented Oct 17, 2024

I cherry picked and rebased all these changes, except for the "escape table pre-check" patch which I reimplemented in a bit of a different way in #620.

Thanks again @mame !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants