Skip to content

Conversation

stevefan1999-personal
Copy link

Just a very early heads up for my upcoming SLJIT integration...

I'm putting it up here because I find https://github.com/bnoordhuis/quickjit very intruging, but that embedded TCC as an intermediate representation would be so much of an overkill. I recently came across this (cross-platform btw) JIT library called https://github.com/zherczeg/sljit, which does a much better job than using TCC for IR.

The size for SLJIT is also tiny given if you strip all the guard check, verbose and debug code. A simple "Hello World" just take ~70KB in binary size, and that's considering the codegen and code allocator region creation as well...compared to 200KB+ with TCC. It is ton quicker than TCC because we don't have to go through the parsing phase, so we can just go straight into codegen.

While technically speaking, this PR should be cross-compatible to @bellard's https://github.com/bellard/quickjs, but he has clearly stated that he doesn't plan to add JIT in any time soon, so I decided to put the PR here since it should be more acceptable.

Progress: 5%. I just got SLJIT and Zydis slipped in as CMake subprojects and enabled the conditions. I can create the SLJIT compiler and generate x86 code (and use Zydis to see generated code body traces), but the absolute pain in the ass now is how to handle the ~230-ish opcode in the QuickJS virtual machine.

On the flip side, if I remembered correctly, I think SLJIT is also being used in PCRE2 as the JIT engine, as you can find a regex implementation in the SLJIT repo as well.

@stevefan1999-personal stevefan1999-personal marked this pull request as draft June 27, 2025 04:12
@saghul
Copy link
Contributor

saghul commented Jun 27, 2025

Nice! 👀

@bnoordhuis
Copy link
Contributor

but the absolute pain in the ass now is how to handle the ~230-ish opcode in the QuickJS virtual machine

That's why I picked tcc, because it let me (for the most part) just copy-and-paste code from quickjs.c :)

@bnoordhuis
Copy link
Contributor

By the way, you may want to read through the (very related) discussion in #659 if you haven't already, but to summarize:

A basic template JIT won't give huge speedups because it only removes opcode dispatch overhead (and that overhead is fairly small in quickjs.)

That doesn't mean it isn't worthwhile but one concern I have is duplicated logic between the interpreter and the JIT (another reason for picking tcc.)

@stevefan1999-personal
Copy link
Author

By the way, you may want to read through the (very related) discussion in #659 if you haven't already, but to summarize:

A basic template JIT won't give huge speedups because it only removes opcode dispatch overhead (and that overhead is fairly small in quickjs.)

That doesn't mean it isn't worthwhile but one concern I have is duplicated logic between the interpreter and the JIT (another reason for picking tcc.)

Yeah that's exactly why i was also thinking about getting a lightweight SSA engine based on SLJIT first, I just didn't reveal the plan. Turns out I need to read a lot of papers first

@aabbdev
Copy link

aabbdev commented Aug 25, 2025

Why not ship a copy-and-patch JIT for QuickJS?
No handwritten asm, portable across CPUs, and still very fast.

The idea (per Xu & Kjolstad) is to prebuild a big dictionary of stencils tiny binary code blocks with holes for immediates, stack offsets, and branch targets generated by a metacompiler. You write stencils in C/C++, let Clang -O3 do instruction selection and local opts, and record relocation spots. At runtime you just pattern-match bytecode/IR to stencil variants, memcpy them back-to-back, and patch the holes. Variants + pass-through params give you a cheap "register allocator," and supernodes (fused opcode sequences) fall out naturally so "opcode fusion makes JITing harder" isn't a blocker here, it’s literally the point.

Results from the paper: WebAssembly tier-1 compile 4.9×–6.5× faster than V8 Liftoff with 39-63% faster code; their high-level compiler beats LLVM -O0 by ~100× on compile time while running ~14% faster. This is a very direct path to pushing QuickJS toward V8-class baseline performance without building a full optimizing JIT.

Refs:
• Paper: https://arxiv.org/pdf/2011.13127

• Overview: https://sillycross.github.io/2022/11/22/2022-11-22/

• Talk: https://youtu.be/HxSHIpEQRjs

• Worked example: https://scot.tg/2024/12/22/worked-example-of-copy-and-patch-compilation/

Short alt (tweet/lead-in)

Copy-and-patch JIT for QuickJS: precompile stencil blocks with Clang, then at runtime copy+patch them (immediates/targets) and stitch with CPS. You get fused opcode blocks, cheap reg allocation via variants, and multi-× faster tier-1 with portable C/C++ no handwritten asm. (See Xu & Kjolstad; links inside.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants