ECS-like Node storage likely to be cache unfriendly #75

aapoalas · 2025-07-19T17:11:39Z

aapoalas
Jul 19, 2025

Hello Mr. Klabnik,

My apologies for barging in like this. I hope this is not badly received, but I was reading your technical decisions / architectural documentation and the "ECS-inspired design with separate arrays" popped out to me as a possible problem. It's likely that it will prove to be a cache-unfriendly data structure for the AST, instead of being cache-friendly like you list in the rationale.

You might have also seen a recent DConf talk creating a data-oriented parser, AST, and visitor generator program. They performed an ECS transformation on the AST node storage and found it reduced performance by 20%. The reason seems to be fairly clear: they have more cache misses after the ECS transformation.

The reason is (likely) that the common case for the AST is to read the data of the next node, regardless of what its type is. With the ECS transformation done to separate node data based on the node type, your memory layout optimises for reading the next node of the same type. Having multiple nodes of the same type one after the another is a rare, exceptional case so this will likely lead to a lot of cache misses.

The way you'd likely want to perform the ECS transformation instead is to take the Vec<ASTNode> and transform that into something like this:

#[repr(u8)]
enum NodeType {
  First,
  Second,
}

// example
struct CommonNodeDataA {
  offset: u32,
}

struct CommonNodeDataB {
  other: ???,
}

union UniqueNodeDataA {
  first: NodeIndex, // u32 newtype
  second: SecondUniqueData // u32 newtype
}

union UniqueNodeDataB {
  first: (), // doesnt have second unique data
  second: SpillIndex, // u32 newtype index to spill_data
}

struct AST {
  nodes: SoAVec<(NodeType, CommonNodeDataA, CommonNodeDataB, UniqueNodeA, UniqueNodeDataB)>,
  spill_data: Vec<u8>,
}

ie. the AST nodes are split into a single Struct-of-Arrays vector with the node types in one array, common node data in another array¹, and then type-specific node data in yet other arrays². A final, separate spill vector is kept on the side for any time a node needs a variable number of data, or just needs more data than can be fit into the static ones³, and in these cases the static slots contain an index into the spill vector whence the AST node can read the rest of its data (the spill vector index can start with a dynamic length to tell how much data the AST node should read). (This is just copying Andrew Kelley / Ziglang's AST structure, see Practical DOD for details.)

With this, the memory layout of the AST should be a lot cache friendlier: when an AST node's type is read, the next one's type is the next byte over. Likewise, the common data for both nodes are one after the other, as are the unique / type-dependent data. Extracting the unique data does now require unsafe {} (or you could rewrite the unique data to be just u32s or such, reinterpreted based on the tag for less unsafe usage but the result is much the same), but it should be pretty easy to keep this completely on the up-and-up. Finally, spill data location becomes harder to predict for the CPU but it should be both rare, and cache local: the previous spill data ends where the next one begins, so there's still a good chance that the next node needing spill data will find its spill data hot in the cache even though there may have been multiple AST nodes without spill data between it and the previous node with spillage.

There; I've said my piece. As said, I hope this is not received badly: I absolutely do not aim to disparage or make fun of your ECS transformation (or plans for one? I didn't see it in the code so I assume this is planning-stages for now). Much rather, I am very much in favour of it and want it to succeed! I may even offer my meager aid to the effort at some point, as I am in need of a SoAVec type that would, preferably, work without (much) macros. But, I couldn't help but notice that the recent DConf talk had had a negative result with precisely this kind of "vector per node type" transformation and I hope you won't end up finding the same result.

Best regards,
-Aapo Alasuutari

Footnotes

I've split the common data into two arrays here; whether it actually makes sense to split is hard to tell and would need to be measured to really know for certain. But eg. splitting the offset or line&column data off from the rest may well make sense, as those are presumably only really accessed during error recovery and reporting. ↩
Here it is very likely that at least two or maybe three arrays makes sense: some common node types will only need one or two "data slots", so fusing all data slots together into a single union would introduce wasted padding in those cases. ↩
If a few relatively rare node types would require eg. five slots, it probably makes sense to move their extra data into the spill section rather than introduce five static arrays, something like half of which are mostly empty because other nodes don't need those "slots". ↩

steveklabnik · 2025-07-21T19:02:23Z

steveklabnik
Jul 21, 2025
Maintainer

My apologies for barging in like this. I hope this is not badly received, but I was reading your technical decisions / architectural documentation and the "ECS-inspired design with separate arrays" popped out to me as a possible problem.

Hey hey! It's not a problem at all! 100% of this is just like, vibes and suggested ideas, I am not married to anything at all, and also, what's in the repo may not even live up to the plan. A lot of the stuff you read is like, a mental dump of "stuff I want to try out," and so hearing about what may work and what may cause problems is 100% welcome, and I appreciate it.

I'm going to actually read your comment now, but I mostly just first wanted to say that, because it is important on its own.

0 replies

steveklabnik · 2025-07-21T19:12:39Z

steveklabnik
Jul 21, 2025
Maintainer

The reason is (likely) that the common case for the AST is to read the data of the next node, regardless of what its type is. With the ECS transformation done to separate node data based on the node type, your memory layout optimises for reading the next node of the same type. Having multiple nodes of the same type one after the another is a rare, exceptional case so this will likely lead to a lot of cache misses.

I'm glad someone already tried this! This was something I was wondering about.

(This is just copying Andrew Kelley / Ziglang's AST structure, see Practical DOD for details.)

Yes! I recently watched this talk, and definitely wanted to see what I could take from that. I hadn't tried to port any of those ideas to Rust though, so I appreciate the sketch, I'm sure it'll be helpful.

I may even offer my meager aid to the effort at some point, as I am in need of a SoAVec type that would, preferably, work without (much) macros.

It's no worries either way, I'm unsure how much contribution I want to take on this project right now, because this is largely a learning experience + passion project for me at the moment, but if you do take a crack at this, I'd at least like to take a look at it!

(or plans for one? I didn't see it in the code so I assume this is planning-stages for now)

Yep. As always, there's a ton of things to do with a baby language, and so I haven't figured out at what point I want to do each step. First round is just to get anything going at all, and I don't want to implement an entire language and then re-write its parser, I'm also worried about doing this kind of thing too early when measurements aren't "real", know what I mean? Regardless, the current state of things is very much in a more classic style, and even a little sloppy: I've been prioritizing shipping anything at all over something that perfectly fits my end plan just yet, at this nascent stage of the language existing, there's just so little going on that I want to make things a bit more interesting, language-wise, before I really focus on getting these sorts of details in place in a more real way.

0 replies

aapoalas · 2025-07-22T06:12:03Z

aapoalas
Jul 22, 2025
Author

Hey hey! It's not a problem at all! ..., and I appreciate it.

Glad to hear that; I was a little worried, somehow.

It's no worries either way, I'm unsure how much contribution I want to take on this project right now, because this is largely a learning experience + passion project for me at the moment, but if you do take a crack at this, I'd at least like to take a look at it!

Ah, yeah don't worry, I didn't mean direct contributions: what I meant was that I should be implementing a SoAVec like struct (specifically one that can only contain up to 2^32 items) in the context of my JavaScript engine project, and it will then likely be available on crates.io. So, hopefully I'll have something that you can indeed take a look at :)

Yep. As always, there's a ton of things to do with a baby language, and so I haven't figured out at what point I want to do each step. First round is just to get anything going at all, and I don't want to implement an entire language and then re-write its parser, I'm also worried about doing this kind of thing too early when measurements aren't "real", know what I mean? Regardless, the current state of things is very much in a more classic style, and even a little sloppy: I've been prioritizing shipping anything at all over something that perfectly fits my end plan just yet, at this nascent stage of the language existing, there's just so little going on that I want to make things a bit more interesting, language-wise, before I really focus on getting these sorts of details in place in a more real way.

I absolutely understand (the JS engine is very much the same; trying things out, just getting things out the door before making sure they're necessarily the best thing ever). I wish you a good time with this project, cheers! <3

0 replies

steveklabnik · 2025-07-22T16:30:06Z

steveklabnik
Jul 22, 2025
Maintainer

it will then likely be available on crates.io. So, hopefully I'll have something that you can indeed take a look at :)

Excellent :)

0 replies

steveklabnik · 2025-08-07T04:55:10Z

steveklabnik
Aug 7, 2025
Maintainer

I was chatting about this with Claude earlier:

● Wow, Zig's approach is fascinating! It's ultra-minimalist and incredibly cache-efficient. Let me break down the key
  differences:

  Comparison Summary

  | Aspect               | Our Current (Typed Arenas) | Aapo's (Unified)  | Zig's Approach    |
  |----------------------|----------------------------|-------------------|-------------------|
  | Node Size            | Variable (24-64 bytes)     | ~32-48 bytes      | 13 bytes fixed    |
  | Cache Lines per Node | 1-2 nodes                  | 1-2 nodes         | 5 nodes           |
  | Indirection          | 1 level                    | 1 level           | 1-2 levels        |
  | Type Safety          | High (typed IDs)           | Medium (enums)    | Low (u32 indices) |
  | Traversal            | Jump between arenas        | Linear in one vec | Linear in arrays  |
  | Child Storage        | Inline or ChildRange       | Inline Vec        | extra_data array  |

fun!

1 reply

aapoalas Sep 8, 2025
Author

Hmm, it seems to have gotten confused by my pseudo-code, since there's definitely a spill_data array there as well, and it's all in a "SoAVec" so multiple Arrays, and the sizes of nodes is also fixed (well, this is just a corollary of the spill_data Vec existing) etc.

Anyway, I thought I'd drop you a line about that "SoAVec": the first version is now published as a crate. It's a little bare-bones right now, just supporting push/pop/as_(mut_)slices and drain_mut (because I needed that), and doesn't support enums yet. But it'll be getting better, I think; it's actually become a university student project to make it more complete vis a vis Vec's API!

I have the concepts of a plan to make it support splitting up enums:

// tuple-like enum
#[derive(SoAble)]
enum Value {
  Undefined,
  Null,
  Boolean(bool),
  String(u32),
  SmallString([u8; 7]),
}

// in a SoAVec splits up into
struct ValueRef<'soa> {
  discriminant: &'soa Discriminant<Value>,
  _0: union {
    Undefined: &'soa (),
    Null: &'soa (),
    Boolean: &'soa bool,
    String: &'soa u32,
    SmallString: &'soa [u8; 7],
  }
}

// struct-like enum
#[derive(SoAble)]
enum Node {
  First {
    offset: u32,
    other: ??,
    unique1: NodeIndex,
    unique2: (),
  },
  Second {
    offset: u32,
    other: ??,
    unique1: SecondUniqueData,
    unique2: SpillIndex,
  },
}

// splits up into
struct NodeRef<'soa> {
  discriminant: &'soa Discriminant<Node>,
  offset: &'soa u32,
  other: &'soa ??,
  unique1: union {
    First: &'soa NodeIndex,
    Second: &'soa SecondUniqueData,
  },
  unique2: union {
    First: &'soa (),
    Second: &'soa SpillIndex,
  }
}

Alternatively... a really enterprising trick would be to actually read the discriminant out of the SoAVec allocation and make ValueRef/NodeRef themselves be enums that use the same discriminant values but contain references... Hmm. It might still be more efficient to keep everything on the heap and avoid being too smart.

Aanyway, if you want to play around with that, it's there and works for basic structs through a derive macro no problem. Cheers.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ECS-like Node storage likely to be cache unfriendly #75

Uh oh!

{{title}}

Uh oh!

Replies: 5 comments 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

ECS-like Node storage likely to be cache unfriendly #75

Uh oh!

aapoalas Jul 19, 2025

Footnotes

Footnotes

Replies: 5 comments · 1 reply

Uh oh!

steveklabnik Jul 21, 2025 Maintainer

Uh oh!

steveklabnik Jul 21, 2025 Maintainer

Uh oh!

aapoalas Jul 22, 2025 Author

Uh oh!

steveklabnik Jul 22, 2025 Maintainer

Uh oh!

steveklabnik Aug 7, 2025 Maintainer

Uh oh!

aapoalas Sep 8, 2025 Author

aapoalas
Jul 19, 2025

Replies: 5 comments 1 reply

steveklabnik
Jul 21, 2025
Maintainer

steveklabnik
Jul 21, 2025
Maintainer

aapoalas
Jul 22, 2025
Author

steveklabnik
Jul 22, 2025
Maintainer

steveklabnik
Aug 7, 2025
Maintainer

aapoalas Sep 8, 2025
Author