ECS-like Node storage likely to be cache unfriendly

Hello Mr. Klabnik,

My apologies for barging in like this. I hope this is not badly received, but I was reading your technical decisions / architectural documentation and the "ECS-inspired design with separate arrays" popped out to me as a possible problem. It's likely that it will prove to be a cache-unfriendly data structure for the AST, instead of being cache-friendly like you list in the rationale.

You might have also seen a recent [DConf talk](https://www.youtube.com/watch?v=s_1OG9GwyOw) creating a data-oriented parser, AST, and visitor generator program. They performed an ECS transformation on the AST node storage and found it [reduced performance by 20%](https://youtu.be/s_1OG9GwyOw?si=zQOseOIbEcrT3iKu&t=933). The reason seems to be fairly clear: they have more cache misses after the ECS transformation.

The reason is (likely) that the common case for the AST is to read the data of the next node, regardless of what its type is. With the ECS transformation done to separate node data based on the node type, your memory layout optimises for reading the next node of the _same_ type. Having multiple nodes of the same type one after the another is a rare, exceptional case so this will likely lead to a lot of cache misses.

The way you'd likely want to perform the ECS transformation instead is to take the `Vec<ASTNode>` and transform that into something like this:

```rust
#[repr(u8)]
enum NodeType {
  First,
  Second,
}

// example
struct CommonNodeDataA {
  offset: u32,
}

struct CommonNodeDataB {
  other: ???,
}

union UniqueNodeDataA {
  first: NodeIndex, // u32 newtype
  second: SecondUniqueData // u32 newtype
}

union UniqueNodeDataB {
  first: (), // doesnt have second unique data
  second: SpillIndex, // u32 newtype index to spill_data
}

struct AST {
  nodes: SoAVec<(NodeType, CommonNodeDataA, CommonNodeDataB, UniqueNodeA, UniqueNodeDataB)>,
  spill_data: Vec<u8>,
}
```
ie. the AST nodes are split into a single Struct-of-Arrays vector with the node types in one array, common node data in another array[^1], and then type-specific node data in yet other arrays[^2]. A final, separate spill vector is kept on the side for any time a node needs a variable number of data, or just needs more data than can be fit into the static ones[^3], and in these cases the static slots contain an index into the spill vector whence the AST node can read the rest of its data (the spill vector index can start with a dynamic length to tell how much data the AST node should read). (This is just copying Andrew Kelley / Ziglang's AST structure, see [Practical DOD](https://vimeo.com/649009599) for details.)

With this, the memory layout of the AST should be a lot cache friendlier: when an AST node's type is read, the next one's type is the next byte over. Likewise, the common data for both nodes are one after the other, as are the unique / type-dependent data. Extracting the unique data does now require `unsafe {}` (or you could rewrite the unique data to be just u32s or such, reinterpreted based on the tag for less unsafe usage but the result is much the same), but it should be pretty easy to keep this completely on the up-and-up. Finally, spill data location becomes harder to predict for the CPU but it should be both rare, and cache local: the previous spill data ends where the next one begins, so there's still a good chance that the next node needing spill data will find its spill data hot in the cache even though there may have been multiple AST nodes without spill data between it and the previous node with spillage.

There; I've said my piece. As said, I hope this is not received badly: I absolutely do not aim to disparage or make fun of your ECS transformation (or plans for one? I didn't see it in the code so I assume this is planning-stages for now). Much rather, I am very much in favour of it and want it to succeed! I may even offer my meager aid to the effort at some point, as I am in need of a `SoAVec` type that would, preferably, work without (much) macros. But, I couldn't help but notice that the recent DConf talk had had a negative result with precisely this kind of "vector per node type" transformation and I hope you won't end up finding the same result.

Best regards,
-Aapo Alasuutari

#### Footnotes

[^1]: I've split the common data into two arrays here; whether it actually makes sense to split is hard to tell and would need to be measured to really know for certain. But eg. splitting the offset or line&column data off from the rest may well make sense, as those are presumably only really accessed during error recovery and reporting.

[^2]:  Here it is very likely that at least two or maybe three arrays makes sense: some common node types will only need one or two "data slots", so fusing all data slots together into a single union would introduce wasted padding in those cases.

[^3]: If a few relatively rare node types would require eg. five slots, it probably makes sense to move their extra data into the spill section rather than introduce five static arrays, something like half of which are mostly empty because other nodes don't need those "slots".

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ECS-like Node storage likely to be cache unfriendly #51

Footnotes

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

ECS-like Node storage likely to be cache unfriendly #51

Description

Footnotes

Footnotes

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions