Skip to content

AbstractDataTreeNode.indexOfChild() is called excessively on large resource tree size #2567

@laeubi

Description

@laeubi

During analysis of two heapdumps shared here the following issue was discovered as a hotspot that can benefit from optimization:

Performance Data

Metric WITH transitive WITHOUT transitive Ratio
indexOfChild() self time (µs) 70,405,568 35,739,594 2.0×
DeltaDataTree.lookup() (µs) 55,328,734 24,397,133 2.3×

Description

indexOfChild() performs a binary search on a sorted array of child nodes:

protected int indexOfChild(String localName) {
    AbstractDataTreeNode[] nodes = this.children;
    int left = 0, right = nodes.length - 1;
    while (left <= right) {
        int mid = (left + right) / 2;
        int compare = localName.compareTo(nodes[mid].name);
        // ...
    }
    return -1;
}

While the binary search itself is O(log n), DeltaDataTree.lookup() calls indexOfChild() at each path segment level:

public DataTreeLookup lookup(IPath key) {
    int keyLength = key.segmentCount();
    for (DeltaDataTree tree = this; tree != null; tree = tree.parent) {
        AbstractDataTreeNode node = tree.rootNode;
        for (int i = 0; i < keyLength; i++) {
            node = node.childAtOrNull(key.segment(i)); // calls indexOfChild
        }
    }
}

The delta tree chain (tree.parent) can be deep, and each delta layer requires re-traversing the path from root.

The combined indexOfChild() time (70.4 seconds) represents the single largest CPU consumer in the entire build.

Suggested Fix

  1. Compact delta chains more aggressively: The parent chain walk is expensive. More frequent immobilization of the delta tree would reduce the number of layers to traverse.
  2. Cache frequent lookups: If the same resource paths are looked up repeatedly (e.g., project roots, source folders), a small LRU cache on DeltaDataTree.lookup() could avoid repeated traversals.
  3. Consider hybrid data structure: For nodes with many children (> 100), consider switching from sorted array + binary search to a HashMap<String, AbstractDataTreeNode> for O(1) child lookup. This is particularly relevant for directories containing many files (e.g., classpath container folders).
  4. Reduce key.segment(i) overhead: Each call to IPath.segment(i) on Path objects does array access. Ensure paths are not being reconstructed or re-parsed in hot loops.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions