Skip to content

fix: Prevent O(2^N) exponential blowup in diamond DAG manifest validation#1990

Open
peekmoar wants to merge 1 commit into
contentauth:mainfrom
peekmoar:fix/diamond-dag-dedup-exponential-blowup
Open

fix: Prevent O(2^N) exponential blowup in diamond DAG manifest validation#1990
peekmoar wants to merge 1 commit into
contentauth:mainfrom
peekmoar:fix/diamond-dag-dedup-exponential-blowup

Conversation

@peekmoar
Copy link
Copy Markdown

@peekmoar peekmoar commented Mar 29, 2026

Summary

  • Fix exponential O(2^N) manifest revisits in diamond DAG topologies by adding visited-node tracking to three recursive validation functions
  • In a diamond DAG (manifest A references B and C, both referencing D), the recursive validation revisited already-processed manifests, causing 16,384 visits instead of ~28 at depth 14, leading to 30-70 minute hangs
  • get_claim_referenced_manifests_impl: check svi.manifest_map.contains_key() before recursing (it already tracks visited manifests)
  • ingredient_checks / ingredient_checks_async: add visited: &mut HashSet<String> parameter with early-exit via visited.insert()

Test plan

  • Added regression test test_diamond_dag_dedup.rs that constructs a depth-8 diamond DAG and verifies it completes within 60 seconds (takes ~1.5s with the fix)
  • cargo check passes
  • cargo test --test test_diamond_dag_dedup passes

@tmathern
Copy link
Copy Markdown
Contributor

Does this relate to #1885 and #1887?

Comment thread sdk/tests/test_diamond_dag_dedup.rs Outdated
@peekmoar
Copy link
Copy Markdown
Author

Does this relate to #1885 and #1887?

Well, it's the same root cause (diamond DAGs), but a different code path. #1887 fixed exponential memory growth in the manifest reading/construction path - the reader.rs/store deserialization stuff. This PR fixes exponential time spent in the validation path. It's specifically the ingredient_checks, ingredient_checks_async, and get_claim_referenced_manifests_impl within store.rs. Those weren't touched by the other prs #1887. Without this fix, a depth-14 diamond DAG causes ~16,384 redundant validation visits instead of 28ish. I wish I looked at those PRs first.

This fix is need I think. It's a separate recursive walk entirely. It's probably something that could be solved by a timeout as well but I think this way is more correct. Thank you for taking a look. I'm going to try closing and opening for the Adobe CLA

@peekmoar peekmoar closed this Mar 31, 2026
@peekmoar peekmoar reopened this Mar 31, 2026
@peekmoar peekmoar force-pushed the fix/diamond-dag-dedup-exponential-blowup branch from ab3b083 to 240ea33 Compare March 31, 2026 04:11
@peekmoar
Copy link
Copy Markdown
Author

peekmoar commented Apr 2, 2026

Not rushing you but is there anything else that I need to do to get this change in?

@peekmoar peekmoar force-pushed the fix/diamond-dag-dedup-exponential-blowup branch from 7b385e0 to 2bbb621 Compare April 5, 2026 12:57
let elapsed = start.elapsed();

let manifest_count = reader.iter_manifests().count();
assert!(manifest_count > 0, "should have parsed manifests");
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How many manifests should be parsed? Checks could be tightened here to make sure result stays the same even if code changes in future

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also: It may be good to verify a Reader reports all expected checks in reader.validation_results()

/// Test that a diamond DAG at depth 8 completes in reasonable time.
///
/// Without the dedup fix, depth 8 would cause 2^8 = 256 manifest visits.
/// With the fix, it should visit only ~17 unique manifests.
Copy link
Copy Markdown
Contributor

@tmathern tmathern Apr 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the approximation? It should always be the same number at each run?


// With the dedup fix, this should complete in well under 60 seconds.
// Without the fix at depth 8, it would take significantly longer due to 256 visits.
assert!(
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer not to have timed unit test. If we know this code works you can remove this test.

@mauricefisher64
Copy link
Copy Markdown
Collaborator

I am OK with the changes. Please address the PR comments and I will approve.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants