Skip to content

scip lint reports cross-document relationship targets as orphan despite scip print finding them #397

@Gitious

Description

@Gitious

Summary

scip lint reports RelationshipsRefersToSymbol errors of the form "…has a relationship to '#2', but couldn't find #2 in external symbols or some other document" for relationship targets that ARE present in documents[].symbols when the same SCIP file is read via scip print --json. The two CLI subcommands disagree about what symbols are in the index.

In our specific case (fabric-indexer, an external SCIP emitter), this produces ~4,488–5,614 errors on the rust-lang/rust monorepo despite the index being byte-correct. We've verified at the protobuf level that the relationship-side and definition-side symbol-id strings are byte-identical. We've also verified via scip print --json that 481 of 481 unique orphan target symbols cited in the lint errors are present in documents[].symbols.

Environment

  • scip v0.7.1 (scip-code/scip, scip-darwin-arm64.tar.gz) — also reproduced with the linux-amd64 build inside our bench Docker image
  • macOS 25.4.0 / Darwin arm64 (host) and Ubuntu 24.04 / x86_64 (bench)
  • SCIP file: ~50 MB (52,447,818 bytes), 1,961 documents, 37,991 in-doc symbols, 434 external_symbols stubs, 23,415 Relationship edges
  • SHA-256: 34f9c36778965c9c5a23b5b5ff7e6611d9a125eced947d45b15654c44c4a2eeb
  • Source: fabric-indexer's emission for the rust-lang/rust monorepo (rust.scip)
  • Reproducer artifact location: cognisos-ai/Prod_Fabric bench-results orphan branch, commit a9d7a466, path indexer-v0.2.0-rc.24/2026-05-04/unknown/rust.scip (also at the rc.25 path on commit 6516f729)

Reproducer

Sample lint error line:

error: symbol 'scip-fabric . f23d83ee 0.2.0-rc.22 alloctests/tests/`thin_box.rs`/check_thin_dyn().' (#1) (in symbols for file alloctests/tests/thin_box.rs) has a relationship to 'scip-fabric . f23d83ee 0.2.0-rc.22 alloc/src/boxed/`thin.rs`/new_unsize().' (#2), but couldn't find #2 in external symbols or some other document

Same target symbol via scip print --json:

$ scip print --json rust.scip | python3 -c "
import json, sys
target = 'scip-fabric . f23d83ee 0.2.0-rc.22 alloc/src/boxed/\`thin.rs\`/new_unsize().'
for line in sys.stdin:
    obj = json.loads(line)
    if 'documents' not in obj: continue
    for d in obj['documents']:
        for s in d['symbols']:
            if s['symbol'] == target:
                print(f'FOUND in {d[\"relative_path\"]}: kind={s[\"kind\"]}, display_name={s[\"display_name\"]!r}')
"
FOUND in alloc/src/boxed/thin.rs: kind=26, display_name='new_unsize'

Byte-level check on the same target string in both forms (relationship-side Relationship.symbol field on the source's SymbolInformation in alloctests/tests/thin_box.rs vs definition-side SymbolInformation.symbol field in alloc/src/boxed/thin.rs):

relationship_form (utf-8 hex): 736369702d666162726963202e20663233643833656520302e322e302d72632e323220616c6c6f632f7372632f626f7865642f607468696e2e7273602f6e65775f756e73697a6528292e
definition_form   (utf-8 hex): 736369702d666162726963202e20663233643833656520302e322e302d72632e323220616c6c6f632f7372632f626f7865642f607468696e2e7273602f6e65775f756e73697a6528292e
identical: True
length: 74 bytes

Quantified

  • 4,488 unique RelationshipsRefersToSymbol errors from scip lint (one local reproduction; 5,614 in another invocation — see "non-determinism" note below)
  • 481 unique orphan target symbols across those errors (each cited ~9.3× on average)
  • 481/481 = 100% of those orphan target symbols are findable in documents[].symbols via scip print
  • 0 of them are in external_symbols (because the index already has 434 valid external_symbols stubs for the genuinely-unresolved cases — the orphans cited by lint aren't those)

Hypotheses

scip lint and scip print use different symbol-resolution paths. Either:

  1. Canonicalization mismatchscip lint parses the symbol string and reformats it before comparison, and the canonicalized form differs from the raw stored form. scip print --json returns the raw bytes.
  2. Cross-document lookup bugscip lint's document-walking path fails to find relationship targets in documents OTHER than the source's own document, in some condition (perhaps related to walk order or a cached lookup table populated incorrectly).

The empirical signature (bytes match, print finds the symbol, lint reports it missing) is consistent with both. Tracing through lint's symbol resolver vs print's document walker should distinguish them.

Non-determinism note

Different invocations of scip lint v0.7.1 against the byte-identical rust.scip produce slightly different error counts (4,488 in one run, 5,614 in another) and slightly different unique-orphan counts (481 vs 530). Same binary, same scip version, same input file. This may be related to walk-order-dependent lookup table state or output buffering. Not the main issue, but worth flagging.

What this means for SCIP emitters

Emitters like fabric-indexer are forced into the awkward position of emitting "duplicate" external_symbols stubs for symbols that are already in documents[].symbols, just to satisfy scip lint. We're planning that workaround on our side, but it would clearly be preferable to fix the lint matcher.

Reproducer artifact availability

Happy to share the 50 MB rust.scip directly (sha256 above for verification). It's on a public bench-results branch as the deterministic, signed output of our publication bench.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions