Skip to content

Conversation

d10c
Copy link
Contributor

@d10c d10c commented Sep 1, 2025

This PR adds overlay support to the Python extractor, including overlay compilation, basic tests, and a consistency check.

Supercedes earlier PR #20206.

According to latest DCA results,

  • Analysis time +15%
  • Database build time -72%
  • TRAP import time -46%
  • End-to-end time -15%
  • Accuracy 99.6% (lowest on py/unused-global-variable 83%)
  • Database size +20%

Clarifications:

  • I squashed all new changes onto earlier PR.
  • @py_cobject and @externalDataElement are not Discardable because they can't be linked to a source file. The consistency check ignores them.
  • @externalDefect/Metric, @duplication_or_similarity, and @svnentry are not Discardable because they are deprecated.

@github-actions github-actions bot added the Python label Sep 1, 2025
@d10c d10c force-pushed the d10c/python-overlay-compilation-plus-extractor branch 2 times, most recently from 0b94992 to feb4c3a Compare September 12, 2025 21:20
@d10c d10c force-pushed the d10c/python-overlay-compilation-plus-extractor branch 3 times, most recently from fbb16b4 to e2f6e4a Compare September 22, 2025 21:15
github-advanced-security[bot]

This comment was marked as resolved.

@d10c d10c force-pushed the d10c/python-overlay-compilation-plus-extractor branch from 456c659 to c0707fd Compare October 2, 2025 15:50
@d10c d10c force-pushed the d10c/python-overlay-compilation-plus-extractor branch 2 times, most recently from 3901c56 to 8844c2d Compare October 2, 2025 16:16
@d10c d10c mentioned this pull request Oct 2, 2025
4 tasks
@d10c d10c requested a review from tausbn October 2, 2025 16:18
@d10c d10c marked this pull request as ready for review October 2, 2025 16:18
@d10c d10c requested a review from a team as a code owner October 2, 2025 16:18
@Copilot Copilot AI review requested due to automatic review settings October 2, 2025 16:18
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds overlay support to the Python extractor, enabling incremental compilation and overlay-based extraction for improved performance. The changes introduce overlay metadata handling, entity discard predicates, and consistency checks for overlay databases.

  • Adds overlay compilation and extraction support to the Python ecosystem
  • Implements entity discard predicates for incremental analysis
  • Introduces consistency checks to ensure proper overlay database construction

Reviewed Changes

Copilot reviewed 45 out of 45 changed files in this pull request and generated no comments.

Show a summary per file
File Description
python/ql/lib/semmlecode.python.dbscheme Adds @top type and overlay metadata support to database schema
python/ql/lib/semmle/python/Overlay.qll Implements comprehensive entity discard predicates for overlay functionality
python/ql/lib/semmle/python/internal/OverlayDiscardConsistencyQuery.qll Provides consistency query logic for overlay database validation
python/extractor/semmle/worker.py Adds overlay extraction mode with change-based file filtering
python/extractor/semmle/projectlayout.py Improves Windows path handling in project layout configuration
python/extractor/semmle/path_rename.py Updates environment variable to use CODEQL_PATH_TRANSFORMER
python/ql/test/extractor-tests/overlay/ Adds comprehensive overlay extraction test cases
Various .expected files Test output files for overlay functionality validation

d10c added 12 commits October 6, 2025 11:36
The new name is required by overlay support.
And don't add slash to start of path patterns on Windows.
- fall back to full extraction on overlay changes json read error
- we filter both root modules and (transitive) imports against the overlay-changes json.
for dbscheme elements with direct or indirect location links in dbscheme.

- Unify discardable entities under one Discardable superclass.
- Two discard predicates depending on TRAP ID type.
- Future-proof the XML and Yaml discard predicates for when their
  extractors become incremental.
@d10c d10c force-pushed the d10c/python-overlay-compilation-plus-extractor branch from 8844c2d to e74f9a4 Compare October 6, 2025 09:51
d10c added 2 commits October 6, 2025 12:30
The base source is in basic-overlay-eval/orig_src,
the overlay source is in basic-full-eval.

We run two tests: a full evaluation test in basic-full-eval,
and an overlay evaluation test in basic-overlay-eval.

The test source and expected results are the SAME,
due to the .qlref, meaning we expect the same results
for full and overlay evaluation.
@d10c d10c force-pushed the d10c/python-overlay-compilation-plus-extractor branch from e74f9a4 to ece1210 Compare October 6, 2025 10:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant