Skip to content

Refactor potentially shared functionality into core#60

Open
joshbouncesecurity wants to merge 11 commits intogadievron:mainfrom
BounceSecurity:new-refactor
Open

Refactor potentially shared functionality into core#60
joshbouncesecurity wants to merge 11 commits intogadievron:mainfrom
BounceSecurity:new-refactor

Conversation

@joshbouncesecurity
Copy link
Contributor

Trying to move functionality that could be shared between packages into core.

Just an idea for now

@joshbouncesecurity
Copy link
Contributor Author

@danielcuthbert you can see the tests running here, the cleanup fails (probably because of permissions) but it isn't really important and could be removed.

As for this refactor, not sure if this is the direction of philosophy you wanted but more like food for thought

@joshbouncesecurity
Copy link
Contributor Author

Refactoring: Consolidate Core Utilities

Hi @danielcuthbert, this PR refactors the codebase to consolidate common utilities into the core module, improving code organization and reusability across packages and to better comply with the guidance in ARCHITECTURE.md.

This refactor will hopefully help with some future work I am looking at doing. I have tried to include an accurate description here of the changes.

Files Simply Moved

The following files were moved with 100% similarity (no content changes):

LLM Module Files (4 files)
  • packages/llm_analysis/llm/__init__.pycore/llm/__init__.py
  • packages/llm_analysis/llm/client.pycore/llm/client.py
  • packages/llm_analysis/llm/config.pycore/llm/config.py
  • packages/llm_analysis/llm/providers.pycore/llm/providers.py

Functions Simply Moved

The following functions were moved from their original locations to new core modules with no functional changes:

From packages/static-analysis/scanner.py
  • run()core/exec.py (command execution utility)
  • validate_repo_url()core/git.py (repository URL validation)
  • run_single_semgrep()core/semgrep.py (single Semgrep scan execution)
  • semgrep_scan_parallel()core/semgrep.py (parallel Semgrep scanning)
  • semgrep_scan_sequential()core/semgrep.py (sequential Semgrep scanning fallback)

Functions Refactored

The following functions were refactored during the move. All changes were made to maintain compatibility with previous implementations - existing code continues to work without modification, and enhancements preserve original behavior:

Refactored Functions
  • packages/static-analysis/scanner.py::safe_clone() - Refactored to use clone_repository() from core.git instead of inline implementation (same behavior, cleaner code)
  • packages/recon/agent.py::sha256_tree() call - Updated to use new parameters (max_file_size=10**12, chunk_size=8192) to maintain exact backward compatibility with old behavior (no file size limit, original chunk size)

New Functions Added

The following new functions were added to support the refactoring. Most are based on existing code patterns:

core/git.py
  • get_safe_git_env() - Wrapper around RaptorConfig.get_git_env() that was already used in safe_clone() (based on existing usage pattern)
  • clone_repository() - Consolidated git clone functionality extracted from safe_clone() in packages/static-analysis/scanner.py (based on existing safe_clone() implementation)
core/semgrep.py
  • run_semgrep() - Simplified Semgrep scan interface based on run_single_semgrep() logic but with a cleaner API (based on existing run_single_semgrep() implementation)

Functions with Actual Content Changes

The following functions had their implementation logic modified (beyond just moving or adding parameters). All changes were made to maintain compatibility with previous implementations - the modifications were necessary to unify two different implementations that had conflicting behaviors:

Functions with Content Changes
  • core/hash.py::sha256_tree() - Moved from packages/static-analysis/scanner.py (duplicate removed from packages/recon/agent.py). The original implementations had different behaviors: scanner.py had file size limits and used config chunk size, while recon/agent.py had no file size limits and used hardcoded 8192 chunk size. Parameters were added (max_file_size, chunk_size) to maintain compatibility with both: defaults match scanner.py behavior, and packages/recon/agent.py explicitly calls with max_file_size=10**12, chunk_size=8192 to preserve its original behavior.

Import Updates

All packages have been updated to import from the new core module locations as necessary. This includes updates to LLM imports (changed from packages/llm_analysis/llm.* to core.llm.*), utility imports (moved to core.exec, core.git, core.hash, core.semgrep), and corresponding test files.

Additional Tests Added

Comprehensive test coverage has been added for all new core modules:

New Test Files
  • core/tests/test_exec.py - Tests for run() function including command execution, working directory handling (Path and str), timeout behavior, and environment variables
  • core/tests/test_git.py - Tests for git utilities including repository URL validation, git clone functionality, and safe git environment variables
  • core/tests/test_hash.py - Tests for sha256_tree() including directory hashing, file size limits, chunk size parameters, and consistency
  • core/tests/test_semgrep.py - Tests for Semgrep utilities including scan execution, SARIF validation, and error handling
  • core/tests/test_llm.py - Tests for LLM module after refactoring including import verification, ModelConfig, LLMConfig, and LLMClient initialization
Integration Tests Added

Added to test/integration_tests.sh:

  • test_core_exec_import() - Verify core.exec module can be imported
  • test_core_hash_import() - Verify core.hash module can be imported
  • test_core_git_import() - Verify core.git module can be imported
  • test_core_semgrep_import() - Verify core.semgrep module can be imported
  • test_packages_use_consolidated_utilities() - Verify packages can use consolidated core utilities

Other Changes

  • core/__init__.py - Updated to export all new core utilities for convenient imports
  • docs/ARCHITECTURE.md - Updated to reflect new core module structure
  • docs/README.md - Updated documentation references
  • .github/workflows/tests.yml - Updated CI workflow (15 lines changed)
  • README.md - Minor documentation update (2 lines changed)

Summary

  • Files moved: 4 LLM module files (100% similarity)
  • Functions moved: 5 functions from packages/static-analysis/scanner.py (no functional changes)
  • Functions refactored: 2 function calls updated to use new consolidated utilities
  • Functions with content changes: 1 function moved and enhanced (maintain backward compatibility)
  • New functions added: 3 new utility functions across core modules (most based on existing code patterns)
  • New tests: 5 comprehensive test files (962 lines of test code) plus 5 integration tests
  • Import updates: 8+ files updated to use new import paths
  • Total changes: 36 files changed, 1,589 insertions(+), 353 deletions(-)

@joshbouncesecurity joshbouncesecurity marked this pull request as ready for review January 28, 2026 08:45
@joshbouncesecurity
Copy link
Contributor Author

(I would strongly suggest squash and merge :) )

@joshbouncesecurity joshbouncesecurity changed the title First draft of refactor Refactor potentially shared functionality into core Jan 28, 2026
Copy link
Collaborator

@danielcuthbert danielcuthbert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea here of core being the centre and agree, but want to have another set of eyes on this given it's a big change

@joshbouncesecurity
Copy link
Contributor Author

I like the idea here of core being the centre and agree, but want to have another set of eyes on this given it's a big change

Fair one @danielcuthbert , who can we tag in here aside from copilot :)

@danielcuthbert
Copy link
Collaborator

Not forgotten about this, doing a manual review

@joshbouncesecurity
Copy link
Contributor Author

joshbouncesecurity commented Feb 16, 2026

Ok, thanks :) I think it will be a useful refactor but I am not sure the subsequent steps will work. I was hoping to get opencode sdk working with this but it seems like the python SDK is not kept updated 🤦‍♂️🤦‍♂️🤦‍♂️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants