-
Notifications
You must be signed in to change notification settings - Fork 2
Description
Is your feature request related to a problem? Please describe.
Currently, the system's understanding of the codebase is limited to the content of individual chunks (functions, classes). As highlighted in the "Challenges and Future Enhancements" section of the README.md, this leads to the "Invisible Threads" problem. The assistant lacks dependency awareness and cannot trace the relationships between different parts of the code.
This prevents it from answering critical impact-analysis questions, such as:
- "What will break if I change this function?"
- "Where is this variable used throughout the repository?"
- "Show me the call graph for this function."
Describe the solution you'd like
To address this, we should integrate true Abstract Syntax Tree (AST) parsing and analysis into the indexing pipeline. While the current Python parser uses the ast module to identify functions and classes, we can expand this to build a comprehensive dependency graph of the entire repository.
The implementation would involve:
- AST Generation: During the indexing phase, parse each source file (starting with Python) to generate a full AST.
- Graph Construction: Traverse the AST to identify nodes representing function calls, variable usage, imports, and class instantiations.
- Store Relationships: Store these relationships (e.g., in a graph database or as metadata linked to the vector store) to create a map of the codebase's dependencies.
- Enhanced Retrieval: Update the querying pipeline to leverage this graph. When a user asks about a specific function or variable, the retriever could also fetch all related code chunks that use or are used by it.
Describe alternatives you've considered
We could continue relying on semantic search alone, but this will never fully solve the problem of understanding precise code execution flow and dependencies. Regex-based parsing is another alternative, but it is brittle and less powerful than ASTs for understanding code structure.
Additional context
Successfully implementing this feature would represent a significant step forward in the assistant's intelligence, moving it from a "bag of functions" model to a system with a true structural understanding of the code. This directly addresses one of the key challenges outlined in the project's own documentation.