Skip to content

**Feature: Implement AST-based Analysis for Deeper Code Understanding** #2

@HeMaNtMoRee

Description

@HeMaNtMoRee

Is your feature request related to a problem? Please describe.

Currently, the system's understanding of the codebase is limited to the content of individual chunks (functions, classes). As highlighted in the "Challenges and Future Enhancements" section of the README.md, this leads to the "Invisible Threads" problem. The assistant lacks dependency awareness and cannot trace the relationships between different parts of the code.

This prevents it from answering critical impact-analysis questions, such as:

  • "What will break if I change this function?"
  • "Where is this variable used throughout the repository?"
  • "Show me the call graph for this function."

Describe the solution you'd like

To address this, we should integrate true Abstract Syntax Tree (AST) parsing and analysis into the indexing pipeline. While the current Python parser uses the ast module to identify functions and classes, we can expand this to build a comprehensive dependency graph of the entire repository.

The implementation would involve:

  • AST Generation: During the indexing phase, parse each source file (starting with Python) to generate a full AST.
  • Graph Construction: Traverse the AST to identify nodes representing function calls, variable usage, imports, and class instantiations.
  • Store Relationships: Store these relationships (e.g., in a graph database or as metadata linked to the vector store) to create a map of the codebase's dependencies.
  • Enhanced Retrieval: Update the querying pipeline to leverage this graph. When a user asks about a specific function or variable, the retriever could also fetch all related code chunks that use or are used by it.

Describe alternatives you've considered

We could continue relying on semantic search alone, but this will never fully solve the problem of understanding precise code execution flow and dependencies. Regex-based parsing is another alternative, but it is brittle and less powerful than ASTs for understanding code structure.

Additional context

Successfully implementing this feature would represent a significant step forward in the assistant's intelligence, moving it from a "bag of functions" model to a system with a true structural understanding of the code. This directly addresses one of the key challenges outlined in the project's own documentation.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions