Skip to content

Separate syntactic and semantic nodes? #5159

@ilevkivskyi

Description

@ilevkivskyi
Member

There is something that I was thinking about for a year or so that became more apparent while working on type alias refactoring (and planning the module refactoring). We have two different things mixed together in mypy code: syntactic nodes and "semantic" nodes. This is most obvious in fixup.py where we use NodeVisitor interface (intended to traverse syntactic tree) to patch symbol tables (something that carries purely semantic info). As a result, there are various weird things like visit_var, visit_type_info, etc. The idea is to have clear separation between syntactic and semantic nodes and have separate visitors for them. We currently already have this partially, we have TypeInfo vs ClassDef, and Var vs AssignmentStmt.

Here is a short (and approximate) summary of the proposal:

  • There are following semantic nodes: Var, Function, Class (non-leaf, has symbol table), Module (non-leaf, ditto), TypeVar, TypeAlias, ConditionalNode (see below).
  • The above nodes inherit from SymbolNode, while syntactic nodes inherit from Node, these two both inherit from Context.
  • Only the above nodes (and types) are serialized in cache and deserialized in incremental runs.
  • Semantic nodes can have attributes that point to the relevant (defining) syntactic nodes (for example variable can have defining assignment, or function statement if property), but we should limit this to minimize cache size.
  • The above nodes will have their separate visitor, so that in total we have three: for AST, for symbol tables, and for types.
  • ConditionalNode exists to avoid having .nodes instead of .node in SymbolTableNodes (the latter are thin wrappers around semantic nodes), while still supporting certain conditional definitions like conditional imports. This node will have an attribute that is a list of other semantic nodes.

This is a very large refactoring, but IMO this will add robustness and clarity, and will simplify addition of new features (e.g. conditional imports). This is probably a low priority (long term) thing. We just discussed this with @JukkaL, he likes the idea but is concerned about the size of this refactoring. A possible way to go forward with this is to split this in several separate PRs.

Activity

elazarg

elazarg commented on Jun 11, 2018

@elazarg
Contributor

I tried to do this several times without success - the changes were unmanageable for me. I really hope this will be done.

ilevkivskyi

ilevkivskyi commented on Jun 25, 2018

@ilevkivskyi
MemberAuthor

Some ideas to split this into smaller parts:

  • Kill Decorator, it isn't present in native Python AST, and currently this node appears in both syntactic and semantic trees (intersection that we want to get rid of).
  • Avoid serialization of syntactic nodes, that have matching semantic ones, currently TypeInfo.defn (which is ClassDef) is also serialized, but only partially. Removing this (we can keep context info for better errors, like TypeInfo.defn_context: Context) will not only disentangle semantic/syntactic parts, but might likely make serialization less error-prone.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @elazarg@ilevkivskyi

        Issue actions

          Separate syntactic and semantic nodes? · Issue #5159 · python/mypy