-
Notifications
You must be signed in to change notification settings - Fork 6
V2 Compiler Passes
- Pass 0: Lexical analysis
- Pass 1: Local variable identification and macroexpansion
- Pass 2: TBD
Done by the Lisp reader. 'Nuff said.
Hard to go further without knowing the code to be compiled. This involves macroexpansion to all levels. Examining a form (op arg1 arg2 ... ), there are several possibilities:
-
opis a special form -
opsignifies a macro -
opis of the form.name -
opis a symbol of the formns/namewherensis an existingNamespaceor alias for one in theCurrentNamespaceandnsnames a type -
opis of the formname. - otherwise
Result of macroexpansion is, respectively,
- the original form itself (no macroexpansion)
- Result of calling
Varidentified byopon the entire form, the local variable environment, thenextof the form. -
(. name arg1 arg2 ... )(host expression, either static or virtual method call, depending on whetherarg1names a type -
(. ns name arg1 arg2 ...)(static method) -
(new name arg1 args 2 ... )(new expression) - the original form itself (no macroexpansion)
Determining where op is a macro:
-
opis aSymboland not a local variable -
opis aVaror aSymbolnaming aVar, theVaris marked as a macro and is not marked as private
By inspection, macroexpansion and the identification of local variable scopes are intertwined. This pass must walk the form being compiled, keeping track of the local variable environment and macroexpanding along the way.
Output TBD: could be a simple structure with local variable introduction nodes and expression nodes and leave it at that. Or, one could go to gross level of analysis as done by Compiler.analyze and bottom out with SymbolExpr, KeywordExpr, etc.
With the code expanded, types can be chased throughout the tree. User type tags, type info on Var'd IFns, and through flow interop calls. Likely this will include all identification of known flow of value type values. We should add a boxing node type to the AST to mark explicitly where value types get boxed.
For remaining (non-interop) nodes (fn arg1 arg2 ...) identify invocation type: regular, static, prim, ... . THis might need to be combined with Pass 2 above.
- Identification of constants to compile in (symbols, keywords, maps/lists/sets, etc.)
- Adornment of sequence points and other IL debug information
Could come in two flavors: optimizations on the AST nodes, or optimizations on the (abstract, pseudo) IL. I won't know what is possible or needed here until we see where the above gets us.
A question to be resolved is if there is an intermediate IL (a la Swift IL, e.g.) that sits between the AST representation and the final IL. We definitely want an explicit IL representation tied to MSIL that allows inspection and manipulation prior to going to ILGen.