Compiler from a C-subset to LLVM IR & MIPS. A group project for the course Compilers (INFORMAT 1001WETCOP)
By Emil Lambert, Lucas Vanden Abeele, Tibo Verreycken, Kars van Velzen.
[Spring, 2024 | Compilers, 1001WETCOP]
The goals off this project for this course consists of (but not limited to): Show expertise in the course material theory, Implementing a compiler
You can find detailed information about the course using this link
Implementing a compiler for a C-subset to working LLVM-IR and MIPS assembly with at least the Mandatory Functionality, listed below.
We developed a compiler for a C-subset to working LLVM-IR and MIPS assembly with a wide range of support. We encourage you to take a look at the provided functionality below. A testing system consisting of 650 tests which will be used for the next year by the teaching assistants to grade the upcoming projects. We received the total grade for this project.
- Install the python packages, see
requirements.txt
- Working gcc compiler on your device
- For unit tests, a linux Kernel with lli & spim packages installed:
sudo apt install llvm
sudo apt install spim
Find our demonstration for Mips , LLVM
--unused_var False
Disables the optimization to remove unused variables. This might help to better
understand the LLVM output
(In short we made all the mandatory features) (Links are provided to an example file showing this feature)
- Binary operations +, -,* and /
- Binary operations >, <, and ==
- Unary operators + and -
- Parenthesis to overwrite the order of operations
- Logical operators &&, ||, and !
- Comparison operators >=, <=, and !=
- Binary operator %
- Shift operators <<, >>
- Bitwise operators &, |,~ and ^
- Abstract syntax tree and visualisation
- Constant Folding
- Add an int main() { ... } function
- Extra reserved keywords need to be supported: const, char, int, and float
- Expressions
- Literals of types: 'char', 'int', 'float' and pointers for these types
- Variables: variable declarations, variable definitions, assignment statements, and identifiers
- Pointers:(Declaration, definition) & Unary Operations: '*' (dereference) and '&' (address)
- Pointer to const Values (We also support Const ptrs btw)
- Constants
- Implicit Conversions (Warnings when going to poorer type)
- Explicit Conversions
- Pointer Arithmetic
- Increment/Decrement Operations
- Constant propagation
- Error Analysis: Syntax Errors
- Error Analysis: Semantic Errors (all Semantic error features)
- Semantic error: Use of undeclared var
- Semantic error: Redeclaration of an undeclared var
- Semantic error: Operations or assignments of incompatible types
- Semantic error: Assignment to an rvalue
- Semantic error: Assignment to a const variable
- Symbol table and visualisation
- Semantic analysis: Consistency of return type.
- Semantic analysis: Parameter types passed to call should be consistent with the function signature.
- Semantic analysis: Consistency between forward declarations and function definitions. Functions must be declared/defined before calling them. Functions should not be re-defined.
- Semantic analysis: Type checking for array types (parameter passing, assignment, indexing).
- Semantic analysis: Checking the length of array initialisers ({ ... }) during assignment.
- Semantic analysis: Type checking for accessing and assigning struct members.
- Semantic analysis: Type checking for accessing and assigning union members (if implemented).
- Semantic analysis: Type checking for function pointers: assigning function pointers, calling
- Semantic analysis: Check for all paths in a function body whether a return statement is present. functions, etc. (if implemented)
- SymbolTable and Visualization of the symbol table
- Singleline Comments
- Multiline Comments
- Original Code as Comments
- Outputting to the standard output using printf
- Typedefs
- 'if' and 'else' statements
- Loops: 'while' and 'for', also support keywords 'break' & 'continue'
- Scoping: if-else, for, while, anonymous
- Switch statements: switch, break, case
- Enumerations
- Support Scopes in symbol Table
- Function scopes
- Local & Global variables
- Functions: definitions; declarations; calling, recursive calls
- Pre-processor: #define
- Pre-processor: #include
- Optimisations: no code after break, continue, or return
- Arrays both simple & multidimensional arrays
- Array initialization using initializer list
- Operations on array elements
- Strings encoded as zero-terminated char-arrays (both supported as char str[size] and char* str)
- printf and scanf
- user Defined Structs
- Include Guards (mandatory because group of 4)
- Unions (mandatory because group of 4)
- Function Pointers (mandatory because group of 4)
- Code Generation: LLVM IR
- Code Generation: Mips
(In short we did all the optionals except 'Function overloading' and we did 4 extra optionals ('CFG' and 'array[] automatic size' )', 'Func ptrs', 'calloc en realloc'')
- Include Guards (mandatory because group of 4)
- Unions (mandatory because group of 4)
- Function Pointers (mandatory because group of 4)
- Const Casting
- Else If Statements
- Semantic Analysis: Check for all paths in a function body whether a return statement is present
- Optimisation: Do not generate code for variables that are not used
- Optimisation: Do not generate code for conditions that are always false
- Dynamic Arrays (stored on heap) (malloc, free) (sidenote we require #include <stdlib.h> to be included, (similar situation as printf and scanf))
- Structs that contain other structs as a value
- Dynamic Allocation of Structs (store structs on heap) (support by also supporting sizeof function)
- File reading using fgets
- File writing using fputs
- Dynamically allocated strings and character buffers (can be used for printf, fgets, fputs, ...)
- ControlFlow and visualization (Extra optional)
- Dynamic Allocation Extension: Using Calloc and Realloc
- File reading uses File Pointers with fopen and fclose
- Array that automatically inferred array size int a[] = {1, 2, 3}
- README
- Documentation
- Unit Tests
- Start Script
- PNG Creation from .dot-files
Compilers
├── example_source_files - Files used in our demo
├── grammar - Directory containing grammar specifications
│ └── grammarC.g4 - ANTLR file containing our grammar specification
├── README.md - This file
├── requirements.txt - Python dependencies
├── script.py - Run all example_source_files
├── src - Directory containing our source doe
│ ├── antlr_files - Grammar files created by ANTLR
│ ├── internal_tools - Directory containing internal tools
│ │ └── IntegrityChecks.py - PreConditions Functions
│ ├── llvm_target - Directory containing code to translate the AST & table data towards LLVM
│ │ ├── AST2LLVM.py - Main visitor to convert to LLVM
│ │ ├── LLVMSingleton.py - Manages the LLVMLite Library to ease the work of our visitor
│ │ ├── MapTable.py - Map variables to a LLVM register
│ │ └── OutputLLVMGenerator.py - Contains methods to convert parts of the tree to LLVM code
│ ├── main - Directory containing the main function
│ │ └──__main__.py - Main file
│ ├── mips_target - Directory containing files to generated Mips
│ │ ├── AST2MIPS.py - Main visitor to convert to Mips
│ │ ├── MipsLibrary - Our own library, based upon the workings of LLVMLite
│ │ │ ├── Blocks.py - Structure to create blocks
│ │ │ ├── Function.py - Structure to create functions
│ │ │ ├── Instructions - We won't go over each instructions, but these classes are called in the block structure, just like you're used to for LLVMLite.
│ │ │ │ ├── FR
│ │ │ │ │ ├── add_s.py
│ │ │ │ │ ├── c.eq.s.py
│ │ │ │ │ ├── c.le.s.py
│ │ │ │ │ ├── c.lt.s.py
│ │ │ │ │ ├── div_s.py
│ │ │ │ │ ├── FloatMipsInstruction.py
│ │ │ │ │ ├── mul_s.py
│ │ │ │ │ └── sub_s.py
│ │ │ │ ├── I
│ │ │ │ │ ├── addi.py
│ │ │ │ │ ├── addiu.py
│ │ │ │ │ ├── andi.py
│ │ │ │ │ ├── beq.py
│ │ │ │ │ ├── bne.py
│ │ │ │ │ ├── IMipsInstruction.py
│ │ │ │ │ ├── lb.py
│ │ │ │ │ ├── li.py
│ │ │ │ │ ├── lui.py
│ │ │ │ │ ├── lw.py
│ │ │ │ │ ├── mips_not.py
│ │ │ │ │ ├── neg.py
│ │ │ │ │ ├── ori.py
│ │ │ │ │ ├── sb.py
│ │ │ │ │ ├── slti.py
│ │ │ │ │ ├── sltiu.py
│ │ │ │ │ └── sw.py
│ │ │ │ ├── Instruction.py
│ │ │ │ ├── J
│ │ │ │ │ ├── jal.py
│ │ │ │ │ ├── jalr.py
│ │ │ │ │ ├── JMipsInstruction.py
│ │ │ │ │ └── j.py
│ │ │ │ ├── R
│ │ │ │ │ ├── add.py
│ │ │ │ │ ├── addu.py
│ │ │ │ │ ├── div.py
│ │ │ │ │ ├── __init__.py
│ │ │ │ │ ├── jr.py
│ │ │ │ │ ├── mfhi.py
│ │ │ │ │ ├── mflo.py
│ │ │ │ │ ├── mips_and.py
│ │ │ │ │ ├── mips_or.py
│ │ │ │ │ ├── mul.py
│ │ │ │ │ ├── nor.py
│ │ │ │ │ ├── RMipsInstruction.py
│ │ │ │ │ ├── sgt.py
│ │ │ │ │ ├── sll.py
│ │ │ │ │ ├── sllv.py
│ │ │ │ │ ├── slt.py
│ │ │ │ │ ├── srl.py
│ │ │ │ │ ├── srlv.py
│ │ │ │ │ ├── sub.py
│ │ │ │ │ ├── subu.py
│ │ │ │ │ └── xor.py
│ │ │ │ └── S
│ │ │ │ ├── comment.py
│ │ │ │ ├── cvt_s_w.py
│ │ │ │ ├── cvt_w_s.py
│ │ │ │ ├── la.py
│ │ │ │ ├── l_s.py
│ │ │ │ ├── manual_label.py
│ │ │ │ ├── mfc1.py
│ │ │ │ ├── move.py
│ │ │ │ ├── mtc1.py
│ │ │ │ ├── SpecialInstruction.py
│ │ │ │ └── systemCall.py
│ │ │ ├── Memory - Directory containing files related to memory management.
│ │ │ │ ├── Memory.py - Memory object, used for registers or stack locations. Widely used by the registermanager.
│ │ │ │ └── RegisterManager.py - A singleton service with function to retrieve a free registers and spill any data to memory.
│ │ │ ├── MipsManager.py - Mips Label manager
│ │ │ ├── MipsModule.py - Module manager
│ │ ├── MipsSingleton.py - Similar to LLVMSingleton.py
│ │ ├── OutputMIPSGenerator.py - Similar to it's LLVM counterpart
│ │ ├── PredifinedStructures.py - Predefined functions translated from c to Mips
│ ├── parser - Directory containing code to parse an input file, create & massage the tree
│ │ ├── ArrayCleaner.py - Massage the tree, specificly for arrays
│ │ ├── ArrayPreProcessor.py - Makes sure const values are propagated inside the array size
│ │ ├── ASTCleanerAfter.py - Clean the AST after table creation
│ │ ├── ASTCleaner.py - Clean the AST before table creation
│ │ ├── ASTConversion.py - Reform implicit conversions to explicits and verify semantics
│ │ ├── ASTCreator.py - Create the initial AST from the ANTLR tree
│ │ ├── ASTDereferencer.py - Alters the AST to ease the use of adressess
│ │ ├── ASTIfCleaner.py - Massage If Statements to our format
│ │ ├── ASTLoopCleaner.py - Convert & Clean loop statements to While loops in a desired format
│ │ ├── AST.py - Contains class definitions of our AST Objects & some methods for easy manipulation
│ │ ├── ASTTableCreator.py - Creates symbol tables
│ │ ├── ASTTypedefReplacer.py - Resolves typedefs to their actual type
│ │ ├── ASTVisitor.py - Base class visitor for our AST
│ │ ├── BlacklistVisitor.py - Remove nodes that contain black listed text
│ │ ├── CodeGetter.py - Maps all source code to their respective linenumber & file
│ │ ├── ConstantFoldingVisitor.py - Apply constant folding
│ │ ├── ConstantStatementFolding.py - Apply constant folding for conditional statements
│ │ ├── Constraints - Directory containing semantic checks
│ │ │ ├── AmpersandConstraint.py - Verifies the integrity of the & operator
│ │ │ ├── CheckRvalues.py - Check for invalid Rvalues
│ │ │ ├── CheckUnaryOps.py - Verify consistency of unary operators
│ │ │ ├── CleanGlobalScopeConstraint.py - Verify that all statements in the global scope are allowed
│ │ │ ├── ConstraintChecker.py - Actual visitor that applies all constraints
│ │ │ ├── Constraint.py - Abstract Base Class
│ │ │ ├── FunctionReturnConstraint.py - Verify that every (non-void) function has a return statement
│ │ │ ├── GlobalsConstrained.py - Verify global variable syntax
│ │ │ ├── IOConstraint.py - Check when printf or scanf is used, stdio.h is included
│ │ │ ├── MainFoundConstraint.py - Check if the code to compile, contains a main function
│ │ │ ├── RedefinitionConstrained.py - Check for redefinitions
│ │ │ ├── UndeclaredConstrained.py - Verify if all used variables are declared
│ │ │ ├── UndefinedReferenceConstraint.py - Check for function calls to undefined functions
│ │ │ └── VoidReturnConstraint.py - Check that every void function has no non-empty return statement
│ │ ├── ControlFlow - Directory containing ControlFlow Code
│ │ │ ├── ControlFlowDotVisitor.py - Visualise the CFG
│ │ │ ├── ControlFlowGraph.py - Contains CFG Classes & manipulation methods
│ │ │ └── ControlFlowCreator.py - Creates the actual CFG
│ │ ├── CTypes -
│ │ │ ├── CFunctionExecuterChar.py - Makes sure that each C type has the right operation behaviour
│ │ │ ├── CFunctionExecuterFloat.py - Makes sure that each C type has the right operation behaviour
│ │ │ ├── CFunctionExecuterInt.py - Makes sure that each C type has the right operation behaviour
│ │ │ ├── CFunctionExecuterPtr.py - Makes sure that each C type has the right operation behaviour
│ │ │ ├── CFunctionExecuter.py - Makes sure that each C type has the right operation behaviour
│ │ │ ├── COperationHandler.py - Simulates operations like it would behave in C
│ │ │ ├── InvalidOperatorFloatError.py - Error used in COperationHandler & -Executer
│ │ │ └── InvalidOperatorPtrError.py - Error used in COperationHandler & -Executer
│ │ ├── DeadCodeRemover.py - Remove unreachable code
│ │ ├── DotVisitor.py - Export the AST to a .dot file
│ │ ├── DynamicAllocation.py - Add dynamic allocation functionality
│ │ ├── EnumConverter.py - Convert enum nodes to our format
│ │ ├── ErrorExporter.py - File containing methods to throw errors
│ │ ├── FileIO.py - - Add FileIO functionality
│ │ ├── FunctionPtrCleaner.py - Cleanup Function Pointer declaration to our declration format
│ │ ├── IdentifierReplacerVisitor.py - Replace variables with known values at compile time
│ │ ├── PointerReformater.py - Reformat the '->' pointer operator towards (* ). operators
│ │ ├── Preproccesing - Directory containing preprocessor code
│ │ │ └── preProcessor.py - Execute preprocessing; remove comments & resolve #directives
│ │ ├── SizeOfTranslater.py - Convert SizeOf expressions, to an integer indicating the size of this type
│ │ ├── StringToArray.py - Make it possible to represent strings as char arrays
│ │ ├── StructCleanerAfter.py - Clean Struct Nodes after Table creation
│ │ ├── StructCleaner.py - Cleanup struct nodes & insert info into the Struct Table
│ │ ├── SwitchConverter.py - Convert switch stamenets to if statements
│ │ ├── Tables - Directory containing tables & type objects
│ │ │ ├── AbstractTable.py - Abstract Base Class for our Tables
│ │ │ ├── FunctionSymbolType.py - Type that holds data about Functions
│ │ │ ├── StructTable.py - Table to map structs info to scopes
│ │ │ ├── SymbolTable.py - Should be clear
│ │ │ ├── SymbolTypeArray.py - Type holding array info
│ │ │ ├── SymbolTypePtr.py - Type holding ptr info
│ │ │ ├── SymbolType.py - Simple type for base types, also used as base class for others
│ │ │ ├── SymbolTypeStruct.py - Type holding struct info
│ │ │ ├── SymbolTypeUnion.py - Type holding union info
│ │ │ ├── TableDotVisitor.py - Exports the symboltable to a .dot file
│ │ │ ├── TypedefTable.py - Table containing info about typedefs
│ │ │ └── TypeNodehandler.py - Tools with regard to creating and retrieving AST TypeNode types
│ │ ├── TypeCleaner.py - Convert 'Type' subtrees to a ASTTypeNode containing the type
│ │ ├── TypeMerger.py - Formats declarations with enums, structs to our format.
│ │ ├── UnarySaVisitor.py - Semantic Analyses for Unary operations
│ │ ├── Utils - Directory containing helper functions
│ │ │ └── ArraySizeReader.py - Help reading array sizes
│ │ ├── ValueAdderVisitor.py - Constant Propagation
│ │ ├── VirtualLineNrVisitor.py - Map code to a virtual linenr (needed instead of linenr by using multiple files)
│ │ ├── VoidReturnAdder.py - Add void returns to void functions
│ │ ├── ScopeCleaner.py - Removes 'Scope' Nodes
│ │ └── UnusedCleaner.py - Removes Code for unused varaibles
└── TestCases - Directory containing our test system
├── ABCTests - DIrectory containing the abstract base class test & examples
│ ├── abcTest.py - Abstract Base Test (from unitTest)
│ ├── AstLoader.py - Can load & export an AST to .json files
│ ├── createTests - Directory to create a new test file
│ │ ├── file_read_json.json - .json upon file read
│ │ ├── file_read_json_result.json - .json after applied actions
│ │ ├── read_file_after.dot - .dot file after applied actions
│ │ ├── read_file_after.png - .png file after applied actions
│ │ ├── read_file_before.dot - initial AST
│ │ ├── read_file_before.png - initial AST in png
│ │ ├── read_file.c - Test input file
│ │ └── testCaseCreator.py - Action applyer
│ ├── ReCalibrateTestCases.py - Script to automaticly remake all .json files
│ ├── temp - Temp directory in which any output files are generated
│ │ └── temp.txt - File to ensure the directory exists
│ └── tests - Directory containing test files corresponding to a unittest
│ ├── error_dict.json - Predefine expected errors for each file
│ ├── input_dict.json - If exists, predefined input for scanf
│ ├── test1.c - A test file
│ ├── test1.json - Generated .json after AST test
│ └── test1_result.json - Expected .json result
├── ASTTests - Directory containing AST unit tests
└── LLVMTests - Directory containing LLVM unit tests
357 directories, 3512 files
You can test our example files by running:
python3 script.py
This will run the provided examples from example_source_files
For our project we have lots of testcases, ... So if for some reason a situation occurs that is unexpected, please consolidate our testcases.
LLVM Tests are tests that run the entire compiler and cross-reference their output (from prints) with the gcc output. This Part of our testEnvironment is the best indicator of our compiler's capabilities. Currently, we have around 650 tests in this directory. Some classification is made, but not every feature has its own testSuite. A lot of testfiles also have a combination of testcase features.
Run our testcases using:
python3 -m unittest discover -v
You can also run the tests in parallel Requires:
pip install unittest-parallel
unittest-parallel -t . -s TestCases --level=class