Skip to content

Conversation

dfg-98
Copy link

@dfg-98 dfg-98 commented Nov 14, 2023

No description provided.

@dfg-98
Copy link
Author

dfg-98 commented Nov 14, 2023

Completed lexer for COOL. A lexer, or lexical analyzer, is a program that transforms an input string (source code) into a sequence of tokens, which are meaningful character strings.

The code defines several classes: CharacterStream, TokenType, Token, and Lexer.

The CharacterStream class is used to read the source code character by character. It keeps track of the current position in the code, as well as the line and column numbers. It provides methods to get the next character (next_char), peek at the next character without advancing the position (peek_char), and get the current position (get_position).

The TokenType class is a simple class that defines constants for all possible types of tokens that can be recognized in the source code. These include various keywords, operators, and punctuation marks.

The Token class represents a token. Each token has a type, a value, and a position. The type is one of the constants defined in TokenType, the value is the actual text of the token in the source code, and the position is a CharPosition indicating where the token starts in the source code.

The Lexer class is the main class that performs the lexical analysis. It reads characters from a CharacterStream, recognizes tokens, and generates a list of Token objects. It also keeps track of any errors that occur during the lexical analysis.

The Lexer class's fetch_token method is the heart of the lexer. It reads characters and recognizes tokens based on the rules of the programming language. For example, if the next character is a letter, it reads all subsequent letters and digits to form an identifier or keyword. If the next character is a digit, it reads all subsequent digits to form a number. It also recognizes strings, comments, and various operators and punctuation marks.

The lex method of the Lexer class performs the complete lexical analysis. It resets the character stream, fetches tokens until the end of the source code is reached, and returns a list of all recognized tokens and any errors that occurred.

@dfg-98
Copy link
Author

dfg-98 commented Nov 15, 2023

@Greenman44 we have a bug in the let expression when an extra comma before in.
ex:

let id1: Int, let id2: String <- "Val",
in {

}

Also happens in parse_arguments

test1.tests4(1, 2,)

I think we need to change from while to do-while

Greenman44 and others added 2 commits November 14, 2023 23:46
@dfg-98
Copy link
Author

dfg-98 commented Nov 16, 2023

Errors in parser to solve:

  • Extra comma on arguments / parameters in method call / definition
  • Type/Class names must begin with uppercase letters
  • Identifiers names must begin with lowercase letters
  • Case must contain at least one branch

Greenman44 and others added 6 commits November 16, 2023 14:08
Fix Extra comma on arguments / parameters in method call / definition
Fix Type/Class names must begin with uppercase letters
Fix Identifiers names must begin with lowercase letters
Fix Case must contain at least one branch
@dfg-98
Copy link
Author

dfg-98 commented Nov 22, 2023

12 of 71 semantic tests passed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants