cleks2.h
is a highly customizable header-only lexer written in C.
It is the second iteration of cleks.h.
- Download
cleks2.h
or clone the repository for all templates and examples. - Include
cleks2.h
in your project. - To access the function bodies, define
CLEKS_IMPLEMENTATION
#define CLEKS_IMPLEMENTATION
#include "cleks2.h"
First create a Clekser struct by calling
Clekser Cleks_create(char *buffer, size_t buffer_size, CleksConfig config, char *filename);
Arguments:
buffer
: [char*] the string buffer to lexbuffer_size
: [size_t] the length of the bufferconfig
: [CleksConfig] the configuration structfilename
: [char*
] a filename indicating where the content of the buffer originated, only used when printing
cleks2 is highly customizable via the CleksConfig struct.
A CleksConfig
can be understood as a description of the "language" rules the lexer should follow.
typedef struct{
CleksWord *words;
size_t word_count;
CleksSymbol *symbols;
size_t symbol_count;
CleksComment *comments;
size_t comment_count;
CleksString *strings;
size_t string_count;
CleksWhitespace *whitespaces;
size_t whitespace_count;
uint8_t flags;
CleksPrintFn print_fn;
} CleksConfig;
Currently, there are five customizable fields, each with a corresponding _count
field, and a field for further flags:
words
: [CleksWord*] an array of string literals to findsymbols
: [CleksSymbol*] an array of character literals to find (highest priority when lexing)comments
: [CleksComment*] an array of comment definitionsstrings
: [CleksString*] an array of string delimeter definitionswhitespaces
: [CleksWhitespace*] an array of whitespace delimeter definitionsflags
: [uint8_t] a bit-mask containing further lexing instructionsprint_fn
: [CleksPrintFn
] a custom function for printing tokens (NULL forCleks_print_default
)
The created CleksConfig
struct is simply assinged to a Clekser
via Cleks_create
, as seen above.
CLEKS_FLAGS_INTEGERS
: enable integer recognitionCLEKS_FLAGS_FLOATS
: enable float recognitionCLEKS_FLAGS_HEX
: enable hex number recognitionCLEKS_FLAGS_BIN
: enable bin number recognitionCLEKS_FLAGS_ALL
: enable all the aboveCLEKS_FLAGS_NO_UNKNOWN
: don't allow unknown tokens
To begin extracting tokens, start calling Cleks_next in a loop. With each iteration, the lexer will try to find a new token and set the provided CleksToken accordingly, returning true on success. If an error occures or when the end of the buffer is reached false is returned.
CleksToken token;
while (Cleks_next(&clekser, &token)){
// do something with the token
}
// lexing finished
You can expect a certain token using
bool Cleks_expect(Clekser *clekser, CleksToken *token, CleksTokenID id);
typedef struct{
char *buffer;
size_t buffer_size;
CleksLoc loc;
size_t index;
CleksConfig config;
} Clekser;
A CleksToken
is defined as follows:
typedef struct{
CleksTokenID id;
CleksLoc loc;
char *start;
char *end;
} CleksToken;
Fields:
id
: [CleksTokenTypeID] a 64-bit integer encoding the CleksTokenType and CleksTokenIndexloc
: [CleksLoc] a structure containing information about the location of a token in the bufferstart
: [char*] the pointer to the starting point of the token in the bufferend
: [char*] the pointer to the end point of the token in the buffer
The CleksTokenType
of a token is one of the following:
CLEKS_WORD
CLEKS_SYMBOL
CLEKS_STRING
CLEKS_UNKNOWN
(an unknown word, can be disabled via theCLEKS_FLAGS_NO_UNKNOWN
flags)CLEKS_INTEGER
(has to be enabled via theCLEKS_FLAGS_INTEGERS
orCLEKS_FLAGS_ALL
flags)CLEKS_FLOATS
(has to be enabled via theCLEKS_FLAGS_FLOATS
orCLEKS_FLAGS_ALL
flags)CLEKS_HEX
(has to be enabled via theCLEKS_FLAGS_HEX
orCLEKS_FLAGS_ALL
flags)CLEKS_BIN
(has to be enabled via theCLEKS_FLAGS_BIN
orCLEKS_FLAGS_ALL
flags)
The token type can be obtained via:
CleksTokenType type = cleks_token_type(token.id);
A string representation of a CleksTokenType
can be obtained via:
char *name = cleks_token_type_name(type);
The index is referring to the index in the array in the config of the token's type. It can be obtained via:
CleksTokenIndex index = cleks_token_index(token.id);
For example given an array of CleksWord
s: {"do", "if", "else"}
if the lexer comes upon "if" in the buffer the corresponding token will be of type CLEKS_WORD
and index 1
.
The location of a token is stored in a CleksLoc
struct.
typedef struct{
size_t row;
size_t column;
char *filename;
} CleksLoc;
typedef struct{
CleksWord *words;
size_t word_count;
CleksSymbol *symbols;
size_t symbol_count;
CleksComment *comments;
size_t comment_count;
CleksString *strings;
size_t string_count;
CleksWhitespace *whitespaces;
size_t whitespace_count;
uint8_t flags;
CleksPrintFn print_fn;
} CleksConfig;
typedef const char* CleksWord;
typedef const char CleksSymbol;
typedef struct{
char start_del;
char end_del;
} CleksString;
typedef struct{
char *start_del;
char *end_del;
} CleksComment;
typedef const char CleksWhitespace;
typedef void (*CleksPrintFn) (CleksToken);
Clekser Cleks_create(char *buffer, size_t buffer_size, CleksConfig config, char *filename, CleksPrintFn print_fn);
bool Cleks_next(Clekser *clekser, CleksToken *token);
bool Cleks_expect(Clekser *clekser, CleksToken *token, CleksTokenID id);
bool Cleks_extract(CleksToken *token, char *buffer, size_t buffer_size);
void Cleks_print(Clekser clekser, CleksToken token);
void Cleks_print_default(CleksToken token);
#define cleks_token_type(id) ((CleksTokenType) (((CleksTokenID)(id)) >> 32))
#define cleks_token_type_name(type) (CleksTokenTypeNames[(CleksTokenID)(type)])
#define cleks_token_index(id) (CleksTokenIndex)((id) & 0xFFFFFFFF)
#define cleks_token_id(type, index) ((CleksTokenID) ((CleksTokenID) (type) << 32) | ((CleksTokenIndex)(index)))
#define cleks_token_value(token) (token).start
#define cleks_token_value_length(token) ((token).end - (token).start)