todo: typecheck assignment statement and store left side as index to variable
allow assignment to declare a new variable (type inferred) if no matching name is provided as external variable
this internally declared variable will act as a temporary that is in temp storage, so must be flagged as internal so that we can reallocate each time script is run
allow left side of assignment to be an expression
this will require some concept of L-values
how to parse other types of statements?
begin parsing as expression then if we hit a =, we know that what we just parsed needs to be an L value means parse expression should flag each node or pass some value back up saying that it can be treated as l value variable, and some operations (indexing is considered an operation) can be an L value if we just hit a ;, then the expression needs to be a procedure_call, since that is the only expression that can also be a statement on its own
for indexing operator, we may run into some trouble bc of our current typechecking model...
I wish I had just a bit more experience so I would not be short-sighted! but alas, you dont know what you dont know
we can't know return type of indexing operator (or which one to apply, i.e. [] vs *[]) until we know whether we are being used in a context wehere it needs to be an lvalue
perhaps we can just assume *[] and then simply deref in the case that we need to value itself?
but then in any case we will have to fix up the ast after the intial parse...
indexing operator can probably be its own ast node type for now, that way we can just determine if something is an lvalue based on whether it is type variable or indexer since we have all variables by reference at all times, any variable can be an lvalue we will want to be able to have overloads on indexing operator, for matrix types or something probably this will have its own overload set probably
'subexpressions' rather than internal variable declarations $ before 'variable' name means we are declaring a name for a common subexpression that may be used in later expressions/statements while the usage within the script is essentailly the same as a loca variable would be, the distinction of calling it a common subexpression has a purpose in the AST analysis functions that I plan to add when we search for certain patterns or forms of expressions, it aid in our analysis if the node is marked as a reference to a subexpression rather than simply as a variable value that way we can avoid duplicating work by pulling out subexpressions, but still recognize that those expressions are common to multiple larger expressions
$main_cycle = time % 5;
$main_cycle_pi = main_cycle * PI;
tilemaps[1].offset = Vec2.{ 3 * cos(main_cycle), 5 * sin(main_cycle) };
if left of assignment is to a tilemap's position member analyze expressions for vec x and y components
we need to have some simple but versatile syntax for matching expressions and extracting them
match expression form extract expression extract literals when extracting some expression, we want to be able to assert in format string that it is of correct form
identify certain forms of expressions with # builtins #num #op #var #proc #expr will probably allow user to define more of these?
identify exact expressions by name this includes subexpressions declared with $ inside script
easier way to do certain things for certain subexpressions is just to parameterize jai procedures then you can just trivially check an expression ast for the procedure call in question and extract procedure values
$main_cycle = time % 5;
tilemaps[1].offset = ellipse(main_cycle, 1, 5);
tilemaps[2].offset = figure_eight(main_cycle, 3, 5);
in above example, we could just pull out values from parameters and use to generate music elements
Named_Expression :: struct { name: string; expression: AST.Expression; };
get_expr_of_form :: (form: string) -> ([] int, bool) {
extracts expressions based on a format string
format string can use $syntax to denote subexpression to extract
}
// here, I guess type matching on $1 and $2 is determined by provided 'extract' args
// but, it would be useful to be able to get other subexpression out rather than just values
extracted, match := get_expr_of_form(script, "$scalar(0) * sin(time % $time_scale(0))");
if match {
scalar_expr := get_name_expr(extracted, "scalar");
if script.nodes[scalar_expr.root].type == .NUMBER
}
get_assignments_variable(name: string)
get_assignments_to_type(type: *Type_Info)
get_assignments_to_struct_member(base_type: *Type_Info, member: string)
In regards to usage in rosebud:
the general idea for level scripts is that they will be one script for the entire level like a better version of the gon files I used for defining movement components in OE
these scripts should all be one file so that the designer can get the overview of all components and how they relate in a single place because there may be some "local variables" that we want to declare in each level/script, I would like to add an #init section at the top of the script that will only be run once. alternatively, I could just add if statements and have that be managed separately but I sort of want to keep the scripts as basic as possible then again, if statements could also be very useful for things like switches and timers I am really not sure about this though, because the levels should aslo be basically discretely evaluated and having any real state in a scripts breaks this but maybe that is thinking too specifically about my use case for that one game
another idea to document here before I forget: attach callbacks to nodes to run before/after node is evaluated this could be useful for the music system so that it can get live input as the values of certain expressions change e.g., we could alter the pitch/pan of some sound channel as a tile layer moves through its cycle
lots of the musical elements of lead sheets would be good to first try out in a more general visualizer program, based just around making movement components and listening to them interact
if and for statements will require a hierarchical structure to statements we can probably get by with keeping a float array of statements and change method of iteration (e.g. C gon parser) but if we do this, we should try to make it easy to refactor later on if need be so probably use an interface proc to add sub-statements to a block-type statement
will require creation of block scopes of different types
I don't want to have scoping on declarations (only thing to declare right now are subexpressions)
though we may also want to have variables that are parameters to script that are declared within script itself, but that can be modified outside script after its compiled
we will have generated imgui menus to display and modify these sorts of variables
different scope type for struct literals, declaration scope
arena allocator for all script data we need to allocate contiguous space for Any's or at least pointers to all arguments to procedure call but if we use an arena allocator, it won't work very nicely to use a resizable array but we also cannot get the argument count before actually parsing each argument expression maybe we could do a sort of temporary linked-list thing where we store index to next argument's node on previous argument's node this would be pretty low cost and we could then allocate space for all argument Any's after we know size
procedure call arguments
arg nodes connected in linked list sort of way during parsing
for typechecking, we will either need to create temp array view for arg types/anys
or we we could write our own typechecking routine which just iterates args in linked list way
for the actual execution, we do want all args in contiguous memory though
what things to allocate and where
numeric constants don't need to be allocated, since value is stored directly on node
if we later go to a bytecode, these will need to be copied
all other literals will all go in a section of 'constant storage' during first pass of typechecking
as we typecheck the script, keep track of how much stack space we will need for intermediate values, and where each node's intermediate value will be on the stack
after typechecking, allocate exact amount of stack space needed and + room to place some sentinal bytes after stack end, so that we can make sure our logic is correct
problem, we don't know how much stackspace we will need on first pass of typechecking, becuase we don't know for certin the type of parent node until child is checked
so, we will have to just make a second pass for now, I guess
how to layout arguments for proc calls...
ideally, these would all be contiguous, then we could use only one pointer to point directly to values of arguments
then when we do dyncall we just bump this pointer as we push args
so for each argument, which needs to do its own work on stack, it needs to know where to place return value
for each node, we get one offset into stack
at bottom of stack we are to place the return value
then above that we may have other data that node needs in order to complete its operations
for operators, this will be it two operands (by value)
for procedures, this will be the arguments (also by value)
doing it by value here should be fine, since in this case we will want an rvalue
`[ constants | stack space ... sentinel bytes ]`
flat pool usage
since we are using a large flat pool for much of script data, we can probably just allocate all nodes in here as well and then use pointers between nodes
this will make some sections of code much nicer than using indices, as we have been doing until now
(side note: this is probably also something we should do for GON parser as well)
But there are still many things that I would like to allocate contiguously, which means we need to think about steps
`[ nodes | constants | stack space ... sentinel bytes ]`
can we get all nodes allocated before doing constants? probably.
nodes are created during parse/ast construction
constants and variables are allocated during typechecking first pass
stack space calculation done during typecheck second pass
now that we have cleaned up the parser a bit would be useful to have a mode of expression parsing that does not even go to ast would have to resolve identifiers on the spot could do in callback would have to lean heavily on temp storage or pool to allocate intermediate values would have to do all type checking entirely dynamically this would be very handy for the gon expression parsing though since then we can just use a callback that identifies and evaluates node references problem though, we can't know what all nodes are referenced in order to check for circular dependencies until after we parse out expressions
first order of business should be just adding indexers
indexers will encompass both struct member access and array indexing because both of these can return valid lvalues, and are essentially the same operations of offsetting a pointer by some amount
how that pointer is offset and where that base pointer is located depends on base type so for a struct, it just means offsetting from base value_pointer for an array, it depends further on array type
lvalue vs rvalue for lvalue, indexer will result in different type coming out than going in we could optimize out indexers on constants / literals, but we will need to keep them on variables, since base pointer of variable is liable to change
we will do arrays after struct members, bc arrays will bring all kinds of complications
for example, will we allow appending to arrays?
probably, yes
but only dynamic arrays
will we allow access to array members as if we are accessing Array_View_64
almost certainly not
how will we handle user passing arrays to jai procs?
if we want to be able to pass e.g. a resizable array to an array_add in script
then we will have to be able to treat the any as an lvalue
also woul dhave to propogate up the typechecking that we need an lvalue,
but that is not something we do!
not only that, but we actually cant even handle polymorphic procedures as script procs
so we would have to have a builtin proc for array add
then maybe we could do the whole thing up propogating up that we need an lvalue
this will require changes to dyncall, maybe?
nah, just alterations to our custom typechecking
we will actually have to entirely rewrite the typechecking to take the ast nodes instead of just Anys or *Type_Infos
but that was needed anyhow
in order to be able to do member accesses on structs properly, we need to stop pushing structs themselves onto the stack as values instead, store struct values in constant section of script before stack space then only push and pop pointers to structs on the script stack
when we do a member access, we will pop a struct pointer, and place back down either a register-sized primitive value or a pointer to a different struct
[ nodes | variables and constants (values) | stack space ... sentinel bytes ]
need somewhere to place an actual array of Any's metadata for variables and constants in script
could do another pass of ast to allocate after we have all values down
nodes : [] AST.Node;
struct/array values : [] u8;
stack : [] u8;
externals : [] Script_Variable;
locals : [] Local_Variable;
Maybe we should have a value union on node for all register-size values?
also, maybe we should just make push/pop on stack work explicitly only for register-size values that way we can just be sure that alignment of values is not sucky we probably also want to do some alignment on values allocated in local variables segment this would elminate need for the get_stack_required proc, since all items would use exactly 8 bytes of stack
if everything is register size, our binary op proc also gets simpler
one issue though, is that for binops and procs we either have to push empty space, or we have to push a pointer to dst and the pointer given to the binary op execution proc mus tbe given either the stack dst or the value dst we can't do a pop-then-push thing for binary op or proc call, since there's no way around providing ptr to return val in struct case
this complication makes it where we can simply use a push
if all items on stack are the same size, then we easily can know how far to set back stack pointer for proc call case arg_count + 1 (for return value);
Unfortunately, its not completely trivial to desugar overloaded operators directly into procedure calls we should be able to do this no problem once we are in bytecode, but for executing from AST it creates problems not that we couldn't do it, but it seems like it would create some unnecessary complexity
...unless we just decide to put operator overlaods into the same array as script procedures maybe this is the way to go since it would allow us to make this transformation in the typechecking step and this would allow us to simplify code in several places
also, since we will need to change how procedures are typechecked anywyas, this may be an even bigger help to simplify in the first place typechecking on procedure arguments should look at the nodes themselves for the arguments so that we can coerce/cast values in a similar way to what jai does this is not urgent by any means but it would be quite useful
TODO: better typechecking on procedure arguments will require type casting remove need for variables and procedures resizable arrays in script struct move to constructor allocate fixed array view for these (only those variables/procedures that actually get used) add additional flags for variables (internal/external, constant, intermediate, subexpression value) add subexpression declarations
procedures as variables
currently, parsing and typecheckin of procedures is completely separate from variables
but variables could be procedures, and an expression can evaluate to a procedure pointer that maybe we want to be able to call
iterators are very much hardcoded right now also for loops only work on arrays, not ranges we will probably want to be able to do for loops on numeric ranges, or at least slice arrays and iterateresulting slice
whe resolving an identifier for a procedure, we should first search all variables for a match failing that, we search the procedures array, find all candidate procedures, and then resolve the best overload
make management of scopes better resolve declarations within nested scopes node needs pointer to prev and parent in addition to next
IDEAS FOR LATER
named blocks that can be called like funcitons, but don't take arguments these named blocks can then have new statements compiled ad added to them dynamically this would be useful for some application(s) maybe
Problem: we have two conflicting design issues to deal with
resolving identifiers
identifier node contains resolved declaration details
this was much nicer than having to point to a separate node for resolved declaration because that would requires us to make nodes for external variable and proceudres
but now this is a problem for iterators, because we need a flag that notifies a node that it's variable_ptr has an additional indirection on it
this is because the iterator needs to update a pointer to the `it` value
but because there's also no real concept of pointers in the scripts, indirections have to basically be handled automatically
and this extra indirection is a problem basically
i am not explaining this well
in any case, this would not be an actual problem if we instead have a pointer to a single resolved declaration
maybe we can still get the best of both worlds though by just making a Node_Declaration again which can be for any of the resolved types that are currently on the Node_Identifier
TODO:
use linked list for things that acutally need to have next/prev nodes instead of just having that on every node type
probably make it so that local variables get pushed on scope entry and popped on scope exit
this may require some refactoring like collecting all declarations in a scope and putting them in one spot
but this will be needed if we do the callable blocks thing, or ever do proper functions
add lexer callback or some means to insert declarations while parsing
it may be good to actually store the source location on nodes again, so that we can use to check if an identifier is used before its declaration in a more robust way
actually try to use scripts to make squares move around and change color
declare and call named blocks
not really proper preocedures, since they won't take parameters, just subroutines or way of splitting up code into sections
for strings, we will probably want to make it so that we can push larger things than single-register values
also, I'm not sure I actually made dyncall work with strings in the first place (lol, lmao even)
need to write more notes and formalize rules about compilation pipeline
can add new declarations of external variables through parsing directives
but cannot add or remove external variables once typechecking begins
parsing basically produces the full ast
we probably should not need to be creating new nodes during the typechecking phase
maybe some fringe cases for things like iterators or other such declarations that need to get automatically inserted
what if we want to parse thigns from multiple files and insert them into the same script?
maybe these should be considered separate named blocks?
then how to link and call between blocks?
If I can keep the script syntactically and semantically very close to native Jai code, then it should be a trivial conversion int he case that I ever want to simply convert soem of my scripts into real jai code.
Directives:
for now, we will jsut evaluate all arguments in parameter list and pass them to directive procedure straightforwardly for now, we will only accept a single bool return as a signifier of whether an error has occured
in future, we should allow a directive to return values
probably only one value though
if directive returns a *Node, we insert that node in the place where the directive was
if directive returns a different type, we will insert that value as a literal
also, in the future we should definitely just pass the raw *Node if that is the argument type that a directive accepts
Going to probably branch further off of gon and overhaul the syntax, focus more on the lead sheets integration and probably remove weird inconsistencies between how field refs and code expressions work
but first step is changing some things here in lead sheets particularly, I want to stop passing around the script to everything and make parsing/typechecking/execution more individually usable
disentangle execution/evaluation from allocation in pool make it easier to swap the allocator used for nodes,
we will always have to allocate nodes, since the grammar is too complex to do everything purely on the stack plus we already need temp storage for intermediates, so may as well just allocate the nodes there and make life easier
first write evaluate_node proc thought this would requre factoring all the nodes out into their own typecheck routines, but the thing is, its just too damn recursive we can't really use the existing typechecking proc or factoring it any way that allows us to use it piecemeal in the way we would need to to implement an evaluate_node proc and besides, I suppose it only makes sense that typechecking is all done at the same time, rather than interleaving it with execution or evaluation
so wat we really want is just an execute_temp proc that uses temp storage for the results instead of the stack
test this with existing framework
second figure out how to nicely swap node allocator in script
test run switch to using evaluate and temp storage for directives and just throw away the node afterwards
script state used during... parsing directives
pool / allocator
can probably just push this as context.allocator
but need to be careful about that...
also a bit tricky maybe to disentangle this from the stack that we allocate in the pool before the nodes?
lexer
current_scope
this can be passed as parameter, but this is cumbersome
both typechecking and execution need access to the the variables and procedures arrays for procedures, we really could just store the fn ptr, but for variables we have to be abel to sub them out, so there will always be a need for an additional indirection
the biggest thing preventing just having a totally general 'evaluate' or 'interpret' proc that just takes a string of text input as expression and then evaluates it is that we have tied our allocation and execution (due to using the stack) we could certainly just set our script's allocator to temp in order to interpret throwaway expressions, but then we still need the stack to execute those so, we should also have the option to use temp storage for our intermediate values and maybe we can even clean uip after ourselves with the whole set_watermark thing, assuming that's still in the language
we do want to put ref to token on node after all if we do end up wanting source info for the node this will keep our nodes slimmer and the tokens are honestly not too much to hold on to this would also mean we could just tokenize the entire script at once if we wanted to, which would give better locality to both our tokens and nodes
likewise for bytecode instructions in the future, we should hold on to a pointer to the node (only in debug) so that we can, well, debug.
regardless of whether we want to keep nodes or throw them away, we should run all expressions through the same basic pipeline of parse/typecheck/eval
evaluate doing its own typecheck may actually work out and maybe it really is better to do everything in a single traversal of the AST because we basically have either leaf nodes literals, identifiers don't call into recursive typechecking and may benefit from special coercions then there are just structural nodes like block, if, for, while and then misc like cast, subscript, dot the real tricky part is primarily gonna be the big messy nodes like operation and procedure but maybe we can break these up in a nicer way?
random maybe we don't do declarations in eval? malleable literals don't make sense in eval neither do directives, really. but that's harmless I guess tbh, maybe even assignments don't make sense in an eval? nvm, can make sense if lvalue is external variable
thinking about use of script as a context for evaluating scripts for a builtin console we could use the stack for executing statements / expressions on if we really wanted to, but eval + temp storage works just as well for that a more interesting idea is being able to play with declarations and scope in the console/repl user could get and store variables in the console's script ctxt and then use those later and if we wanted to just scope some stuff for a short while, we could have some directive to push/pop a scope as a little throwaway workspace or, we could even save the current scope/workspace and its variables to a file, since we can serialize our scripts pretty well if we add named blocks, you could also write and run blocks by name
#block(block_name)
do_stuff(a, b, c)
do_stuff(x, y, z)
#end_block()
#run_block(block_name)
named blocks will essentially be like functions in the script they wont' be run like everything else in the global scope if you jsut call execute_script() they can be called by name with #run(block_name) more like subroutines that functions, and they maintain their own state just like things in the global scope can don't take parameters or return values (just write a proc in jai and call that)
I could write an init script for my game that would restore the state or current level and such or set certain conditions just simple things like:
debug_init: {
load_level("platform.lvl");
set_player_powerup_state(.HAMMER);
debug_rendering = .PLAYER_VECTORS | .TILEMAP_COLLISION;
}
then any time i build in debug I just call the debug_init block on startup
default procedure parameters because type info for procedures doesn't contain info about default parameter values, we end up having to be pretty explicit in the scripts though tbh I've never had a procedure take more than like 2 parameters yet though I'm sure I will have that happen with some of the editro ui handles stuff coming up so one thought is to give the user the option for certain procedures to get default values for stuff that we would like to just be contextual to the script we can already insert a sort of explicit context for the script in the form of external variables and external procedures, but this is more like a contextual variable that gets filled in automatically for a specific procedure
idk.. I tried to write out an exampel and couldnt even think of a good contrived one
so this felt like a good idea for 1 minute and now I'm not sure if it would ever be useful tbhj
TODO: need to slim down our base node a bit if possible parent (scope) can probably become implicit, being passed as param in parsing and typechecking procedures then we only need to store the parent block on any other nodes which introduce scope which, we should probably factor out to a common node type like Code_Scope and simplify resolution of declarations or maybe we just do that with a procedure?
maybe we don't need source location on base node, especially if we hold on to source tokens and link back to those for the purposes of whitespace/comments preservation those tokens will probably use a Source_Code_Range instead of location
but, we do want to be able to attach a source code location to some jai code call site for nodes that are added to an AST programatically or from user input and we will need a flag on node to signify that node was inserted outside the context of parsing a larger file
also, storing next node is still somewhat questionable. It would be better if we had a real array of *Node, or maybe a real doubly-linked list structure for nodes main issue is that we want to be able to replace nodes more easily and the place wehere that is hard to do is in lists of nodes because when we are iterating over the list we can't easily get a **Node, which is what we need to replace
step 1 remove parent *Node from bade Node, move to a Node_Scope? fix up identifier resolution OR don't remove parent use proper doubly linked list for connecting nodes make it easy to replace nodes
step 2 remove source location lex all tokens into one big array save tokens and refer back to them from nodes
constants expressions, constant declarations, and malleable literals
these three are all sort of intereacting in a not-idela way at the moment
one thing I owuld like to be able to do with constant declarations is change them so that they're really just AST node references
but we mark these 'constant' declarations with the same flag that we use for saying that an expression is constant
and really, we'd like to be able to use this AST reference semantics with non-constant expressions
this would especially be useful in the context of a debug console, since we could basically use those 'constant' declarations' identifiers like macros for larger expressions
for example Entities :: get_active_entities(get_active_level());
maybe the solution is just to make a .MACRO flag that is used for declarations instead of the CONSTANT flag
that we the two concepts are distinct
then maybe we can even do some parameterized macros as a replacement for having procedure declarations in scripts
I really don't want to have procedures if you can't tell
for simle expressions that you may want to parameterize, we have macros
then for organizational purposes we have named blocks, which don't take parameters but can have their own internal state through external variables or malleable literals
for constant expressions then, we will be able to pre-evaluate those form thr root-most node and simplify them down into single literals
this will be a good step to run before lowering to bytecode
another consideration with constants is whether we even want to consider them at all in evaluation context
for example, it is kind of neat to be able to just use a non-constant type expression in declaring a variable in the debug console
this will probably just be some context flag in the script at a later point when we make certain aspects more configurable
along with things like using remap_data for automatic coercions, making LS feel a bit more dynamically-typed while still not actually being so
though we could also add some directive that would re-type a node
ok, so the problem with the macros right now is that we don't yet have a deep copy for ast nodes
which we will need, since we actually need to typecheck different instanciations of the macro individually
we probably want to change the way macros are parsed so that they can't get a type expression in the same way as normal declarations
we want the macros to be a bit more polymorphic i guess, so that means
but maybe copying nodes in the macro way actually breaks the ability to name malleable literals
since we would jsu tbe duplicating the literal node
and we actually want to point to the same literal node, since that's what has the underlying value we care about...
so im not sure if we can have these malleable literals and also macros that actually copy nodes
we could make the macros simply be references to the nodes in question, but then they are not polymorphic, just shorthands
i guess righ tnow I don't care to make the macros polymorphic and only want them for the shorthand aspect...
BUT the problem is we really NEED to retypcheck on macro instanciation because the macro may be used as either an lvalue or rvalue
and this is really where the main issue lies...
so the only option lef tis the hard option, which is to do some fancy stuff when we clone nodes for a macro
for the most part, we can probably just clone nodes as-is, but at least for malleable literals, we need to make sure that all instances share the same value_pointer and value_type
which also means that we place some limits on how polymorphic a macro containin a malleable literal can be, since the malleable literal's type needs to stay the same across all instanciations
that leaves a tiny bit of wiggle room, but not much
the other question is what to do about type inference on malleble literals? I guess we just have to be explicit about types there, or maybe the type gets dictated based on the
yeah, if we want macros to be polymorphic at all, then we need to take out the type expression and and malleable literals used therein need to be explicitly typed (or default to float or w/e)
due to all the complexity that this sounds like, I will probably just leave the macro stuff for a later date, after I make some more real progress on the game
it owuld be a good time to make sure everything still works properly, fix serialization, and maybe even do the whitepsace stuff first before adding such complex features
once we do get the macro thing working though, we can definitely use that to make the foreach more interesting
directives improvements we really want to just use evaluate_directive and not execute_directive
but perhaps better than that we just use the # character to denot that the following expression or statmeent should be run immediately
sort of liek a #run in jai
but with the added semantics that if the return value is a `*Node`, then that node gets appended in place of the 'directive'
then our directives can just be arbitrary expressions or chains of expresisons that transform nodes and and not just procedure calls
refactoring lexer to allow pre-tokenization of entire file will help with attaching token reference to ast nodes, while keeping some semblance of cache locality between nodes we actually have a major problem here, which is that this method of attaching tokens to nodes by pointer requires that our get_token and peek_token procs now return tokens by pointer rather than by value which means that all our tokens need to be heap-allocated... well maybe not, since the tokens at least have valid storage until consumed, at which point they are replaced... we could make the token buffer a bit larger and shorten the max lookahead so that the tokens are at least valid until the next token is consumed but this is still a bit janky I think, since now we need to manually dereference the tokens in certain situations and AAHHHH but all this complication is just because we are trying to allow both the current model for lexing and the new pre-lexing model the pre-lexing thing also has an issue in that it requires the use of a dynamic array which will certainly realloc and waste a bunch of space either in our pool or in the temp allocator
So it seems the best solution right now will just be to allocate space for the tokens we need at the time when we attach the token to the nodes
so we are gonna defer working on the pre-tokenization stuff for now and just store a single source token for the sake of the source location information.
also, at that time it may not be a bad idea to also attach trivia from line comments to the preceding token if it is on the same line e.g.
x := 5; // some line comment would have the comment attached to the semicolon (or declaration?) rather than token whatever comes next
get source location info on nodes and print that information in debug logging we need to just beef up logging anyhow, and maybe write some helper procs to capture the common logging patterns in each section of code
then finally, try to get trivia (whitespace/comments) working nicely in serialization. unfortunately cannot do this all trivially since some nodes require more or variable numbers of tokens...
would be nice to not have to store allocator on script, but I don't want to make this change until I am confident that all allocations the script does from some givne entry point can be done using a single allocator for now, it helps to know that we can have a context allocator and node/token allocator that are distinct in the future we should probbaly not have to do thigs this way as the need for multiple allocators seems like a design flaw or at least indicative of things being overcomplicated
when we get back to developing that whole feature about identifier renaming, go back and make that work with declarations as well
already beefed up logging in parsing step, now need to do typechecking and execution need to make sure we are properly getting source_token attached to nodes
we should make expect_token_type better and log an error message when type is not as expected
need to provide source file path when parsing from a source file, else provide source location of Jai file by default
we do now store source location info well and report it in most error logging statments still probably want to do the thing with expect_token_type though, and also add jai location on manufactured nodes
Result :: struct(T: Type) { success: bool; union { error: string; value: T; } }
would like to be able to append to error string, maybe would also like to convert from one result type to another for errors, so we at least need some binary compatibility there
// only for errors, since we have binary compatibility in that case recast_error :: (result: T/Result, to_type: R) -> Result(R) { assert(result.success == false); return .{ success = false, error = result.error }; }
implementing usable result objects will require considerable rewriting, since it basically affects every single procedure call maybe this will actaully just be too clunky to use, but will have to just try it and see
the main reason I want this is to be able to get better error reporting, or get error strings from evaluation procs but maybe a better solution is just to set some global or context error string that the user can then manually check
need to make declarations use an identifier node instead of just a name string (It's sort of silly that we don't just do this already, since the node is already allocated anyhow. I guess I was just trying to remove an extra indirection.)
Not sure what to do here yet, will have to refer to my notes from above.
Node replacement is pretty doable now, not too much else to say. Things got nicer in that department once we stopped doing the linked-list thing for nodes and just used [] Node instead.
Maybe that change will come back to bite me in the future if I want to do more dynamically add nodes to blocks, but for now it seems like a good decision since it just reduces complexity.
I think that the one case where I was really considering doing somethign of the sort was also jsut a case where one could externally augment LS to acheive the same result
e.g. maybe we have some user console with repl-y behavior that pushes all executed commands into a temporary buffer, then later allows the user to save that set of commands as a name block.
We made major improvemts to directives recently, making them much more powerful and giving them a much more flexible interface. The documentation is good enough in directives.jai that I would just point you to go read that, and maybe reference test.jai for an example usage.
Maybe come back to this later if we want to improve the trivia preservation further...
Basic verison is working, but work more is needed in order to improve things. Although, it's already pretty usable in the current state, so I may go ahead and start implement script modificaiton and re-serialization into my game engine.
Unable to save trivia on: before parenthesized expressions after final statement in block after final expression in struct literal after after type expression in declaration after final argument in procedure or directive call
in theory, we could save trivia before and after each node, then prevent appending pre-trivia if it is the same as last post-trivia but that's an extra 16 bytes per node for the extra string
I think I may just acept the minor imperfections and live with it this way long-term because the alternative is probably doing the things where we keep all source tokens and map each node to a token range, then overwrite tokens for what has changed on AST and I dont' want to go down that rabbit hole right now...
Maybe we don't need to store an allocator on the script and can instead just always use the pool / temp storage
I don't think the script ever allocates anything other than nodes and intermediate values
Need to check, but if this is actually the case then I can remove some logic around setting the script allocator, and just push_allocator(get_pool_allocator(*script.pool)); at each main script entry point
Is there really a need for this? We can already do everything we would want to do with #code or #insert using directives...
Or, operators as directives. These operators will need to operate on a given node type rather than on a given value type. For example, in GON, we will use this to implement custom identifier resolution for field references. This will be done with special prefix operators that act on a Node_Identifier at typechecking time, replacing the identifier node with an Any literal.
In general for language design, it would probably not be a good idea to allow operator overload to create new compile-time operators, but for Lead Sheets, it makes sense, since this language is sort of meant more as a collection of simple utilities out of which to construct your own simple language. So extending the language in this way seems very natural, in my opinion. The approach will probably just be to extend the operator table to include directive-like callback functions for manipulating nodes in the same sort of way.
if an operator is compile-time, then it can't have multiple overloads based on type although, we could have differnt versions for prefix, postfix, binary
we will probably first want to refactor operators slightly to add postfix operators rename UNARY to PREFIX, then add POSTFIX as well
operator table defines what operators exist on a syntax level this is currently constant, but we could make this resizable so that we can expand the operator table at runtime
had some thought the other day about putting operators overloads in their own array, separate from procedure overloads... don't exactly remember that whole traing of thought
syntax suggestion:
unary ? as an insert operator for Code Nodes
would also want a similar shorthand for basically #code
or maybe prefix ? is the nodes of and postfix .? means insert the nodes
could make macros much easier to write and use
prefix double question mark means get tokens instead of get nodes
can be inserted in the same manner
I've had enough drinks to make an out-there suggestion, but I was thinking the other day that it would be cool to have an operator with similar semantics to pointers for dealing with code nodes. So instead of 'address of' and 'dereference' operators you have 'nodes of' and 'insert' operators respectively. Say for instance we use prefix & for the former and postfix .& for the latter.
Putting parameters on node insertion becomes very clean as well:
code.?(scope=caller_code);
maybe this operator could be used for macro calls as well since these insert code into the caller's scope
some_macro(..macro_arguments).?(..insert_parameters);
If we just want to get Jai tokens, we can use this syntax: `??{ some 6 tokens , * % };
custom parsing with 'double question mark dot identifier' syntax takes parameter list for called parse procedure and second parameter list for insert parameters
??.parse_proc(..proc_args)(..insert_parameters)
... arbitrary text here ...
parse_proc :: (file: string, args: ..Any) -> (to_insert: Code, remaining: string)
Code node can also represent some array of tokens, so that custom parse proc can return either Jai tokens to be inserted or actual AST nodes parse proc actually cannot be a macro here, since we have no real AST yet in which the parse proc is being called, this is a purely syntactic construct in which the parse_proc is called as soon as it is parsed, takes over parsing to generate a new stream of tokens or AST, and then returns control to the compiler
But maybe this is a less powerful version my directives in LS...? also since the file is not yet parsed, the parse proc obviously has to be resolved as some identifier in a different workspace.
Even if you don't like the idea of having custom parsing in principle, I think it is at least fair to say that this syntax makes it extremely clear when such custom parsing is occuring, so there should at leas tbe no confusion about what is going on the only potential confusion would be telling where the custom parsing ends, but this should probably be quite apparent and with the help of editor tooling, syntax highlighting could make it even more obvious
If you editor supports using a dll or something in order to add syntax highlighting, it would be pretty trivial to write your custom parsing procs into a file that gets compiled to a dll for the editor to call so that your custom parsing proc can also provide the syntax highlighting to the editor
It would honestly probably not be too hard to just add a compiler plugin that implements a demo of this custom parsing
it's really just essentially a pre-processing step
I assume we can just pre-proceess every new file that gets loaded/imported or added through an #insert
and while that's not ideal since it means probably a lot a duplicated memory, it could be worse
we just scan every file for ??. and do the needful
now, of all the things I'm proposing here, this is the only one which would not actually require direct language support since as mentioned, ir can really be done as a pre-processor step It's essentailly syntax sugar for stuff that can already be done in the language (albeit much more verbosely)
but for things like actually passing tokens to macros, that would obviously require language-level support
other thoughts:
remap identifiers exported form macro calls using the insert parameter list:
some_macro().?(new_name = old_name)
maybe there is some possible generalization on how for loops allow remapping control statements like break and continue
the simple verison is probably just a token-level replacement
which is essentially what the break/continue override is: a simple replacement of a token with a new AST expression
the ultimate form of this would be a generalized AST find/replace mechanism
how exactly to implement that recursive AST expression search is still a question
and I will probably eventually work on that more with lead sheets
because we need some syntax for capturing varibles in what the epression can be
much like a regex for the AST
I wonder if I still have my old notes on this somewhere...
I've had this idea floating around in my head for a while, but curious to see what others think.
It's sort of related to the potential macro refactoring that iirc is still under consideration.
Basically the idea is to make handling code nodes more intuitive with #code and #insert operators that are syntactically and semantically similar to * and .* respectively.
Say just for example that we were to use ? and .?.
Then inserting a block of code with some parameters looks like code.?(scope=caller_code);, where the insert parameters are attached like so.
I think this syntax could make it easier to add new parameters on code insertion over time.
Maybe a trivial convenience, but using the ? in place of #code could make calling many macros with Code arguments more compact:
elem := array_find_where(foo, ?{ it.bar == baz });
More importantly, this syntax could provide an intuitive way to attach a second parameter list to a macro call, which makes it clear that some code is being inserted at the callsite:
foo := some_macro().?;
This second parameter list could allow the user to do things like renaming backticked identifiers, which could potentially be useful in a case where one wants to call a macro which exports an identifier in this manner twice in the same block, but wants the exported identifier to be different in each case. Contrived example incoming:
declare_int :: (value: int) #expand {
`number := value;
}
main :: () {
declare_int(3).?(number=x);
declare_int(5).?(number=y);
print("%, %\n", x, y); // "3, 5"
}
I think it would be quite in line with the language's design to make handling code as intuitive as dealing with pointers, and having some syntax similarity
AST regex: could use some such #code expression then provide a 'such that' clause defining constraints on variables here, the arrow notation attached to the prior #code expression shows type constraints on the variables this is sort of backwards to how a procedure declaration is structured I suppose
exprs := get_expressions_of_form(source_code, ?(x * y) -> (x: int, y: int, x != y));
This whole expression ?(x * y) -> (x: int, y: int, x != y) would have to resolve to some structured type
so that the user can then manipulate it
Iterating on the idea:
make it polymorphic with some type restriction:
?(x * y) -> (x: $T, y: T, x != y, type_info(T).type == .INTEGER)
maybe we don't want to use declaraiton syntax here actually, only boolean expressions, so we reformulate as
?(x * y) -> (type_of(x) == type_of(y), x != y, type_info(T).type == .INTEGER)
since we are no longer using declaration syntax, maybe we can use dollar as a wildcard in this context
or, we still allow declaration syntax but use it for the purpose of defining tokens that would otherwise be unparseable
e.g., we want to replace * with a node representing an arbitrary operator
well, nvm, I got a better idea there to just use the token 'operator' sine its already a keyword
?(x operator y) -> (type_of(x) == type_of(y), x != y, type_info(T).type == .INTEGER)
but maybe there's another example where we could use $ as some wildcard? maybe meaning wildard value?
I think any identifiers used in some search expression should have a declared value type
but then also, there's some question about whether the type in the decl should represent the type which the statement evaluates to, or the type of the statement itself
sort of an lvalue vs ralue thing, but maybe not really
what I mean is, say for example we have some procedure (int) -> int
well then is the type that we consider the expression as (int) -> int or just int?
in the context of looking for some expression like the above, we probably want to consider it just int
because what we mean by the above search expression is to find all instances of operators where the operands are both integers
but then how would we be able to detect the other case?
an attempt: ?( (($) -> x) * y ) -> (...)
this doe snot work because x here is presumably then a type expression
here we use that wildcard to represent that parameters to proc can be anything
then we catch x as the return type, so that it can be checked against the same constraints
so this search expression would find any case where a procedure returns an integer and the result by another integer
crap --> (x: T = ($) -> ) i dunno anymore
the difficulty is that we want to be able to bind certain subexpressions to variables, but those subexpressions may themselves have their own variables which we want to pull out
?(x * y) -> (x: ($T) -> int), y: int)
this search would yeild the type info for T in addition to providing the values for x and y
Code_Search_Expression :: struct {
expression_form: *Code_Node;
constraints: [] Expression_Constraint;
}
Expression_Constraint :: struct {
constraint_type: enum { SUBEXPRESSION_TYPE;
... more stuff to figure out later
}
will need some Code_Wildcard type to represent arbitrary subexpressions in search expression
go back to the examples where i was trying to insert an Any returned tby a #run as a statically typed variables I feel like this is something I should be abel to do but I remember it being nearly impossible to do tersely and there was also some related issue there with pointers not being able to have a null value i think
for typechecking, we want to be able to return an error enum to denote the reason for failure, which should be helpful in certain cases which are currently somewhat ambiguous we also want to be able to return an error for the sake of user callbacks
multiple returns is nicer from user side though, so we will just adapt with wrapper procs
simplify error messages, don't format with location in error string, just attach location to Error object. if user then prints error later, then format location with error message
Using dyncall in lead sheets is now optional, and there are both benefits and drawbacks that the user should be aware of when choosing whether or not to use dyncall. By not using dyncall, you lose a little bit of functionality / flexibility, but gain the ability to compile to targets that dyncall does not support.
Calling Jai procedures with dyncall currently only works with the LLVM backend this is because the calling convention for Jai is not precisely defined, and so it really only works as a sort of hack Hopefully when Jai gets an official release, it will also have a well-defined calling convention so that I can properly support it going forward.
dyncall does not work at compile-time There may be some way around this if I could only compile a dyncall dll, but I haven't put in the time to figure that out quite yet. Now of course, there's probably no reasonable use case for running dyncall at compile-time, but I still feel this is a restriction worth mentioning.
dyncall does not work when compiling to web assembly If you want to target wasm, then you will have to disable use of dyncall. There are also other platforms that dyncall does not support, which you can just check on their website. It could be possible to simulate dyncall-like functionality on the web by doing something crazy like generating JavaScript that calls back into Jai, but that is not very practical nor is it something I want to figure out at the moment.
without dyncall, function pointers are mostly unusable inside of scripts The user does have the option to manually register proceudre types so that wrappers can be generated, however this obviously requires that the user know the types of the function pointers that may be used in a script ahead of time.
without dyncall, the wrapper procedures which are generated to marshal arguments and return values will probably add a littel bit of code bloat, though it should not be too severe.
TODO: replace #procedure_of_call with #bake_arguments, add type as separate parameter like before check what the function signature looks like after baking maybe we can remove the need for the cast when storing proc ptr to procedure_wrappers?
make c call procedures work
in dyncall
replace MAKE_C_CALL with something more like call_procedure
try to make it so that we can separately bake procedure type and specific procedure pointer
Got this new idea, not sure how best to make it work though. The basic idea is that we can use some special syntax to access dynamically-added members on some identifiers.
For example:
entity->cycle_offset = 0.3;
cycle offset is not a real member of the entity struct, but we can treat it like a sort of virtual member that is stored like any other script variable so we are basically just using the entity as a sort of namespace to access what is really an internal script variable
the big issue I have right off the bat is that we can't really just bind this to an identifier itself, we need to bind the member to some other particualr variable
e.g., it is an identifier in a for loop that will refer to several different instances of the same type
and we don't want the virtual member to be bound to it, but to each individual instances that it points to
but unless the bounds of iteration are known ahead of time,
alternatively, we can just make the virtual member lookup and a completely dytnamic operation i.e. the user needs ot implement some virtual member lookup procedure that returns an any for the value
In this scheme, above example would just desugar to something like
`virtual_member_lookup(entity, "cycle_lookup")`
I don't like this because it pushes typechecking to execution-time
and this isnt just trivially slower due to the additional typechecking, it would mean that we would need to type hint every single spot where virtual members are used as a rvalue
ideally, we can find a way to make virtual members statically typed, apply to specific instances rather than types, and will require that we in some way declare the members on each instance before using them coudl use regular declaration syntax for this, but the semantics are quite different, since we
Any procedures concerining virtual members should be function pointers that we can override, just like the parsing procedures.
add_virtual_member :: (owner: Any, name: string, type: *Type_Info) -> ok: bool, value: Any {
// how to add virtual members will actually depend very much on the data type we are dealing with, probably
// because in many cases we cannot assume that pointers will be stable across different executions of the script
// so we need some user-level owner resolution
// but really, all we will ever be able to give to the user here is the Any, since this is particular to some instance variable, not to an identifier or some node
// (we can't use an external variable node due to the iterator problem)
so the user will just have to use the pointer + type as a handle, and perform any of their own validation as necessary...
we will probably want to first do some internal checks to see if the virtual member is owned by anything within the script itself, then hand it off to the user afterwards
}
get_virtual_member :: (owner: Any, name: string, type: *Type_Info) -> ok: bool, value: Any {
ok, value := internal_get_virtual_member(owner, name);
ok ||= value.type == type
return ok, value;
}
Doing this in a statically typed way should actually be relatively simple now that I think about it. There's just an added wrinkle to the semantics of declarations now such that we may change the value pointer each time we execute the declaration (if the declaration is for a virtual member). This will mean that we can just resolve the usage sites as per usual, no changes necessary. And we will only need to do any kind of type assertion once, at the declaration site.
Because we need to declare virtual members before we can use them in any case, this does mean that we will have to do some extra work to declare virtual members even if we don't set their values.
Will require either adding --- keyword to mean non-initialized in declaration or just not zero-ing virtual members in by default.
Another thing to consider about virtual members is that we want them to be easy to look up or access from outside the script that way we can display them inside things like imgui menus, e.g. 'Entity Details' panel in my game
I am now noticing that we have a major problem with my plan And that problem is having the ability to support any kind of complex expression on the left side of an arrow because we are essentially just using the left side as a namespace for the declaration, we cannot compare any complex expressions or use those expressions as a namespace
we could go to an almost totally dynamic way of dealing with virtual members, but that has a lot of problems, of course (because then we're just back in the dynamic typing boat)
we could to the dynamic version and then just use casts as a sort of type assertion or insert auto casts in
if we go the totally dynamic route, then we don't need to change how declarations work we would only have to change assignments or add some implicit
OK, I think for now what I will do is just implement the more constrained version with static typing, where we only allow simple identifiers on the left side of an arrow
we could also potentially do something like, the virtual members do get associated with a type, and must be declared for that type, but they can be null on any individual instance of that type and if they are null that's just a runtime error
that's probably the way to go...
// declare virtual member for a type
Entity->cycle_time: float;
Entity->cycle_lerp: float;
Entity_Group->tempo: float;
// using virtual members
for entity_group("orbiters") {
entity->cycle_lerp = cycle_over(time * orbiters->tempo, entity->cycle_time);
set_next_offset(entity, circle(cycle_lerp));
}
if we declare virtual members as being owned by some type rather than by some identifier then we get the benefit of being able to access virtual members through arbitrarily complex expressions which would be very helpful if we have, for example, some kind of lookup procedures that return some entity/entity group on the downside, we cannot use the same virtual member identiifers for different data types on differnet instances of the same owning type for example, we could not have one entity that uses 'range' to identify a virtual float member while another uses it for a vector2 but maybe his limitation is not that big of a deal...
also, perhaps we should allow declaring a virtual member using an instance of some type on the left side this could be semantically the exact same as the usual case where the left side is a type, but we just get the type for the user implicitly this could be convenient since we don't have anything like type_of() in LS not that this would be hard to add though
NOTE: as an aside, perhaps we should sort all entity groups each frame, getting a temp array of the members and pass the entity groups as external variables to the levle script so then you can just iterate orbiters.members and can attach virtual members to the groups themselves
The first, minimal implementation of virtual members seems to be kinda working now but there is a lot of cleanup that will be needed in order to make things nice again
firstly, we may actually want to consider making the arrow its own node type, even it is lexically similar to the dot
secondly, we should make resolving declarations a lot cleaner instead of using two separate procedures for regular and virtual declarations, we should just pass the resolved node for the virtual member owner
thirdly, we should clarify what parts of an arrow node get flagged as typechecked and under what circumstances
declaration
left is either:
identifier
name (acts as declaration nameb)
arrow
left identifier
(acts as a sort of namespace for the virtual member, matters what we resolved to)
we need to be able to compare two identifiers to see if they resolve to the same underlying thing
right identifier
name (acts as declaration name, sorta)
TODO: need to introduce some procedure to check if an identifier can be used as a namespace we don't want the user to be able to add virtual members on top of other virtaul members for th time being. that sounds like poopoo doodoo (as is the technical term)
TODO: we should note somewhere in the code that we leave identifiers in the unresolved state when they are the terminal/primary identifier for some declaration
implementing virtual member declarations on a per-type basis rather than a per-identifier basis
restrict the types of identifier which can be used as a declaration namespace to only type for the time being when a virtual member is created on a type, we need to somehow attach that information to the type registered on the script so External_Type may need to become a new struct type like External_Procedure or External_Variable
remove the logic to execute virtual member declarations, since virtual member declarations on types would be a purely compile-time thing unless we somehow set a default value for virtual members that get registered on types, then provide that virtual member default value when we call get_virtual_member
TODO: we should add a mechanism to evaluate a script in a such a way that it is tolerant to errors this could be useful when identifiers may change or become invalidated, such as when using the scripts in the context of a level editor
I now have some basic stuff in place to get info about procedure arguments which we could integrate with external procedures. - still need to collect info about varargs parameters, I overlooked that before
However there are some other things that need to be put in place first:
- Parse struct contents with comma-delimited name = value syntax
- same procedure will apply for parsing named procedure argument expressions
- slightly different since we can begin with unnamed arguments, then begin using named arguments (cannot do this in structs)
- typechecking changes (duh)
- will complicate process of matching arguments and overload resolution
- how to rate conflicting overloads where one has some default argument provided implicitly and one does not?
- execution changes
- how to denote that some argument is filled by default value, which can be non-constant (e.g. context values)?
- do we need a Node_Argument can either point to another node or be null to indicate default value should be used?
need a consistent procedure for determining what expressions can be validly used as malleable literals should use same basic procedure for checking validity and setting the node as a malleable literal
We should check that the literal expression being used as a malleable literal is actually constant since we can have struct literals that are not constants
TODO
There's no reaosn we should be using some ridiculous dynamic cast for all casts in lead sheets we should at least have builtin casts for the numeric types, I think This is not a major concern at the moment, but it something we probabyl should take care of soner or later The only benefit of staying with the current way of doing casts is that we can catch and report failed casts as runtime erros, and choose to ignore them.
TODO: make sure that procedure resolution failures don't override error message for deeper nodes when the error is not from a failed type hint
TODO: add the ability to name it and it_index