-
-
Notifications
You must be signed in to change notification settings - Fork 37
[PoC] Parse integer data #32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[PoC] Parse integer data #32
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the direction of this! Great work!
src/implementation/js/node/int.ts
Outdated
const ctx = this.compilation; | ||
const index = ctx.stateField(this.ref.field); | ||
|
||
out.push(`${index} = (${ctx.bufArg()}[${ctx.offArg()}] & 2 ** 7) * 0x1fffffe;`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpick: Let's use 1 << 7
instead of 2 ** 7
. Mixing binary and floating point operators looks fishy.
I'm not sure that this line does what you want it to do, but I understand that this is work in progress. We can discuss it later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is based on how Node.js Buffer
also implements reading integers: https://github.com/nodejs/node/blob/f5512ff61ecb668c2f49b7c05d3227ef7aa5e85f/lib/internal/buffer.js#L416
I thought being consistent with core would make sense here? 🤔 I'm not sure what the pros or cons of switching to 1 << 7
would be. 🤷♂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we could also just inline the result of the multiplication / shifting by changing this to:
out.push(`${index} = (${ctx.bufArg()}[${ctx.offArg()}] & 2 ** 7) * 0x1fffffe;`) | |
out.push(`${index} = (${ctx.bufArg()}[${ctx.offArg()}] & ${2 ** 7}) * 0x1fffffe;`); |
What do you think?
} | ||
|
||
case 1: { | ||
out.push(`${index} += ${ctx.bufArg()}[${ctx.offArg()}] * 2 ** 8;`); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpick: same here, let's use <<
for binary operations.
} | ||
|
||
case 2: { | ||
out.push(`${index} += ${ctx.bufArg()}[${ctx.offArg()}] * 2 ** 16;`); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpick: ditto.
} | ||
|
||
case 1: { | ||
out.push(`${index} += ${ctx.bufArg()}[${ctx.offArg()}] * 2 ** 8;`); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpick: ditto.
} | ||
|
||
case 1: { | ||
out.push(`${index} += ${ctx.bufArg()}[${ctx.offArg()}] * 2 ** 8;`); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpick: ditto.
} | ||
|
||
case 3: { | ||
out.push(`${index} += ${ctx.bufArg()}[${ctx.offArg()}] * 2 ** 24;`); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpick: ditto.
|
||
switch (this.ref.byteOffset) { | ||
case 0: { | ||
out.push(`${index} = ${ctx.bufArg()}[${ctx.offArg()}] * 2 ** 8;`); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpick: ditto.
|
||
switch (this.ref.byteOffset) { | ||
case 0: { | ||
out.push(`${index} = ${ctx.bufArg()}[${ctx.offArg()}] * 2 ** 16;`); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpick: ditto.
} | ||
|
||
case 1: { | ||
out.push(`${index} += ${ctx.bufArg()}[${ctx.offArg()}] * 2 ** 8;`); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpick: ditto.
} | ||
} | ||
|
||
private readUInt24BE(out: string[]) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds like it could be generalized a bit. There is no runtime savings from having separate methods for 16
, 24
, etc bits. These functions are executed only at compile time, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. My thinking was to keep these separated for readability - having all the logic in a single huge if/elseif/else
statement made this really hard to understand. 🤷♂ Let's see how this looks like once all the other operations are in place too.
@indutny I see that properties don't carry any sign information - is that a purposeful design decision in llparse? Does that mean that property access in C needs to be casted appropriately to not get the unsigned value? Am I missing something? 🤔 |
@arthurschreiber this is a design decision that was historically motivated by bitcode output. C has to cast fields if a signed access is required. |
@arthurschreiber Sorry if I am writing this question 5 years later but I thought this was an interesting idea since I wanted to write a socks5 protocol parser at some point. Would it be smart if I tried re-implementing this concept with the newer updates that have been made to llparse? |
Feel free to take this for a spin and see if you can get it into a mergeable state. For my purpose (parsing binary data from a stream in JavaScript), I've switched to a VM inspired parsing approach instead. I can see if I can dig up the code if you're interested? |
@arthurschreiber when I open the pull request again I sure will ask you if I have questions otherwise I should be fine. I actually attempted to simulate this library in python a couple years back so I'm sure I should have no problems re-learning things I've learned previously. |
This adds a very rough, unfinished implementation for parsing little- and big-endian integers, based on my question here: #31
See also these 2 PRs:
Int
node to parseintBE
,intLE
,uIntBE
anduIntLE
values. llparse-builder#1Int
nodes. llparse-frontend#1This only includes the code to generate JS output (for now), and has no tests at all. 🤷♂ I'll try to flesh this out if I can find some more time.
Example input
Example output