-
Notifications
You must be signed in to change notification settings - Fork 50
Store all permutations in a global shared array, so that one 32-bit index per register is sufficient. #239
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
mandolaerik
commented
Nov 14, 2023
- remove unused method
- Avoid re-calculating read/write field order
- Store field order as permutation indices instead of field references
- Store all permutations in a global shared array, so that one 32-bit index per register is sufficient.
|
Verification #12507032: fail |
index per register is sufficient. Some sophisticated memo/startup trickery is used to clean up temporary data structures.
b97dc54 to
5b52901
Compare
|
This is yet unpolished, and meant as a basis for discussions (but at least 1.4/lib/fields passes).
|
|
Verification #12507039: fail |
lwaern-intel
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this approach is... though not pretty, it is fine when adjusted. My big peeve is _permutation_table_helper(), but that's fixable (see comment.)
The guarantee we offer that the implicit calls of independent startups happen before independent startup memoized was to offer a degree of control when startup memoizeds had dependencies, so it's not like we're operating outside their boundary of what they were intended for (except _permutation_table_helper(), which, again, is fixable.)
| int n = strlen(perm); | ||
| VGROW(h->table, n); | ||
| for (int i = 0; i < n; i++) { | ||
| VSET(h->table, idx + i, perm[i] & 0x7f); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This bitmask is to be rid of the upper bit used to prevent the NUL char, yeah? Have a comment speaking to that effect. Also consider & ~(1 << 7), I prefer that pattern before literals.
| independent startup method _cleanup_permutation_table_helper() { | ||
| foreach reg in (each register in (dev)) { | ||
| reg._get_read_fields(); | ||
| reg._get_write_fields(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These implicit discards hurt me and may cause Coverity to get mad. Consider binding these to local variables or make this PR dependent on #237 (hint hint nudge nudge review please)
| ++n; | ||
| } | ||
| _DML_add_read_permutation(h, cast(®, _traitref_t *)->id, | ||
| reg._sort_permutation(lsbs, n)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The result of _sort_permutation must be deallocated. Either by _permutation_table_helper() or by _DML_add_read/write_permutation.
Alterternatively, allocate the array on the stack, just like you do with lsbs.
| { | ||
| uint8 *ret = MM_MALLOC(VLEN(h->table), uint8); | ||
| memcpy(ret, VVEC(h->table), VLEN(h->table)); | ||
| return ret; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternatively, detach the associated array and realloc it to free excess space. One way to do that without relying on vect.h internals would be:
VRESIZE_FREE(h->table, VLEN(h->table));
uint8 *ret = VVEC(h->table);
VINIT(h->table);
| -> (const uint8 *) { | ||
| return _DML_field_permutations(_permutation_table_helper()); | ||
| } | ||
| independent method _permutation_index_read(object o) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No compelling reason to have this and _permutation_index_write take object instead of register. The architecture is explicitly for field permutations, and its limitations means it cannot be reasonably used for anything else.
| } | ||
| return h; | ||
| } | ||
| independent startup memoized method _field_permutation_table() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you want to be super-duper performance petty, you can consider the extern C approach for this too:
header %{
extern const uint8 * _field_permutation_array UNUSED;
%}
footer %{
const uint8 * _field_permutation_array UNUSED;
%}
extern const uint8 *_field_permutation_array;
... initialized in _cleanup_permutation_table_helper. This will eliminate the function call needed to retrieve the memoized value.
| local uint64 unmapped; | ||
| (num_fields, unmapped) = this._get_read_fields(fields); | ||
| local uint8 lsbs[64]; | ||
| local uint8 msbs[64]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the benefit of leveraging these instead of doing:
local int f_lsb = fields[f].lsb;
local int f_msb = f_lsb + fields[f].bitsize - 1;
in the .read_field() loop? Note you're not using them (even though you declare and initialize them) in write_register.
| method post_init() default {} | ||
|
|
||
| method destroy() default { } | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why device template instead of top-level?
| method destroy() default { } | ||
|
|
||
| independent startup memoized method _permutation_table_helper() | ||
| -> (_permutation_helper_t *) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I don't like this either, in particular I don't like the part that _cleanup_permutation_table_helper will invalidate the memoized value with that being impossible to check, so if something goes wrong and we have a usage past _cleanup_permutation_table_helper we'll segfault.
My suggestion is to leverage an externed C variable instead:
header %{
-- extern linkage to support --split-c-file
extern _permutation_helper_t *_permutation_table_helper UNUSED;
%}
footer %{
_permutation_helper_t *_permutation_table_helper UNUSED;
%}
extern _permutation_helper_t *_permutation_table_helper;
independent method _initialize_permutation_table_helper() {
_permutation_table_helper = _DML_new_permutation_helper();
...
}
independent startup method _cleanup_permutation_table_helper() {
_initialize_permutation_table_helper();
...
_DML_destroy_permutation_table_helper(_permutation_table_helper):
_permutation_table_helper = NULL:
}
This way, we can guard all usages of _permutation_table_helper by assert _permutation_table_helper;.
|
We have segfaults, by the way! This may or may not be due to the thing I complained about, regardless, you'll want to track them down. |
|
This PR does demonstrate an interesting memoization technique, but after #243 it is no longer needed. |