Skip to content

Conversation

@mandolaerik
Copy link
Contributor

  • remove unused method
  • Avoid re-calculating read/write field order
  • Store field order as permutation indices instead of field references
  • Store all permutations in a global shared array, so that one 32-bit index per register is sufficient.

@syssimics
Copy link
Contributor

Verification #12507032: fail

index per register is sufficient.

Some sophisticated memo/startup trickery is used to clean up temporary
data structures.
@mandolaerik mandolaerik force-pushed the pr/store-all-permutations-in-a-global-shared-array-so-that-one-32-bit-index-per-register-is-sufficient branch from b97dc54 to 5b52901 Compare November 14, 2023 19:46
@mandolaerik
Copy link
Contributor Author

This is yet unpolished, and meant as a basis for discussions (but at least 1.4/lib/fields passes).

  • does this approach make sense, or is it a dead end?
  • The memoization of _get_read_fields adds some ugliness -- just look at _cleanup_permutation_table_helper -- we can consider skipping this and keep hash tables instead, doing an identity_t lookup every time. Probably somewhat slower in runtime, and may use more or less mem, but the diff should not be dramatic.

@syssimics
Copy link
Contributor

Verification #12507039: fail

Copy link
Contributor

@lwaern-intel lwaern-intel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this approach is... though not pretty, it is fine when adjusted. My big peeve is _permutation_table_helper(), but that's fixable (see comment.)

The guarantee we offer that the implicit calls of independent startups happen before independent startup memoized was to offer a degree of control when startup memoizeds had dependencies, so it's not like we're operating outside their boundary of what they were intended for (except _permutation_table_helper(), which, again, is fixable.)

int n = strlen(perm);
VGROW(h->table, n);
for (int i = 0; i < n; i++) {
VSET(h->table, idx + i, perm[i] & 0x7f);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This bitmask is to be rid of the upper bit used to prevent the NUL char, yeah? Have a comment speaking to that effect. Also consider & ~(1 << 7), I prefer that pattern before literals.

independent startup method _cleanup_permutation_table_helper() {
foreach reg in (each register in (dev)) {
reg._get_read_fields();
reg._get_write_fields();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These implicit discards hurt me and may cause Coverity to get mad. Consider binding these to local variables or make this PR dependent on #237 (hint hint nudge nudge review please)

++n;
}
_DML_add_read_permutation(h, cast(&reg, _traitref_t *)->id,
reg._sort_permutation(lsbs, n));
Copy link
Contributor

@lwaern-intel lwaern-intel Nov 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The result of _sort_permutation must be deallocated. Either by _permutation_table_helper() or by _DML_add_read/write_permutation.
Alterternatively, allocate the array on the stack, just like you do with lsbs.

{
uint8 *ret = MM_MALLOC(VLEN(h->table), uint8);
memcpy(ret, VVEC(h->table), VLEN(h->table));
return ret;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively, detach the associated array and realloc it to free excess space. One way to do that without relying on vect.h internals would be:

VRESIZE_FREE(h->table, VLEN(h->table));
uint8 *ret = VVEC(h->table);
VINIT(h->table);

-> (const uint8 *) {
return _DML_field_permutations(_permutation_table_helper());
}
independent method _permutation_index_read(object o)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No compelling reason to have this and _permutation_index_write take object instead of register. The architecture is explicitly for field permutations, and its limitations means it cannot be reasonably used for anything else.

}
return h;
}
independent startup memoized method _field_permutation_table()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you want to be super-duper performance petty, you can consider the extern C approach for this too:

header %{
    extern const uint8 * _field_permutation_array UNUSED;
%}
footer %{
    const uint8 * _field_permutation_array UNUSED;
%}
extern const uint8 *_field_permutation_array;

... initialized in _cleanup_permutation_table_helper. This will eliminate the function call needed to retrieve the memoized value.

local uint64 unmapped;
(num_fields, unmapped) = this._get_read_fields(fields);
local uint8 lsbs[64];
local uint8 msbs[64];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the benefit of leveraging these instead of doing:

            local int f_lsb = fields[f].lsb;
            local int f_msb = f_lsb + fields[f].bitsize - 1;

in the .read_field() loop? Note you're not using them (even though you declare and initialize them) in write_register.

method post_init() default {}

method destroy() default { }

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why device template instead of top-level?

method destroy() default { }

independent startup memoized method _permutation_table_helper()
-> (_permutation_helper_t *) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I don't like this either, in particular I don't like the part that _cleanup_permutation_table_helper will invalidate the memoized value with that being impossible to check, so if something goes wrong and we have a usage past _cleanup_permutation_table_helper we'll segfault.
My suggestion is to leverage an externed C variable instead:

header %{
    -- extern linkage to support --split-c-file
    extern _permutation_helper_t *_permutation_table_helper UNUSED;
%}
footer %{
    _permutation_helper_t *_permutation_table_helper UNUSED;
%}

extern _permutation_helper_t *_permutation_table_helper;

independent method _initialize_permutation_table_helper() {
    _permutation_table_helper = _DML_new_permutation_helper();
    ...
}

independent startup method _cleanup_permutation_table_helper() {
    _initialize_permutation_table_helper();
    ...
    _DML_destroy_permutation_table_helper(_permutation_table_helper):
    _permutation_table_helper = NULL:
}

This way, we can guard all usages of _permutation_table_helper by assert _permutation_table_helper;.

@lwaern-intel
Copy link
Contributor

We have segfaults, by the way! This may or may not be due to the thing I complained about, regardless, you'll want to track them down.

johanhogberg

This comment was marked as off-topic.

@mandolaerik
Copy link
Contributor Author

This PR does demonstrate an interesting memoization technique, but after #243 it is no longer needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants