Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

convert_value_type checks has broken code #4785

Open
Micket opened this issue Mar 3, 2025 · 3 comments
Open

convert_value_type checks has broken code #4785

Micket opened this issue Mar 3, 2025 · 3 comments

Comments

@Micket
Copy link
Contributor

Micket commented Mar 3, 2025

I was going to see what it would take to allow checksums to be a plain top level dict, and it turns out it's very simple. The only code preventing this is this code

def to_checksums(checksums):
"""Ensure correct element types for list of checksums: convert list elements to tuples."""
try:
return [_to_checksum(checksum) for checksum in checksums]
except ValueError as e:
raise EasyBuildError('Invalid checksums: %s\n\tError: %s', checksums, e)

which converts everything to a list (of just the keys).

First problem is that, while our documentation suggests that a simple single string is supported, this is not the case. Such a string would be converted to a list of single characters. This never worked as far as I can tell. I suppose this is good, because that's one pattern we can then just remove. It must always be a list.

Secondly, if one tries to modify to_checksums, to for exmaple allow a dict to be passed through

def to_checksums(checksums):
    """Ensure correct element types for list of checksum items: convert list elements to tuples."""
    try:
        if isinstance(checksums, dict):
            return {key: _to_checksum(checksum) for key, checksum in checksums.items()}
        else:
            return [_to_checksum(checksum) for checksum in checksums]
    except ValueError as e:
        raise EasyBuildError('Invalid checksums: %s\n\tError: %s', checksums, e)

or similar, the code trips up over

if not isinstance(res, typ):
raise EasyBuildError("Converting value %s to type %s didn't work as expected: got %s", val, typ, type(res))

which is trying use

TYPE_CONVERSION_FUNCTIONS = {
str: str,
float: float,
int: int,
str: str,
CHECKSUMS: to_checksums,
DEPENDENCIES: to_dependencies,
LIST_OF_STRINGS: to_list_of_strings,
TOOLCHAIN_DICT: to_toolchain_dict,
SANITY_CHECK_PATHS_DICT: to_sanity_check_paths_dict,
STRING_OR_TUPLE_LIST: to_list_of_strings_and_tuples,
STRING_OR_TUPLE_OR_DICT_LIST: to_list_of_strings_and_tuples_and_dicts,
}

which is defined here:

CHECKSUMS = (list, as_hashable({'elem_types': [type(None), str, CHECKSUM_AND_TYPE,
CHECKSUM_LIST_W_DICT, CHECKSUM_TUPLE_W_DICT, CHECKSUM_DICT]}))

but this is not a value input for isinstance, and the code crashes with

TypeError: isinstance() arg 2 must be a type or tuple of types

clearly, this part of the code has never even worked. Shall we just go ahead and delete this extra unnecessary check?

I really only want to allow for checksum to a be a dict, and it almost seems from the code like it should be supported already by CHECKSUM_DICT via CHECKABLE_TYPES , but since the code always checks for is_value_of_type(val, typ) anyway, where typ == list then this means nothing. It's always going to be forcible converted to a list no matter what, so it's pointless to add all of these to CHECKABLE_TYPES?

@Flamefire am I missing something here?

@Micket Micket moved this to Nice-to-have in EasyBuild v5.0 Mar 3, 2025
@Micket Micket changed the title convert_value_type checks for checksums has broken code convert_value_type checks for checksums has broken code Mar 3, 2025
@Micket Micket changed the title convert_value_type checks for checksums has broken code convert_value_type checks has broken code Mar 3, 2025
Flamefire added a commit to Flamefire/easybuild-framework that referenced this issue Mar 4, 2025
We cannot use `isinstance` on non-trivial types, i.e. our tuple-specifications.
Incidently it works if the result is the type in the first element of
the tuple, otherwise it fails with a `TypeError` because the remainder
is not a type.

The correct way is to use our `is_value_of_type`.

2 small corrections:

1. We don't need `EASY_TYPES` which had some types missing, e.g. `float`.
   Instead we can check if a type was passed and hence can be used with
`isinstance`.
2. `convert_value_type` shouldn't be called when the type is already correct.
   We ensure that in all usages in framework code.
   So issue a warning for that case.

Fixes easybuilders#4785
@Flamefire
Copy link
Contributor

Flamefire commented Mar 4, 2025

to_checksums is only called when the type doesn't match as per the definition. We have test_to_checksums which ensures the conversion function does not modify valid types. In 5.0x I fixed and enhanced some parts of the checksum type checking.
So I guess the function itself is good.

while our documentation suggests that a simple single string is supported, this is not the case.

AFAIK we haven't supported that for a long time, if at all. Can't find where the checksum param was ever supported to be only a string. Where did you find that? I found:

The checksums easyconfig parameter is a list usually containing strings.

As for:

but this is not a value input for isinstance, and the code crashes with

TypeError: isinstance() arg 2 must be a type or tuple of types

clearly, this part of the code has never even worked. Shall we just go ahead and delete this extra unnecessary check?

It does work (somehow): For checksums = ('a', ) the type conversion function is called, yielding a list from the tuple and passes the isinstance

However you are right: The code is incomplete: We only support a single type for easyconfig params, no alternatives. And all our conversion functions are "correct enough". What happens:

  • checksums does not pass the is_value_of_type check
  • to_checksums is called returning a list
  • isinstance(['a'], (list, ('foo', 'something'))) is called
  • `isinstance iterates over the tuple and returns True on the first match, see python doc

    TypeError may not be raised for an invalid type if an earlier check succeeds.

So the wrong value doesn't matter for our current usages.

However it still is wrong and I opened a PR to make this right.

With this you can likely implement alternatives like: CHECKSUMS = [(list, as_hashable(...)), CHECKSUM_DICT]
You only need to modify is_type_of to allow for lists as the type param and return True if any matches:

def is_type_of_single(val, type): renamed from current
def is_type_of(val, type): types = type if isinstance(list) else [type]; return any(is_type_of_single(val, t) for t in types)

Maybe this can be used from or combined with code from check_element_types or so where we already have logic to allow alternative types

Flamefire added a commit to Flamefire/easybuild-framework that referenced this issue Mar 4, 2025
We cannot use `isinstance` on non-trivial types, i.e. our tuple-specifications.
Incidently it works if the result is the type in the first element of
the tuple, otherwise it fails with a `TypeError` because the remainder
is not a type.

The correct way is to use our `is_value_of_type`.

2 small corrections:

1. We don't need `EASY_TYPES` which had some types missing, e.g. `float`.
   Instead we can check if a type was passed and hence can be used with
`isinstance`.
2. `convert_value_type` shouldn't be called when the type is already correct.
   We ensure that in all usages in framework code.
   So issue a warning for that case.

Fixes easybuilders#4785
@Micket
Copy link
Contributor Author

Micket commented Mar 4, 2025

to_checksums is only called when the type doesn't match as per the definition

I guess I'm confused by CHECKABLE_TYPES :
https://github.com/easybuilders/easybuild-framework/blob/4702af0ed1402817f39e1c3316fb874f6466991e/easybuild/framework/easyconfig/types.py#L659C1-L660C90
as they list CHECKSUMS and CHECKSUM_DICT on the same level, while one the latter is supposed to be for nested values.

It does work (somehow): For checksums = ('a', ) the type conversion function is called, yielding a list from the tuple and passes the isinstance

Hm, since that matches typ[0] it never checks typ[1]. If one needs to keep that piece of code around, it should probably be isinstance(val, typ[0]) since that's what it intends to check. isinstance(val, typ[1]) will always just crash.

though i think the check:

        if not isinstance(res, typ[0]):
            raise EasyBuildError("Converting value %s to type %s didn't work as expected: got %s", val, typ, type(res))

is just overkill, as it really only tests the outermost type, of our own to_xxxx functions in case they return the "wrong" type. That seems very unlikely. I feel a type hint on the return values in the code would have sufficed, and it would easily open up to_checksums to handle list and dicts.

Also, just testing

checksums = [123456]

the error you get is

ERROR: Failed to process easyconfig /apps/c3se-test-easyconfigs/zlib-1.3.1-GCCcore-13.3.0-test.eb: Converting type of [123456] (<class 'list'>) to (<class 'list'>, (('elem_types', (<class 'NoneType'>, <class 'str'>, (<class 'tuple'>, (('elem_types', (<class 'str'>, <class 'int'>)),)), (<class 'list'>, (('elem_types', (<class 'str'>, (<class 'tuple'>, (('elem_types', (<class 'str'>, <class 'int'>)),)), (<class 'dict'>, (('elem_types', (<class 'NoneType'>, <class 'str'>, (<class 'tuple'>, (('elem_types', (<class 'str'>, <class 'int'>)),)), (<class 'tuple'>, (('elem_types', (<class 'str'>, (<class 'tuple'>, (('elem_types', (<class 'str'>, <class 'int'>)),)))),)), (<class 'list'>, (('elem_types', (<class 'str'>, (<class 'tuple'>, (('elem_types', (<class 'str'>, <class 'int'>)),)))),)))), ('key_types', (<class 'str'>,)))))),)), (<class 'tuple'>, (('elem_types', (<class 'str'>, (<class 'tuple'>, (('elem_types', (<class 'str'>, <class 'int'>)),)), (<class 'dict'>, (('elem_types', (<class 'NoneType'>, <class 'str'>, (<class 'tuple'>, (('elem_types', (<class 'str'>, <class 'int'>)),)), (<class 'tuple'>, (('elem_types', (<class 'str'>, (<class 'tuple'>, (('elem_types', (<class 'str'>, <class 'int'>)),)))),)), (<class 'list'>, (('elem_types', (<class 'str'>, (<class 'tuple'>, (('elem_types', (<class 'str'>, <class 'int'>)),)))),)))), ('key_types', (<class 'str'>,)))))),)), (<class 'dict'>, (('elem_types', (<class 'NoneType'>, <class 'str'>, (<class 'tuple'>, (('elem_types', (<class 'str'>, <class 'int'>)),)), (<class 'tuple'>, (('elem_types', (<class 'str'>, (<class 'tuple'>, (('elem_types', (<class 'str'>, <class 'int'>)),)))),)), (<class 'list'>, (('elem_types', (<class 'str'>, (<class 'tuple'>, (('elem_types', (<class 'str'>, <class 'int'>)),)))),)))), ('key_types', (<class 'str'>,)))))),)) using <function to_checksums at 0x7f43cdc8d5e0> failed: 'Invalid checksums: [112345689]\n\tError: Unexpected type of "<class \'int\'>": 123456'

yikes.

The old behavior in EB 4 was much nicer

== FAILED: Installation ended unsuccessfully (build directory: /dev/shm/zlib/1.3.1/GCCcore-13.3.0-test-debug): build failed (first 300 chars): Invalid checksum spec '112345689': should be a string (MD5 or SHA256), 2-tuple (type, value), or tuple of alternative checksum specs. (took 0 secs)

So I feel like there sure is a lot of very hard to follow code here, and we aren't getting the helpful very helpful error messages.

Flamefire added a commit to Flamefire/easybuild-framework that referenced this issue Mar 5, 2025
We cannot use `isinstance` on non-trivial types, i.e. our tuple-specifications.
Incidently it works if the result is the type in the first element of
the tuple, otherwise it fails with a `TypeError` because the remainder
is not a type.

The correct way is to use our `is_value_of_type`.

2 small corrections:

1. We don't need `EASY_TYPES` which had some types missing, e.g. `float`.
   Instead we can check if a type was passed and hence can be used with
`isinstance`.
2. `convert_value_type` shouldn't be called when the type is already correct.
   We ensure that in all usages in framework code.
   So issue a warning for that case.

Fixes easybuilders#4785
@Flamefire
Copy link
Contributor

I guess I'm confused by CHECKABLE_TYPES : https://github.com/easybuilders/easybuild-framework/blob/4702af0ed1402817f39e1c3316fb874f6466991e/easybuild/framework/easyconfig/types.py#L659C1-L660C90 as they list CHECKSUMS and CHECKSUM_DICT on the same level, while one the latter is supposed to be for nested values.

The list CHECKABLE_TYPES is a bit of an issue: It is just a list of all "type specs" we have in this file and if we miss to put one into that list it will fail the code. It is used to only try to process the "type specs" if we know for sure it actually is one. I don't see a better way to check those except for checking for constants defined in the file as I did in my PR when converting a type specs back to its name.

Hm, since that matches typ[0] it never checks typ[1]. If one needs to keep that piece of code around, it should probably be isinstance(val, typ[0]) since that's what it intends to check. isinstance(val, typ[1]) will always just crash.
though i think the check [...] is just overkill, as it really only tests the outermost type, of our own to_xxxx functions in case they return the "wrong" type.

That doesn't cut it either: typ might be str not necessarily a type spec. See my PR where I verify it using our verification function.
However that check (done for every parameter in every easyconfig) is possibly expensive given that it mostly checks that our conversion is correct. So we could enable it for tests only and check the general (outermost) type only by using typ[0] for types in CHECKABLE_TYPES and typ otherwise to allow for typ to be a tuple (typ=(tuple, list) might be useful to support)

Also, just testing

checksums = [123456]

the error you get is [...]

The old behavior in EB 4 was much nicer

So I feel like there sure is a lot of very hard to follow code here, and we aren't getting the helpful very helpful error messages.

True, we were in some cases just doing a partial transform and let the easyconfig handle the rest.
With my PR this will be a bit better, something like

ERROR: Failed to process easyconfig /apps/c3se-test-easyconfigs/zlib-1.3.1-GCCcore-13.3.0-test.eb: Converting type of [123456] (<class 'list'>) to <type 'CHECKSUMS'> failed: 'Invalid checksums: [112345689]\n\tError: Unexpected type of "<class 'int'>": 123456'

So maybe we really shouldn't to that much type checking on the result of the conversion functions or inside the conversion functions. But then we have to be sure to handle that in the easyconfig better.
We could move the "good" reporting function to the type checking. E.g. at this point we can report that we expected "None, a string, a list, a tuple or a dict" So mostly what we had before but we need to include the dict case which we more or less added later.

Additionally we could allow a "wildcard type": I had played with adding None as a typespec that meant "any type". That would allow us to cut down on the conversions if we are sure to check those at the place where they are used. E.g. for dependencies we had:

# pass down value untouched, let EasyConfig._parse_dependency handle it

That would still fail the type check unless we allow a "wildcard type" there.

So no free lunch here too.
Maybe we should really just assume the type conversion functions returns something usable by the using code and do more verification there. I'm a bit nervous about that though and kind of like having some kind of specification of the format in the code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Nice-to-have
Development

No branches or pull requests

2 participants