-
Notifications
You must be signed in to change notification settings - Fork 2
Add tests for std::[unordered_][multi]set
#39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR! I left some comments, mostly we should start by figuring out what we want to test for the binary format in particular. Originally I considered being lazy for other containers and put all std::[unordered_][multi]set
in a single test, without the full combination of index column types. But I think the multi
containers make this awkward because there we actually want to test that the duplicates are preserved, and also we may want to be 100% certain that we get the index columns right. For the non-multi
containers though, duplicates are essentially handled on the C++ side, before it comes to RNTuple I think, so not sure if we need those entries here...
// Fourth entry: duplicate elements in the set | ||
*Index32 = {1, 1}; | ||
*Index64 = {2, 2}; | ||
*SplitIndex32 = {3, 3}; | ||
*SplitIndex64 = {4, 4}; | ||
writer->Fill(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, this entry is the same from the RNTuple point of view because the deduplication will happen on the C++ side, before we enter Fill()
. Does it provide extra coverage?
*Index32 = {2, 1}; | ||
*Index64 = {4, 3}; | ||
*SplitIndex32 = {6, 5}; | ||
*SplitIndex64 = {8, 7}; | ||
writer->Fill(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This also seems like it's mostly about the C++ semantics of std::set
, not really the binary format...
*Index32 = {2, 1}; | ||
*Index64 = {4, 3}; | ||
*SplitIndex32 = {6, 5}; | ||
*SplitIndex64 = {8, 7}; | ||
writer->Fill(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we make guarantees about the element order on disk? If not, I'm not sure we need to test C++ semantics in the validation suite...
Set &value = *entry.GetPtr<Set>(name); | ||
os << " \"" << name << "\": ["; | ||
bool first = true; | ||
for (auto element : value) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, is the iteration order defined here? Do we want an explicit construction of a std::vector
(?) and sort it? If not, the output file might change from execution to execution...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The iteration order is defined by the (optional) comparison predicate in the set template (std::less
by default), so unless this is different from the one used when writing, the order will be the same (see also https://stackoverflow.com/a/8834041)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I commented on the wrong test: the question is mostly relevant for std::unordered_[multi]set
, but for symmetry we may want to do it for all tests.
Yeah that's fair enough! I suppose we could squash [unordered_]set and [unordered_]multiset. One thing to consider is that it's not a given that every writer/reader will be written in C++ (we already know this isn't the case), but the spec explicitly refers to C++ types. But you're right that still the ordering and duplicate handling is not something that the format itself is responsible for. Taking it even further, the specification explicitly states that the on-disk representation is identical to |
That might be a good compromise because we probably want the same input entries / entry classes for
Yes, in my opinion we want each supported C++ type to appear at least once in the validation suite. But indeed, the question is how much different is it from a binary format perspective. There's two axes to that: index column encoding and nesting (e.g. |
As a side note, in roottest we do have a test of reading each STL collections in file format into all other STL collections of the same content (for a large subset of cases) |
After thinking this over, here are some more considerations:
The decision whether to merge |
...for the tests that need them (currently, all nested `std::set` and friends).
Based on this statement:
and because I right now don't have a clear idea how to nicely merge the ordered and unordered variants (two field types instead of one, i.e., |
Thanks for the work of also updating the infrastructure and the CI, seems to pass 😃
Fine with me.
I would tend to remove the "duplicate" and "reverse" tests because at least I personally find it confusing that the output doesn't match the input, and it's actually not RNTuple that is responsible for that. Let's hear some opinions from others maybe? |
Closes #14.