Skip to content

Conversation

@mariadb-LeonidFedorov
Copy link
Collaborator

@mariadb-LeonidFedorov mariadb-LeonidFedorov commented Oct 23, 2025

No benchmarks yet. Any ideas like is it good idea to replace std::vector with small_vector or any other is appreciated

@timofeyturenko
Copy link
Member

performance comparison with stable-23.10 https://palantir.mariadb.net/report_comparisons/74/

typedef typename std::tr1::unordered_multimap<typename element_t::second_type,
typename element_t::first_type>::value_type hashPair_t;
// Now using boost::unordered_flat_map with vector for better cache locality
typedef typename boost::unordered_flat_map<typename element_t::second_type,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

using instead of typedef?

utils::SimpleAllocator<std::pair<uint64_t const, uint64_t> > >
hash_t;
// Using boost::unordered_flat_map with vector for multimap behavior
typedef boost::unordered_flat_map<uint64_t, std::vector<uint64_t>> hash_t;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The allocator type was lost.


for (; range.first != range.second; ++range.first)
range.first->second |= MSB;
// Add all values to newMatches and mark them
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need to add all values here? We added only one value before

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Look just below, the values were/are added there.

return std::unique_ptr<HashTable>(new HashTable(bucketCount, TupleJoiner::hasher(),
typename HashTable::key_equal(),
utils::STLPoolAllocator<typename HashTable::value_type>(resourceManager)));
// boost::unordered_flat_map doesn't need bucket_count or allocator in constructor
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

std::ordered_map also doesn't need it, but we use custom allocator to track memory usage

@mariadb-AlexeyAntipovsky
Copy link
Contributor

There is a LOT of code like

      auto it = destHashTbl->find(e.second);
      if (it != destHashTbl->end())
      {
        it->second.push_back(e.first);
      }
      else
      {
        std::vector<typename element_t::first_type> vec;
        vec.push_back(e.first);
        destHashTbl->emplace(e.second, std::move(vec));
      }

It would be nice to isolate this boilerplate somewhere

Copy link
Contributor

@mariadb-SergeyZefirov mariadb-SergeyZefirov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I consider PR as good modulo Alexei's request, where the most important is notice that allocator type has been lost.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants