Skip to content

Conversation

@LFletch1
Copy link
Contributor

@LFletch1 LFletch1 commented Sep 5, 2024

  • Added a distributed alias_table data structure to ygm. Alias table data structure enables O(1) sampling time from a discrete distribution.
  • User samples items by calling alias_table::async_sample which requires the user provide a lambda that takes the item sampled as an argument.
  • Created new namespace ygm::random where the default_random_engine now sits as well as alias_table
  • Created tests for the new alias table. Further testing will be written for it, I figured I'd get this PR sent to go ahead and start discussing the interface/implementation of the alias table.

Copy link
Collaborator

@steiltre steiltre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recommend making use of concepts in the way they are used in existing container base classes. We can decide where to move these concepts and new ones to add independent of this PR.

The existing constructors look good, but constructors that work from STL containers should be added as well.

@steiltre steiltre changed the base branch from v0.7-dev to v0.8-dev February 20, 2025 03:56
@steiltre steiltre changed the base branch from v0.8-dev to v0.9-dev June 30, 2025 21:33
LFletch1 and others added 14 commits July 1, 2025 20:42
Updating master with ygm 0.8.
…emTuple concepts that take as input for_all_args type. Also added constructor that builds the alias table from an STL container
…ogic. Wrote test for various constructors and tested sampling accuracy via sampling words from corpus and comparing to ground truth frequency
…the average local weight is no longer 1. This was done to decrease the errors associated with inexact floating point operations
… showed this has fixed the bug, but consistently reproducing the bug has proved to be difficult, so bug might still be present
item item_to_send = {local_item.id, weight_to_send};
items_to_send.push_back(item_to_send);

if ((curr_weight > 1e-8) && (dest_rank < m_comm.size())) { // Accounts for rounding errors
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this cause any slight errors in results?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have removed the curr_weight > 1e-8 condition.

}

// Need to handle items left in items to send. Must also account for floating point errors.
if (items_to_send.size() > 0 && curr_weight > 1e-8 && dest_rank < m_comm.size()) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does weight lost here affect results?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the curr_weight > 1e-8 condition. This condition has the potential to create bigger errors than that caused by floating point arithmetic inaccuracies.

I still have to verify that the destination being sent to is less than the number of ranks because it is possible a floating point error will cause there to be more weight and therefore attempt to spread it to a rank > than m_comm.size().

ygm::container::detail::SingleItemTuple<typename YGMContainer::for_all_args> &&
pair_like_and_convertible_to_weighted_item<Item,
std::tuple_element_t<0,typename YGMContainer::for_all_args>>
alias_table(ygm::comm &comm, RNG &rng, YGMContainer &c)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Making the constructor take an RNG object that is stored as a reference feels a bit clunky.

Does it make sense to try any of the following:

  1. Take a seed for an RNG object that gets constructed by the alias table (allows seeding but disconnects external RNG object from what is used within alias_table)
  2. Create a randomly initialized RNG object when one is not given to the constructor (not sure how this would work with the RNG being stored as a reference)

Given m_rng is only needed in the async_sample function, it would be natural (and look more like std::sample) to have the RNG passed in to async_sample, but this is not practical given the communication required before m_rng is called.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am good with passing a seed or generating a random seed if a seed is not given. We may need to change this in the future to match the interface of some sort of ygm container sampling object (assuming we eventually create something like this).

I agree that passing m_rng to async_sample would be best, but as you mentioned, it is not practical.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants