Skip to content

In-place sampling #8

Open
Open
@make-github-pseudonymous-again

Description

@make-github-pseudonymous-again
Member

If you want to sample k items out of n, Fisher Yates shuffles your item array.
Hereunder is an efficient no shuffle implementation. Something better can probably be achieved with prefix sums and rank/select.

Theorem

We can sample k items out of n with k random draws and O(k log k) time, without modifying the items array.

Proof

TODO fix proof

Given n items, at draw i=1,2,...,n with a binary search tree containing all previously drawn items indices between 0 and n-1, pick a random integer x between 0 and n - i. Binary search for its corresponding index value as follows: start with p = 0 the number of predecessors of the current node, let l be the size of the left subtree of the current node, let v be the value of the current node, if x + p + l + 1 < v
the current node becomes the left child, otherwise the current node becomes the right child and p = p + l + 1. If there is no such child, insert x + p in the case of a left child, x + p + l + 1 in the case of a right child.

If x + p + l + 1 < v, then x cannot be in the right subtree of v. The item index corresponding to x would be x + p + l + 1 if it was the right child of v which is smaller than v. Pick any node c in the right subtree of v. Let g be the number of predecessors of c in the right subtree of v. The value of c is at least v + 1 + g because all indices are distinct. Suppose c has no right child. The item index of x as a right child of c is x + p + l + 1 + g + 1 < v + g + 1 <= c. Suppose c has no left child. The item index of x as a left child of c is x + p + l + 1 + g < v + g < v + g + 1 <= c.

If x + p + l + 1 > v, then x cannot be in the left subtree of v. If v has no left child, then l = 0 and the value of x as the left child of v would be x + p which is larger or equal to v (indices must be distinct). Pick any node c in the left subtree of v.
Let s be the number of successors of s in the left subtree of v. The value of c is at most v - s - 1 because all indices are distinct. Suppose c has no left child. The item index of x as a left child of c is x + p + l - s - 1 > v - s - 2 so x + p + l - s - 1 >= v - s - 1 = c.

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

    Development

    No branches or pull requests

      Participants

      @make-github-pseudonymous-again

      Issue actions

        In-place sampling · Issue #8 · randomized-algorithm/random