The goal here is to have a block-level top-k implementation that we can use inside DeviceSegmentedTopK for small enough segments. We plan to have the BlockTopK implementation in the detail namespace for now until we feel confident that we can fix the interface and expose it publicly.