In the paper, the stochastic quantization was done by rounding up with probability p=clip(0.5x, 0, 1), and rounding down with probability 1-p. However, in the code it's done by adding random uniform noise before quantization:
noise = output.new(output.shape).uniform_(-0.5, 0.5)
output.add_(noise)
This noise does not depend on the magnitude of x. I wonder what is the reasoning behind this discrepancy?