I've been auditing a lot of JavaScript source code lately, and a common problem I'm seeing when generating random numbers is using the naive "multiply-and-floor" method. Because the "Math.random()" function call returns a number between 0 and 1, not including 1 itself, then developers think that the "best practice" for generating a random number is as follows:
1
2
3 function randNumber(range) {
return Math.floor(Math.random() * range); // number in the interval [0, range).
}
The problem with this approach is that it's biased. There are numbers returned that are more likely to occur than others. To understand this, you need to understand that Math.random() is a 32-bit RNG in Chrome and Safari, and a 53-bit RNG in Edge and Firefox. First, let's pretend every browser RNG is a 32-bit generator, then we'll extend it.
A 32-bit Math.random() means that there are only 232 = 4,294,967,296 possible decimal values in the range of [0, 1). This means that the interval [0, 1) is divided up every "1/232 = 0.00000000023283064365" decimal values. But that doesn't matter though, because if I wanted a random number between 1 and 100, 100 does not divide 4,294,967,296 evenly. I get 42,949,672 with 96 left over. What does this mean? It means that ...
1 | randNumber(100); |
... will favor 96 numbers out of our 100. The 4 least likely results are 24, 49, 74, & 99. That's our bias.
It doesn't matter if it's a 53-bit RNG either. "253 = 9,007,199,254,740,992" is not a multiple of 100. Instead, dividing by 100, I get 90,071,992,547,409 with 92 left over. So, with a 53-bit RNG, we have the same problem where 92 results will be more likely to be generated than 8 others. Those unlucky 8 are 11, 22, 33, 45, 58, 66, 79, and 91.
The only time this bias would not exhibit itself in the naive "multiply-and-floor" approach above, is if the random number requested is in the interval [0, 2N), where "N" is any positive integer. 232, 253, and 2X, where "X" is a positive integer, is always a multiple of 2N (2N divides 2X evenly, when N ≤ X, N > 0).
So, what do we do? How do we improve the naive multiply-and-floor approach? Thankfully, it's not too difficult. All we need to do is essentially the following:
- Force the RNG into 32-bits (common denominator for all browsers).
- Create a range of values that is a multiple of our desired range (E.G.: 1-100).
- Loop over the range picking values until a value inside the range is generated.
- Output the generated value modulo our desired range.
Let's see this in practice. First the unbiased code, then the explanation:
1
2
3
4
5
6
7 function uniformRandNumber(range) {
var max = Math.floor(2**32/range) * range; // make "max" a multiple of "range"
do {
var x = Math.floor(Math.random() * 2**32); // pick a number of [0, 2^32).
} while(x >= max); // try again if x is too big
return(x % range); // uniformly picked in [0, range)
}
I know what you're thinking: WAIT! YOU JUST DID THE "MULTIPLY AND FLOOR" METHOD!! HYPOCRITE!!! Hold on though. There are two subtle differences. See what they are?
The "max" variable is a multiple of "range" (step 2 above). So, if our range is [0, 100), then "max = 4294967200", which is a multiple of 100. This means that so long as "0 < = x < 4294967200", we can return "x % 100", and know that our number was uniformly chosen. However, if "x >= 4294967200", then we need to choose a new "x", and check if it falls within our range again (step 3 above). So long as "x" falls in [0, 4294967200), then we're good.
This extends to cryptographically secure random numbers too. In action, it's just:
1
2
3
4
5
6
7
8 function uniformSecureRandNumber(range) {
const crypto = window.crypto || window.msCrypto; // Microsoft vs everyone else
var max = Math.floor(2**32/range) * range; // make "max" a multiple of "range"
do {
var x = crypto.getRandomValues(new Uint32Array(1))[0]; // pick a number of [0, 2^32).
} while(x >= max); // try again if x is too big
return(x % range); // uniformly picked in [0, range)
}
So it's not that "multiply and floor" is wrong so long as you use it correctly.
One small caveat- these examples are not checking if "range" is larger than 32-bits. I deliberately ignored this to draw your attention on how to correctly generate uniform random numbers. You may or may not need to do various checks on the "range" argument. Is it an integer type? Is it a positive integer? Is it 32-bits or less? Etc.
As an exercise for the reader, how could you extend this uniform generator to pick a random number in the range of [100, 200)? Going further, how could you pick only a random even number in the range of [250, 500)?
Post a Comment