After executing the code
RNGkind(kind="Mersenne-Twister") # the default anyway set.seed(123) n = 10^5 x = runif(n) print(x[22662] == x[97974])
TRUE
is output!
If I use, e.g., RNGkind(kind="Knuth-TAOCP-2002")
similarly happens: I get "only" 99 995 different values in x
. Given the periods of both random generators, the results seem highly unlikely.
Am I doing something wrong? I need to generate at least one million random numbers.
I am using Windows 8.1 with R version 3.6.2; Platform: x86_64-w64-mingw32/x64 (64-bit) and RStudio 1.2.5033.
Additional findings:
- Having a bag with $n$ different balls, we choose a ball $m$ times and put it back every time. The probability $p_{n, m}$ that all chosen balls are different is equal to ${nchoose m} / (n^m m!)$.
- R documentation points to a link where the implementation of Mersenne-Twister for 64-bit machines is available: http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/emt64.html
The uniform sampling from $[0, 1]$ interval is obtained via choosing a random 64-bit integer first, so I computed the above probabilities for the 64-bit and (when $p_{2^{64}, 10^5}$ turned out to be rather low) 32-bit case:
$$
p_{2^{64}, 10^5}doteq 0.9999999999972… qquad p_{2^{32}, 10^5} doteq 0.3121…
$$
Then, I tried 1000 random seeds and compute the proportion of the cases when all generated numbers are different: 0.303.
So, currently, I assume that for some reason, 32-bit integers are actually used.
Best Answer
The documentation of R on random number generation has a few sentences at its end, that confirm your expectation of 32-bit integers being used and might explain what you are observing:
Do not rely on randomness of low-order bits from RNGs. Most of the supplied uniform generators return 32-bit integer values that are converted to doubles, so they take at most 2^32 distinct values and long runs will return duplicated values (Wichmann-Hill is the exception, and all give at least 30 varying bits.)
So the implementation in R seems to be different to what is explained on the website of the authors of the Mersenne Twister. Possibly combining this with the Birthday paradox, you would expect duplicates with only 2^16 numbers at a probability of 0.5, and 10^5 > 2^16. Trying the Wichmann-Hill algorithm as suggested in the documentation:
RNGkind(kind="Wichmann-Hill") set.seed(123) n = 10^8 x = runif(n) length(unique(x)) # 1e8
Note that the original Wichmann-Hill random number generator has the property that its next number can be predicted by its previous, and therefore does not meet non-predictability requirements of a valid PRNG. See this document by Dutang and Wuertz, 2009 (section 3)
Similar Posts:
- Solved – Natural example of bad results with a Lehmer Random Number Generator
- Solved – Is there such a thing as a “good/bad” seed in pseudo-random number generation
- Solved – limit to the SPSS random number generator
- Solved – Generating random number falling in a range with given probabilities
- Solved – References and Best practices for setting seeds in pseudo-Random Number Generation