I am currently working on a project where I generate random values using low discrepancy / quasi-random point sets, such as Halton and Sobol point sets. These are essentially $d$-dimensional vectors that mimic a $d$-dimensional uniform(0,1) variables, but have a better spread. In theory, they are supposed to help reduce the variance of my estimates in another part of the project.
Unfortunately, I've been running into issues working with them and much of the literature on them is dense. I was therefore hoping to get some insight from someone who has experience with them, or at least figure out a way to empirically assess what is going on:
If you have worked with them:
What exactly is scrambling? And what effect does it have on the stream of points that are generated? In particular, is there an effect when the dimension of the points that are generated increases?
Why is it that if I generate two streams of Sobol points with MatousekAffineOwen scrambling, I get two different streams of points. Why is this not the case when I use reverse-radix scrambling with Halton points? Are there other scrambling methods that exist for these point sets – and if so, is there a MATLAB implementation of them?
If you have not worked with them:
- Say I have $n$ sequences $S_1, S_2, ldots,S_n$ of supposedly random numbers, what type of statistics should I use to show that they are not correlated to each other? And what number $n$ would I need to prove that my result is statistically significant? Also, how could I do the same thing if I had $n$ sequences $S_1, S_2, ldots,S_n$ of $d$-dimensional random $[0,1]$ vectors?
Follow-Up Questions on Cardinal's Answer
Theoretically speaking, can we pair any scrambling method with any low discrepany sequence? MATLAB only allows me to apply reverse-radix scrambling on Halton sequences, and am wondering whether that's simply an implementation issue or a compability issue.
I am looking for a way that will allow me to generate two (t,m,s) nets that are uncorrelated with each another. Will MatouseAffineOwen allow me to do this? How about if I used a deterministic scrambling algorithm and simply decided to choose every 'kth' value where k was a prime?
Best Answer
Scrambling is usually an operation applied to a $(t,m,s)$ digital net which uses some base $b$. Sobol' nets use $b = 2$, for example, while Faure nets use a prime number for $b$.
The purpose of scrambling is to (hopefully) get an even more uniform distribution, especially if you can only use a small number of points. A good example to see why this works is to look at the Halton sequence in $d = 2$ and choose two "largish" primes, like 29 and 31. The square gets filled in very slowly using the standard Halton sequence. But, with scrambling, it is filled in more uniformly much more quickly. Here is a plot for the first couple hundred points using a deterministic scrambling approach.
The most basic forms of scrambling essentially permute the base $b$ digit expansions of the original $n$ points among themselves. For more details, here is a clear exposition.
The nice thing about scrambling is that if you start with a $(t,m,s)$ net and scramble it, you get a $(t,m,s)$ net back out. So, there is a closure property involved. Since you want to use the theoretical benefits of a $(t,m,s)$ net in the first place, this is very desirable.
Regarding types of scrambling, the reverse-radix scramble is a deterministic scrambling. The Matousek scrambling algorithm is a random scramble, done, again, in a way to maintain the closure property. If you set the random seed before you make the call to the scrambling function, you should always get the same net back.
You might also be interested in the MinT project.