I have a complete beginner question on random walk.
As per this paper
Random walk – the stochastic process formed by successive summation of independent, identically distributed random variable....
I really cannot get past the first line. I thought that the central limit theorem and law of large numbers state that the mean of a large number of independent random processes will approximate a normal process. But the paper seems to indicate that the summation of these random processes is a random walk? So the mean is a normal process but the sum is a random walk? IS this correct?
Best Answer
You need to know what a stochastic process is. In this context, it's just a collection of random variables $(X_0, X_1, X_2, ldots)$.
Seeing a simple worked example may help. Let's set it up. Suppose you have a collection of independent variables $mathbf Y = (Y_0, Y_1, ldots)$, all with the same distribution. For instance, each $Y_i$ could represent the flip of a fair coin using (say) $1$ for heads and $0$ for tails. That's a stochastic process (which we could call a "Bernoulli process").
You can construct new processes out of old. One way is to convert $mathbf Y$ into its cumulative sum
$$mathbf X = (Y_0, Y_0+Y_1, Y_0+Y_1+Y_2, ldots)$$
This is a random walk.
As an example, let's consider a finite random walk of length $3$ based on fair coins. That Bernoulli process $mathbf Y$ has eight possible outcomes, aka "walks" or "paths," each with equal probabilities of $1/8$:
$$(0,0,0), (0,0,1), (0,1,0), (0,1,1), (1,0,0), (1,0,1), (1,1,0), (1,1,1).$$
The associated paths of $mathbf X$, computed by taking cumulative sums, are therefore
$$(0,0,0), (0,0,1), (0,1,1), (0,1,2), (1,1,1), (1,1,2), (1,2,2), (1,2,3).$$
If you like, you can now identify the component variables $X_i$. For instance, $X_0$ takes on the value $0$ four times, for a total probability of $4times 1/8=1/2$, and the value $1$ four times, for a total probability of $1/2$. $X_1$ takes on the values $0, 1,$ and $2$ with probabilities $1/4, 1/2, 1/4$, respectively. And $X_2$ takes on the values $0,1,2,3$ with probabilities $1/8, 3/8, 3/8, 1/8$, respectively. Notice that these three variables do not have identical distributions. The distributions have different means and variances, too: their means are $1/2, 1, 3/2$ (in order) and their variances are $1/4, 1/2, 3/4$ (in order).
The component variables in a random walk also are dependent. For instance, given that $X_1=0$ (which occurs only in the paths $(0,0,0)$ and $(0,0,1)$), the chance that $X_2=0$ is $1/2$. But given that $X_1=1$, the chance that $X_2=0$ is now zero: it's just not possible. Because these conditional probabilities vary with the value of $X_1$, $X_1$ and $X_2$ are not independent. In fact, no pair of these component variables is independent.
The Central Limit Theorem makes a statement about the distribution of $X_n$ when $n$ gets very large. Besides assuming the $Y_i$ (out of which the $X_n$ are constructed) are independent and identically distributed, it has to assume that this common distribution has a finite variance. The concept of a stochastic process is separate from any idea of limits (which wouldn't even make sense for a finite one, as in the example). The CLT holds only for very special processes.