I'm working through the proof why the exponential smoothing is a biased estimator of a linear trend.

The book is trying to describe the expected value of an exponentially smoothed time series. It's one of these steps I'm having trouble following.

For infinite sums, the book claims that the following applies:

$$sum (1-lambda)^t=frac{1}{1-(1-lambda)}=frac{1}{lambda}$$ and

$$sum (1-lambda)^tt=frac{1-lambda}{lambda^2}$$

The first part I think I understand, this is just the geometric sum. The second term in the numerator goes towards 0 when t goes toward infinity.

That second expression I don't understand, however.

Why can the second expression be written like this, can $t$ be written as a function of $lambda$?

**Contents**hide

#### Best Answer

**Because this is a statistics site, let's develop a purely statistical solution.**

The first formula in the question correctly observes that

$$lambda + lambda(1-lambda)^1 + lambda(1-lambda)^2 + lambda(1-lambda)^3 + cdots = 1,$$

implicitly assuming $|1-lambda|lt 1$. For real numbers $0 lt lambda lt 1$, this exhibits $1$ as the sum of a series of non-negative values. That allows us to view these values as *probabilities*. (This particular set of numbers is a Geometric Distribution.)

What could they be the probabilities of? Consider a long, wide rectangular dartboard. The $lambda$eft portion, covering $lambda$ of it, is colored red–this is what you would like to hit–while the right portion, covering the remaining $1-lambda$ portion, is colored blue. You plan to throw darts at this board until one hits in the red.

Suppose you are a poor dart shooter, just barely good enough to ensure the darts hit the board, but otherwise you have no control over where on the board they land. Let "$t$" stand for the number of tosses you make *in toto*. According to the axioms of probability, any sequence of $tge 1$ independent random dart tosses (each having an equal probability of hitting any part of the dartboard) that lands $t-1$ times in the blue and finally in the red has a chance of

$$(1-lambda)cdots(1-lambda)lambda = (1-lambda)^{t-1}lambda$$

of occurring: this simply is the product of the individual chances, $1-lambda$ for blue and $lambda$ for red. These are the same probabilities as above. By definition, the *expectation* of the number of blue hits in such a sequence is the sum of the probability-weighted counts of blue hits; to wit,

$$lambda(1-lambda)^0(0) + lambda(1-lambda)^1(1) + lambda(1-lambda)^2(2) + cdots = lambdasum_{t=0}^infty (1-lambda)^t t.$$

Up to a factor of $lambda$, this is what we would like to compute.

The Weak Law of Large Numbers (which is intuitively obvious and first proved by Jakob Bernoulli in the late 17th century) tells us that the expectation can be achieved arbitrarily closely by conducting this experiment repeatedly, so let's do so. Throw the darts until one lands in the red. Let $b_1$ be the number that land in the blue. Let $b_2$ be the number in the blue during the second trial, and so on, up through $b_n$.

*In the figure, which shows the holes left when the darts were pulled out, $n=25$ trials have resulted in $100$ darts being thrown, of which $75$ landed in the blue.*

The mean number of blue hits in these trials is, by definition,

$$frac{1}{n}(b_1+b_2+cdots+b_n) = frac{b_1+b_2+cdots+b_n}{1+1+cdots+1}.$$

In other words, it's the ratio of the number of darts landing in the blue compared to those landing in the red. But because the darts land uniformly at random, in the limit this ratio must approach the ratio of blue to red areas, namely $(1-lambda):lambda$. Thus

$$lambda sum_{t=0}^infty (1-lambda)^t t = frac{1-lambda}{lambda}.$$

Dividing both sides by $lambda$ gives the answer!

**This question might also benefit from some elementary mathematical answers.** To that end, notice that whenever $|lambda-1| lt 1$, the series

$$S(lambda) = sum_{t=0}^infty (1-lambda)^tt = (1-lambda) + 2(1-lambda)^2 + cdots + t(1-lambda)^t + cdots$$

converges absolutely. (It eventually is dominated by a geometric series with common ratio less than $1$.) This implies we may freely re-arrange its terms when doing arithmetic with it, as in the following calculation:

$$eqalign{ lambda S(lambda) &= S(lambda) – (1-lambda)S(lambda) \ &= (1-lambda) + 2(1-lambda)^2 + 3(1-lambda)^3 + cdots – left((1-lambda)^2 + 2(1-lambda)^3 + 3(1-lambda)^4cdots right)\ &= (1-lambda) + (2-1)(1-lambda)^2 + (3-2)(1-lambda)^3 + cdots \ &= (1-lambda)left(1 + (1-lambda)^1 + (1-lambda)^2 + cdotsright)\ &= (1-lambda)sum_{t=0}^infty (1-lambda)^t \ &= frac{ 1 – lambda }{lambda}, }$$

as stated in the question. Because $|1-lambda| lt 1$, $lambda$ is nonzero, so we may divide both sides by $lambda$ to produce the equality

$$S(lambda) = frac{1}{lambda}frac{ 1 – lambda }{lambda} = frac{1-lambda}{lambda^2},$$

*QED.*

**Another solution** notes that for integral $t ge 1$,

$$binom{-2}{t-1} = frac{-2(-3)cdots(-2-(t-1)+1)}{1(2)(3)cdots(t-1)} = frac{(-1)^{t-1}t!}{(t-1)!} = (-1)^{t-1}t.$$

Recall the Binomial Theorem asserts that when $|x|lt 1$ and $n$ is any number at all, then

$$(1 + x)^n = sum_{t=0}^infty binom{n}{t}x^t = sum_{t=1}^infty binom{n}{t-1}x^{t-1}.$$

Taking $n=-2$ and $x=1-lambda$ gives

$$frac{1-lambda}{lambda^2} = (1-lambda)(1 – (1-lambda))^{-2} = (1-lambda)sum_{t=1}^infty binom{-2}{t-1}(-(1-lambda))^{t-1} = sum_{t=0}^infty (1-lambda)^{t}t.$$

### Similar Posts:

- Solved – Proof that the $r$’th factorial moment of a $Po(lambda)$ random variable is $lambda^r$
- Solved – Proof that the $r$’th factorial moment of a $Po(lambda)$ random variable is $lambda^r$
- Solved – the limiting distribution of exponential variates modulo 1
- Solved – How is TD(1) of TD(lambda) equivalent to Monte Carlo
- Solved – Occurrences of two independent Poisson processes