I am working on a dataset of presence/absence data, with my response variable being 'proportion of sites where X is present'. I have been asked to provide standard deviations alongside the mean proportions. However, it appears to me that the standard deviation of a binomial dataset is a polynomial function of the proportion itself and does not grant additional information about the variability of the underlying data. For example, if a proportion from data is 0.3, it should not matter if that proportion was derived from presence/absence data from 10, 100, or 100,000 sites, the standard dev should be the same.

When I make a sample dataset and graph mean proportion vs st dev, I can model it with a 6th order polynomial function with an R squared of 1.00.

So, can someone confirm my suspicion- That standard deviations are an inherent property of the proportion in a binomial dataset, and thus yield no additional information about the dataset from which that proportion came?

**Contents**hide

#### Best Answer

If you have a binomial random variable $X$, of size $N$, and with success probability $p$, i.e. $X sim Bin(N;p)$, then the mean of X is $Np$ and its variance is $Np(1-p)$, so as you say the variance is a second degree polynomial in $p$. Note however that **the variance is also dependent on $N$ !** The latter is important for estimating $p$:

If you observe 30 successes in 100 then the fraction of successes is 30/100 which is the number of successes divided by the size of the Binomial, i.e. $frac{X}{N}$.

But if $X$ has mean $Np$, then $frac{X}{N}$ has a mean equal to the mean of $X$ divided by $N$ because $N$ is a constant. In other words $frac{X}{N}$ has mean $frac{Np}{N}=p$. This implies that the fraction of successes observed is an unbiased estimator of the probabiliy $p$.

To compute the variance of the estimator $frac{X}{N}$, we have to divide the variance of $X$ by $N^2$ (variance of a (variable divided by a constant) is the (variance of the variable) divided by the **square** of the constant), so the variance of the estimator is $frac{Np(1-p)}{N^2}=frac{p(1-p)}{N}$. The standard deviation of the estimator is the square root of the variance so it is $sqrt{frac{p(1-p)}{N}}$.

So , if you throw a coin 100 times and you observe 49 heads, then $frac{49}{100}$ is an estimator of for the probability of tossing head with that coin and the standard deviation of this estimate is $sqrt{frac{0.49times(1-0.49)}{100}}$.

If you toss the coin 1000 times and you observe 490 heads then you estimate the probability of tossing head again at $0.49$ and the standard devtaion at $sqrt{frac{0.49times(1-0.49)}{1000}}$.

Obviously the in the second case the standard deviation is smaller and so the estimator is more precise when you increase the number of tosses.

**You can conclude that, for a Binomial random variable, the variance is a quadratic polynomial in p, but it depends also on N and I think that standard deviation does contain information additional to the success probability.**

In fact, the Binomial distribution has two parameters and you will always need at least two moments (in this case the mean (=first moment) and the standard deviation (square root of the second moment) ) to fully identify it.

P.S. A somewhat more general development, also for poisson-binomial, can be found in my answer to Estimate accuracy of an estimation on Poisson binomial distribution.

### Similar Posts:

- Solved – Is the Standard Deviation of a binomial dataset informative
- Solved – Is the Standard Deviation of a binomial dataset informative
- Solved – Variance of sample proportion decreases with n but of a count increases with n – why
- Solved – Variance of sample proportion decreases with n but of a count increases with n – why
- Solved – Standard error of proportions, with weighting