Effective sample size

In statistics, effective sample size is a notion defined for correlated (or weighted) samples from a distribution.^[1]

Correlated samples

Suppose several samples $y_{i}$ are drawn from a distribution with mean $\mu$ and std. deviation $\sigma$ . Then the best estimate for the mean of this distribution is

{\hat {\mu }}={\frac {1}{n}}\sum _{i=1}^{n}y_{i}

In that case, the variance of ${\hat {\mu }}$ is given by

Var({\hat {\mu }})={\frac {\sigma ^{2}}{n}}

However, if the samples are correlated, then $Var({\hat {\mu }})$ is somewhat higher. For instance, if all samples are completely correlated ( $\rho _{(i,j)}=1$ ), then $Var({\hat {\mu }})=\sigma ^{2}$ regardless of $n$ .

The effective sample size $n_{\text{eff}}$ is the unique value (not necessarily an integer) such that

Var({\hat {\mu }})={\frac {\sigma ^{2}}{n_{\text{eff}}}}

$n_{\text{eff}}$ is a function of the correlation between samples. Suppose that all the correlations are the same, i.e. if $i$ ≠ $j$ , then $\rho _{(i,j)}=\rho$ . In that case, if $\rho =0$ , then $n_{\text{eff}}=n$ . Similarly, if $\rho =1$ then $n_{\text{eff}}=1$ . More generally,

n_{\text{eff}}={\frac {n}{1+(n-1)\rho }}

The case where the correlations are not uniform is somewhat more complicated. Note that if the correlation is negative, the effective sample size may be larger than the actual sample size. Similarly, it is possible to construct correlation matrices that have an $n_{\text{eff}}>n$ even when all correlations are positive. Intuitively, $n_{\text{eff}}$ may be thought of as the information content of the observed data.

Weighted samples

If the data has been weighted, then several samples have been pulled from the distribution with effectively 100% correlation with some previous sample. In this case, the effect is known as Kish's Effective Sample Size^[2]

n_{eff}={\frac {(\sum _{i=1}^{n}w_{i})^{2}}{\sum _{i=1}^{n}w_{i}^{2}}}

References

^ Tom Leinster (December 18, 2014). "Effective Sample Size" (html).
^ "Design Effects and Effective Sample Size" (html).

Correlated samples

Weighted samples

References

Further reading