William Cochran's book on sampling gives the following proof that a sample mean is unbiased: Since every unit appears in the same number of samples, it is clear that $E[Y_1 + \cdots + Y_n]$ must be some multiple of $y_1+y_2+\cdots+y_N$. The multiplier must be $n/N$ since the first expression has $n$ terms and the second has $N$ terms. I am trying to understand the last sentence, about the number of terms. The expectation here is taken over all samples of size $n$ (all of equal probability) out of a population of size $N$.
$\begingroup$ You misquoted Cochrane. I see only a small excerpt via Google Books, but it is clear: He wrote $$ E(y_1+y_2+\cdots+y_n) \text < must be some multiple of >(y_1 + y_2 + \cdots + y_N) $$ He had lower-case $y$s throughout. $\endgroup$
Commented Apr 30, 2017 at 0:21$\begingroup$ @MichaelHardy That was intentional since his notations is, to my experience, nonstandard. The ys on the lhs are random, those on the rhs are not. (I suppose you could consider both sides as random, but the rhs is still deterministic since the population size is $N$.) $\endgroup$
Commented Apr 30, 2017 at 0:57 $\begingroup$ Actually, he ought to have expressed it the way you did. $\endgroup$ Commented Apr 30, 2017 at 3:34Let $y_1,\dots,y_N$ denote the values of some attribute of the population units $1,\dots,N$. In randomization theory, the $y_i$'s are not random variables, but fixed quantities. A random sample is determined by (dependent) indicator random variables $Z_1,\dots,Z_N\in\$ such that $Z_1+\dots+Z_N$ is equal to the sample size $n$. Each $Z_i$ determines if the respective population unit is included ($Z_i=1$) or not ($Z_i=0$) in the sample. The sample mean is the random variable $$ \bar = \frac\sum_^N y_i Z_i. $$ For a simple random sample, if the $i$-th population unit is included in the sample, then the other $n-1$ sample units must be chosen from the remaining $N-1$ population units. Hence, the probability $\Pr\$ is equal to the number of samples of size $n$ which include $i$, given by $n-1\choose N-1$, divided by the number of size $n$ samples, given by $n\choose N$: $$ \Pr\ = \frac = \frac. $$ It follows that $$ \mathrm[\bar] = \frac\sum_^N y_i \mathrm[Z_i] = \frac\sum_^N y_i = \bar. $$ Therefore, the sample mean $\bar$ is an unbiased estimator of the population mean $\bar$.
answered Apr 29, 2017 at 22:51 24.9k 4 4 gold badges 84 84 silver badges 125 125 bronze badges$\begingroup$ Thanks for the proof. But actually I am trying to understand Cochran's particular proof, specifically his remark about the terms on each side. $\endgroup$
Commented Apr 29, 2017 at 23:00$\begingroup$ I probably should have stated in the question, Cochran gives a rigorous proof. But he also gives this alternative proof that I was trying to make sense of. $\endgroup$
Commented Apr 30, 2017 at 0:58 $\begingroup$"The number $A$ must be some multiple of the number $B$." is true because the number $A/B$ exists, unless $B=0$. So Cochran's meaning may take some effort to discern.
Suppose $N=5$ and $n=3$. Then the possible samples (assuming it's without replacement) are these: \begin & y_1 + y_2 + y_3 \\[3pt] & y_1 + y_2 + y_4 \\[3pt] & y_1 + y_2 + y_5 \\[3pt] & y_1 + y_3 + y_4 \\[3pt] & y_1 + y_3 + y_5 \\[3pt] & y_1 + y_4 + y_5 \\[3pt] & y_2 + y_3 + y_4 \\[3pt] & y_2 + y_3 + y_5 \\[3pt] & y_2 + y_4 + y_5 \\[3pt] & y_3 + y_4 + y_5 \end There are $10$ of these, so the average value of the sum is the average of these $10$ sums. The number $y_1$ appears in six of these sums, and $y_2$ appears in six of them, and so one. Thus "every unit appears in the same number of samples", since in this case, that "same number" is $6$. Thus the sum of the ten numbers is $6y_1+6y_2+6y_3+6y_4+6y_5.$ Dividing by $10$ gives the average, or expected value, of the random sample of size $3$. Hence \begin \text = \frac \\[10pt] = \frac 6 (y_1+y_2+y_3+y_4+y_5) = \frac 3 5 (y_1+y_2+y_3+y_4+y_5) \end This ratio, $\dfrac 3 5,$ is the ratio of the sample size to the population size. That is the "multiplier". I expect there is a short elegant combinatorial argument showing that that will happen in general, i.e. with other numbers than $3$ and $5$, and also making Cochran's argument precise, but I'm not sure how best to express that right now.