in high-school statistics, -test is a very important element that enriches the students’ power to put statistics in real life scenarios. however, it is rarely explained from the scratch except for a few words of ‘the central limit theorem’. where and how the clt is used, and how come the degree of freedom is one less that the number of categories? this is so unsatisfactory.
for categorical goodness-of-fit cases, the number of observation in each discrete case would be binomially distributed with parameter where is the size of the sample. applying the normal approximation:
however, it should be noted that and are not independent, as an observation can not be in two categories simultaneously. we can, however, find the covariance between and :
where we can show that
similar arguments can be made for test of independence.
notice that , or
where
where is a unit vector by definition. this tells us that has eigenvalues of 1 and one eigenvalue of 0, or equivalently put: there is a orthonormal transformation such that
this result is also known as the sylvester’s theroem.
this means the components of , except for the last 0, have independent standardised normal distribution. measure the vector in this frame:
finally we have
however, in the above, the so-called ‘normal approximation’ is rather suspicious…the core of the argument lies in the well known central limit theorem. what is it and what are the conditions of it?
the central limit theorem states that if a random variable is distributed with mean and standard deviation , then for a independant random sample of , we would have the following:
that is, the average error from the population distribution would be normally distributed and as the size of the sample increases, the deviation shrinks down. this is essentially the theoretical basis for multiple measurements to imporve the precision. however, the normality is rather intriguing, as the theorem makes no assumption on the population distribution as long as it has valid expectation and variance.
how come that the err automatically converges to a normal distribution? well, the proof is quite bland, we look for its density function.
talking about convolutions, generating function is the magic words one want to spell. remember that the generating function is defined by:
notice that , and .
readers with a bit exposure to the fourier analysis will easily identify that the generating function is actually a fourier transform of the density function:
if ’s are i.i.d., then we would have , that is
notice that when , thus we can approximate the logarithm of by taylor expansion:
thus,
that is
to recover the density function from a moment generating function would require an inverse transformation whose difficulty may vary depending on the actual format of . nevertheless, since we are trying to prove the central limit theorem, we know which distribution to look up, it would be a good idea to compare the moment generating function, or rather, the logarithmic moment generating function to that of a normal distribution.
the density function of is
thus
this proved the claim that , and finally we can have the ‘normal approximation’.
it should always be noted that aforementioned logics only applies if the distribution is discrete, for continuous distributions, test is not the go-to solution. then, what is the counterpart of it in the continuous population?
the central problem lies in that without categories, we no longer have any prior information of the distribution of observation. that is, we are no longer sure if the observation is normal, or asympototically normal, the question switched from a parametric problem to a non-parametric one.
a simple example can be: given the sample from i.i.d. , , we want to test the claim:
H0: .
where is some known distribution.
q-q plot is a great way to visualise the case. by sorting the sample to , we are not losing any information. notice that by null hypothesis, should be somewhere around either
plot on a plane, we are expecting to see a linear correlation if the null hypothesis is true. thus, by looking at the correlation of the two sequence we can decide the statistic inference. this would be a topic in our future discussion though.
the test is known as shapiro-francia test when is a normal distribution.