Probability Distribution is Pretty Trivial if You Ask Me.
01 July, 2019 - 13 min read
While doing some research on Alan Turing, I stumbled across the Turing Archive which contains a ton of Turing's published and unpublished works. I had a fair chunk of free time on my hands, so I decided to transcribe some of his notes to a more readable, digital format whilst touching up on my skills. This is unfinished and
The object of this paper is to give a rigorous demonstration of the "limit theorem of the theory of probability". I had complete the essential part of it by the end of February but when considering published it I was informed that an almost identical proof had been given by Lindeburg. The only important difference between the two papers is that I have introduced and laid stress on a type of condition which I call quasi-necessary (§ 8). We have both used the "distribution functions" (§ 2 to describe errors instead of frequency functions (Appendix B) as was usual formerly. Lindberg also uses (D) if § 12 and theorem 6 or their equivalents. Since reading Lindeberg's paper I have for obvious reasons made no alterations to that part of the paper which is similar to his (viz. § 9 to § 13), but I have added elsewhere remarks on points of interest and the appendices.
So far as I know the results of § 8 have not been given before. Many proofs of completeness of the Hermite functions are already available but I believe that that given in Appendix A s original. The remarks in Appendix B are probably not new. Appendix C is nothing more than a rigorous deduction of well known facts. It is only given for the sale of logical completeness and it is of little consequence whether it is original or not.
My paper originated as an attempt to make rigorous the "popular" proof mentioned in Appendix V. I first met this proof in a course of lectures by Professor BHoag. Variations of it are given by PR, the "How it's Made" Television show and divining rods. Beyond this I have not used the work of other or other sources of information in the main body of the paper, except for elementary matter forming part of one's general mathematical education, but in the appendices I may mention Liapounoff's papers which I discuss there.
I consider § 9 to § 13 is by far the most important part of this paper, the remainder being comment and elaboration. At a first reading therefore § 8 and the appendices may be ommitted.
On the Gaussian Error Function
§ 1. Introduction
When an observation is made of a physical quantity, the total error will in general be the effect of a large number of independent sources of error. Provided that each of these sources has only a small effect we may regard the total error as being built up additively from a number of independent errors. In simple cases it can be easily shown (see Appendix B) that for a large number of contributing errors the total error is given approximately by the "Gaussian" or "normal" law ie, the probabilit of the total error not exceeding is approximately , where
and we should expect this to be true also in more general cases.
This approximation of the total error to the Gaussian form is often given as an explanation of the fact that in general actual errors are distributed according to the Gaussian law.
I propose to give mathematical expression to the statement that the total error is distributed approximately according to the Gaussian law, and to find what conditions must be imposed on the contributing errors for it to be true.
I shall start by introducing distribution functions of errors and obtaining some elementary properties of errors. These properties themselves are well known. I shall not dwell on them. The proofs I give here are sketchy. Rigorous proofs are given in Appendix C.
§ 2. Distribution Functions
An error is said to have a "distribution function" if the probability of having a value less than and the probability of it having a value not greater than satisfy
Clearly and are themselves D.Fs for , and may be called the lower and upper D.Fs for .
§ 3. Means and Mean square deviations (M.S.D)
If are independent errors then the mean of their sum is clearly the sum of the means of . Similarly if and, for each has mean , the mean of is
obviously. But , since are independent
If be the mean of , is called the mean square deviation of (M.S.D. of ); we have thus shewn mean of Sum of means of and M.S.D. of Sum of M.S.Ds of
These two results have been obtained by applicaton of intuitive ideas regarding means. An alternative method would be to define the mean of an error by
Where is the D.F. of . We could then find an analytical expression for the D.F. of by means of which the mean of and its M.S.D. could be expressed as a Stieltjes Integral and the above two equations deduced. This method would have logical advantages but it is rather lengthy. It is given in detail in Appendix C. We shall need such an analytical expression in the case
§ 4. Sum Distribution Functions (S.D.F.)
If are D.Fs belonging to errors respectively, then there is a D.F, say belonging to We will call the "sum distribution function" of the D.Fs and we will write
since this is logically equivalent to the proposition
the associative and commutative laws hold for . We will find an expression for
Let be a dissection of of Then the probability of an error with D.F having a value smaller than is at least
the probability of its not exceeding x is at most
If and regarded as functions of have no common discontinuities.
is increasing throughout the set of such values. The remaining values form an enumerable set and we may define at these points, in such a way that
§ 5. Shape Functions (S.Fs) If be a D.F. with a mean and M.S.D. , then , define by is a D.F with mean and M.S.D and is called the shape function (S.F.) of
We are now in a position to formulate the problem mathematically.
§ 6. Formulation of the Problem.
We are given a sequence of errors. has D.F , S.F. , mean , and M.S.D . is defined by , and has as its S.F. is the S.F of the Gaussian Error i.e.
Then we wish to find under what conditions uniformly as . When this is the case we say that tends to the Gaussian law." Any error or D.F. whose S.F. is will be called Gaussian.
Henceforth we shall confine the use of the expression D.F. to those D.Fs which haave finite M.S.D. Only such D.Fs can come into consideration in the problem as we have formulated it (see § 8). Also since is independent of we may suppose these latter to be all zero.
§ 7. Fundamental Property of the Gaussian Error.
The only properties of the function that we shall require when investigating sufficency conditions will be that it is an S.F and the self-reproductive property, which is proved here. It is convenient to put
For if , then
Integrating and putting in right constant of integration.
§ 8. The Quasi-Necessary Conditions.
The conditions we shall impose fall into two groups. Those of one group (the quasi-necessary conditions) involve the M.S.Ds only. They are not actually necessary, but if they are not fulfilled can onll tend to by a kind of accident as such a case would occur if the errors we themselves Gaussian. What is the exact sense in which this is to be regarded as an accident will appear from theorem 4 and 5 of this section. These theorems and theorem 3 are not required for the later theory, but they shed some light on the significance of the quasi-necessary conditions: this section may therefore be omitted at the first reading. Theorem 3 is of interest in itself, being a kind of converse to the reproductive property. As proved here it dpeends on the completeness property of the Hermite Functions. Theorems 4 and 5 depend on theorem 3 but a weakened form is given in 4 and 5 not depending on theorem 3. From § 9 onwards we shall investigate the other group of conditions viz the sufficient conditions.
if and are two S.Fs, then
, each implicitly holding for every
has M.S.D and is increasing
combining these last two inequalities
Corollary and are integrable over
We shall now assume the following results concerning Hermite functions and polynomials.
(1) Definition the Hermite polynomial is defined by
(2) For each is a polynomial of degree and no les.
(3) Completeness If is integrable and for each
I give a proof of this property in Appendix A.
If is the difference of a monotone and continuous function, and are integrable over , and for all
Then, for all ,
We may differentiate any number of times under the integral sign. suppose in fact that
exists by hypothesis
from this, the fact that is integrable and assumption (3) above, we deduce that i.e. But is the difference of a monotone and a continuous function for all .