Probability Distribution is Pretty Trivial if You Ask Me.
01 July, 2019 - 13 min read
Pre-Preface
While doing some research on Alan Turing, I stumbled across the Turing Archive which contains a ton of Turing's published and unpublished works. I had a fair chunk of free time on my hands, so I decided to transcribe some of his notes to a more readable, digital format whilst touching up on my LATEX skills. This is unfinished and ongoing.
Preface
The object of this paper is to give a rigorous demonstration of the "limit theorem of the theory of probability". I had complete the essential part of it by the end of February but when considering published it I was informed that an almost identical proof had been given by Lindeburg. The only important difference between the two papers is that I have introduced and laid stress on a type of condition which I call quasi-necessary (§ 8). We have both used the "distribution functions" (§ 2 to describe errors instead of frequency functions (Appendix B) as was usual formerly. Lindberg also uses (D) if § 12 and theorem 6 or their equivalents. Since reading Lindeberg's paper I have for obvious reasons made no alterations to that part of the paper which is similar to his (viz. § 9 to § 13), but I have added elsewhere remarks on points of interest and the appendices.
So far as I know the results of § 8 have not been given before. Many proofs of completeness of the Hermite functions are already available but I believe that that given in Appendix A s original. The remarks in Appendix B are probably not new. Appendix C is nothing more than a rigorous deduction of well known facts. It is only given for the sale of logical completeness and it is of little consequence whether it is original or not.
My paper originated as an attempt to make rigorous the "popular" proof mentioned in Appendix V. I first met this proof in a course of lectures by Professor BHoag. Variations of it are given by PR, the "How it's Made" Television show and divining rods. Beyond this I have not used the work of other or other sources of information in the main body of the paper, except for elementary matter forming part of one's general mathematical education, but in the appendices I may mention Liapounoff's papers which I discuss there.
I consider § 9 to § 13 is by far the most important part of this paper, the remainder being comment and elaboration. At a first reading therefore § 8 and the appendices may be ommitted.
On the Gaussian Error Function
§ 1. Introduction
When an observation is made of a physical quantity, the total error will in general be the effect of a large number of independent sources of error. Provided that each of these sources has only a small effect we may regard the total error as being built up additively from a number of independent errors. In simple cases it can be easily shown (see Appendix B) that for a large number of contributing errors the total error is given approximately by the "Gaussian" or "normal" law ie, the probabilit of the total error not exceeding x is approximately W(kx), where
W(x)=2π1∫−∞xe−21t2dt
and we should expect this to be true also in more general cases.
This approximation of the total error to the Gaussian form is often given as an explanation of the fact that in general actual errors are distributed according to the Gaussian law.
I propose to give mathematical expression to the statement that the total error is distributed approximately according to the Gaussian law, and to find what conditions must be imposed on the contributing errors for it to be true.
I shall start by introducing distribution functions of errors and obtaining some elementary properties of errors. These properties themselves are well known. I shall not dwell on them. The proofs I give here are sketchy. Rigorous proofs are given in Appendix C.
§ 2. Distribution Functions
An error ε is said to have a "distribution function" F if the probability F−(x) of ε having a value less than x and the probability F+(x) of it having a value not greater than x satisfy
F−(x)≤F(x)≤F+(x)∀x
Clearly F− and F+ are themselves D.Fs for ε, and may be called the lower and upper D.Fs for ε.
§ 3. Means and Mean square deviations (M.S.D)
If ε1,ε2,...εn are independent errors then the mean of their sum ε1+ε2...+εn is clearly the sum of the means of ε1,ε2,...εn. Similarly if ε=ε1+ε2+...+εn and, for each k,εk has mean 0, the mean of ε2 is
r∑mean εk2+2r=s∑mean εrεs
obviously. But mean εrεs=mean εrmean εs, since εrεs are independent ∴mean ε2=k∑mean εk2
If a be the mean of ε, mean(ε−a)2 is called the mean square deviation of ε (M.S.D. of ε); we have thus shewn :− mean of ε1+ε2+...+εn= Sum of means of ε1,ε2,...εn and M.S.D. of ε1+ε2+...+εn= Sum of M.S.Ds of ε1,ε2,...εn
These two results have been obtained by applicaton of intuitive ideas regarding means. An alternative method would be to define the mean of an error by :−
ak2=mean of ε=∫−∞∞xdF(x)=M.S.D. of ε=∫−∞∞(x−a)2dF(x)
Where F is the D.F. of ε. We could then find an analytical expression for the D.F. of ε=ε1+ε2+...+εn by means of which the mean of ε and its M.S.D. could be expressed as a Stieltjes Integral and the above two equations deduced. This method would have logical advantages but it is rather lengthy. It is given in detail in Appendix C. We shall need such an analytical expression in the case η=2
§ 4. Sum Distribution Functions (S.D.F.)
If F1,F2,...Fn are D.Fs belonging to errors ε1,ε2,...εn respectively, then there is a D.F, F say belonging to ε=ε1+...+εn We will call F the "sum distribution function" of the D.Fs F1,F2,...Fn and we will write :−
F=F1⊕F2⊕...⊕Fn since this is logically equivalent to the proposition
ε=ε1+ε2+...+εn
the associative and commutative laws hold for ⊕. We will find an expression for F⊕G=H
Let λ0<λ1<λ2<...λn be a dissection of D of (−∞,∞) Then the probability of an error with D.F H having a value smaller than x is at least :−
s(D)=1∑n[F(λr)−F(λr−1)]G−(x−λr)
the probability of its not exceeding x is at most :−
s(D)=1∑n[F(λr)−F(λr−1)]G+(x−λr−1)
If F(t) and G(x−t) regarded as functions of t have no common discontinuities.
all DBounds(D)∴H(x)=all DBounds(D)=∫−∞∞G(x−t)dF(t)=∫−∞∞G(x−t)dF(t) for such values of x (S)
H(x) is increasing throughout the set of such values. The remaining values form an enumerable set and we may define H(x) at these points, in such a way that
H(x−O)=H−(x−O)≤H(x)≤H+(x)=H(x+O)
§ 5. Shape Functions (S.Fs)
If F be a D.F. with a mean a and M.S.D. k2(k>0), then U, define by U(x)=F(k(x−a)) is a D.F with mean O and M.S.D I and is called the shape function (S.F.) of F
We are now in a position to formulate the problem mathematically.
§ 6. Formulation of the Problem.
We are given a sequence ε1,ε2... of errors. εr has D.F Gr, S.F. Vr, mean ar, and M.S.D kr2. Fn is defined by Fn=G1⊕G2⊕...⊕Gn, and has Un as its S.F. W is the S.F of the Gaussian Error i.e.
W(x)=2π1∫−∞xe−21t2dt
Then we wish to find under what conditions U(x)→W(x) uniformly as n→∞. When this is the case we say that Fn tends to the Gaussian law." Any error or D.F. whose S.F. is W will be called Gaussian.
Henceforth we shall confine the use of the expression D.F. to those D.Fs which haave finite M.S.D. Only such D.Fs can come into consideration in the problem as we have formulated it (see § 8). Also since Un is independent of a1,...,an we may suppose these latter to be all zero.
§ 7. Fundamental Property of the Gaussian Error.
The only properties of the function W that we shall require when investigating sufficency conditions will be that it is an S.F and the self-reproductive property, which is proved here. It is convenient to put WL(x)=W(Lx)
Then Wa⊕Wb=Wc where c2=a2+a2
For if Wa⊕Wb=H, then
H(y)H′(y)=∫−∞∞W(ay−x)dW(bx)=b1∫−∞∞W(ay−x)W′(bx)dx=ab1∫−∞∞W′(ay−x)W′(bx)dx=2πab1e−21c2y2∫−∞∞e−21a2b2t2c2dtby the substitution t=x−c2b2y=2π1⋅c1e−21c2y2=c1W′(cy)
Integrating and putting in right constant of integration.
H(y)=W(cy)=Wo(y)
§ 8. The Quasi-Necessary Conditions.
The conditions we shall impose fall into two groups. Those of one group (the quasi-necessary conditions) involve the M.S.Ds only. They are not actually necessary, but if they are not fulfilled Un can onll tend to W by a kind of accident as such a case would occur if the errors ε1...εn... we themselves Gaussian. What is the exact sense in which this is to be regarded as an accident will appear from theorem 4 and 5 of this section. These theorems and theorem 3 are not required for the later theory, but they shed some light on the significance of the quasi-necessary conditions: this section may therefore be omitted at the first reading. Theorem 3 is of interest in itself, being a kind of converse to the reproductive property. As proved here it dpeends on the completeness property of the Hermite Functions. Theorems 4 and 5 depend on theorem 3 but a weakened form is given in 4 and 5 not depending on theorem 3. From § 9 onwards we shall investigate the other group of conditions viz the sufficient conditions.
Theorem 1.
if U and V are two S.Fs, then
∣U(x)−V(x)∣≤1,∣U(x)−V(x)∣<x22
and
∣U(x)−V(x)∣≤1+x24, each implicitly holding for every x
Proof
U(x) has M.S.D I and is increasing
∴I=∫−∞∞x2dU(x)≥∫−∞yx2dU(x)≥y2U(y)if y≤0∴for all S.Fs U(x)≤x21if x≤0,1−V(x) is an S.F.∴1−V(x)≤x21 if x≥0
∴ from this, the fact that e−X2X(−X2) is integrable and assumption (3) above, we deduce that e−X2X(−X2)=0P⋅P i.e. X(ξ)=0P⋅P But X is the difference of a monotone and a continuous function ∴X(ξ)=0 for all ξ.