Strong approximation and a central limit theorem for St. Petersburg sums

The St. Petersburg paradox (Bernoulli 1738) concerns the fair entry fee in a game where the winnings are distributed as P ( X = 2 k ) = 2 − k , k = 1 , 2 , . . . . The tails of X are not regularly varying and the sequence S n of accumulated gains has, suitably centered and normalized, a class of semistable laws as subsequential limit distributions (Martin-L¨of (1985), Cs¨org˝o and Dodunekova (1991)). This has led to a clariﬁcation of the paradox and an interesting and unusual asymptotic theory in past decades. In this paper we prove that S n can be approximated by a semistable L´evy process { L ( n ) , n ≥ 1 } with a.s. error O ( √ n (log n ) 1+ ε ) and, surprisingly, the error term is asymptotically normal, exhibiting an unexpected central limit theorem in St. Petersburg theory. MSC 2010. 60E07, 60F05, 60F17.


Introduction
Let X, X 1 , X 2 , . . . be i.i.d. r.v.'s with P (X = 2 k ) = 2 −k , (k = 1, 2, . . . ) (1.1) and let S n = ∑ n k=1 X k . The asymptotic behavior of the sequence {S n , n ≥ 1} has attracted considerable attraction in the literature in connection with the St. Petersburg paradox concerning the 'fair' entry fee in a game where the winnings are distributed as X. We refer to Csörgő and Simons [10] for the history and bibliography of the problem. Feller [11] proved that lim n→∞ S n n log 2 n = 1 in probability (where log 2 denotes logarithm with base 2) and Martin-Löf [16] showed where G is the infinitely divisible distribution function with characteristic function exp(g(t)), where Let G γ denote the distribution with characteristic function exp(γg(t/γ) − it log 2 γ) and let γ n = n/2 [log 2 n]+1 ∈ [1/2, 1) be the parameter describing the location of n between two consecutive powers of 2, where [y] denotes the (lower) integer part of y ∈ R. Csörgő [6] proved that and determined the precise convergence rate. It follows (and actually it was proved earlier in [8]) that the class of subsequential limit distributions of S n /n − log 2 n is the class G = {G γ : 1/2 ≤ γ < 1}.
If n runs through the interval [2 k , 2 k+1 ], then G γn moves through the distributions G j/2 k+1 , 2 k ≤ j ≤ 2 k+1 representing, in view of G 1/2 = G 1 , a "circular" path in G. In view of (1.3), the distribution of S n /n−log 2 n also describes approximately a circular path, a remarkable asymptotic behavior called in [6] merging. From the merging theorem (1.3) and the results of [8] it also follows that given γ ∈ (1/2, 1) and an increasing sequence (n k ) of integers, the limit relation holds iff γ n k → γ as k → ∞. For γ = 1/2 this criterion breaks down and (1.4) holds iff the sequence γ n k has no other cluster points than 1/2 and 1.
Using a decomposition idea of Le Page, Woodroofe and Zinn [15], in [3] a new representation of the limiting semistable variable of Petersburg sums was given, simplifying the theory considerably and leading to new asymptotic information. Let Ψ(x) denote the function on (0, ∞) which grows linearly from 1 to 2 on any interval [2 k , 2 k+1 ), (k ∈ Z), let η 1 , η 2 , . . . be independent exponential random variables with mean 1 and let Z k = ∑ k j=1 η k . In [3], Lemma 2 it was proved that for any 1 ≤ γ < 2 the series converges absolutely with probability 1 and the limit distribution G γ above is iden- We note that for each γ ∈ [1/2, 1) we have where the ε k 's are the dyadic digits of γ given by γ = ∑ ∞ k=1 ε k 2 −k and the function ξ was introduced by Csörgő and Simons [9], see also Kern and Wiedrich [14]. In contrast to the representation of the limiting semistable variable of St. Petersburg theory as an infinite weighted sum of independent Poisson variables in [8], the terms of the sum (1.5) are dependent random variables. For an analogous representation of stable random variables, see [15]. A similar representation, implicit in [3], holds for the partial sums S n , namely 1 n S n − a n,γn + ε n a n,γn (1.6) where ε n = Z n+1 /n−1, γ n = n/2 [log 2 n]+1 is the dyadic location parameter introduced above and For a simple proof, see Section 2. Note that the equality in (1.6) holds only in distribution and thus (1.6) yields an expansion of S n in the sense of Strassen: S n can be redefined on a suitable probability space together with a sequence (ε n ) of i.i.d. exponential random variables such that setting Z n = ∑ n k=1 ε k , (1.6) holds pointwise. This makes the formula easy to apply, in particular, (1.6) makes the asymptotic theory of St. Petersburg sums very transparent. (Note the difference between (1.6) and the Edgeworth expansion of S n in [7], [17] giving an expansion of the distribution function of S n . The expansion (1.6) is particularly convenient for problems involving almost everywhere convergence and asymptotics.) By the law of the iterated logarithm we have ε n = O(n −1/2 (log log n) 1/2 ) a.s. and an easy calculation shows that replacing ε n by 0 in (1.6) results in an error of o P (1) on the right hand side, and thus we get the result 1 n S n − a n,γn = Y (γn) + o P (1), (1.8) which is meant again in the sense that for each fixed n the variables S n and Y (γn) can be defined on a common probability space such that (1.8) holds. Relation (1.8) thus yields a pointwise version of the merging result (1.3). The purpose of the present paper is to prove that actually much more is valid: the partial sum process of (X n ) can be approximated by a semistable Lévy process ) and an asymptotically normal error term, establishing an unexpected central limit theorem in St. Petersburg theory.
where g is the function in (1.2). Then on a suitable probability space one can define the St. Petersburg sequence (X n ) and the process and for some sequence a n ≍ (n log n) 1/2 we have Here c n ≍ d n means that the ratio c n /d n lies between positive constants. For an explicit construction of a n , see (2.22). Due to the irregular tail behavior of the random variables in our construction (see the proof of Lemma 2.1), it seems likely that a n ≍ (n log n) 1/2 in Theorem 1.1 cannot be replaced by a n ∼ c(n log n) 1/2 with a constant c.
The process L(t) was introduced by Martin-Löf [16] who proved the scaling relation From this it follows that the transformation t −→ 2t does not change the distribution of the process In view of the atomic Lévy measure in the characteristic function of Z(1), its distribution is not stable. It also follows that (1.13) showing that L(n)/n − log 2 n exhibits the merging behavior (1.3) in an ideal way, with zero error. Thus Theorem 1.1 gives an invariance principle for the merging result (1.3) and actually, for a class of further limit theorems for (X n ). It shows also the surprising fact that the partial sum process of (X n ) can be represented as a semistable Lévy process with an asymptotically normal perturbation.
In a previous paper [1], a strong approximation of St. Petersburg sums with the weaker remainder term O(n 5/6+ε ) and without the asymptotic normality of the error term was proved by a standard blocking argument. The proof in [1] works for a large class of i.i.d. sequences (X n ) in the domain of geometric partial attraction of a semistable law G. In contrast, the proof of Theorem 1.1 uses the structure of the St. Petersburg sequences in a substantial way and whether Theorem 1.1 remains valid for a larger class of i.i.d. sequences remains open.
Weak and strong approximation of partial sums of i.i.d. random variables (X n ) in the domain of attraction of stable laws were proved in Stout [21], Simons and Stout [20], Berkes and Dehling [2]. The remainder terms there are given in terms of the function β(x) = x α |P (X 1 < x) − G(x)|, where G is the limit distribution, and are rather complicated. In the case when β(x) is a slowly varying function tending to 0, lower bounds for the remainder term (valid for any construction) are also given in [2], leaving only a small gap between the upper and lower bounds. However, in the case of the stable analogue of St. Petersburg sums, when G is a stable distribution with parameters α = 1, β = −1 (see e.g. [13], p. 164), we have β(x) = O(x −γ ) for some γ > 0 and no lower bounds for the remainder term have been found. For the same reason, we do not have universal lower bounds for the remainder term in the St. Petersburg game and thus, even though Theorem 1.1 determines the precise stochastic order of magnitude of the error term for a specific construction, the question whether other constructions can give a better error term remains open.

Proofs
We first prove (1.6). Clearly Let F denote the distribution function of X 1 and let F −1 (x) = inf{t : F (t) ≥ x} be its (generalized) inverse. Then
We turn now to the proof of Theorem 1.1, which uses, as in [2], [20], [21], a termwise approximation of partial sums. As it turns out (see Lemma 2.1 below), the termwise error in this approximation is determined by the second term of the expansion (1.5) whose tails were shown in [3] to be ≍ x −2 . This implies that the termwise error is in the domain of attraction of the normal law, explaining relation (1.11) in Theorem 1.1. The crucial influence of the second term of the expansion (1.5) in our approximation problem is similar to the convergence of Markov chains to the stationary distribution whose speed is determined by the second largest eigenvalue of the transition matrix.

Lemma 2.1 A St. Petersburg variable X with distribution (1.1) and a random variable Y distributed as Y (1) in (1.5) can be jointly defined on a suitable probability space such that
for some positive constants c 1 , c 2 , x 0 .

Proof.
Put We show that (2.5) holds with X = W 2 , Y = W 3 . Clearly, the distribution function of Z 1 is G(x) = 1 − e −x , (x ≥ 0) and thus U = G(Z 1 ) = 1 − e −Z 1 has distribution U (0, 1). Next we observe that for any k ∈ Z the function Ψ(u)/u equals 2 k for u ∈ [2 −k , 2 −k+1 ) and thus for a fixed x ∈ [2 ℓ , 2 ℓ+1 ), ℓ ∈ Z, the inequality Ψ(u)/u > x holds iff u < 2 −ℓ . Therefore for x ∈ [2 ℓ , 2 ℓ+1 ) we have If x ≥ 1, then ℓ ≥ 0 and thus the last probability in (2.7) equals 2 −ℓ ; otherwise ℓ < 0 and the last probability in (2.7) equals 1. Thus W 2 is a St. Petersburg variable. On the other hand, W 3 has distribution Y (1) in (1.5) and thus to prove Lemma 2.1 it suffices to show that We first prove that with some positive constants c 3 , c 4 , x 1 . As already noted, for any k ∈ Z the function Ψ(x)/x equals 2 k on the interval I k = [2 −k , 2 −k+1 ). Let now k ≥ 2 and assume Thus if Z 1 ∈ I k , then for k ≥ 2 we have by (2.10) showing that 1 − e −Z 1 ∈ I k or 1 − e −Z 1 ∈ I k+1 . Thus equals 0 or 2 k+1 − 2 k = 2 k according as 1 − e −Z 1 belongs to I k or I k+1 . In view of (2.10) the second alternative implies and thus Z 1 is closer to the left endpoint of I k than 2 −2k+1 . But then Z 1 ∼ 2 −k , as k → ∞ and thus by (2.10) Consequently, the relation 1 − e −Z 1 ∈ I k+1 , or, equivalently, 1 − e −Z 1 < 2 −k holds iff We thus proved that the difference ∆ in (2.11) equals 2 k on a set A k in the probability space, where the A k are disjoint for k ≥ k 0 , P (A k ) ∼ 1 2 2 −2k and otherwise ∆ = 0. Hence where k 0 = k 0 (x) denotes the smallest integer such that 2 k 0 > x. Thus if x runs in the interval (2 s , 2 s+1 ) for some integer s ≥ 1, then x 2 P (∆ > x) runs from 1 6 + o s (1) to 4 6 + o s (1) as s → ∞, which proves (2.9) and we also see that x 2 P (∆ > x) and thus x 2 P (|W 1 − W 2 | > x) fluctuates between positive constants, without a limit.
Next we observe that is a tail sum of the series representing Y (1) in (1.5) whose tail behavior is described by Theorem 5 of [3]; in particular we have with suitable positive constants c 5 , c 6 . Theorem 5 of [3] also shows that x 2 P (|W 1 − W 3 | > x) has no limit as x → ∞. Now (2.9) and (2.12) imply proving the upper half of (2.8). To prove the lower half, we note that For any t ≥ 0, set , (2.15) where Z * k = ∑ k j=2 η j for k ≥ 2. We claim that there exists a positive constant C such that for any 0 ≤ t ≤ 1 we have (2.16) Since the sequence (Z * k ) has the same distribution as (Z k ), for t = 0 relation (2.16) follows from Lemma 2 of [3]. As inspection shows, the properties of (Z k ) used in the proof in [3] remain valid for the sequence (Z k + t) for any fixed t ≥ 0 and moreover, the inequalities in [3] hold uniformly for 0 ≤ t ≤ 1, proving (2.16). Now, conditionally on Z 1 = t, W 3 − W 1 becomes V t in (2.15) which is independent of η 1 = Z 1 and thus of ∆ = |W 1 − W 2 + 1| in (2.11) and consequently for x ≥ x 0 where ∆(t) is the expression in (2.11) with Z 1 replaced by t. If Z 1 is bounded away from 0, then |W 1 − W 2 | is bounded above, or putting differently, if |W 1 − W 2 | is large, then Z 1 is near 0. Thus integrating (2.17) over 0 ≤ t ≤ 1 with respect to P (Z 1 ∈ dt) we get for sufficiently large x, where in the last step we used (2.9). Now using (2.12), (2.14) and (2.18) we get the lower half of (2.8).
Proof of Theorem 1.1. For the vector (X, Y ) in Lemma 2.1, let H denote the distribution function of X − Y and put Using Lemma 2.1 and integration by parts we get provided that x and −x are continuity points of H. Using Lemma 2.1 again for the last integral it follows that with suitable positive constants c 8 and c 9 . Thus lim x→∞ U (2x)/U (x) = 1, i.e. the nondecreasing function U is slowly varying. Further, (2.5) implies that H has a finite expectation. Let now (X n , Y n ) be i.i.d. copies of the vector (X, Y ) in Lemma 2.1. By the slow variation of U , X − Y is in the domain of attraction of the normal law, specifically we have 1 a n where c = E(X − Y ) and (See e.g. [12], p. 580, Theorem 3 and the comment after (5.23) on page 579.) Using (2.22) and the first relation of (2.20), we get by a simple calculation c 10 (n log n) 1/2 ≤ a n ≤ c 11 (n log n) 1/2 (2.23) with suitable constants c 10 , c 11 . Recall now that along the sequence n = 2 k we have where G = G 1/2 is the semistable distribution defined after (1.2). The first relation here follows from (1.4) and the second from (1.13), since ∑ n k=1 Y k d = L(n). Relation (2.23) shows that replacing 1/a n by 1/n in (2.21), the left hand side will converge to 0 in probability and adding the second relation of (2.24) yields