Strong approximation of the St. Petersburg game

ABSTRACT Let be i.i.d. random variables with and let . The properties of the sequence have received considerable attention in the literature in connection with the St. Petersburg paradox (Bernoulli 1738). Let be a semistable Lévy process with underlying Lévy measure . For a suitable version of and , we prove the strong approximation a.s. This provides the first example for a strong approximation theorem for partial sums of i.i.d. sequences not belonging to the domain of attraction of the normal or stable laws.


Introduction
Let X, X 1 , X 2 , . . . be i.i.d. random variables with P(X = 2 k ) = 2 −k (k = 1, 2, . . .) and put S n = n k=1 X k . The study of the sequence {S n , n ∈ N} has received considerable attention in the literature in connection with the St. Petersburg paradox (Bernoulli 1738) concerning the 'fair' entry price for a game where the winnings are distributed according to X. Martin-Löf [1] proved that where G is the semistable distribution with characteristic function exp(g(t)), where He also proved ([1, Theorem 1]) that if n k ∼ γ 2 k , 1 ≤ γ < 2, then where G γ denotes the distribution with characteristic function exp(γ g(t/γ ) − it Log γ ) and Log n denotes logarithm with base 2. Letting γ n = n/2 [ Log n] (where [·] denotes integral part), Csörgö [2] proved that sup x P S n n − Log n ≤ x − G γ n (x) −→ 0 as n → ∞ and determined the precise rate of convergence. Relation (3) shows that the class of subsequential limit distributions of S n /n − Log n is the class Moreover, if n runs through the interval [2 m , 2 m+1 ] then, with an error tending to 0 as m → ∞, the distribution of the variable S n /n − Log n runs through the elements of the discrete set (Note that G 1 = G 2 , so that the motion is 'circular' in G.) This remarkable behaviour was called merging in [2]. Csörgö and Dodunekova [3] showed that merging holds for extremal and trimmed sums of the sequence (X n ) as well and Berkes et al. [4] and del Barrio et al. [5] proved that the same holds for bootstrapped sums of (X n ).
Let Z(t) denote the Lévy process defined by The process Z(t) has been introduced by Martin-Löf [1] who proved the scaling relation From this it follows that the transformation t −→ 2t does not change the distribution of the process In particular, Z(2)/2 − 1 d = Z(1), and since Z(2) d = Z(1) Z(1), the distribution of Z(1) is semistable. In view of the atomic Lévy measure in the characteristic function of Z(1), its distribution is not stable. It also follows that showing that the distribution of the sequence Z(n)/n − Log n exhibits the merging behaviour (3) in an ideal way, i.e. the left-hand side of Equation (3) is equal to 0 for all n. Hence in analogy with strong approximation theory under finite variances, it is natural to ask if the process {S n , n ≥ 1} can be approximated, in the almost sure sense, by the semistable process {Z(n), n ≥ 1} with a good remainder term. Such an approximation would naturally yield much more information on the behaviour of the partial sums S n than their weak limit behaviour. The purpose of this paper is to prove such a strong approximation result. More precisely, we will prove the following . .) and let S n = k≤n X k . Let Z(t) be the Lévy process defined by Equation (5), with g given by Equation (2). Then without changing their distributions, the processes {S n , n ≥ 1} and {Z(n), n ≥ 1} can be defined on a common probability space such that for any ε > 0.
As in the case of i.i.d. sequences with finite variances, our theorem implies the functional (Donsker type) version of Equation (1), as well as the almost sure central limit theorem in [6]. As the deductions are routine, we omit the details.
Our theorem can be extended for the class of i.i.d. sequences X, X 1 , X 2 , . . . satisfying for some x 0 > 0, where c 1 ≥ 0, c 2 ≥ 0, 0 < α < 2 are constants and ψ is a bounded periodic function. However, since the proof requires lengthy calculations and no new ideas, we do not give the details here. Note that such i.i.d. sequences belong to the domain of geometric attraction of semistable laws, see Grinevich and Khokhlov [7] for a precise characterization of this class in terms of characteristic functions. Also, as shown by Csörgö and Megyesi [8], for partial sums of i.i.d. sequences belonging to this class, an analogue of the merging relation (3) holds.
It seems likely that the exponent 5/6 in Equation (7) is far from optimal, but since for applications all exponents < 1 suffice and we do not know the optimal exponent, we will not investigate this problem here. Finding the optimal remainder term is unsolved even in the case of stable limit distributions. In the case of symmetric X, upper bounds for the remainder term in the stable case are given in [9][10][11], while lower bounds are given in [10]. For example, in [10, p.339], it is shown that if X is symmetric with where β(x) = (log x) −γ , γ > 0, then the partial sums n k=1 X k can be approximated with a stable Lévy process Z(n) with a.s. remainder Note that for small γ , the gap between the two bounds is very small. Similar results hold for functions β(x) tending to 0 more slowly. On the other hand, the proof of lower bounds in [10] breaks down if β in Equation (8) decreases at least polynomially, thus even in the simplest symmetric case when P( In case of the St. Petersburg variable X it follows from the results of Berkes et al. [12] that the difference |P(X > x) − P(Y > x)| of the tails of X and the limiting semistable variable Y is O(x −(1+γ ) ) for some γ > 0 except near the points of discontinuity x = 2 k and again the method of Berkes and Dehling [10] yields no lower bounds in the invariance principle.

Proof
Let Y 1 , Y 2 , . . . be i.i.d. random variables with distribution G having characteristic function exp(g(t)) with g defined by Equation (2). Then letting Z * (n) = n k=1 Y k , the processes {Z(n), n ≥ 1} and {Z * (n), n ≥ 1} have the same distribution and thus our theorem states equivalently that the sequences (X k ), (Y k ) can be defined jointly on a suitable probability space such that Our proof will use a modification of the standard blocking technique. Using a remainder term in the merging theorem in [2], the blocking method yields the approximation (9) along a polynomially growing sequence (t k ) of n's. Unfortunately, the fluctuation of the partial sums of X n and Y n in the intervals [t k , t k+1 ] are too large for extending the approximation (9) for all n. However, as we are going to see, the difficulty is caused by a single large term X i and Y j within [t k , t k+1 ], and using a special coupling ensuring that the indices of the maximal terms of the sequences (X n ) and (Y n ) in the blocks [t k , t k+1 ] coincide, then removing these terms and using a minimax inequality of Billingsley [13] instead of a standard maximal inequality resolves the difficulty. This idea was used by Berkes et al. [9] in the context of stable R d -valued sequences and appears to have many further applications for heavy-tailed sequences.  (u)).
For the proof of Equation (10) see Berkes et al. [6]; the proof of the second relation is similar.

Lemma 2.2:
For any n ≥ 1, we have for some absolute constant C > 0, where π denotes the Prohorov distance. Clearly, the distribution of L is uniform on {1, 2, . . . , n}; in fact, it is uniform conditionally on any symmetric function of τ 1 , . . . , τ n , i.e. it is independent of S.

Proof of the Theorem:
We first enlarge the probability space to carry an i.i.d. sequence (ζ n ) of standard normal r.v. 's, which is also independent of (X n ). By the LIL for (ζ n ), it suffices to prove the theorem for the sequence (X * n ) where X * n = X n + ζ n . Also, with some constant C and thus Lemma 2.1 remains valid, with possibly different constants, for the sequence (X * k ). Further, Equation (12) implies that and thus Lemma 2.2 also remains valid for (X * n ) with (log n) 2 /n on the right-hand side of Equation (11) replaced by (log n) 2 / √ n. Since in the rest of the proof of the theorem, we use the properties of the St. Petersburg sequence (X n ) only through Lemmas 2.1 and 2.2, in the sequel, we can drop the stars and let X n denote the perturbed version of X n . As a consequence, the X n have continuous distribution. Let for some ρ > 3 chosen suitably later and The modified version of Lemma 2.2 implies that the Prohorov distance of the distribution of ξ k and of G γ n k is (log n k ) 2 / √ n k and since the underlying probability space is atomless (because of the continuity of the distribution of the X k 's), the proof of Theorem 2 of Berkes and Philipp [14] shows that on the same probability space there exists a sequence {η k , k ≥ 1} of independent random variables such that η k is measurable with respect to σ {ξ 1 , . . . , ξ k }, it has distribution G γ n k and where Here, and in the sequel, means the same as the O notation. Define L k by |X t k +L k | = max j∈H k |X j |. Since the X k have continuous distribution, Lemma 2.3 shows that L k is defined uniquely with probability 1, has uniform distribution on (0, n k ] ∩ Z and is independent of ξ k , and consequently of the whole sequence {ξ k , k ≥ 1}. Since the L k are independent, it follows that {L k , k ≥ 1} is independent of {ξ k , k ≥ 1} and since η k is measurable with respect to Let {Y i , i ≥ 1} be a sequence of independent random variables, defined on some probability space and with common characteristic function exp(g(u)). Denote by L * k the random variable defined by |Y t k +L * k | = max j∈H k |Y j |. Since the distribution of Y i is continuous (in fact, Y i has an infinitely many times differentiable density, see Csörgö [15]), by Lemma 2.3 L * k is well-defined, has uniform distribution on (0, n k ] ∩ Z and is independent of As we noted above, η * k has distribution G γ n k and thus the sequence {(η * k , L * k ), k ≥ 1} has the same distribution as {(η k , L k ), k ≥ 1}. We apply Lemma A1 of Berkes and Philipp [14] to the joint law F of the sequences {ξ i , i ≥ 1, η k , k ≥ 1} and {(η k , L k ), k ≥ 1} and the joint law G of the sequences {(η * k , L * k ), k ≥ 1} and {Y i , i ≥ 1} and the spaces S 1 = R ∞ × R ∞ , S 2 = (R × N) ∞ , S 3 = R ∞ . We obtain a joint law Q with marginals F and G, which we realize on some probability space . Hence, keeping the same notation we can set η k = η * k and L k = L * k . In summary, we have redefined the sequences {X i , i ≥ 1}, {ξ k , k ≥ 1} and {L k , k ≥ 1} without changing their joint law on a (possibly) new probability space, together with a sequence {Y i , i ≥ 1} of i.i.d. random variables with common characteristic function exp(g(u)) with the following properties: i.e. the location t k + L k of max i∈H k |X i | and max i∈H k |Y i | is the same. This together with Equation (14) yields: Using Equations (15) and (16) and since ρ > 3 implies ∞ k=1 α k < ∞, we get, using the Borel-Cantelli lemma, and hence using Equation (13), we find This estimates the difference | i≤n (X i − Y i )| for all n of the form n = t k . For general n, we need the following lemmas.

Lemma 2.4:
With probability 1 we have for any ε > 0 and sufficiently large k that and a similar statement holds for the Y j 's.
Proof: Let a 0 = 0 and a j = j Log j for j ≥ 1. We claim that Clearly, Equation (21) holds for λ < 18(j − i) Log N, since then the right-hand side exceeds 1. Assume now λ ≥ 18(j − i) Log N. Then we have, observing that |a j − a i | ≤ 2(j − i) Log N by the mean value theorem and trivially a j−i ≤ (j − i) Log N, we get where in the last step we used Lemma 2.1. Thus we proved Equation (21) and lettingX k = X k − (a k − a k−1 ),S n = k≤nX k = S n − a n , we get by the independence of theX j for any 1 ≤ i ≤ j ≤ k ≤ N and λ > 0, Hence using Theorem 12.1 of Billingsley [13] with γ = 1, α = 1 and u j = 18 Log N, we get for any N ≥ 1 and λ > 0, for some absolute constant C > 0. Clearly, replacingS k andS N in Equation (22) Hence choosing N = n k , λ = t 1−1/(2ρ)+ε k and using stationarity and the Borel-Cantelli lemma, we get the statement of Lemma 2.4 for the X j 's. The proof for the Y j 's is the same.

Lemma 2.5:
With probability 1 there exists a k 0 such that for all k ≥ k 0 , there is at most one index j ∈ H k with |X j | > t for the X and the Y process. Thus we proved t k <j≤n (X j − Y j ) ≤ 16t 1−1/(2ρ)+ε k , t k < n < t k + L k and n<j≤t k+1 completing the proof of Equation (24).