On the Extremal Theory of Continued Fractions

Letting x=[a1(x),a2(x),…]\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$x=[a_1(x), a_2(x), \ldots ]$$\end{document} denote the continued fraction expansion of an irrational number x∈(0,1)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$x\in (0, 1)$$\end{document}, Khinchin proved that Sn(x)=∑k=1nak(x)∼1log2nlogn\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$S_n(x)=\sum \nolimits _{k=1}^n a_k(x) \sim \frac{1}{\log 2}n\log n$$\end{document} in measure, but not for almost every x\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$x$$\end{document}. Diamond and Vaaler showed that, removing the largest term from Sn(x)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$S_n(x)$$\end{document}, the previous asymptotics will hold almost everywhere, this shows the crucial influence of the extreme terms of Sn(x)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$S_n (x)$$\end{document} on the sum. In this paper we determine, for dn→∞\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$d_n\rightarrow \infty $$\end{document} and dn/n→0\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$d_n/n\rightarrow 0$$\end{document}, the precise asymptotics of the sum of the dn\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$d_n$$\end{document} largest terms of Sn(x)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$S_n(x)$$\end{document} and show that the sum of the remaining terms has an asymptotically Gaussian distribution.

(We say that a k ∼ b k if lim k→∞ a k /b k = 1.) Thus, by the ergodic theorem, we have for any function F : a.e. (1.1) provided that the series on the right-hand side converges absolutely. The sequence {a k (x), k ≥ 1} has remarkable mixing properties. Gauss noted that the distribution of T n x = [a n+1 (x), a n+2 (x), . . .] with respect to the uniform measure in (0, 1) converges to μ and asked for the speed of convergence. (For a discussion, see [3], pp. 49-50 or [17], p. 552.) Kusmin [19] showed that the convergence speed is O(e −λ √ k ), and Lévy [21] improved this to O(e −λk ). Lévy's result implies that the sequence {a k (x), k ≥ 1} is ψ-mixing with exponential rate, i.e., for all A ∈ F k 1 , B ∈ F ∞ k+n , k ≥ 1, n ≥ 1, we have where ψ(n) = Ce −λn with positive absolute constants C, λ, and F s r denotes the σ -field generated by the variables {a k (x), r ≤ k ≤ s}.
Letting E denote expectation with respect to μ, we have Ea 1 = ∞ and correspondingly for F(x) = x the right-hand side of (1.1) is +∞. Thus the partial sums N k=1 a k (x) grow faster than N . Lévy [22] proved that where d −→ means convergence in distribution in the probability space ((0, 1), B, μ), and G is a stable distribution with characteristic function where κ = 0.577 . . . is the Euler-Mascheroni constant. See also Theorem 2, pp. 159-160 of Heinrich [14], where a remainder term estimate for the convergence in (1.3) is obtained. This implies that a result obtained earlier by Khinchin [18]. Khinchin also noted that (1.5) cannot hold almost everywhere. Diamond and Vaaler [9] showed that the obstacle to a.e. convergence in (1.5) is the occurrence of one single large term in the sum N k=1 a k (x) and established an a.e. analogue of (1.5) by excluding the largest summand. They proved namely N for any fixed d ≥ 2 and discarding more terms improves the rate of a.e. convergence in (1.6). An analogous result for the St. Petersburg game was proved by Csörgő and Simons [7]. For further analogies between continued fraction digits and the St. Petersburg game, we refer to Vardi [32]. In view of these facts, it is natural to ask what happens if from the sum so that the number of discarded terms is 'large', but is still negligible compared with N . The purpose of this paper was to answer this question. Let We will prove the following result.  N to that of η d,N , which is a much simpler problem. We will show in (3.15) that η d,N ∼ N /d in probability and since m(t) ∼ (log 2) −1 log t as t → ∞, Theorem 1.1 can be rewritten equivalently as 10) where ζ N d −→ N (0, 1/ log 2). Here and in the sequel, P −→ will denote convergence in probability and o P (1) a quantity converging to 0 in probability. Relation (1.10) shows that N m(η d,N ) is the main term in an asymptotic expansion of S (d) N . As a comparison, write Lévy's limit theorem (1.3) in the form where ζ * N converges in distribution to the Cauchy variable with characteristic function (1.4). In addition to the change of the order of magnitude of S N caused by removing the d largest terms, note that the Cauchy fluctuations of S N around 1 log 2 N log N described by (1.11)  Thus, in this case, the order of magnitude of S (d) N is the same as that of the complete sum S N , i.e., the contribution of the d largest terms of S N is still negligible compared with the whole sum. If d ∼ N γ for some 0 < γ < 1, then We thus see that the removal of a small portion of extreme elements of S N changes the asymptotic order of magnitude of the sum; hence the role of large elements in S N is very substantial. In case of i.i.d. variables in the domain of attraction of a stable law with parameter 0 < α < 2, the effect of the extremal terms on the partial sums is well known. For positive variables, Darling [8] showed (see also Arov and Bobrov [1]) that under some additional regularity assumptions, the ratio of the sum and its largest term has a nondegenerate limit distribution if 0 < α < 1, and this holds also for 1 < α < 2 provided we center the partial sum by its mean. The case α = 1 is critical and is not covered in [1,8]. The sequence {a k (x), k ≥ 1} in the continued fraction expansion corresponds to this case, except that the variables a k are weakly dependent. Theorem 1.1 and its corollaries above show that the contribution of the d largest terms of S N is negligible (in probability) compared with the total sum S N if and only if log d/ log N → 0. In particular, this holds for d = 1, i.e., in the case of the largest term. In the i.i.d. case, Csörgő et al. [6] also showed that removing the d largest and d smallest elements from the partial sum, where (1.7) holds, the remaining sum S There is a large literature on the metric properties of continued fractions and using the exponential ψ-mixing property of the transformation T above, many classical limit theorems for partial sums of independent random variables have been extended to continued fractions. We refer to Doeblin [10], Gordin and Reznik [13], Ibragimov [15], Iosifescu [16,17], Philipp [23,25], Philipp and Stackelberg [26], Samur [27,28], Stackelberg [29], Szewczak [30] and the references therein. Using the extremal theory of dependent processes, (see, e.g., Leadbetter and Rootzen [20]), asymptotic properties of the (individual) extremes of (a 1 (x), . . . , a n (x)) can be established; limit theorems for the largest digit were obtained by Galambos [11,12], Philipp [24]. Note that an analogue of Theorem 1.1 for a different, less natural trimming of the partial quotients a j was obtained in Philipp [25].
In Sect. 2, we will prove Theorem 1.1 in a probabilistic form and we will change the notation accordingly.
where W is the Wiener process.
i=1 X i by removing the d − 1 largest terms, and thus, the conclusion of Theorem 1.2 for t = 1 reduces to that of Theorem 1.1. However, for integer valued variables X n , η d,n can appear in the sequence (X 1 , . . . , X n ) more than once and in this case the number of terms of the sum [nt] i=1 X i exceeding η n,d can be smaller than d −1 and can actually be random. Thus, in a formal sense, Theorem 1.1 is not a special case of Theorem 1.2. However, using a simple perturbation argument, Theorem 1.1 will be deduced from Theorem 1.2.

Let
We will derive Theorem 1.2 from the following two-dimensional limit theorem.

Theorem 1.3 Under the assumptions of Theorem 1.2, we have
where {W (t, s), t ≥ 0, s ≥ 0} is a two-parameter Wiener process.
As we already noted, under (1.7) we have Since the limit process W (t, s) in (1.15) has continuous trajectories a.s., Theorem 1.3 and Billingsley [4], pp. 144-145 imply that which is exactly the functional central limit theorem (1.14).
In conclusion we note that Theorem 1.2 and Theorem 1.3 remain valid assuming a suitable polynomial ψ-mixing rate instead of the exponential rate. However, as this requires extensive changes in the arguments and we do not know of any practically interesting examples for ψ-mixing sequences with polynomial rate, we omit the details.

Some Lemmas
In the rest of the paper, (X k ) denotes a sequence of random variables satisfying the conditions of Theorem 1.2 and d = d n denotes a sequence of positive integers satisfying (1.7). Moreover, c 0 denotes the constant in (1.12). Given a process Y (s, t) defined This is a special case of a general tightness condition in Bickel and Wichura [2].
Then X Y is also integrable and This follows from Theorem 3.10 in Bradley [5], p. 75.

Lemma 2.3
Let G k denote the σ -field generated by X k , let n 1 < · · · < n r be positive integers and let Y 1 , . . . , Y r be bounded r.v.'s such that Y j is G n j measurable ( j = 1, 2, . . . , r ). Then Proof This is immediate by induction upon observing that by the previous lemma we have for any 1 ≤ j ≤ r − 1

2)
Moreover, for any fixed 0 ≤ s 1 < s 2 , we have and for any fixed 0 < s 1 < s 2 and sufficiently large n Here C 1 , C 2 are positive constants depending only on the sequence (X k ).
This is immediate from (1.12).
To prove (2.6), consider a generic term of the left hand side of (2.6). Fix r ≥ 0 and sum those covariances in (2.7) where The following central limit theorem for φ-mixing sequences is due to Utev [31].

Proof of Theorem 1.3
Put for all M ≥ 1, J ≥ 1, real coefficients μ m, j , 0 = s 0 < s 1 < s 2 < · · · < s J < ∞, 0 = t 0 < t 1 < · · · < t M = 1. Clearly, Z is a normal random variable with mean zero and and thus relation (3.2) is equivalent to Since the terms of the sum in (3.4) are random variables with disjoint support, by relation (2.3) of Lemma 2.4 we have Consequently, letting we get Further, Lemma 2.5 implies for n → ∞ Also, by (3.4) and the second relation of (2.2) we have We apply now Lemma 2.6 for the triangular array Since {X j , j ≥ 1} is ψ-mixing with exponential rate, the array (3.9) satisfies the φmixing condition (2.8 (3.11) holds for each i j ∈ {12, 21} with some constant C * > 0. Moreover, since U n (t, s) is constant on intervals k/n ≤ t < (k + 1)/n, by the last statement of Lemma 2.1 we may assume that nt, nt 1 and nt 2 are all integers. To prove (3.11), we introduce the notations i .
Using Lemmas 2.3 and 2.5 and (1.13), we get for n ≥ n 0 . On the other hand, where we put i .
The expression in the third line of (3.12) equals the sum of all expressions k Y (2) ), (3.13) where nt 1 + 1 ≤ i, j, k, ≤ nt. The following facts can be verified by elementary calculations using Lemmas 2.2-2.4: where means the same as the O notation, with an implied constant depending on the sequence (X n ). We prove relation (d), the proof of (a), (b), (c) is similar (and simpler).
We have Clearly, X (1) i and X (2) i are supported on different sets and thus X (1) i X (2) i = 0. Thus, among the nine terms above, the first, second, fourth and fifth are equal to 0. Also, the second and third statement of Lemma 2.4 imply, in view of 1/2 ≤ s 1 < s < s 2 ≤ 3/2, for n ≥ n 0 . This shows that the remaining five terms of the nine-term sum above are (n/d)(s − s 1 )(s 2 − s), proving statement (d) above. Statements (a), (b) and (c) can be proved similarly.
We can now estimate the expressions in (3.13). We will distinguish four cases according as i, j, k, are all different, or the number of different ones among them is 1, 2 or 3. Consider first the case when i, j, k, l are all different, say i < j < k < ; (2) and using that E X = 0, we get that the absolute value of the expression (3.13) is bounded by where we used Lemma 2.3 to estimate E|Y | and relation (a) above. Here, and in the rest of the tightness proof, C denotes (possibly different) constants depending only on the sequence (X n ). Arguing similarly, but splitting the four-term product in (3.13) after the third term, we get the same bound, except that ψ(r ) gets replaced by ψ(r ), where r = − k. Thus, the absolute value of the expression in (3.13) is at most Fixing the pair (i, ) and summing for ( j, k) means summing for (r, r ), and since ∞ n=1 ψ(n) 1/2 < ∞ and the pair (i, ) can be chosen by at most (nt − nt 1 ) 2 different ways, it follows that the contribution of all terms (3.13) with i < j < k < is at most using (1.13) and d/n → 0. The contribution of terms (3.13) where i, j, k, are different, but their order is different can be estimated similarly.
Next, we consider the case when i = j = k = . In this case, the expression i ) 2 , which by the estimate in (d) above is at most C A −4 n (n/d)(s −s 1 )(s 2 −s). Since the number of choices for i is nt −nt 1 ≤ (nt −nt 1 ) 2 , the contribution of all such expressions is bounded by using again (1.13) and d/n → 0. Assume now that among i, j, k, , there are two different ones, i.e., these numbers are pairwise equal or three are equal and the fourth is different. Starting with the case of two pairs, assume, e.g., that i = j and k = l, but i = k. In this case, the expression k ) 2 which, in view of Lemma 2.3 and the estimate in (b) above is at most Since the number of choices for the pair (i, k) is at most (nt − nt 1 ) 2 , using (1.13) it follows that the total contribution of all such terms (3.13) is at most If i = k, j = l and i = j, then the expression (3.13) becomes A −4 n EY (1) i Y (2) i Y (1) j Y (2) j , which by Lemma 2.3 and the estimate in (a) above is bounded by Since the number of pairs (i, j) is ≤ (nt − nt 1 ) 2 , the contribution of such terms is at most Assume now that from the indices i, j, k, l three are equal and the fourth one is different. Letting, e.g., i = j = k and i = , the expression (3.13) becomes (2) which is, by Lemma 2.3 and the estimates (a) and (c) above is bounded by Since the number of pairs (i, ) is ≤ (nt − nt 1 ) 2 , the total contribution of such terms is at most Cμ(B 11 )μ(B 12 ). Finally, if the number of different indices among i, j, k, l is 3, e.g., if i = j < k < , then the expression (3.13) k Y (2) , which by using EY (2) = 0, Lemma 2.2, Lemma 2.3 and estimates (a) and (b) above, can be estimated by where r = − k. Since for fixed r the number of triples (i, k, ) with − k = r is at most (nt − nt 1 ) 2 , the contribution of such terms (3.13) is at most C A −4 n ψ(r )(n/d)(s − s 1 )(s 2 − s) 2 (nt − nt 1 ) 2 ≤ Cψ(r )(s − s 1 )(s 2 − s)(t − t 1 ) 2 and summing for r we get again ≤ Cμ(B 11 )μ(B 12 ). The other cases (e.g., i < j = k < ) can be treated similarly, and the proof of tightness in Theorem 1.3 is completed. This also completes the proof of the theorem. We prove now, as claimed after Theorem 1.3 that Thus, for fixed t > 1 and n large, with probability tending to 1 the number of X k 's, 1 ≤ k ≤ n exceeding tn/d is smaller than d and thus η d,n ≤ tn/d. Similarly, for t < 1 and n large, with probability tending to 1 we have η d,n ≥ tn/d, and thus, (3.15) is proved.
Proof of Remark 1.1 Let (X n ) be a sequence satisfying the assumptions of Theorem 1.2, and put X n = X n + 4 −n . Letting η d,n denote the d-th largest of X 1 , . . . , X n and S (r ) n and S (r ) n denote the sums n k=1 X k , n k=1 X k after removing their r largest terms, it is easily seen that |S (r ) n − S (r ) n | ≤ 2 for any r ≥ 1 (3.16) and n|m(η d,n ) − m(η d,n )| = O P (1). (3.17) Clearly, relation (1.12) will fail for the perturbed sequence (X n ), but as inspection shows, all the lemmas in the proof of Theorem 1.2 and the subsequent arguments remain valid, so conclusion (1.14) of the theorem remains valid if we replace X i by X i and η d,n by η d,n . Since the X n are integer valued, with probability one, all the X j , j = 1, 2, . . . are different, and thus, the sum of the X j 's, 1 ≤ j ≤ n not exceeding η d,n equals S