Convergence of series of dilated functions and spectral norms of GCD matrices

We establish a connection between the $L^2$ norm of sums of dilated functions whose $j$th Fourier coefficients are $\mathcal{O}(j^{-\alpha})$ for some $\alpha \in (1/2,1)$, and the spectral norms of certain greatest common divisor (GCD) matrices. Utilizing recent bounds for these spectral norms, we obtain sharp conditions for the convergence in $L^2$ and for the almost everywhere convergence of series of dilated functions.


INTRODUCTION
Carleson's theorem [11] states that the series are convergent for almost every x in [0, 1] provided that the sequence of coefficients (c k ) k≥1 (assumed to be real) satisfies (2) ∞ k=1 c 2 k < ∞.
By orthogonality, condition (2) is also necessary and sufficient for the L 2 norm convergence of the two series in (1). A much studied problem is what happens with the convergence in either sense if the functions sin 2πx and cos 2πx are replaced by more general periodic functions. More precisely, the question is what we can say about the convergence of the series when f : R → R is a measurable function satisfying 2010 Mathematics Subject Classification. 42A16, 42A20, 42A61, 42B05, 11A05, 15A18, 26A45. The first author is supported by a Schrödinger scholarship of the Austrian Research Foundation (FWF). The second author is supported by FWF Grant P 24302-N18 and OTKA grant K 108615. The third author is supported by the Research Council of Norway grant 227768. This paper was initiated while three of the authors (Berkes, Seip, Weber) participated in the research program Operator Related Function Theory and Time-Frequency Analysis at the Centre for Advanced Study at the Norwegian Academy of Science and Letters in Oslo during 2012-2013. 1 In general, (2) will not be a sufficient condition either for convergence in L 2 or for almost everywhere convergence of (3), and the problem is to find alternate conditions on the coefficients (c k ) k≥1 when f belongs to a prescribed class of functions. For a survey of existing results in this direction and recent results we refer to [2,6]. For a recent survey on Carleson's theorem, see [24].
In this paper, we will be interested in the case when f belongs to the class C α for α > 1/2, i.e. when the Fourier series of f is of the form as j → ∞.
The important limiting case α = 1 is essentially covered by the results of [2] (see Section 3 for details). We will now extend the methods of [2] to cover also the range 1/2 < α < 1 and will give sharp conditions for the L 2 convergence and the almost everywhere convergence of (3) as well as of the related series where (n k ) k≥1 is a sequence of distinct positive integers.
Problems concerning the convergence of (3) or (5) can be traced back to Riemann's Habilitationsschrift (1852). They exhibit profound interrelations between various parts of analysis and number theory, as illustrated by the following list of important contributions: classical formulas of Franel and Landau connecting the convergence theory of (3) and (5) to sums of greatest common divisors (GCD sums); their generalization to the Hurwitz zeta function due to Mikolás; the work of Koksma, Erdős, Gál, LeVeque, and others in Diophantine approximation and uniform distribution theory; the results of Dyer and Harman in the context of the Duffin-Schaeffer conjecture in metric Diophantine approximation; upper and lower bounds for GCD sums obtained by the authors of the present paper; and problems concerning the magnitude of the largest eigenvalue of GCD matrices, which were studied by Wintner, by Lindqvist and Seip (in the context of questions about Riesz bases), and by Hilberdink (in the context of the Riemann zeta function). Basic work on the convergence and divergence of dilated series and their relation to lacunary series was done by Gaposhkin, Nikishin, Philipp, and Kaufman, just to mention a few.
In view of this multitude of connections, we have found it appropriate to give a fairly detailed presentation of those ideas and lines of research that are most relevant for our particular problem. To this end, following the statement of our three main theorems in the next section, Section 3 gives an extensive survey of relevant background material. Section 4 contains auxiliary results, and the proofs are given in Section 5.

RESULTS
Throughout this paper we write K ,K , K 1 , K 2 , . . . for appropriate positive constants, not always the same, which only depend (at most) on α and f . We will use the Vinogradov symbols "≪" and "≫" in the same sense. Throughout this paper, we assume that (c k ) 1≤k≤N and (c k ) k≥1 denote sequences of real numbers and that (n k ) 1≤k≤N and (n k ) k≥1 denote sequences of distinct positive integers. For notational convenience, throughout this paper we will read log x as max {1, log x}; in particular, this implies that iterated logarithms are defined and non-zero.
Theorem 2. Assume that f ∈ C α for some α ∈ (1/2, 1). Then the series (5) is convergent in L 2 norm and almost everywhere if Conversely, for every α ∈ (1/2, 1) there exist a function f ∈ C α , a sequence (c k ) k≥1 , a sequence (n k ) k≥1 , and a constantK =K (α) such that (7) holds with K replaced byK , but the series (5) is not convergent in L 2 norm and is divergent almost everywhere.
Theorem 1 improves results of Brémont [10], who proved that (3) is convergent in L 2 norm and almost everywhere provided Brémont also proved that there exists a sequence (c k ) k≥1 satisfying (2) such that the series (3) does not converge in L 2 norm and is almost everywhere divergent.
As the second part of Theorem 2 shows, condition (7) is optimal both for convergence in L 2 and almost everywhere convergence, except for the precise value of the constant, thus providing a nearly complete solution of the problem of norm convergence and almost everywhere convergence of series of the form (5). In Theorem 1, we claim the optimality of condition (6) only for the norm convergence of (3); we do not know whether (6) is optimal also for almost everywhere convergence. However, we know that, in general, condition (2) is not sufficient for the almost everywhere convergence of the series (3). This follows from our proof of the optimality of the convergence condition in Theorem 2 for almost everywhere convergence of (5).
In fact, for the proof of the optimality of Theorem 2 for given α ∈ (1/2, 1) and an appropriate function f ∈ C α , we construct sequences (c k ) k≥1 and (n k ) k≥1 such that condition (7) holds for a certain value of K , but the series (5) is almost everywhere divergent. The proof reveals that n k is of asymptotic order at most R k log k for some constant R = R(α). Consequently, setting d n k = c k when n = n k and d n = 0 otherwise, we see that ∞ n=1 d n f (nx) is divergent almost everywhere, but for some (sufficiently small) positive constantK . Hence, in the condition for almost everywhere convergence in Theorem 1, a Weyl factor of order at least exp K (log logk) 1−α (log loglog k) α is necessary. This leaves a rather large gap in comparison to the Weyl factor in (6).
As noted, Theorem 1 gives an optimal condition for the problem of L 2 convergence of series of the form (3). More precisely, this statement is true as long as one requests the Weyl multiplier to be a "simple", slowly varying function. On the other hand, the situation is totally different if one allows the Weyl mutiplier ψ(k) to depend on number-theoretic properties of k and to be strongly fluctuating as k increases. In this sense, Theorem 1 may be said to conceal the arithmetical nature of our problem. To state the next result, we introduce the divisor function We will prove the following result.
Theorem 3. Assume that f ∈ C α for some α ∈ (1/2, 1). Assume also that On the other hand, for every α ∈ (1/2, 1) and every 0 < β < 1 there exist a function f ∈ C α and a real sequence (c k ) k≥1 such that In Berkes and Weber [5] it is proved that implies the convergence in L 2 norm and almost everywhere convergence of (3). Despite the similarity of (8) and (10), there is a crucial difference between the corresponding convergence statements. Clearly, for every s > 0 we have showing that the average value of the function σ −s (k) is ∞ d=1 d −1−s < ∞. This implies that given any function ω(k) → ∞, the asymptotic density of the set {k : σ −s (k) ≤ ω(k)} is 1 and thus for α > 1/2 and sufficiently small ε > 0, the Weyl factor σ 1−2α+ε (k) in (8) is of order O (ω(k)) for "most" k. Thus, despite the optimality of the condition in Theorem 1, for most k the much smaller Weyl factor ω(k) suffices for the norm convergence of ∞ k=1 c k f (kx). This effect will be apparent from the proofs of the divergence results in Theorems 1-3. The construction of (c k ) k≥1 and (n k ) k≥1 in the examples of divergence uses, roughly speaking, the eigenvectors of suitable GCD matrices belonging to the maximal eigenvalue, which, as is seen from [2] and [16], are concentrated on indices k with many small prime factors. These are also the indices k where the divisor functions σ −s (k) are large: as Gronwall [15] showed, and σ −s (k) reaches the order of magnitude on the right hand side along the sequence k r = p 1 · · · p r , r = 1, 2, . . ., where (p r ) r ≥1 is the sequence of primes. There is a gap between (8) and (9), and the problem of finding the optimal arithmetic function required for the L 2 norm convergence of (3) remains open.
As mentioned in the introduction, the case α = 1 is essentially covered by the results of [2]. We refer here to [2,Theorem 3], concerning the almost everywhere convergence of (5) for functions f of bounded variation. The only property used in the proof of that result is that a function of bounded variation belongs to C 1 . It therefore follows from [2, Theorem 3] that (5) is almost everywhere convergent when f ∈ C 1 provided for some γ > 4 (under the additional assumption that (n k ) k≥1 is strictly increasing). Moreover, it was proved in [2, Theorem 7] that this statement becomes false for γ < 2. Since the series (3) is a special case of (5), the series (3) is also almost everywhere convergent for all f ∈ C 1 if (12) holds for some γ > 4. Concerning L 2 convergence, using [2, Lemma 4] it can be shown that the series (5) is convergent in L 2 norm for all f ∈ C 1 provided (12) holds for some γ > 4, and by the results in [13] this statement becomes false for γ < 2. Moreover, using the results from [16] it is possible to show that that the series (3) is convergent in L 2 norm for all f ∈ C 1 provided (12) holds for some γ > 2, and this statement also becomes false for γ < 2. Thus the problem of L 2 convergence and almost everywhere convergence of the series (3) and (5) is solved, up to powers of (log log k) in the extra convergence conditions. The problem of norm and almost everywhere convergence of (3) when (4) is our only assumption on f , is considerably harder. The reason for the difficulties is that while for f ∈ C α we have for some constant K > 0, for general f satisfying (4) the integral in (13) depends on k, ℓ and the Fourier coefficients of f in a rather complicated way and the arithmetic machinery involving GCD sums and eigenvalues of GCD matrices used in the proof of our theorems breaks down. Assuming that the complex Fourier coefficients a j of f satisfy |a j | ≤ φ( j ), where the positive function φ has the homogeneity property |φ( j k)| ≪ k −γ φ( j ) for some γ > 0, much of what is developed in the present paper will carry over to this situation. Estimates as those found in [8] could then, for instance, be used to obtain fairly sharp analogues of Theorem 1 and Theorem 2 for the considered function classes.
In case of arithmetic criteria like in Theorem 3, Berkes and Weber [7] proved that if f satisfies (4) with complex Fourier coefficients a j , then the series (3) converges almost everywhere provided where the arithmetic function ψ is defined by Note that the arithmetic function ψ in (16) is larger than the one in (8), which is of course to be expected. Note also that if j −γ |a j | is non-increasing for some γ > 0, then in (14) we can choose The same criterion holds if f satisfies a Hölder continuity condition, see [5,31]. These remarks show again the strong arithmetic character of our convergence problem. In [7] it is also shown that except the factor (log k) 2 , condition (14) is optimal. However, just like in Theorem 3, the arithmetic criterion (14) is not as sharp as those in Theorems 1 and 2.
Note that if (3) converges almost everywhere for c k = 1/k, then by the Kronecker lemma we have and thus the almost everywhere convergence problem of (3) under (4) is closely connected with the classical problem of the convergence of averages in (17). Khinchin [19] conjectured that under (4) (even without the third condition) the convergence relation (17) holds. This conjecture was disproved nearly 50 years later by a famous counterexample of Marstrand [26].
In the positive direction, Koksma [22] proved that (17) holds provided the complex Fourier coefficients a j of f satisfy Bourgain [9] gave a new, much simplified counterexample to Khinchin's conjecture and claimed, without proof, that Koksma's criterion is essentially optimal. This claim was proved recently by Berkes and Weber [7]. Thus while the almost everywhere convergence problem for (3) under (4) remains open, the closely related problem of almost everywhere convergence of averages (17) is essentially settled.

THE ROLE OF GCD MATRICES AND CERTAIN EXTREMAL FUNCTIONS IN C α
We will now review the key ideas used in both [2] and the present paper. We begin by introducing the special functions f α (x) andf α (x) in C α defined by Informally speaking, these functions are extremal in C α in the sense that their Fourier coefficients are of maximal size. Furthermore, all Fourier coefficients are positive, which makes it relatively easy to obtain lower bounds for L 2 norms of sums of dilated functions.
When α = 1, the first series in (18) is the Fourier series of the function where {·} denotes fractional part. This means that, up to multiplication by a constant, f 1 is the first Bernoulli polynomial on [0, 1], extended with period one. Convergence problems for (3) and (5) have been investigated extensively for f = f 1 , starting probably with Riemann's Habilitationsschrift of 1852. Such series have been called Davenport series in honor of Harold Davenport, who was the first to study them in this general form [12]. See [17] for a survey on the history of the subject and several results on the convergence problem for series involving this function. Convergence problems for Davenport series have an interesting connection with fractal geometry, see for example [18].
The convergence problem for series involving the function f 1 is connected with sums involving greatest common divisors through the formula for positive integers k, ℓ, which was first stated by Franel and formally proved by Landau in 1924. Consequently we have But much more is true since the Fourier coefficients of f 1 are positive and maximal: By an observation of Koksma [21] we have The relation between L 2 norms of sums of dilated functions and sums involving greatest common divisors extends to the classes C α for 1/2 < α < 1. This was first observed by Mikolás [27], who proved that for the Hurwitz zeta function ζ(1 − α, ·) we have for positive integers k, ℓ and for α > 1/2. Hurwitz's formula states that for α > 1 and x ∈ [0, 1] we have (see for example [20] for a simple proof), which implies that is a function whose Fourier coefficients are precisely of asymptotic order j −α , and in particular ζ(1 − α, x) ∈ C α . As Mikolás showed, (23) continues to hold for α > 1/2 and 0 < x < 1, which leads to (22) by the orthogonality of the trigonometric system. By the same argument as for the case α = 1, we get that for every function f in C α (see Lemma 1 below). For the special function f α (x) from (18) we get as will also be established in Lemma 1 below.
Our two estimates (21) and (24), as well as the two identities (20) and (25), show that to understand the convergence of (3) and (5) Now let G (α) N be the N × N matrix with entries g kℓ given by It is a well-known fact that both these matrices are positive definite (see e.g. [25]). Thus for the largest eigenvalue Λ(G (α) and for the largest eigenvalue Λ(H (α) Consequently, by (24) and (25), the problem of finding upper and lower bounds for the largest eigenvalue (or the square-root of the spectral norm) of G (α) N and H (α) N is precisely the same as that of finding general upper bounds for respectively (29)  The problem of calculating the largest eigenvalue Λ(G (α) N ) of G (α) N , and accordingly the problem of estimating the integral on the left-hand side of (29), was solved by Hilberdink [16], who proved that (30) Λ G (α) In (31) the constants K depends on α, and (31) is optimal except for the precise value of K .
For H (α) N , in Lemma 4 and Theorem 5 of [2] it was shown that where the constant K depends on α. Here (33) is optimal except for the precise value of the constant K , but it remains a profound problem to decide whether the exponent 4 of log log N on the right-hand side of (32) is optimal. By a classical theorem of Gál [13], it is known that this exponent can not be smaller than 2.
As noted above, the results (30)-(33) imply corresponding upper bounds for the integrals in (29) when f ∈ C α , and the optimality of (30), (31) and (33) implies corresponding lower bounds for the integrals in (29) in the special case when f = f α ; this is the reason why the exponential factor from (31) appears and Theorem 1, and that from (33) appears in Theorem 2. When comparing the bounds for the largest eigenvalues of G (α) N and H (α) N , respectively, we note that in the case α = 1 there is an additional factor (loglog N ) 2 in (32) as compared with (30). As mentioned above, this extra factor possibly can be avoided since we do not know whether (32) is optimal. In the case 1/2 < α < 1 there is a difference between the denominator in the exponential terms in (31) and (33), respectively, which is log log N in the one case and (log log N ) α in the other case. Since both results are optimal, this shows that there really is a significant difference between the spectral norms of G (α) N and of H (α) N , and accordingly also a difference between the convergence problems for (3) and (5). In [16], a connection is established between the spectral norm of G (α) N and the maximal order of magnitude of the Riemann zeta-function along vertical lines, using Soundararajan's "resonance method" from [30]. However, Hilberdink's results cannot reach the stronger lower bounds of Montgomery [28], which in turn bear a striking resemblance to the bounds for the spectral norm of H (α) N in [2].
We close this section by making an observation on our extremal functions f α andf α in (18) that will be needed in the sequel. We note first that they are, up to normalization, the even and odd parts of the Hurwitz zeta function. In fact, from the Fourier series representation in (23) it is easily seen that and These representations can be used to describe the rate with which f α (x) andf α (x) tend to infinity as x → 0. Mikolás proved that for fixed α ∈ (1/2, 1) we have (this is equation (12) in [27]). Consequently, since lim In particular this implies that which will be a crucial ingredient in the proof of the necessary condition for almost everywhere convergence of (5). More precisely, (36) implies that for any α ∈ (1/2, 1) the function f α is in L 2+δ for some δ = δ(α) > 0, which will allow us to apply Lyapunov's central limit theorem (which requires the existence of an absolute moment of order 2 + δ for some δ > 0). Similar results hold if f α is replaced byf α .

AUXILIARY RESULTS
In the sequel, we use the notation · for the L 2 (0, 1) norm. Throughout the rest of this paper, we will always assume that α ∈ (1/2, 1).
For the particular function f α from (18) we have Note that as a special case of Lemma 1 we have Proof of Lemma 1. The argument needed for the proof of Lemma 1 is a simple generalization of the arguments leading to (20) and (21), respectively. We write assuming, to shorten formulas, that f is an odd function; the proof in the general case is exactly the same. Then, by the orthogonality of the trigonometrical system, for arbitrary positive integers m, n we have a j m/ gcd(m,n) a j n/ gcd(m,n) (40) In (39), we used the fact that j 1 m = j 2 n holds if and only if j 1 = j n/ gcd(m, n) and j 2 = j m/ gcd(m, n) for some positive integer j . Applying this inequality for all pairs (n k , n ℓ ) gives the first part of the lemma.
In the case f = f α we have a j = j −α , j ≥ 1. Inserting this into (40) we get Again we obtain the desired result by summing over all pairs (n k , n ℓ ).

Lemma 2.
Assume that f ∈ C α . There exist constants K 1 , We can choose K 1 , K 2 such that By Lemma 1 and (27) and (28), the estimates in Lemma 2 follow from corresponding upper bounds for the largest eigenvalues of the matrices G (α) N and H (α) N , respectively, which were already stated in (31) and (33). The given value for K 1 is a coarse estimate for that given in a more precise form in the proof of [16,Theorem 2.3] and at the end of [16,Section 3]; the value for K 2 is obtained by using the method of the recent paper [8], which improves in a significant way the arguments from [2].
Using the same method as in the proof of the Rademacher-Menshov inequality, we easily obtain the following lemma, which is a maximal version of Lemma 2. Note that the proof of the Rademacher-Menshov inequality gives an additional logarithmic factor, which however in our case can be included in the exponential term if we slightly increase the value of the constants.
We can choose K 1 , K 2 such that
For the formulation of the following lemma we note that the unit interval, equipped with Borel sets and Lebesgue measure, is a probability space. Throughout the rest of this paper, we will use the symbols P and E with respect to this probability space. The following lemma is a variant of [2, Lemma 5]. We write log 2 for the dyadic logarithm.
Lemma 5. For given α ∈ (1/2, 1), let η = 12/(2α−1) and let 1 ≤ Then there exist independent random variables Y 1 , Y 2 , . . . on the probability space ((0, 1), B, P) such that EY i = 0 and . Assume that f ∈ C α . Let k ≥ 1 be a positive integer, and write g (x) = f (kx). Then for any integer m ≥ k we have Proof of Lemma 5: Let F i denote the σ-field generated by the dyadic intervals and set Then we clearly have Eξ k = 0, which implies EY i = 0. By Lemma 6 and (41) we have for every which implies that Since by assumption every k ∈ ∆ i +1 is a multiple of 2 S i +1 , each interval U j in (42) is a period interval of f α (kx) for all k ∈ ∆ i +1 , and consequently also for ξ k for all k ∈ ∆ i +1 . Consequently Y i +1 is independent of the σ-field F i . Since F 1 ⊂ F 2 ⊂ . . . and since Y i is F i -measurable, the random variables Y 1 , Y 2 , . . . are independent.
The following lemma is a simple consequence of [16, Proposition 3.1], from which it can be deduced in the same way as relation (3.2) of [16].
where b k are defined by

PROOFS
Proof of the convergence part of Theorem 1: Throughout this proof, we will write K 1 for the constant in the statement of Theorem 1, and K 2 for the constant in the statement of the first part of Lemma 2. Note that we can assume that K 1 > K 2 . Relation (6) implies that e m+1 k=e m +1 which also implies that e m+1 k=e m +1 Consequently by Lemma 2 we have, for any M, N satisfying e m < M < N < e m+1 , for some ε > 0, since K 1 > K 2 . For given M < N , letm denote the integer for which M ∈ (em, em +1 ], andn the integer for which N ∈ (en, en +1 ]. Ifm =n, then by (43) we have wherem is defined as before. Again the right-hand side can be made arbitrarily small if M is assumed to be sufficiently large. Thus the monotone convergence theorem and Lemma 4 imply that the series ∞ k=1 c k f (kx) is almost everywhere convergent.
Proof of the optimality of Theorem 1: For given α ∈ (1/2, 1), we will show that there exists a sequence (c k ) k≥1 satisfying (6) for a "small" value of K , for which for the function f (x) = f α (x) from (18) the series ∞ k=1 c k f α (kx) is divergent in L 2 . We will construct (c k ) k≥1 such that it is supported on a set of indices which have a small number of prime factors; this idea already appears in [2,13,16] and other places. However, there it is only used to construct a finite sequence, whereas in the present case we have to construct an infinite sequence. Note that by (22), (34) and (35) the L 2 norm of sums of dilated functions f α (x),f α (x) and ζ (1−α, x) is the same, up to multiplication with a constant, and consequently we could also use the functions We write (p r ) r ≥1 for the sequences of primes in increasing order. We define sets ∆ i in the following way: for given i ≥ 1, the set ∆ i contains those positive integers which are of the form By construction the sets ∆ i , i ≥ 1, are disjoint (since all numbers in ∆ i are multiples of either 2 2i or 2 2i +1 , but not of 2 2i +2 ). Note that the number of elements of ∆ i is 2 i .
By the prime number theorem for all sufficiently large i for all k ∈ ∆ i we have and consequently for sufficiently large i and for all k ∈ ∆ i Thus for i ≥ 1 for all k ∈ ∆ i we have Using the second part of Lemma 1 and the facts that f α has only positive Fourier coefficients and that all coefficients c k are non-negative, we have By the structure of the set ∆ i for any fixed k ∈ ∆ i we have (an argument of this type already appears in Gál's paper [13]). By the prime number theorem we have Combining (46), (47) and (48) we get Note that (1 − ε) − η(1 + ε) = ε, and thus the series on the right-hand side of (49) is divergent. Consequently the series ∞ k=1 c k f α (kx) is divergent in L 2 , although (c k ) k≥1 satisfies the extra convergence condition (6) for K = η/(1 − α). Note that by choosing ε small, η can be moved arbitrarily close to 1. This proves the optimality of Theorem 1, apart from the precise optimal value of the constant K in (6).

Proof of the convergence part of Theorem 2:
The proof of the convergence part of Theorem 2 can be given in exactly the same way as the proof of the convergence part of Theorem 1 above, using the second part of Lemma 2 and 3 instead of the first part, respectively.
Proof of the optimality of Theorem 2: The optimality of condition (7) in the case of L 2 convergence can be shown in a similar way as the optimality of condition (6) in Theorem 1. Again we construct a set of integers which is composed of a relatively small number of prime factors. In particular, again we will use an equality similar to (48), which allows a precise computation of the corresponding GCD sum. Again we choose f = f α , but as in the proof of the optimality of Theorem 1 we could also use the functionsf α or ζ(1 − α, ·) instead. The main difference between the present case and the proof of Theorem 1 is the fact that we can make the sequence (n k ) k≥1 grow as fast as we wish. Together with the well-established principle that lacunary sequences of functions show almost independent behavior, this is the reason why for Theorem 2 we can also prove optimality with respect to almost everywhere convergence (which was not possible for Theorem 1).
First we recall that f α ∈ L p (0, 1) for p < (1 − α) −1 , which was established in (36). Thus we can choose δ ∈ (0, 1) such that 2+δ < (1−α) −1 . Furthermore, we can find a number β ∈ (0, 1) which satisfies For this number β we have Let (p r ) r ≥1 denote the sequence of primes in increasing order. We set A(1) = 1 and Here, and in the sequel, log 2 denotes the logarithm in base 2. We define the numbers S i and T i recursively in the following way: Then obviously the numbers (S i ) i ≥1 and (T i ) i ≥1 satisfy the conditions of Lemma 5. For i ≥ 1, we define ∆ i as the set of all numbers k of the form Then clearly all elements of ∆ i are divisible by 2 S i , and ∆ i ⊂ [2 S i , 2 T i ]; that is, the sets ∆ i also satisfy the assumptions of Lemma 5. Let (n k ) k≥1 denote the sequence consisting of the elements of i ≥1 ∆ i , sorted in increasing order. Note that by definition we have Furthermore we define sets of integers Γ i , i ≥ 1, such that Then (Γ i ) i ≥1 is a decomposition of N. Let K 1 denote a "small" constant with a value to be determined later. For every k ≥ 1 there is an i such that k ∈ Γ i , and we define Note that the value of c k only depends on the index i for which k ∈ Γ i . Thus we can also define numbers (d i ) i ≥1 such that Since the series in (52) is convergent, the same holds for the series on the left-hand side of (51). Furthermore, since for k ∈ Γ i we have k ≪ i β+1 , the convergence of the left-hand side of (51) implies that there exists a positive constant K 2 (depending on As in the lines following (48) we get k,ℓ∈Γ i (gcd(n k , n ℓ )) 2α (n k n ℓ ) α = k,ℓ∈∆ i for some positive constant K 3 . Together with the second part of Lemma 1 this implies Since all coefficients (c k ) k≥1 are non-negative we have Combining this with (54) we arrive at We can assume that K 1 was chosen so small that K 1 < K 3 . Then since the right-hand side of (55) is divergent, the series ∞ k=1 c k f α (n k x) is divergent in L 2 . This proves the optimality of Theorem 2 for L 2 convergence (except for the exact value of the constant K in the extra divergence condition).
To show that Theorem 2 is also optimal with respect to almost everywhere convergence, we apply Lemma 5. As noted before, Lemma 5 can be used for S i , T i , ∆ i as defined above. Consequently there exist independent random variables Y 1 , Y 2 , . . . on ((0, 1), B, P) such that The proof of Lemma 5 shows that the random variables Y i are constructed as the conditional expectation of k∈∆ i f α (n k x) with respect to some appropriate σ-fields. Thus the conditional form of Jensen's inequality (see for example [23,Theorem 13.3]) implies that We have chosen δ in such a way that f α ∈ L 2+δ (0, 1). Thus by Minkowski's inequality we have which together with (57) implies On the other hand, by (54) and (56) we have where K 4 := K 3 − K 1 is a positive constant (again we assume that K 1 was chosen sufficiently small). Let and F M (t ) = P x ∈ (0, 1) : By This proves the optimality of Theorem 2 for almost everywhere convergence.
We note that a more detailed analysis shows that a possible choice for the constant K 1 , and accordingly also for the constantK (α) in the statement of Theorem 2, is Let ε > 0 be so small that 1−2α+ε < 0, and that (8) where the inner sum is extended for all k of the form k = j d , j = 1, 2, . . .. But σ( j d ) ≤ σ(d )σ( j ) and thus the inner sum in (63) is bounded by where we used the fact that σ( j ) ≤ d ( j ) = O( j η ) for any η > 0. Substituting this into (63), we get, together with (61), that To prove the second part of Theorem 3, let α ∈ (1/2, 1), 0 < β < 1, and choose δ > 0 so small that β(1 + δ) < 1. Then by the second statement of Theorem 1 there exist a function f ∈ C α and a sequence (c k ) k≥1 such that (65) ∞ k=1 c 2 k exp β(1 + δ) 1 − α (log k) 1−α log logk < ∞ but the series (3) does not converge in L 2 norm. In view of (11), the terms of the sum in (9) are smaller than those of (65) for sufficiently large k and thus the sum (9) converges, proving the second half of Theorem 3.