Maximum entropy formalism for the analytic continuation of matrix-valued Green's functions

We present a generalization of the maximum entropy method to the analytic continuation of matrix-valued Green's functions. To treat off-diagonal elements correctly based on Bayesian probability theory, the entropy term has to be extended for spectral functions that are possibly negative in some frequency ranges. In that way, all matrix elements of the Green's function matrix can be analytically continued; we introduce a computationally cheap element-wise method for this purpose. However, this method cannot ensure important constraints on the mathematical properties of the resulting spectral functions, namely positive semidefiniteness and Hermiticity. To improve on this, we present a full matrix formalism, where all matrix elements are treated simultaneously. We show the capabilities of these methods using insulating and metallic dynamical mean-field theory (DMFT) Green's functions as test cases. Finally, we apply the methods to realistic material calculations for LaTiO$_3$, where off-diagonal matrix elements in the Green's function appear due to the distorted crystal structure.


I. INTRODUCTION
In condensed matter physics, response functions are often calculated in imaginary-time formulation, especially when electronic correlations are taken into account.This is not only true for numerical approaches like quantum Monte Carlo [1][2][3], but also for perturbative techniques such as the random phase approximation [4][5][6].However, these quantities cannot be directly related to measurable quantities in real frequency.Quite generally, the Wick rotation iτ → t, where τ is the imaginary-time argument and t is the real-time argument (or equivalently iω n → ω, with the nth fermionic Matsubara frequency ω n = (2n + 1)π/β and the real frequency ω), transforms the calculated quantities to real frequencies.In practice, this analytic continuation (AC) is not possible straightforwardly, since the kernel of this mapping is illconditioned when going from imaginary times to real frequencies.As a result of the kernel being ill-conditioned, small changes of the input will correspond to largely different outputs, rendering the inversion of this problem highly unstable due to numerical noise, where even an error at the level of machine precision can lead to nonsensical results in practice.
This fact has lead to the development of a plethora of different methods trying to efficiently perform the AC.Among them are series expansions (e.g., the Padé method [7][8][9]), information-theoretical approaches such as the maximum entropy method (MEM) [10][11][12][13] and stochastic methods [14][15][16][17][18].Other algorithms based on singular value decomposition (SVD) [19], machine learning [20] or sparse modeling [21] tackling the AC have also been presented.Despite all those interesting other developments, the workhorse method for the AC of noisy Monte Carlo data is the MEM.
The methods based on the MEM are well established for the diagonal elements of the Green's function, where the corresponding spectral function can be interpreted as a probability distribution (non-negative normalizable function).There are several freely available codes performing this task, such as Ωmaxent [22] and the maxent code by Levy et al. [23].
Nowadays, numerical algorithms do also provide imaginary-time solutions for off-diagonal Green's functions, e.g., in the multiorbital DFT + DMFT [24][25][26] context relevant for real-material applications.However, due to the lack of reliable methods for performing the AC of the whole Green's function matrix, still the offdiagonal elements are often neglected on different levels of the calculation.One strategy is to transform the impurity problem to some local basis, where the Hamiltonian and hybridization functions are as diagonal as possible and to neglect the off-diagonal elements in the solution of the impurity problem [27].However, this is an uncontrolled approximation because it is impossible to check the accuracy of this approximation without actually taking the off-diagonal elements into account.
Particularly important is a proper AC for the selfenergy.The full matrix form of the self-energy on the real axis is required in the Dyson equation to calculate lattice (k-dependent) quantities of interest.Without ensuring analytic properties such as positive semidefiniteness of this matrix, the results for quantities such as the k-dependent spectral function A(k, ω) or derived quantities (e.g., transport, optics) are physically questionable.We will present a method to remedy this problem.
For certain cases, the AC of off-diagonal Green's functions has been tackled before: In general, it is possible to construct an auxiliary Green's function by adding a (possibly frequency-dependent) shift to the off-diagonal elements of the spectral function so that their positivity is ensured.Then, they can be treated with the MEM [28][29][30].An example of a work where off-diagonal elements of the impurity spectral function are calculated are the DFT+DMFT calculations of the two perovskites LaVO 3 and YVO 3 [31].Additionally, a stochastic regularization method also suitable for off-diagonal elements has been proposed [32].However, these methods cannot ensure important matrix properties, e.g., positive semidefiniteness and Hermiticity of the spectral function.Additionally, given the probability theoretical background of the MEM [33], it is unclear how the shift method fits into this theoretical framework.
In other disciplines, such as astronomy and nuclear magnetic resonance (NMR), the MEM has been successfully extended to extract data without the constraint of non-negativity [34][35][36][37][38].This generalization is not straightforward, as non-negative functions cannot be directly interpreted as probability distributions.
But even with this generalization, important matrix properties are not respected.The purpose of this paper is, thus, to introduce a consistent matrix formulation of the MEM completely from probability theory.Using the full matrix enables us to consistently formulate the constraint that the resulting spectral functions are indeed positive semidefinite and Hermitian.
The paper is organized as follows: First, we present the probability theoretical background of the continuation of matrix-valued Green's functions and some computational and implementation details in Sec.II.In Sec.III, we perform a benchmark of the MEM and discuss some practical considerations using a DMFT calculation for a model system.Finally, in Sec.IV, we apply the introduced methodology within the framework of DFT+DMFT to the strongly correlated perovskite LaTiO 3 .

A. Basic principles of the maximum entropy method
The retarded one-electron Green's function G(ω +i0 + ) and the Matsubara Green's function G(iω n ) are related through the analyticity of G(z) in the whole complex plane with the exception of the poles below the real axis.This connection is explicit by writing the Green's function G(z) in terms of the spectral function A(ω) as In general, both G(z) and A(ω) are matrix-valued (with indices a, b), but Eq. ( 1) is valid for each matrix element separately.For a given G ab (ω + i0 + ), the matrix-valued A ab (ω) can be obtained as Note that for matrices, the spectral function is not proportional to the element-wise imaginary part of the Green's function.
A drawback of expression (1) is that the real and imaginary parts of G and A are coupled due to the fact that z is complex-valued.This is avoided by Fourier-transforming G(z = iω n ) to the imaginary time Green's function G(τ ) at inverse temperature β; G ab (τ ) = dω e −ωτ 1 + e −ωβ A ab (ω). (3) The real part of the spectral function is only connected to the real part of G(τ ), and analogously for the imaginary part.In the following, we will first recapitulate the maximum entropy theory for a real-valued single-orbital problem as presented in Ref. [11] and later generalize to matrix-valued problems.
In order to handle this problem numerically, the functions G(τ ) and A(ω) in Eq. ( 3) can be discretized to vectors G n = G(τ n ) and A m = A(ω m ); then, Eq. ( 3) can be formulated as where the matrix is the kernel of the transformation.Calculating G(τ ) from A(ω) is straightforward, but the inversion of the matrix equation ( 4), i.e., calculating A via A = K −1 G, is an ill-posed problem.To be more specific, the condition number of K is very large due to the exponential decay of K nm with ω m and τ n , so that the direct inversion of K is numerically not feasible by standard techniques.The task of the AC is to find an approximate spectral function A whose reconstructed Green's function G rec = KA reproduces the main features of the given data G, but does not follow the noise (note that here and in the following we use G and A for the numerical quantities to keep the notation simple).However, a bare minimization of the misfit χ 2 (A) = (KA−G) T C −1 (KA−G), with the covariance matrix C, leads to an uncontrollable error [9].
One efficient way to regularize this ill-posed problem is to add an entropic term S(A).This leads to the maximum entropy method (MEM), where one does not minimize χ 2 (A), but The prefactor of the entropy, usually denoted α, is a hyperparameter that is introduced ad hoc and needs to be specified.The way to choose α marks various flavors of the maximum entropy approach and will be discussed later (Sec.II B).This regularization with an entropy has been put on a rigorous probabilistic footing by Skilling in 1989, using Bayesian methods [33].He showed that the only consistent way to choose the entropy for a nonnegative function A(ω) is where D(ω) is the default model.The default model influences the result in two ways (see Appendix A for details): First, it defines the maximum of the prior distribution, which means that in the limit of large α one has A(ω) → D(ω).Second, it is also related to the width of the distribution, since the variance of the prior distribution is proportional to D(ω).Unless otherwise specified (see, especially, Sec.II E), we use a flat D(ω), corresponding to no prior knowledge.

B. Hyperparameter α
The simplest way to determine α is to choose it such that χ 2 equals the number of τ points [12,39], which is today known as the historic MEM.Usually, it tends to underfit the data [40].Other, more sophisticated ways are delivered by the probabilistic picture of Skilling and Gull [33,41], which are recapitulated in Appendix A. Two frequently used flavors are the classical MEM [41] and the Bryan MEM [42].A disadvantage is that these probabilistic methods tend to overfit the data as the probability is only evaluated approximately in practice (see Appendix A) [43,44].Furthermore, all methods presented so far strongly depend on the provided covariance matrix C. If the statistical error of Monte Carlo measurements, for example, is not estimated accurately, the data could be over-or underfitted.
A rather heuristic approach to overcome these problems is not to consider probabilities, but rather the quality of the reconstruction as a function of α.One way to quantify this is to detect the characteristic kink in the function log χ 2 (log α), which indicates the boundary between the noise-fitting and information-fitting regimes [22].In the noise-fitting region, log χ 2 (log α) is essentially constant, while in the information-fitting region, it behaves linearly.In this approach, the optimal α is at the crossover of these regimes, which can be detected, e.g., through the maximum of the second derivative ∂ 2 log χ 2 /∂(log α) 2 , as implemented the Ωmaxent code [22].
We propose another way, which is to fit a piecewise linear function to log χ 2 (log α), consisting of two straight lines: one for the noise-fitting region (with slope zero) and one for the information-fitting region.The intersection of the two lines, and hence the optimal α, is determined such that the overall fit residual is minimized.This way of determining the optimal α is used throughout the rest of this paper, as it turns out to be stable even in difficult cases where the curvature of log χ 2 (log α) shows multiple local maxima.

C. Positive-negative MEM
For the case of non-matrix-valued or diagonal spectral functions, the MEM described so far became a standard tool used in many different contexts.However, this or-dinary MEM is only rigorous for non-negative, additive functions [33].Nonzero off-diagonal elements of spectral functions clearly violate the non-negativity since their norm dωA ab (ω) = δ ab (8) is zero (this follows directly from the Lehmann representation of the spectral function and the anticommutation relations of fermionic operators).Keeping the additivity, one could imagine that the off-diagonal spectral functions originate from a subtraction of two artificial positive functions, i.e., A(ω) = A + (ω) − A − (ω).Assuming independence of A + (ω) and A − (ω), the resulting entropy is the sum of the respective entropies which was first used for the analysis of NMR spectra [34,35].To illustrate the plausibility of this entropy, we use the analogy of the horde of monkeys, which has a long tradition in the field of Bayesian methods.The conventional entropy can be explained by monkeys randomly throwing balls into slots which correspond to different frequencies ω i on a grid [33].Then, the number of balls in each slot, related to A(ω i ), obeys a Poisson distribution with a mean value given beforehand, related to the default model D(ω i ).From now on, the ω i dependence of A and D is dropped for simplicity.The subtraction of two positive functions A = A + − A − can be understood with two different hordes of monkeys, one throwing "positive" balls and one throwing "negative" balls.Individually, both the number of positive (∼ A + ) and negative (∼ A − ) balls again obey a Poisson distribution.The total number of balls in each slot, however, follows a Skellam distribution, which is the convolution of two Poisson distributions.The entropy S describing this process depends on both A + and A − [see Eq. ( 9)].Due to the independence of A + and A − , S(A + ) and S(A − ) follow the same functional form as the conventional entropy, Eq. ( 7), that stems from the Poisson distribution.Thus, The fact that two default models (D + and D − ) enter will be discussed later.Several configurations of positive and negative balls give the same net number of balls (∼ A), since only their difference A + − A − matters.Hence, an additional superfluous degree of freedom is present once two hordes are acting.Just as the two Poisson distributions of the respective balls lead to a Skellam distribution by integrating out the additional degree of freedom, a reduction of the parameter space from A + and A − to A = A + − A − leads to an entropy S ± (A) that differs from the conventional entropy in Eq. (7).The derivation of S ± (A) was first carried out in the context of cosmic microwave background radiation [36][37][38], an available software package providing this entropy is memsys5 [45].This framework is recapitulated here in the context of spectral functions.
The main objective of the MEM is to minimize Q α as given by Eq. ( 6), but the entropy S depends now on both A + and A − as shown in Eq. (10).The minimum of has to be found with respect to both A + and A − .The misfit χ 2 only depends on the difference A = A + − A − .For any fixed A, the minimum of Q α (A, A + ) is, therefore, realized for the particular choice of A + and A − that maximizes the entropy under the constraint that A new entropy for functions that can be both positive and negative is then obtained by S ± (A) = S (A + (A), A − (A)), which we call positive-negative entropy and which reads [38] The Bayesian probabilistic interpretation of this entropy is described in Appendix A. In the special case D − = 0, the limit of purely positive functions is recovered since then S ± (A) = S(A) [the latter being the conventional entropy from Eq. ( 7)].
Next, we want to compare the conventional entropy, Eq. ( 7), to the positive-negative entropy, Eq. ( 14).This comparison can be performed on the level of the integrand of the expression for the entropy, which we refer to as the entropy density s(A(ω)).At a particular frequency value ω i , the entropy density depends only on the function value of the spectral function A(ω i ) and the default model D(ω i ).We therefore plot the entropy density s depending on the function value A at any given ω i in Fig. 1.The ordinary entropy density s(A) (blue line) is just defined for positive A. Within this definition space, it is concave with a maximum at A = D.The variance of the prior distribution around this maximum is also proportional to D (see Appendix A).In the case of S ± , two default models D + and D − are needed, each determining the maximum of the respective spectral functions A + and A − as well as the accompanying variances.Usually, no additional knowledge about A is available and,  7) and of positive and negative spectral functions s ± (A) compatible to Eq. ( 14).The entropy is related to the entropy density via S(A) = dω s(A(ω)).Here, D + = D − = D was assumed.
therefore, one has to choose D + = D − = D for symmetry reasons (green line in Fig. 1).This is the case for the off-diagonal elements of the spectral functions studied in this work; however, the general case is discussed in Appendix A. For D + = D − = D, the maximum of the entropy is at A = 0, with a prior variance proportional to D. This demonstrates a fundamental difference of the role of the default model in the conventional and in the positive-negative entropy; in the former, the default model defines both the maximum and the variance of the entropy density, while for the latter it only punishes large values of |A|/D.This also means that for α → ∞ the minimization of Q α (A) gives A = 0 for the positive-negative entropy, in contrast to A = D for the conventional entropy.We note in passing that off-diagonal elements of the spectral function can, of course, be complex-valued.Then, the real and imaginary part of A(ω) and, correspondingly, G(τ ) can (in principle) be treated separately.The misfit χ 2 and the entropy S for a complex function can therefore be just summed up to a total χ 2 and S.This straightforward generalization of the method is applicable to this and the following sections, but for simplicity we limit ourselves to real-valued spectral functions.

D. Reduction of the parameter space
In Ref. [42], Bryan presents an algorithm that works in the space of the singular values of K, by means of the singular value decomposition (SVD) [46] When the problem is discretized with N τ points on the τ axis and N ω points on the ω axis, the kernel K is a N τ × N ω matrix (we use 684×200) which gets decomposed into the singular-vector matrices U of dimension N τ ×N Ξ and V of dimension N ω × N Ξ as well as the diagonal matrix Ξ of the singular values.In principle, the number N Ξ of singular values is given by min(N τ , N ω ); however, many singular values are on the order of machine precision.Therefore, in practice all singular values below a small threshold (10 −14 for the results in this paper) can be discarded.This also means that the full vector space of A, where A i = A(ω i ), is larger than necessary (for our calculations, N ω = 200, while N Ξ = 45).One important ingredient for each MEM is the optimization of Q α , Eq. ( 6), for a given value of α.In Bryan's framework, the stationarity condition ∂Q α /∂A = 0 for the conventional entropy reads suggesting that a simple way to parametrize A in the much smaller singular value basis is where u is the new parameter vector of the same dimension as the number of kept singular values.With this parametrization, condition (16) becomes .
The optimization of Q α has thus been reformulated to the problem of finding the vector u that solves Eq. ( 18), which does not explicitly depend on A anymore.This allows us to carry out the numerical solution in the (smaller) space of u instead of A. Ansatz (17) ensures the positivity of A, so for the positive-negative approach a different parametrization has to be found in order to use the advantages of the smaller singular space.By doing a similar derivation for S ± as the one presented above for the conventional entropy, one realizes that due to it is easier to express the equations in terms of A + rather than A. This is possible because of relation (12).Given Eq. ( 19), a suitable parametrization is given by With this, the condition ∂Q α /∂A = 0 becomes .
Note that this expression looks nearly identical to Eq. ( 18).The difference is that A is parametrized in terms of u by Eq. ( 17) in case of non-negative functions and by Eq. ( 22) in the generalized case; this enters Eq. ( 23) via χ 2 (A) on the right-hand side.
In principle, the actual search for a vector u that fulfills Eq. ( 18) or ( 23) can be performed using any suitable numerical procedure; in practice, the Levenberg-Marquardt algorithm is usually (and also in this work) employed [47].This program to minimize Q α can be implemented very similarly for both the standard non-negative and the here-presented positive-negative case, the only changes occur due to the generalization of Eq. ( 17) to Eq. ( 22).

E. Poor man's matrix procedure
As discussed before, Eq. ( 3) works independently for each element of the matrices G(τ ) and A(ω).This allows us to perform the AC separately for each matrix element, using the conventional entropy, Eq. ( 7), for diagonal elements and the modified entropy, Eq. ( 14), for the off-diagonals.
However, for physical systems, the resulting spectral function matrix has to be positive semidefinite and Hermitian, which is usually not the case when performing the AC separately for each matrix element with a flat default model.Using a flat default model reflects the total absence of previous knowledge on the problem.However, we know that a necessary condition for the positive semidefiniteness of the resulting spectral function matrix is For example, for a problem where all diagonal elements of the spectral function are zero at a certain frequency ω, condition (24) implies that also all off-diagonal elements have to be zero at this ω.Thus, once the diagonal elements have been analytically continued, this condition constitutes additional knowledge about the problem which might be incorporated into the MEM framework by choosing the default model for the off-diagonal elements D ll (ω) accordingly, Here, is a small number to prevent the default model from becoming zero, so that no division by zero occurs in the entropy term.We will show in Secs.III and IV that our special choice of the default model ( 25) drastically improves the results of the off-diagonal elements when they are calculated element-wise, although it does not guarantee a positive semidefinite solution.This poor man's matrix approach is especially useful if one wants to upgrade an existing MEM code by only modifying the entropy for off-diagonal elements, as setting the default model is usually a user input.

F. Full matrix formulation
The only way to ensure that the obtained spectral function is indeed positive semidefinite and Hermitian is by treating the matrix A ab as a whole.Instead of Eq. ( 6), the functional to minimize then reads Here, the ordinary entropy, Eq. ( 7), is used for the diagonal elements (a = b), and accordingly the modified entropy, Eq. ( 14), is used for off-diagonal elements.One way to ensure the desired properties of A ab is to introduce an auxiliary matrix B, where A ab = c B * ca B cb .In contrast to the parametrization of the uncoupled A ab described in Sec.II D, there is no obvious singular-space parametrization here, since A ab couples different elements of B. However, as the elements B ab can be positive and negative for both diagonal and off-diagonal elements, in the spirit of Sec.II D we choose Using the resulting parametrization of A ab in terms of the singular-space vectors u, the stationarity condition for Q α from Eq. ( 26) leads to equations which consequently have to be solved for u (for a more detailed discussion, see Appendix B).The fact that the expression for A ab now couples the singular-space parameters u of different matrix elements means that all matrix elements have to be treated at the same time.Consequently, the configuration space grows quadratically with the matrix size d.Concerning the computational cost, the fundamental difference between the poor man's and the full matrix approach is that in the former, one needs to d 2 times find a solution in a configuration space of size N Ξ , while in the latter, one searches a solution once in a configuration space of size N Ξ • d 2 .As typically solver algorithms take disproportionally longer for larger search spaces, this usually leads to a substantial increase of computational time.Nevertheless, the increased computational effort is justified, as it gives the possibility to ensure the desired properties of A, leading to a large improvement of quality.A flat default model is chosen for all matrix elements when the full matrix method is used in this paper.

G. Analytic continuation of the self-energy
One of the central quantities of many-body theory is the self-energy Σ.While some of its properties can be understood from Σ(iω n ), the analytically continued Σ(ω + i0 + ) allows a more straightforward interpretation and the calculation of further physical properties.
We will focus our discussion of the AC of the selfenergy on DMFT [26], where the self-energy is approximated to be k independent and connects the impurity to the lattice problem.For a given (in general, matrixvalued) Σ, the local (matrix-valued) lattice Green's function is The matrix H k is the k-dependent Hamiltonian of the lattice and the inversion has to be understood as matrix inversion.The so-called impurity Weiss field G 0 (z) is obtained from Dyson's equation This G 0 is the input for the impurity solver to calculate the self-energy and the interacting impurity Green's function G imp ; when inserting Σ back into Eq.( 28), the selfconsistency loop can be closed.The DMFT cycle is iterated until convergence is reached, i.e., until The self-energy as a function of real frequency is needed within the framework of DMFT to calculate lattice quantities, e.g., G loc (ω+i0 + ) as defined in Eq. ( 28), k-resolved spectral functions [48,49], Fermi surfaces [48] or optical properties of strongly correlated materials [50].
In contrast to Green's functions, there is no relation equivalent to Eq. ( 1) for self-energies and, hence, one needs to find an appropriate method to perform the AC.There are several ways to do so.One could analytically continue both the DMFT Weiss field G 0 (iω n ) and the interacting impurity Green's function G imp (iω n ) and calculate Σ(ω + i0 + ) via the Dyson equation on the realfrequency axis [51].However, there are two independent analytic continuations involved, and hence, the resulting real-frequency self-energy tends to oscillate heavily and does usually give poor results (see, e.g., Ref. [51]).Another approach is to solve for Σ(ω+i0 + ) in the expression for the (analytically continued) G loc (ω +i0 + ) [28,52,53].
In the following, we give two possible constructions of G aux (z).First, one can use where Σ(i∞) is the constant term of the high-frequency expansion of Σ(iω n ) [51].We note here that the resulting quantity G aux is, technically speaking, not a Green's function, since its off-diagonal elements do not have the correct analytic high-frequency behavior (they should fall off like ∼ 1/(ω n ) 2 , but in Eq. ( 30) they fall off like ∼ 1/ω n ).Second, there is the inversion method, e.g., used in Ref. [55], The constant C is usually set to C = Σ(i∞) + µ with the chemical potential µ.In this work, we choose to use the inversion method.

H. Implementation details
We implement a variation of Bryan's MEM algorithm [42] allowing arbitrary expressions for the entropy with the ability to treat the problem in the full matrix formulation.For the minimum search of Q α we use the Levenberg-Marquardt minimization algorithm [47].The expressions for the step length and the convergence criterion are chosen as in Ref. [42].The spectral function is parametrized in singular space as laid out in Secs.II D and II F and the value of the hyperparameter α is chosen using the piecewise linear fit of log χ 2 (log α) (see Sec. II B).In general, the frequency mesh on which A(ω) is discretized can be freely chosen.In this work, we use a hyperbolic grid, which asymptotically becomes a linear grid for high frequencies but is denser around ω = 0.This allows the use of a smaller overall number of ω points, which speeds up the calculation.However, when calculating the full Green's function from the spectral function according to Eq. ( 1), a small broadening (i0 + ) has to be used.As the hyperbolic grid with a small number of points is not dense enough (we use N ω = 200 points), this small broadening leads to artefacts, which can be avoided by first interpolating the spectral function on a much finer grid (we use a linear mesh with 10 000 points) and then using 2∆ω of the new grid as broadening.
For metallic systems, it is known that the MEM spectra tend to exhibit spurious cusps around the Fermi level [10].This can be prevented by using the preblur formalism [56], where the so-called "hidden spectral function" is blurred via a convolution with a Gaussian function.Within this algorithm, the hidden spectral function is used to calculate the entropy S, but the misfit χ 2 is evaluated from the blurred spectral function.Also, this blurred spectral function is what is taken in the end as the solution of the problem.
The width b of the Gaussian is another hyperparameter that can be chosen similarly to α, e.g., by searching the maximum of the probability p(α, b) or by locating the characteristic kink in log χ 2 (log α, log b).In accordance to the route employed for determining α, first we determine one value of α for every value of b using the fit method described at the end of Sec.II B. Then, we take the curves of log χ 2 at that value of α for the different values of b and fit once more to determine b.

III. TWO-BAND MODEL
As a benchmark system for the presented approach, we investigate an artificial particle-hole symmetric twoband model with semicircular density of states.We set the half-band width to D 1 = 2 for the first band and to D 2 = 1 for the second band.We choose the interaction term in a simple Hubbard-type form H int = i U i n i↑ n i↓ with U i /D i = 3.25.The chemical potential and the onsite energies are chosen such that both bands are halffilled.For the chosen interaction, the system is a Mott insulator with a spectrum consisting of two distinct Hubbard bands separated by an energy U i .We treat the problem with DMFT to obtain an interacting impurity Green's function G imp (τ ) and a self-energy Σ(iω n ) at an inverse temperature of βD 2 = 40.The simplicity of the problem allows the use of iterated perturbation theory (IPT) [26,[57][58][59][60] as impurity solver.The AC of the IPT results are performed using Padé approximants [7,9].Because of the noiseless nature of the IPT data, the Padé approximants give reliable results for this specific problem [26].Additionally we solve our two-band model using the continuous-time hybridization-expansion quantum Monte Carlo solver (triqs/cthyb) [61,62], which is based on the triqs package [63], and perform the AC with the MEM.We perform 8 × 10 6 CTHYB measurements.Of course, more measurements would undoubtedly be beneficial for the AC.Nevertheless, we limit ourselves here to emulate more complicated situations where higher-quality data can only be obtained with a substantial increase in computational effort.Although it would be possible to evaluate the covariance matrix of the Monte Carlo to take into account the correlations of the noise of G(τ ) at different values of τ , for simplicity, we estimate the Monte Carlo noise by manual inspection of the imaginary-time data (5 × 10 −4 in our case), and we assume a diagonal covariance matrix with a constant noise for these (and the following) tests.As we determine α by detecting the characteristic kink in log χ 2 (log α), the procedure is less sensitive to the given error than, e.g., the classic MEM (see Sec. II B).
In the following, we will compare the curves obtained by IPT and Padé with those from CTHYB and MEM as two approaches to tackle this problem.The former suffers from a systematic error as it is a perturbative technique, but yields results without statistical error.The latter, on the other hand, is exact in theory, but will always give noisy Green's functions and, thus, uncertainties after AC.In context of multiorbital DMFT calculations away from half-filling, in many cases quantum Monte Carlo impurity solvers are the only option, making it necessary to analytically continue noisy data.
Some tests benchmarking our implementation of the MEM algorithm and an investigation of the effect of random noise on the data can be found in Appendix C.
In order to model a system with off-diagonal Green's functions and self-energies, we perform a basis transformation for the G imp (τ ) and Σ(iω n ), which come out as diagonal matrices from the impurity solvers.In this work, we simply use a rotation matrix with an angle φ = 0.4 rad, which is representative for the results obtained for other angles.
Figure 2 shows the resulting spectral function for the AC of the rotated G imp (τ ).Using a flat default model, The result from IPT and Padé (black) is shown along with the MEM results.For the latter, we compare the continuation treating the matrix elements independently with a flat default model (dashed green) with the poor man's matrix method, i.e. using a default model incorporating the information from the diagonal elements (blue, only for the off-diagonal elements).Furthermore, the result of the matrix formulation (red) that ensures a positive semidefinite, Hermitian spectral function is shown.Bottom: The determinant of the matrix-valued spectral function A as a function of frequency.Wherever det A(ω) is negative, the matrix is not positive semidefinite.
the off-diagonal elements of the spectral function feature strong oscillations (dashed green line).This can be explained by the relaxation of the positivity constraint: In general, the AC tends to overfit around ω = 0 and to underfit for large ω, since the kernel is largest for small ω.For metallic spectral functions, these artefacts can be cured as explained in Sec.II H.For insulating spectral functions, the oscillations around A = 0 are suppressed in the diagonal components, because fluctuations to negative values are not possible due to their positivity.In the off-diagonal elements, however, these fluctuations appear.Additionally, for high frequencies, the solution for the off-diagonal elements with flat default model does not tend to zero as in the IPT and Padé solution, but overshoots and goes to negative values.The violation of the particle-hole symmetry is due to the stochastic nature of the Monte Carlo data (it is thus a measure for the quality of the QMC result) and not directly a fault of the MEM (see Appendix C).We refrained from symmetrizing the resulting G(τ ), because in most models one does not have this possibility.
Building the information from the diagonal elements into the default model of the off-diagonal elements (poor man's method), as outlined in Sec.II E, improves these issues (blue line in Fig. 2).It suppresses oscillations of the off-diagonal elements where the diagonal elements are small, stabilizes a smooth solution, and improves the high-frequency behavior.Nevertheless, care has to be taken as this does not mean that the solution is positive semidefinite, and indeed, at some frequencies that general property of the spectral function is violated (see the plot of det A at the bottom of Fig. 2).Therefore, we apply the full matrix formulation (Sec.II F) to the problem (red lines in Fig. 2).This solves the issues one faces when performing the AC separately for the individual matrix elements.The spurious oscillations of the off-diagonal elements of A(ω) are efficiently suppressed even for a flat default model and the spectral function matrix is positive semidefinite everywhere.
As a next step, we benchmark the AC of Σ, using the inversion method to construct an auxiliary Green's function (see Sec. II G) for the off-diagonal model; the obtained Σ(ω + i0 + ) is shown in Fig. 3.The separate AC of the individual matrix elements using a flat default model leads to a heavily oscillating self-energy, which is why it is not shown here.But even performing the poor man's matrix method (blue line in Fig. 3) leads to unphysical results.Especially in the regions where the auxiliary spectral function A aux is not positive semidefinite (shaded in gray in Fig. 3), these problems are evident: there are heavy oscillations when the curve overshoots whenever the derivative changes quickly, and for some frequencies even the diagonal elements of the imaginary part of the self-energy become positive.This shows that the poor man's method is not adequate for determining a matrix-valued Σ(ω + i0 + ).The full matrix formulation (red line in Fig. 3), however, yields physical solutions just as IPT and Padé; these two solutions are consistent with each other within the limits of the method.Again, the slight deviation from the particle-hole-symmetrized result (dashed purple line) is due to a stochastic violation of that symmetry in the G(τ ) data.This also manifests itself as a spurious peak close to ω = 0 in the off-diagonal element of Σ(ω + i0 + ), which can be traced back to a slight mismatch of the position of the poles of Im Σ(ω + i0 + ) (that should be at ω = 0) between different matrix elements.In general models, the particle-hole symmetry is not present and cannot there-  + ) for the rotated model obtained with the inversion method.Each subplot represents one matrix element, the two off-diagonal elements Σ01 and Σ10 being the same.The result from IPT and Padé (black) is compared to the curves obtained with CTHYB and the MEM.The poor man's matrix method (blue) is presented alongside the full matrix method (red).For the latter, also the result where the auxiliary spectral function, Aaux, has been particle-hole symmetrized (phs) is shown (dashed purple line).Bottom: The determinant of the matrix-valued auxiliary spectral function Aaux as a function of frequency.Wherever det Aaux(ω) is negative, the matrix is not positive semidefinite.This happens only for the poor man's matrix method.The regions where the corresponding Aaux is not positive semidefinite are marked by the gray areas.
fore be exploited to improve the result.The real part of Σ(ω + i0 + ), which is related to the imaginary part by the Kramers-Kronig relation, also gives plausible results and the two different methods agree very well (not shown).
Once Σ(ω + i0 + ) has been obtained, other lattice quantities are accessible.However, we do not further discuss this here, but refer the reader to the example in the next section (Sec.IV), where we also calculate the local lattice Green's function G loc from Σ(ω + i0 + ).So far, only an insulating solution has been investigated.However, we also put the method to a test in the metallic regime of the model (U i /D i = 1.5), shown in Fig. 4. As discussed in Sec.II H, it is necessary to use preblurring to avoid cusps around ω = 0.This is also observed in the generalization of the method to off-diagonal elements.Not only is the preblurred spectral function smoother around ω = 0, but also more details at higher frequencies can be resolved, which can be best seen in the off-diagonal element.In general, in that regime, the results from CTHYB and MEM (employing the preblur technique) and IPT and Padé agree very nicely, similar to what is found for the insulating case.

IV. APPLICATION: LaTiO3
Finally, we apply the matrix formulations presented above in Secs.II E and II F to LaTiO 3 , for which we perform a one-shot DFT+DMFT calculation.The transition metal oxide LaTiO 3 has a perovskite crystal structure with tilted oxygen octahedra and distorted lanthanum cages.Because of these structural distortions, the material features an off-diagonal hybridization, and thus also an off-diagonal impurity Green's function G imp (τ ).LaTiO 3 was already extensively analyzed in FIG. 5. Top: Comparison of real (red) and imaginary parts (blue) of the impurity Green's function Gimp(ω + i0 + ) (solid lines) and the local lattice Green's function G loc (ω + i0 + ) (dashed lines).The former is obtained by a direct AC, whereas the latter is calculated via Eq.( 28) after the AC of the self-energy.In both cases, the matrix formulation of the MEM code was used.The subplots represent different matrix elements of the Green's function.Bottom: Total spectral function (i.e., trace over the orbital and spin degrees of freedom) of the Ti-t2g bands from the Green's functions shown above (Gimp, black, and G loc , red).For G loc , performing the AC of Σ using the poor man's matrix method is shown as well (blue).Additionally, we show the spectral function of a local Green's function where we have set the off-diagonal elements of the self-energy Σ(iωn) to zero before individually continuing its diagonal elements and evaluating G loc from the obtained Σ(ω + i0 + ) (dashed green).
literature [64][65][66], where also the nature of the Mott insulating state was traced back to the tilting and rotation of the oxygen octahedra and the accompanying lifting of the t 2g degeneracy.
Here, we do not further elaborate on the physics, but rather use LaTiO 3 as a benchmark material to prove the following points: First, we emphasize that the analytic continuation of off-diagonal elements is a problem often encountered in real-materials calculations.Second, the calculations presented here show that the full matrix formalism is feasible for 3 × 3 matrices.Third, we show that the continuation of the self-energy leads to a local Green's function G loc (ω + i0 + ) which is comparable to the continuation of G imp .
Our calculations were carried out with wien2k [67] and the triqs/dfttools package [63,[68][69][70].For the DFT part, we use the crystal structure from Ref. [71], 40 000 k points in the full Brillouin zone and employ the standard Perdew-Burke-Ernzerhof (PBE) [72] generalized gradient approximation (GGA) for the exchangecorrelation functional.From the DFT Bloch states we construct projective Wannier functions for the t 2g subspace of the Ti-3d states in an energy window from −1.0 to 1.2 eV around the Fermi level.In DMFT, we use the Kanamori Hamiltonian with a Coloumb interaction U = 4.5 eV and a Hund's coupling J = 0.65 eV similar to the values used in Ref. [27,66].We solve the impurity model on the imaginary axis with the triqs/cthyb solver [61] at an inverse temperature β = 40 eV −1 and use a total number of 3.2 × 10 7 measurements.We choose the solver basis such that the density matrix is diagonal.In the case of LaTiO 3 , this basis has the advantage that all matrix elements of G imp (τ ) are real if the phases are chosen accordingly.
Having obtained G imp (τ ) from the DFT+DMFT calculation, the AC is again performed in two ways: First, with the full matrix formalism for the full Green's function matrix (Sec.II F) and second, by a separate continuation of the individual elements with the poor man's matrix method introduced in Sec.II E. Furthermore, we analytically continue Σ(iω n ) by means of the inversion method (see Sec. II G).We calculate the local Green's function G loc (ω + i0 + ) with Eq. ( 28) and compare it to the direct continuation of the impurity Green's function G imp (ω + i0 + ) in the top graph of Fig. 5.
Within DMFT, the self-consistency condition requires G loc = G imp , which is well fulfilled on the Matsubara axis.Nevertheless, the agreement on the real axis shown in Fig. 5 is remarkably good for both the diagonal and the off-diagonal elements, especially when considering the different magnitudes of the individual matrix elements and the fact that the continuation is performed for different Green's functions, i.e., G imp (iω n ) and G aux (iω n ).This underlines the capabilities of the presented full matrix method.
Here, we only show the Green's functions obtained with the full matrix formalism; however, it should be emphasized that for LaTiO 3 also the poor man's method gives very similar results (see the corresponding spectral function in the bottom graph of Fig. 5).Therefore, here the element-wise continuation with the poor man's method constitutes an efficient alternative to the full matrix method.Figure 5 does not only prove the concept of the AC for the full Green's function, but also shows that the AC of the self-energy via the construction of an auxiliary Green's function is a feasible approach.In contrast, calculating the spectral function from G loc (ω + i0 + ), where we set the off-diagonal elements of the self-energy Σ(iω n ) to zero (and thus analytically continue only the diagonal elements), does lead to a completely wrong, even metallic, spectral function (see dashed green line in bottom plot of Fig. 5).This clearly shows that the off-diagonal elements must not be neglected at this point of the calculation.
In terms of the gap as well as the overall shape and size of the Hubbard bands, the presented spectra for the Ti-t 2g subspace are in good agreement with calculations available in the literature [27,64,66].

V. CONCLUSION
In this work, we show how a consistent framework for the analytic continuation of matrix-valued Green's functions can be constructed on a probabilistic footing.In order to enable a treatment of the off-diagonal elements, we use an entropy that allows to relax the non-negativity constraint one has to obey in the usual maximum entropy method.With this generalization, diagonal and off-diagonal elements can, in principle, be treated on a similar footing.
The practical use of this method is studied on two examples, an artificial two-band model and a realistic DFT+DMFT calculation for the insulating compound LaTiO 3 .First, we propose the poor man's matrix method, where the matrix elements are treated separately.With this scheme, we find satisfactory results for some cases (e.g., for LaTiO 3 ), but also see completely unphysical results in our calculations for the two-band model, since positive semidefiniteness and Hermiticity of the spectral functions cannot be guaranteed.
Only the AC in full matrix formulation cures these problems and produces spectral functions with the correct mathematical properties, such as positive semidefiniteness and Hermiticity.Although being computationally more expensive, it should be employed whenever feasible.
Moreover, these methods for AC introduced here give access to the matrix-valued self-energy on the realfrequency axis, which is indispensable for the study of lattice quantities.
this approximation [41], whereas Bryan suggested to calculate the weighted average Ā = dαP (α|G, D)A α [42].In most cases, P (α|G, D) is sharply peaked, and thus, the classical and the Bryan MEM give very similar results.

Positive-negative entropy
Since A + and A − are assumed to be independent, the entropy is S(A + , A − ) = S(A + ) + S(A − ) and one has to marginalize over both A + and A − in order to find the probability of α: As in the derivation of S ± , one can use the fact that χ 2 depends only on the difference A + − A − , so that the remaining degree of freedom can be integrated out.
Transforming to A = A + − A − and some auxiliary A = A + + A − , the integral over A is easily evaluated using a second-order expansion of the exponent like in the classical MEM, yielding (A5) The same form is also obtained in Ref. [38], using a different approach to prove it, with the Skellam distribution of the two monkey hordes as a starting point.Note that either way Eq.(A5) is an approximation.This expression looks similar to the case of positive spectral functions (A2), but with a different entropy given by Eq. ( 14) instead of the ordinary entropy (7) and with the measure . Interestingly, the metric is given by the second derivative of the entropy g ij = −∂ 2 S/∂A i ∂A j , just as in the ordinary case of non-negative spectral functions.Expanding the exponent of integral (A5) again to second order, the probability has exactly the same form as in the strictly positive case (A3), but with a different Q α = 1 2 χ 2 − αS ± due to the different entropy and with a different matrix due to the different metric.The prior distribution e αS ± (A) is maximized by Expanding the entropy up to second order around this maximum gives a variance of α(D + i + D − i ).Thus, in the usual case D + i = D − i = D i , the default model only influences the solution via the width of the distribution, not via the position of the maximum, since this is always at A i = 0.  [23] in blue, Ωmaxent [22] in dashed green, and our code in red) were used; a flat default model was employed for all three cases.et al. [23] and the Ωmaxent code [22].The resulting spectral function for the first band is shown in Fig. 6.Within the errors of the method (as discussed, e.g., in Ref. [77]), the three MEM curves are in good agreement.For the second band, the quality of the AC is similar (not shown here).The fact that the MEM solution and the Padé solution do only qualitatively agree is not surprising, as even a small statistical noise of the Monte Carlo data, in contrast to the noiseless IPT solution, notably increases the uncertainty of the AC [77].
In order to assess this influence of statistical noise on the AC, we take the spectral function A(ω) obtained from IPT and Padé as starting point; then, we calculate the corresponding imaginary-time Green's function G(τ ) by multiplying with the kernel, as in Eq. (4).In analogy to Sec.III we rotate the G(τ ) so that it features off-diagonal elements (the rotated IPT and Padé curve is the black curve in Fig. 7).From that G(τ ) (a very small Gaussian random noise with standard deviation 10 −8 has to be added for the MEM to work) an A(ω) can be obtained once more using AC, which we perform using our full matrix MEM (blue curve in Fig. 7).As can be clearly seen, for all matrix elements the original curve is well reproduced by the MEM, which is further evidence that our implementation works.However, the curves are smoother than the original data for larger |ω|; this is a well-known tendency of all MEM as the entropy term favors the smoothness of the default model and G(τ ) generally represents the spectral features worse for higher |ω|.
We now emulate QMC data by adding bigger random Gaussian noise (with standard deviation 5 × 10 −4 ) to the G(τ ) from the IPT and Padé A(ω).
The MEM-analytically continued curve from that noisy G(τ ) (dashed green curve in Fig. 7) differs considerably both from the input A(ω) and from the MEM curve with hardly any noise.One can clearly see how much informa- Starting from that solution, a G(τ ) was calculated using Eq. ( 4); that G(τ ) was then rotated in accordance to Sec.III.The rotated IPT and Padé result is shown as the black curve.Gaussian random noise with a standard deviation of σ was added to that G(τ ), which was then analytically continued using the full matrix MEM (blue curve with σ = 10 −8 , dashed green curve for σ = 5 • 10 −4 ).This is compared to the result from analytically continuing the G(τ ) obtained by solving the same model with CTHYB (red).
tion is lost already by noise only of the order of 5 × 10 −4 .The detailed structure of the Hubbard bands cannot be resolved and it is replaced by just one broad peak for each band (in the diagonal elements).In the A 00 element, a small shoulder for low |ω| is observed, reminiscent of a similar feature in the original data.
Finally, we want to compare the results from our artificially noisy Green's function to that of our CTHYB calculation for the same model.The spectral functions (dashed green and red curves in Fig. 7) look very similar, especially the off-diagonal element.As noted before, the CTHYB solution breaks particle-hole symmetry, which can be nicely seen in the different peak heights for positive and negative ω in the diagonal elements (of course, the particular way how this breaking happens will be different from QMC run to QMC run due to the stochastic nature of the method).This can already be seen on the level of G(τ ) (not shown).Thus, we conclude that IPT and Padé gives results compatible to CTHYB and MEM, for this very specific model at hand.

FIG. 1 .
FIG.1.Comparison of the "entropy density" of non-negative spectral functions s(A) compatible to Eq. (7) and of positive and negative spectral functions s ± (A) compatible to Eq. (14).The entropy is related to the entropy density via S(A) = dω s(A(ω)).Here, D + = D − = D was assumed.

FIG. 2 .
FIG.2.Top: Spectral function of the rotated model in the insulating regime.Each subplot represents one matrix element, the two off-diagonal elements A01 and A10 being the same.The result from IPT and Padé (black) is shown along with the MEM results.For the latter, we compare the continuation treating the matrix elements independently with a flat default model (dashed green) with the poor man's matrix method, i.e. using a default model incorporating the information from the diagonal elements (blue, only for the off-diagonal elements).Furthermore, the result of the matrix formulation (red) that ensures a positive semidefinite, Hermitian spectral function is shown.Bottom: The determinant of the matrix-valued spectral function A as a function of frequency.Wherever det A(ω) is negative, the matrix is not positive semidefinite.

FIG. 3 .
FIG.3.Top: Imaginary part of the self-energy Σ(ω+i0 + ) for the rotated model obtained with the inversion method.Each subplot represents one matrix element, the two off-diagonal elements Σ01 and Σ10 being the same.The result from IPT and Padé (black) is compared to the curves obtained with CTHYB and the MEM.The poor man's matrix method (blue) is presented alongside the full matrix method (red).For the latter, also the result where the auxiliary spectral function, Aaux, has been particle-hole symmetrized (phs) is shown (dashed purple line).Bottom: The determinant of the matrix-valued auxiliary spectral function Aaux as a function of frequency.Wherever det Aaux(ω) is negative, the matrix is not positive semidefinite.This happens only for the poor man's matrix method.The regions where the corresponding Aaux is not positive semidefinite are marked by the gray areas.

FIG. 4 .
FIG.4.Spectral function of the rotated model in the metallic regime.Each subplot represents one matrix element, the two off-diagonal elements A01 and A10 being the same.The result from IPT and Padé (black) is shown along with the full matrix MEM result.For the latter, we show the result with (cyan) and without (red) using the preblur method.
imp , full matrix G loc , poor man's matrix G loc , full matrix G loc , no offdiag.

FIG. 6 .
FIG.6.Spectral function A(ω) of the first band of the diagonal two-band model with Ui/Di = 3.25, calculated using Padé approximants for the IPT solution (black) and the MEM for the CTHYB solution.Different MEM codes (Bryan solution from Levy et al.[23] in blue, Ωmaxent[22] in dashed green, and our code in red) were used; a flat default model was employed for all three cases.

FIG. 7 .
FIG. 7. Spectral function A(ω) of the two-band model withUi/Di = 3.25, calculated using Padé approximants for the IPT solution.Starting from that solution, a G(τ ) was calculated using Eq.(4); that G(τ ) was then rotated in accordance to Sec.III.The rotated IPT and Padé result is shown as the black curve.Gaussian random noise with a standard deviation of σ was added to that G(τ ), which was then analytically continued using the full matrix MEM (blue curve with σ = 10 −8 , dashed green curve for σ = 5 • 10 −4 ).This is compared to the result from analytically continuing the G(τ ) obtained by solving the same model with CTHYB (red).