Decay estimates for time-fractional and other non-local in time subdiffusion equations in Rd\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathbb {R}^d$$\end{document}

We prove optimal estimates for the decay in time of solutions to a rather general class of non-local in time subdiffusion equations in Rd\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathbb {R}^d$$\end{document}. An important special case is the time-fractional diffusion equation, which has seen much interest during the last years, mostly due to its applications in the modeling of anomalous diffusion processes. We follow three different approaches and techniques to study this particular case: (A) estimates based on the fundamental solution and Young’s inequality, (B) Fourier multiplier methods, and (C) the energy method. It turns out that the decay behaviour is markedly different from the heat equation case, in particular there occurs a critical dimension phenomenon. The general subdiffusion case is treated by method (B) and relies on a careful estimation of the underlying relaxation function. Several examples of kernels, including the ultraslow diffusion case, illustrate our results.

Fourier multiplier methods, and (C) the energy method. It turns out that the decay behaviour is markedly different from the heat equation case, in particular there occurs a critical dimension phenomenon. The general subdiffusion case is treated by method (B) and relies on a careful estimation of the underlying relaxation function. Several examples of kernels, including the ultraslow diffusion case, illustrate our results.

Introduction and main results
The main purpose of this paper is to study the temporal decay of solutions to non-local in time diffusion equations whose prototype is given by together with initial condition Here u 0 (x) is a given datum. k * v denotes the convolution on the positive halfline Note that for sufficiently smooth u satisfying (2), The kernel k belongs to a wide class of kernels, it is merely assumed to satisfy the condition (PC) k ∈ L 1, loc (R + ) is nonnegative and nonincreasing, and there exists a kernel l ∈ L 1, loc (R + ) such that k * l = 1 in (0, ∞).
In this case we also write (k, l) ∈ PC. Note that (k, l) ∈ PC implies that l is completely positive, cf. [8,Theorem 2.2], in particular l is nonnegative.
Condition (PC) covers most of the relevant integro-differential operators w.r.t. time that appear in physics applications in the context of subdiffusion processes. An important example is given by (k, l) = (g 1−α , g α ) with α ∈ (0, 1), where g β denotes the standard kernel In this case, the term ∂ t (k * v) becomes the Riemann-Liouville fractional derivative ∂ α t v, and k * ∂ t v = c D α t v, the Caputo fractional derivative (cf. the right-hand side in (3)), of order α (cf. [19]) and (1) is called time-fractional diffusion equation. Condition (PC) also contains the multi-term fractional diffusion case, see Example 4.2 below. Another interesting and important class of examples is given by where ω ∈ C([0, 1]) is a nonnegative weight function that does not vanish everywhere. In this situation the operator ∂ t (k * ·) is a so-called operator of distributed order, and (1) is an example of a so-called ultraslow diffusion equation if ω(0) = 0 ( [20]). The special case ω ≡ 1 is discussed in Example 4.3 below.
Besides (1), we further investigate the time-fractional diffusion equation with the Laplacian being replaced by a more general elliptic operator in divergence form with rough coefficients, that is, we study with a merely measurable, uniformly elliptic coefficient matrix A. Thus, our results on weak solutions to (4), (5) do also apply to certain quasilinear equations, take e.g.
A of the form A(t, x) = A 0 (t, x, u(t, x)), with some appropriate nonlinear function A 0 .
Applications. Problems of the form (1), in particular the time-fractional diffusion equation, have attracted much interest during the last years, mostly due to their applications in the modeling of anomalous diffusion, see e.g. [20,22,25,37] and the references therein for the physical background. To provide some more specific motivation, let Z (t, x) denote the fundamental solution of (1) satisfying Z | t=0 = δ 0 . If k is a kernel of type (PC), then this fundamental solution can be constructed via subordination from the heat kernel and one can show that Z (t, ·) is a probability density function (pdf) on R d for all t > 0, see the proof of [22,Theorem 3] and Sect. 2 below. Given a pdf u 0 on R d satisfying some appropriate conditions, the solution u(t, ·) of the initial-value problem (1), (2) is given by the formula (8) below, and thus is a pdf on R d for all t > 0. So in this case, the problem (1), (2) describes the evolution of a pdf on R d . An important quantity that measures the dispersion of random processes and that describes how fast particles diffuse is the mean square displacement. It can be determined in experiments and is defined (in our situation) as In the case of the classical diffusion equation (i.e. α = 1) m(t) = ct, t > 0, with some constant c > 0. In the time-fractional diffusion case (i.e. the first example) one observes that m(t) = ct α (cf. [25]), which shows that the diffusion is slower than in the classical case of Brownian motion. During the recent decades, experimental studies have shown that there is an abundance of processes that have such a power-law mean square displacement, see [3,25,26,37] and the references given therein. An important application is the diffusion on fractals like e.g. some amorphous semiconductors [25,37]. In our third example, the mean square displacement m(t) behaves like c log t for t → ∞ provided that ω(0) = 0, see [20]. In this case (1) describes a so-called ultraslow diffusion process. Such processes have been extensively studied recently. They appear, for example, in polymer physics [32], diffusion in disordered media (Sinai diffusion) [35], and in diffusion generated by iterated maps [9]. We would like to point out that in our setting, that is, with a pair of kernels (k, l) ∈ PC the mean square displacement is given by see Lemma 2.1.
Another important context where equations of the form (1) and nonlinear variants of them appear is the modeling of dynamic processes in materials with memory. They typically arise by some constitutive laws of temporal convolution form (describing the memory of the material) when combined with the usual conservation laws such as balance of energy or balance of momentum. Examples are given by the theory of heat conduction with memory, see e.g. [29] and the references therein as well as [40] (which also contains a quasilinear model), and the diffusion of fluids in porous media with memory, cf. [5,17].
In view of condition (PC) the problem (1), (2) can be reformulated as an abstract Volterra equation on the positive halfline with a completely positive kernel; this can be seen by convolving the PDE with the kernel l. There has been a substantial amount of work on such abstract Volterra and integro-differential equations since the 1970s, in particular on existence and uniqueness, regularity, and long-time behaviour of solutions, see, for instance, [7,8,14,43], and the monograph [29].
One of the main objectives of this paper is to prove sharp estimates for the temporal decay of solutions to (1), (2). We point out that for non-local in space diffusion equations, in particular space-fractional diffusion equations, corresponding results have been obtained recently, see e.g. [2,4,6,16,38]. Concerning non-local in time diffusion, the case of a bounded domain with homogeneous Dirichlet boundary condition and a kernel of type PC has been studied recently in [40]. The decay estimates obtained in [40] are optimal and even true in the case of a uniformly elliptic operator in divergence form with rough coefficients, cf. the remarks following formula (15) below. For the time-fractional case with Laplacian we also refer to [24] and [27]. Moreover, decay estimates for the time-fractional diffusion equation in R d have also gathered interest in the engineering community [23]. However, due to the unexpected critical dimension phenomenon (to be discussed below) some of the subtleties of the equation have been overlooked there.
Our main observation is that the decay behaviour of the non-local in time diffusion models is markedly different from that of the standard caloric functions. We encounter a critical dimension phenomenon saying that the decay rate of the bounded domain case is attained for a finite critical dimension and does not anymore improve with increasing dimension. We also prove a series of other decay results, the main results including the following: 2. optimal L p -decay rates via Fourier multiplier methods for the general subdiffusion equation (1) with a kernel k of type PC, 3. an L 2 -decay estimate via energy methods for weak solutions of the time-fractional diffusion equation (4) with rough coefficient matrix A.
We will now describe our main results in more detail. Under appropriate conditions on u 0 the solution to (1), (2) is given by It turns out that the situation is significantly different from that in the case of the heat equation, where Time-fractional diffusion. Let us first consider for simplicity the time-fractional diffusion equation (i.e. k = g 1−α , α ∈ (0, 1)) and the case r = 2. Given u 0 ∈ L 2 (R d ) we do not have in general any decay neither for and this estimate is the best one can obtain in general (see e.g. [1]). Here |v| 2 := |v| L 2 (R d ) , and v(t) w(t), t > 0 means that there exists a constant C > 0 such that v(t) ≤ Cw(t), t > 0. In the case of time-fractional diffusion u(t, ·) := Z (t, ·) u 0 exhibits the decay behaviour see Corollary 3.2 below. Whereas for the heat equation the decay rate increases with the dimension d, time-fractional diffusion leads to the phenomenon of a critical dimension, which is d = 4 in this case. Below the critical dimension the rate increases with d, the exponent being α times the one from the heat equation, while above the critical dimension the decay rate is the same for all d, namely t −α . The reason why the decay rate does not increase any further with d lies in the fact that t −α (up to a constant) coincides with the decay rate in the case of a bounded domain and homogeneous Dirichlet boundary condition, cf. the remarks on problem (13) below. This reveals another interesting phenomenon: in the time-fractional case the diffusion is so slow that in higher dimensions (d above the critical dimension) restriction to a bounded domain and the requirement of a homogeneous Dirichlet boundary condition do not improve the rate of decay. This phenomenon cannot be observed in the classical diffusion case, where we always have exponential (and thus a better) decay in the case of a bounded domain. We point out that the estimate (9) is the best one can get in general, see Theorem 4.1 and Example 4.1 below. In the case of the critical dimension d = 4 we obtain the same decay rate as for d > 4, however we have to replace the L 2 -norm by the weak L 2 -norm, that is we find that In the more general case where r ∈ (1, ∞) and u 0 ∈ L 1 (R d ) ∩ L r (R d ), the critical dimension (which is in general not an integer) is given by as well as |u(t, ·)| r t −α , whenever d > d crit . Moreover, in the case d < 3 we always have a subcritical decay behaviour, see Corollary 3.1.
The critical dimension phenomenon can be well understood by looking at the asymptotic properties of the fundamental solution Z . In the time-fractional case, Z can be represented by Fox-functions and corresponding estimates are known (cf. [11] and Proposition 3.1 below). They show that for d ≥ 2 the fundamental solution Z (t, x) does not only have a singularity at t = 0 but also at the origin x = 0, which is a striking difference to the heat kernel. If, for example, d ≥ 3 and t −α |x| 2 ≤ 1, we have the sharp estimate Z (t, x) ≤ Ct −α |x| −d+2 , thus Z is (up to a constant) bounded above by g 1−α (t) times the Newtonian potential w.r.t.
x. This provides some interesting insight into how the fundamental solutions from the Poisson and the heat equation interpolate in case of fractional dynamics.
Relying on the asymptotic bounds for (the time-fractional) Z , in Sect. 3 we first derive estimates for |Z (t, ·)| p (for suitable p ∈ [1, ∞]), which by Young's inequality then yield bounds for |Z (t, ·) u 0 | r . We further look at gradient estimates for Z (t, ·) and Z (t, ·) u 0 , and we show that for integrable initial data u 0 the asymptotic behaviour of Z (t, ·) u 0 as t → ∞ is described by a multiple of Z (t, x). This is the time-fractional analogue of a well-known result for the heat equation, cf. [30,Proposition 48.6].
The general subdiffusion case and Fourier multiplier methods. Turning to the more general subdiffusion equation (1) with k being a kernel of type PC, we first note that asymptotic bounds for Z (t, x) like those in the time-fractional case do not seem to be known in the general case. However, we mention [20] where the ultraslowdiffusion case was studied. To derive decay estimates for Z (t, ·) u 0 we develop a theory which is based on tools from harmonic analysis and a careful estimation of the Fourier symbolZ (t, ξ) of Z w.r.t. the spatial variable.
Taking the Fourier transform w.r.t. x (defined as in (17) below) we see thatZ (t, ξ) solves the problem Here the so-called relaxation function s(t, μ) := s μ (t), t ≥ 0, is defined for the parameter μ ≥ 0 as the solution of the Volterra integral equation Note that s 0 ≡ 1 and that (10) is equivalent to the integro-differential equation It is known that the assumption (k, l) ∈ PC implies that s μ is nonnegative, nonincreasing, and that s μ ∈ H 1 1, loc (R + ); moreover ∂ μ s μ (t) ≤ 0, see e.g. Prüss [29]. Furthermore, it is known (see e.g. [40]) that for any μ ≥ 0 there holds which also shows that This implies that for any fixed μ > 0, s μ (t) cannot decay faster than the kernel k(t), and s μ (t) decays at least like (1 * l)(t) −1 . Note that lim t→∞ s μ (t) = 0 if and only if l / ∈ L 1 (R + ), see e.g. [40, Lemma 6.1]. By means of (11) and Plancherel's theorem we are able to prove the following see Theorem 4.2. We also show that this estimate gives the optimal rate of decay In order to obtain sharp L r -decay estimates and to treat the critical dimension case we go a step further and derive bounds for the partial derivatives of arbitrary order of the functions ξ → |ξ | 2δZ (t, ξ), where t > 0 is fixed and δ ∈ (0, 1] is a parameter. These bounds are uniform w.r.t. time and allow us to apply Mihlin's multiplier theorem. This leads to the desired L r -(and by interpolation also to the L r,∞ -) decay estimates, which generalize those from the time-fractional case. Here in our analysis we make use of another important property of the relaxation function s(t, μ) which is the complete monotonicity w.r.t. to the parameter μ for all t ≥ 0, see Sect. 5 below. This property has already been known, e.g. it follows from results in [29,Section 4]. In the present paper we provide a new proof of this property, which is rather short and relies on a comparison principle for a certain type of integro-differential equation.
By switching the kernels from the ultraslow diffusion case one obtains an interesting example of a pair (k, l) ∈ PC where k(t) decays like t −1 and (1 * l)(t) −1 like ct −1 log t.
Here the upper bound in (11) does not lead to the optimal decay. However, it has been shown in [40] that s μ (t) ≤ C 1+μt for all t, μ ≥ 0, and thus a simple modification of our original proof yields the optimal L 2 -estimate This is a remarkable result as for d ≤ 3 the decay rate is the same as for the heat equation!

Known results in the bounded domain case. It is instructive to compare our decay results with what is known in the case of a bounded domain and a homogeneous Dirichlet condition. Let
⊂ R d be a bounded domain, u 0 ∈ L 2 ( ), and assume again that k is of type PC. We consider the problem Let {φ n } ∞ n=1 ⊂°H 1 2 ( ) := C ∞ 0 ( ) H 1 2 ( ) be an orthonormal basis of L 2 ( ) consisting of eigenfunctions of the negative Dirichlet Laplacian with eigenvalues λ n > 0, n ∈ N, and denote by λ 1 the smallest such eigenvalue. Then the solution u of (13) can be represented via Fourier series as where (·|·) stands for the standard inner product in L 2 ( ), cf. [40, Section 1], the special case k = g 1−α can be also found in [27,Theorem 4.1]. By Parseval's identity and since ∂ μ s μ ≤ 0, it follows from (14) that and thus cf. [40]. This decay estimate is optimal as the example shows. By means of energy methods (15) can be generalized to problems with a uniformly elliptic operator in divergence form with rough coefficients, cf.

Energy methods for weak solutions.
A further goal of this paper is to prove decay estimates for the time-fractional diffusion equation (4), (5), where instead of the Laplacian a more general elliptic operator in divergence form with rough coefficients is considered. We assume α ∈ (0, 1), and that A satisfies a uniform parabolicity condition. We prove that if u is a suitably defined weak solution of (4), (5) satisfying appropriate integrability conditions at see Theorem 6.1. The basic idea of the proof is to show that for some constant μ > 0 in the weak sense. This can be achieved by means of Nash's inequality and the so-called L p -norm inequality for operators of the form ∂ t (k * ·), which has been established recently in [40], cf. Lemma 6.2 below. Note that the decay rate in (16) is less than the one we find in the case of the Laplacian (cf. (9)). This phenomenon of a smaller decay rate in the weak setting with rough coefficients does not occur in the case α = 1, where the same strategy of proof leads to the optimal decay rate t −d/4 . It is an interesting open problem whether (16) provides the optimal decay rate in the variational setting.
The paper is organized as follows. In Sect. 2 we provide some background on the fundamental solution Z , including their construction via subordination. Section 3 is devoted to the time-fractional diffusion case. We study the decay properties of Z and Z (t, ·) u 0 , respectively. Sections 4 and 5 deal with the general case. We first prove L 2decay estimates for Z (t, ·) u 0 using Plancherel's theorem and discuss some specific examples of pairs (k, l) ∈ PC (Sect. 4). Then we establish L r -estimates and look at the critical dimension case (Sect. 5). Finally, in Sect. 6 we discuss decay estimates in the variational setting.

The fundamental solution Z
Suppose (k, l) ∈ PC. We will describe how the fundamental solution Z to the subdiffusion problem (1), (2) can be constructed from the Gaussian heat kernel H by means of the subordination principle for abstract Volterra equations with completely positive kernels, cf. Prüss [29,Chapter 4]. See also the proof of [22,Theorem 3]. We will also show that Z (t, ·) is a probability density function on R d for all t > 0. By definition, the fundamental solution Z (t, x) to the subdiffusion problem (1) where δ 0 stands for the Dirac delta distribution. Throughout this paper the Fourier extended as usual to S (R d ). Taking the Fourier transform w.r.t. x in the problem for Z we obtain (cf. Sect. 1) where the relaxation function s(t, μ) is defined via the Volterra equation (10). In what follows we will construct a function Z that enjoys the latter property. Note first that (k, l) ∈ PC implies that the kernel l is completely positive, see [8] and [29]. Denoting byf the Laplace transform of f : R + → R, this in turn implies that ϕ(λ) = λk(λ), λ > 0, is a Bernstein function and that for every τ ≥ 0, the function ψ τ (λ) = exp(−τ ϕ(λ)), λ > 0, is completely monotone, by [29,Proposition 4.5]. Further, ψ τ (λ) is bounded by e −τ ϕ(0+) . Note that ϕ(λ) = 1/l(λ), and thus ϕ(0+) = 1/|l| L 1 (R + ) , which is 0 if and only if l / ∈ L 1 (R + ). By Bernstein's theorem (see e.g. [29,Section 4.1] or [33,Theorem 1.4]) there exist unique nondecreasing functions w(·, τ ) ∈ BV (R + ) normalized by w(0, τ ) = 0 and left-continuity such thatŵ The function w(t, τ ) is the so-called propagation function associated with the completely positive kernel l, cf. [29,Section 4.5]. Some important properties of w(t, τ ) can be found in [29,Proposition 4.9]. Among others, w(·, ·) is Borel measurable on R + × R + , w(t, ·) is nonincreasing and right-continuous on R + , and w(t, 0) = w(t, 0+) = 1 as well as w(t, ∞) = 0 for all t > 0. Moreover, the relaxation function s(t, μ) is represented by We now set Then Z (t, x) is nonnegative, and |Z (t, ·)| 1 = 1 for all t > 0, since H enjoys these properties. Taking the Fourier transform w.r.t. x and using (18) we obtaiñ Thus Z is the desired fundamental solution to (1).
The following lemma provides a formula for the mean square displacement. The basic idea of the proof can be found already in [25, p. 19] (time-fractional case with d = 1) and [20, p. 268] (ultraslow diffusion with arbitrary d).
Lemma 2.1 Let (k, l) ∈ PC and Z be the fundamental solution to the diffusion problem (1), (2). Let m(t) be defined as in (6). Then and thus the claim follows by inversion of the Laplace transform.
Let us illustrate Lemma 2.1 by looking at the time-fractional diffusion case, where k = g 1−α and l = g α with some α ∈ (0, 1). The formula for m(t) gives which is in accordance with our remarks in the paragraph following (6) in Sect. 1.

Time-fractional diffusion
In this section we study the subdiffusion problem (1), (2) in the important special case k = g 1−α with α ∈ (0, 1). That is, we consider the problem Under appropriate conditions on u 0 the solution of (19), (20) can be represented as where Z is the fundamental solution corresponding to (19), (20), see [11]. It is known (see e.g. [21,34]) that where H denotes the Fox H -function ( [18,19]). As the H -function is a rather complicated object, this representation of Z is not so useful for deriving estimates for Z directly. However, using the analytic and asymptotic properties of H , one can obtain the subsequent sharp estimates, which can be found in [11], see also [21].
Here C = C(α, d) is a positive constant which may differ from line to line.
We point out that for d ≥ 2 the fundamental solution Z (t, x) has a singularity at the origin x = 0, which is a fundamental difference to the heat kernel and which leads to a restriction concerning p-integrability on R d .

L p (R d )-estimates for Z and the solution
Proposition 3.1 allows us to estimate the L p (R d )-norm of Z (t, ·) for t > 0. We decompose the corresponding integral as follows.
For all dimensions d and 1 < p < ∞, we have in view of (22) We come now to the estimate for the integral where R ≤ 1. In case d = 1 we have in view of (25) If d = 2 we may estimate as follows, employing (24).
In fact, taking the Fourier transform w.r.t. the spatial variables we see that the Fourier transform of Z (t, x) solves the fractional differential equation where E α (z) denotes the Mittag-Leffler function, defined by , z ∈ C.
Suppose now that Z (t) belongs to L p (R d ) with some p ≤ 2. Then the Hausdorff-Young inequality implies that where p is the conjugate exponent of p. From this and the representation ofZ as well as (28) it follows that which in turn implies (by changing to polar coordinates) that We next examine the critical case p = d d−2 for d ≥ 3. We know that Z (t) does not belong to L p (R d ). However, it lies in the corresponding weak L p -space as the following theorem shows. This observation will be crucial, among others, to obtain optimal decay estimates for |u(t)| 2 for d ≥ 4.
We need to estimate denotes the distribution function of Z (t). Using again the similarity variable R = t −α |x| 2 we have Employing (26), we find that For the term with R ≤ 1 we use (23) to estimate as follows.
This shows that and thus This proves the theorem.
We come now to decay estimates for the L r (R d )-norm of the solution u of (19), (20) given by formula (21).
This estimate remains true for d = q = 1 and p = r = ∞.
(ii) Let d ≥ 3, 1 < q, r < ∞, 1 r + 2 d = 1 q , and u 0 ∈ L q (R d ). Then for u(t) = Z (t) u 0 we have Proof (i) By Young's inequality and Theorem 3.1 we have Proof If d < 3 or we have both d ≥ 3 and d < 2r r −1 then r < κ(d) and the first two estimates follow from Theorem 3.3 (i) with q = 1. The third estimate is a consequence of Theorem 3.3 (iii), since d = 2r r −1 is equivalent to r = κ(d) whenever d ≥ 3. To show the last estimate, observe first that the assumptions on r and d imply that there is q ∈ (1, r ) such that 1 r + 2 d = 1 q . The assertion then follows from Theorem 3.3 (ii).
Specializing Corollary 3.1 to r = 2 we obtain Corollary 3.1 exhibits the critical dimension phenomenon discussed already in Sect. 1. The critical dimension (which is in general not an integer) is here given by d crit = 2r/(r − 1). As long as the dimension d is below d crit (and at least 3) the decay rate increases with the dimension, whereas for any d > d crit the decay rate is the same, namely t −α , which coincides with the decay rate for the corresponding problem on a bounded domain with homogeneous Dirichlet boundary condition.

Gradient estimates
We turn now to L p -estimates for the spatial gradient of Z (t, x).
Proof The proof uses the gradient estimates for Z from Proposition 3.1 and is analogous to that of Theorem 3.1 and Theorem 3.2, respectively.
By means of the L p (R d )-estimates for ∇ Z (t, x) from Theorem 3.4 we can derive temporal decay estimates for the gradient of the solution. We record these estimates in the following theorem. Using ∇u(t, ·) = ∇ Z (t, ·) u 0 the proof is analogous to the one of Theorem 3.3.
Arguing analogously to the proof of Corollary 3.1 we obtain In the important special case r = 2 the picture is as follows.

Corollary 3.4 Let d ∈ N and u
Then 3 shows a critical dimension phenomenon for the gradient estimates with critical dimension d grad,crit = r/(r − 1) < d crit .

Large time behaviour of Z(t, ·) u 0
In this subsection, we want to show that for integrable initial data u 0 the asymptotic behaviour of Z (t, ·) u 0 as t → ∞ is described by a multiple of Z (t, x). The corresponding result for the heat equation is well-known, see e.g. [30,Proposition 48.6].
For our strategy of proof we need the following decomposition lemma from [10].
We have now the following result.  (ii) Assume in addition that ||x|u 0 | 1 < ∞. Then

Moreover, in the limit case p
Proof The strategy of the proof is the same as in [44, p. 14, 15].
(a) Suppose first that By Young's inequality it follows that for any 1 ≤ p < κ 1 (d) where we used Theorem 3.4. Hence which is the first part of assertion (ii). The second part follows from (31) For each j we have by Part (a) and by Theorem 3.1 and therefore t αd Assertion (i) follows by sending j → ∞.

General subdiffusion equations, L 2 -estimates
Let (k, l) ∈ PC. Assuming that u 0 ∈ L 1 (R d ) ∩ L 2 (R d ) we want to derive L 2 -decay estimates for solutions of by means of harmonic analysis, the main tool being Plancherel's theorem. Under appropriate assumptions on the initial value u 0 (see Kochubei [22]) the solution of (32), (33) is given by In the following we will assume that u is defined by (34).
We turn now to upper estimates.
Suppose that u is given by (34). Then there holds Proof Suppose that d ≤ 3. By Plancherel's theorem and the upper estimate in (11) we have which shows that |u(t) the Hardy-Littlewood-Sobolev theorem on fractional integration, see e.g. [13, Theorem 6.1.3], implies (− ) −1 u 0 ∈ L 2 (R d ). Using this property, Plancherel's theorem, and the upper estimate in (11) we may estimate as follows. Hence The theorem is proved.
where C, T are some positive constants.
If (39) where ψ : R + → R + is a continuous function (typically nondecreasing) and the constant C is independent of μ. Then it follows from the proof of Theorem 4.2 that Recall that the Laplace transform of g β , β > 0, is given by g β (z) = z −β , Re z > 0, and so it is easy to see that g β 1 * g β 2 = g β 1 +β 2 for all β 1 , β 2 > 0. In particular (k, l) ∈ PC.
We further have and so both k(t) and (1 * l)(t) −1 are of the form c t −α with some constant c > 0. Invoking Theorem 4.2 reproduces the estimates from Corollary 3.2 for noncritical d, that is, Theorem 4.1 shows that this estimate is optimal in the general case.
Example 4.2 A sum of two fractional derivatives. Let 0 < α < β < 1 and Clearly, k is completely monotone and k(0+) = ∞, and so by Theorem 5.4 in Chapter 5 of [15], the kernel k has a resolvent l ∈ L 1,loc (R + ) of the first kind, that is k * l = 1 on (0, ∞), and this resolvent is completely monotone as well. In particular (k, l) ∈ PC.

Example 4.3 The distributed order case (an example of ultraslow diffusion).
We consider the pair Both kernels are nonnegative and nonincreasing, and there holds (see [40, Example 6.5])k Thus (k, l) ∈ PC. There exists a number T 1 > 1 such that see [40,Example 6.5]. This together with Theorem 4.2 yields the logarithmic decay estimate which is optimal, by Theorem 4.1.
Example 4.4 Switching the kernels from the previous example. We consider now the pair From the previous considerations we know already that (k, l) ∈ PC. The kernel k(t) in this example behaves like t −1 as t → ∞, whereas (1 * l)(t) ∼ t/ log t as t → ∞, see [40, Example 6.6]. Thus k(t) decays faster than (1 * l)(t) −1 , so that there is a gap between the decay rates provided by Theorems 4.1 and 4.2. We claim that which is optimal by Theorem 4.1. What is interesting here is that for d ≤ 3 the decay rate is the same as for the heat equation! The claim follows from the estimate which has been shown in [40, Example 6.6], and from (42) in Remark 4.1. The proof of (43) in [40] is quite involved and makes use of Laplace transform methods.

L r -estimates and the critical dimension case
In this section we continue the study of the general subdiffusion problem (32), (33) under the assumption that the kernel k is of type PC.
Lemma 5.1 Assume that (k, l) ∈ PC. Let t ≥ 0 be fixed. Then the function μ → s(t, μ) belongs to C ∞ (R + ) and in particular s(t, μ) is completely monotone w.r.t. μ. Moreover, Proof Recall that s(t, μ) = s μ (t) solves the equation Since μ merely appears as coefficient in front of the second term, it is clear that the dependence of the solution s μ (t) on the parameter μ is C ∞ . Differentiating w.r.t. μ gives which is equivalent to the integro-differential equation The property ∂ μ s μ (0) = 0 follows from the fact that s μ (0) = 1 for all μ ≥ 0. Note also that ∂ μ s μ | μ=0 = −(1 * l)(t). Differentiating (46) w.r.t. μ leads to Differentiating further, a simple induction argument shows that (47) Assertion (44) follows then by means of induction from (47) and the fact that the solution v of is nonnegative, whenever f ∈ L 1, loc (R + ) enjoys this property, see e.g. [40] for the latter property. To see (45), we apply Taylor's theorem to the function μ → s(t, μ), thereby obtaining that for every n ∈ N s t, for some η ∈ ( μ 2 , μ). In view of (44) every summand on the right-hand side of (48) is nonnegative. This implies which in turn yields (45).
It is now evident that m(ξ ) satisfies Mihlin's condition with a constant M that merely depends on the dimension d.
Relying on Lemma 5.3, we are now able to prove the following generalization of Theorem 3.3. (k, l) ∈ PC and suppose that u is given by u(t) = Z (t) u 0 , where u 0 is as described below.
The assertion follows then from Lemma 5.3 with δ = 1 and Mihlin's multiplier theorem.
(iii) We know already that m(ξ ) = ψ 1 (|ξ | 2 )[(1 * l)(t)] is an L r (R d )-Fourier multiplier for all r ∈ (1, ∞) with a constant that only depends on r and d, that is, the operator T defined by T f = F −1 (mF f ) (F denoting the Fourier transform) on a suitable dense subset of L r (R d ) is L r (R d )-bounded, thus extends to an operator T ∈ B(L r (R d )), and |T | B(L r ) ≤ M(d, r ). The weak L r -spaces can be obtained from the strong ones by real interpolation. Assuming 1 < r < ∞ we may choose r 1 ∈ (1, r ), r 2 ∈ (r, ∞), and θ ∈ (0, 1) such that 1 r = 1−θ r 1 + θ r 2 . By [36, Theorem 1.18.2], we then have (L r 1 , L r 2 ) θ,∞ = L r,∞ . It follows that T ∈ B(L r,∞ (R d )), with a norm bound that only depends on r and d. We choose r = d d−2 . Then 1 − 1 r = 2 d , and the Hardy-Littlewood-Sobolev theorem ([13, Theorem 6.1.3]) implies that (− ) −1 u 0 ∈ L r,∞ (R d ). Note that u 0 ∈ L 1 (R d ) and so we only get an estimate in a weak L r -space. The assertion now follows from (51) with δ = 1 and the fact that T ∈ B(L r,∞ (R d )), with a norm bound that is independent of t > 0.
The following result generalizes Corollary 3.1 and is a direct consequence of the previous theorem. The proof is analogous to the one of Corollary 3.1.
As a special case we obtain the expected weak L 2 -decay estimate in the case of the critical dimension d = 4, which was missing in Theorem 4.2.
, and assume that (k, l) ∈ PC. Suppose that u is given by u(t) = Z (t) u 0 . Then

Decay estimates via the energy method
In this section we study a time-fractional diffusion equation with a more general elliptic operator in divergence form. To be precise, we consider the problem Here, we assume α ∈ (0, 1), for all T, R > 0, and ∃ν > 0 such that . We say that a function u : (0, ∞) × R d → R is a global weak solution of (52), (53) if for any T, R > 0, and for any test function , and suppose that condition (H) is satisfied. Let u be a global weak solution of (52), (53), and assume in addition that |∇u|, Then An important tool in our proof is the so-called fundamental identity for integrodifferential operators of the form d dt (k * ·), cf. also [41]. It can be viewed as the analogue to the chain rule (H (u)) = H (u)u .  U for a.a. t ∈ (0, T ). Suppose that the By the fundamental identity, applied twice, Fubini's theorem, and the triangle inequality for the L 2 ( )-norm we have for a.a. t ∈ (0, T ) From this and Hölder's inequality, we infer that for a.a. t ∈ (0, T ) This proves the lemma.
We are now in position to prove Theorem 6.1.
Proof 1. Regularized weak formulation. For μ > 0, define the kernel h μ ∈ L 1,loc (R + ) via the Volterra integral equation and set g 1−α,μ = g 1−α * h μ . It is well known that g 1−α,μ is positive and nonincreasing, it belongs to H 1 1,loc ([0, ∞)), and the Yosida approximation B n = n B(n + B) −1 , n ∈ N, of the operator B := ∂ α t defined in an appropriate space takes the form B n = ∂ t (g 1−α,n * ·), and B n → B as n → ∞. Further, h μ is nonnegative for all μ > 0, and for 1 ≤ p < ∞ and f ∈ L p ([0, T ]), we have h n * f → f in L p ([0, T ]) as n → ∞. We refer to [39] and [42] for more background on B n . Using these properties of the kernels g 1−α,μ one can derive an equivalent weak formulation where the singular kernel g 1−α is replaced by the more regular kernels g 1−α,n . In fact, it follows from the above definition of weak solution that for any R > 0 and any function ψ ∈°H 1 2 (B R ) there holds cf. [42].
This finishes the proof of Theorem 6.1.
Note that the decay rate for the L 2 -norm in (55) is strictly less than the one we obtained in Sect. 2, see (29). However, sending d → ∞ the decay rate in (55) becomes α, which is precisely the decay rate for the L 2 -norm in (29) for d > 4. This phenomenon of a smaller decay rate in the variational setting does not occur in the case α = 1. In fact, proceeding similarly as above in the latter case one obtains the differential inequality ∂ t |u(t)| 2 + μ|u(t)| γ 2 ≤ 0, t > 0, with the same γ (> 1) and μ as before. This implies On the other hand, the L p (R d )-norm of the Gaussian heat kernel H (t, x) = (4π t) −d/2 exp(− |x| 2 4t ) decays as Assuming u 0 ∈ L 1 (R d ), Young's inequality implies that the L 2 (R d )-norm of u(t) = H (t) u 0 decays as which is precisely the decay rate we obtain by means of energy estimates. It is an interesting open problem whether the decay rate in (55) is optimal in general. If this was not the case how could it be upgraded?