Decay estimates for time-fractional and other non-local in time subdiffusion equations in $\mathbb{R}^d$

We prove optimal estimates for the decay in time of solutions to a rather general class of non-local in time subdiffusion equations in $\mathbb{R}^d$. An important special case is the time-fractional diffusion equation, which has seen much interest during the last years, mostly due to its applications in the modeling of anomalous diffusion processes. We follow three different approaches and techniques to study this particular case: (A) estimates based on the fundamental solution and Young's inequality, (B) Fourier multiplier methods, and (C) the energy method. It turns out that the decay behaviour is markedly different from the heat equation case, in particular there occurs a {\em critical dimension phenomenon}. The general subdiffusion case is treated by method (B) and relies on a careful estimation of the underlying relaxation function. Several examples of kernels, including the ultraslow diffusion case, illustrate our results.


Introduction and main results
We study the temporal decay of solutions to the non-local in time diffusion equation together with initial condition u| t=0 = u 0 , x ∈ R d .
Here u 0 (x) is a given datum. k * v denotes the convolution on the positive halfline R + := [0, ∞) w.r.t. the time variable, that is (k * v)(t) = t 0 k(t − τ )v(τ ) dτ , t ≥ 0. We assume that k is a kernel of type PC, by which we mean that the following condition is satisfied.
In this case, the term ∂ t (k * v) becomes the Riemann-Liouville fractional derivative ∂ α t v of order α (cf. [19]) and (1) is called time-fractional diffusion equation.
Another interesting example is given by the pair In this situation the operator ∂ t (k * ·) is a so-called operator of distributed order and (1) is an example of a so-called ultraslow diffusion equation, see e.g. [20].
Applications. Problems of the form (1), in particular the time-fractional diffusion equation, have seen much interest during the last years, mostly due to their applications in the modeling of anomalous diffusion, see e.g. [20], [22], [25], [37] and the references therein for the physical background. To provide some more specific motivation, let Z(t, x) denote the fundamental solution of (1) satisfying Z| t=0 = δ 0 . If k is a kernel of type (PC), then this fundamental solution can be constructed via subordination from the heat kernel and one can show that Z(t, ·) is a probability density function (pdf) on R d for all t > 0, see the proof of [22,Theorem 3] and Section 2 below. Given a pdf u 0 on R d satisfying some appropriate conditions, the solution u(t, ·) of the initial-value problem (1), (2) is given by the formula (6) below, and thus is a pdf on R d for all t > 0. So in this case, the problem (1), (2) describes the evolution of a pdf on R d .
An important quantity that measures the dispersion of random processes and that describes how fast particles diffuse is the mean square displacement. It can be determined in experiments and is defined (in our situation) as In the case of the classical diffusion equation (i.e. α = 1) m(t) = ct, t > 0, with some constant c > 0. In the time-fractional diffusion case (i.e. the first example) one observes that m(t) = ct α (cf. [25]), which shows that the diffusion is slower than in the classical case of Brownian motion. During the recent decades, experimental studies have shown that there is an abundance of processes that have such a power-law mean square displacement, see [3], [25], [26], [37] and the references given therein. An important application is the diffusion on fractals like e.g. some amorphous semiconductors [25], [37]. In our second example, the mean square displacement m(t) behaves like c log t for t → ∞, see [20]. In this case (1) describes a so-called ultraslow diffusion process. Such processes have been extensively studied recently. They appear, for example, in polymer physics [32], diffusion in disordered media (Sinai diffusion) [35], and in diffusion generated by iterated maps [9]. We would like to point out that in our setting, that is, with a pair of kernels (k, l) ∈ PC the mean square displacement is given by see Lemma 2.1. Another context where equations of the form (1) and nonlinear variants of them appear is the modeling of dynamic processes in materials with memory. Examples are given by the theory of heat conduction with memory, see e.g. [29] and the references therein as well as [40], and the diffusion of fluids in porous media with memory, cf. [5], [17].
In view of condition (PC) the problem (1), (2) can be reformulated as an abstract Volterra equation on the positive halfline with a completely positive kernel; this can be seen by convolving the PDE with the kernel l. There has been a substantial amount of work on such abstract Volterra and integro-differential equations since the 1970s, in particular on existence and uniqueness, regularity, and long-time behaviour of solutions, see, for instance, [7], [8], [14], [43], and the monograph [29].
One of the main objectives of this paper is to prove sharp estimates for the temporal decay of solutions to (1), (2). We point out that for non-local in space diffusion equations, in particular space-fractional diffusion equations, corresponding results have been obtained recently, see e.g. [2], [4], [6], [16], [38]. Concerning non-local in time diffusion, the case of a bounded domain with homogeneous Dirichlet boundary condition and a kernel of type PC has been studied recently in [40]. The decay estimates obtained in [40] are optimal and even true in the case of a uniformly elliptic operator in divergence form with rough coefficients, cf. the remarks following formula (13) below. For the time-fractional case with Laplacian we also refer to [24] and [27]. Moreover, decay estimates for the time-fractional diffusion equation in R d have also gathered interest in the engineering community [23]. However, due to the unexpected critical dimension phenomenon (to be discussed below) some of the subtleties of the equation have been overlooked there.
Under appropriate conditions on u 0 the solution to (1), (2) is given by Given u 0 in some Lebesgue space L q (R d ), q ∈ [1, ∞] we want to understand the decay behaviour of |Z(t, ·) ⋆ u 0 | Lr(R d ) as t → ∞ for suitable r ∈ [1, ∞]; here f 1 ⋆ f 2 denotes the convolution of f 1 , f 2 on R d . It turns out that the situation is markedly different from that in the case of the heat equation, where Z(t, x) = H(t, x) = (4πt) −d/2 exp(−|x| 2 /4t) is the Gaussian heat kernel. Time-fractional diffusion. Let us first consider for simplicity the time-fractional diffusion equation (i.e. k = g 1−α , α ∈ (0, 1)) and the case r = 2. Given u 0 ∈ L 2 (R d ) we do not have in general any decay neither for |Z(t, ·) Then it is well-known that u(t, ·) := H(t, ·) ⋆ u 0 decays in the L 2 -norm as and this estimate is the best one can obtain in general (see e.g. [1]). Here |v| 2 := |v| L2(R d ) , and v(t) w(t), t > 0 means that there exists a constant C > 0 such that v(t) ≤ Cw(t), t > 0. In the case of time-fractional diffusion u(t, ·) := Z(t, ·) ⋆ u 0 exhibits the decay behaviour see Corollary 3.2 below. Whereas for the heat equation the decay rate increases with the dimension d, time-fractional diffusion leads to the phenomenon of a critical dimension, which is d = 4 in this case. Below the critical dimension the rate increases with d, the exponent being α times the one from the heat equation, while above the critical dimension the decay rate is the same for all d, namely t −α . The reason why the decay rate does not increase any further with d lies in the fact that t −α (up to a constant) coincides with the decay rate in the case of a bounded domain and homogeneous Dirichlet boundary condition, cf. the remarks on problem (11) below. This reveals another interesting phenomenon: In the time-fractional case the diffusion is so slow that in higher dimensions (d above the critical dimension) restriction to a bounded domain and the requirement of a homogeneous Dirichlet boundary condition do not improve the rate of decay. This phenomenon cannot be observed in the classical diffusion case, where we always have exponential (and thus a better) decay in the case of a bounded domain. We point out that the estimate (7) is the best one can get in general, see Theorem 4.1 and Example 4.1 below. In the case of the critical dimension d = 4 we obtain the same decay rate as for d > 4, however we have to replace the L 2 -norm by the weak L 2 -norm, that is we find that In the more general case where r ∈ (1, ∞) and u 0 ∈ L 1 (R d ) ∩ L r (R d ), the critical dimension (which is in general not an integer) is given by Moreover, in the case d < 3 we always have a subcritical decay behaviour, see Corollary 3.1.
The critical dimension phenomenon can be well understood by looking at the asymptotic properties of the fundamental solution Z. In the time-fractional case, Z can be represented by Fox-functions and corresponding estimates are known (cf. [11] and Prop. 3.1 below). They show that for d ≥ 2 the fundamental solution Z(t, x) does not only have a singularity at t = 0 but also at the origin x = 0, which is a striking difference to the heat kernel. If, for example, d ≥ 3 and t −α |x| 2 ≤ 1, we have the sharp estimate Z(t, x) ≤ Ct −α |x| −d+2 , thus Z is (up to a constant) bounded above by g 1−α (t) times the Newtonian potential w.r.t. x. This provides some interesting insight into how the fundamental solutions from the Poisson and the heat equation interpolate in case of fractional dynamics.
Relying on the asymptotic bounds for (the time-fractional) Z, in Section 3 we first derive estimates for |Z(t, ·)| p (for suitable p ∈ [1, ∞]), which by Young's inequality then yield bounds for |Z(t, ·) ⋆ u 0 | r . We further look at gradient estimates for Z(t, ·) and Z(t, ·) ⋆ u 0 , and we show that for integrable initial data u 0 the asymptotic behaviour of Z(t, ·) ⋆ u 0 as t → ∞ is described by a multiple of Z(t, x). This is the time-fractional analogue of a well-known result for the heat equation, cf. [30,Prop. 48.6].
The general subdiffusion case. Turning to the more general subdiffusion equation (1) with k being a kernel of type PC, we first note that asymptotic bounds for Z(t, x) like those in the time-fractional case do not seem to be known in the general case. However, we mention [20] where the ultraslow-diffusion case was studied. To derive decay estimates for Z(t, ·) ⋆ u 0 we develop a theory which is based on tools from harmonic analysis and a careful estimation of the Fourier symbolZ(t, ξ) of Z w.r.t. the spatial variable.
Taking the Fourier transform w.r.t. x (defined as in (17) below) we see thatZ(t, ξ) solves the problem Here the so-called relaxation function s(t, µ) := s µ (t), t ≥ 0, is defined for the parameter µ ≥ 0 as the solution of the Volterra integral equation Note that s 0 ≡ 1 and that (8) is equivalent to the integro-differential equation It is known that the assumption (k, l) ∈ PC implies that s µ is nonnegative, nonincreasing, and that s µ ∈ H 1 1, loc (R + ); moreover ∂ µ s µ (t) ≤ 0, see e.g. Prüss [29]. Furthermore, it has been shown in [40] that for any µ ≥ 0 there holds which also shows that , a.a. t > 0.
By means of (9) and Plancherel's theorem we are able to prove the following L 2 -decay estimate. Suppose u 0 ∈ L 1 (R d ) ∩ L 2 (R d ), then see Theorem 4.2. We also show that this estimate gives the optimal rate of decay provided In order to obtain sharp L r -decay estimates and to treat the critical dimension case we go a step further and derive bounds for the partial derivatives of arbitrary order of the functions ξ → |ξ| 2δZ (t, ξ), where t > 0 is fixed and δ ∈ (0, 1] is a parameter. These bounds are uniform w.r.t. time and allow us to apply Mihlin's multiplier theorem. This leads to the desired L r -(and by interpolation also to the L r,∞ -) decay estimates, which generalize those from the timefractional case. Here in our analysis we make use of another important property of the relaxation function s(t, µ) which is the complete monotonicity w.r.t. to the parameter µ for all t ≥ 0, see Section 5 below. This property has already been known, e.g. it follows from results in [29,Section 4]. In the present paper we provide a new proof of this property, which is rather short and relies on a comparison principle for a certain type of integro-differential equation.
By switching the kernels from the ultraslow diffusion case one obtains an interesting example of a pair (k, l) ∈ PC where k(t) decays like t −1 and (1 * l)(t) −1 like ct −1 log t. Here the upper bound in (9) does not lead to the optimal decay. However, it has been shown in [40] that s µ (t) ≤ C 1+µt for all t, µ ≥ 0, and thus a simple modification of our original proof yields the optimal L 2 -estimate Let Ω ⊂ R d be a bounded domain, u 0 ∈ L 2 (Ω), and assume again that k is of type PC. We consider the problem Let {φ n } ∞ n=1 ⊂°H 1 2 (Ω) := C ∞ 0 (Ω) H 1 2 (Ω) be an orthonormal basis of L 2 (Ω) consisting of eigenfunctions of the negative Dirichlet Laplacian with eigenvalues λ n > 0, n ∈ N, and denote by λ 1 the smallest such eigenvalue. Then the solution u of (11) can be represented via Fourier series as where (·|·) stands for the standard inner product in L 2 (Ω), cf. [40, Section 1], the special case k = g 1−α can be also found in [27,Theorem 4.1]. By Parseval's identity and since ∂ µ s µ ≤ 0, it follows from (12) that cf. [40]. This decay estimate is optimal as the example shows. By means of energy methods (13) can be generalized to problems with a uniformly elliptic operator in divergence form with rough coefficients, cf. [40, Corollary 1.1]. (13) can be further extended to r ∈ (1, ∞), in fact assuming u 0 ∈ L r (Ω) and setting ρ(r) := 4(r − 1)/r 2 we have see [40,Remark 5.1]. We see that the relaxation function s µ (t) (with some fixed µ > 0) determines the rate of decay as t → ∞. For example, if k = g 1−α with α ∈ (0, 1) it follows from (9) that s µ (t) decays like ct −α . This justifies among others earlier remarks on (7) concerning the case d > 4. Energy methods for weak solutions. A further goal of this paper is to prove decay estimates for the time-fractional diffusion equation with the Laplacian being replaced by a more general elliptic operator in divergence form with rough coefficients. More precisely, we want to study the problem Here, we assume α ∈ (0, 1), , and that A satisfies a uniform parabolicity condition. We prove that if u is a suitably defined weak solution of (14), (15) satisfying appropriate integrability conditions at see Theorem 6.1. The basic idea of the proof is to show that for some constant µ > 0 in the weak sense. This can be achieved by means of Nash's inequality and the so-called L pnorm inequality for operators of the form ∂ t (k * ·), which has been established recently in [40], cf. Lemma 6.2 below. Note that the decay rate in (16) is less than the one we find in the case of the Laplacian (cf. (7)). This phenomenon of a smaller decay rate in the weak setting with rough coefficients does not occur in the case α = 1, where the same strategy of proof leads to the optimal decay rate t −d/4 . It is an interesting open problem whether (16) provides the optimal decay rate in the variational setting.
The paper is organized as follows. In Section 2 we provide some background on the fundamental solution Z, including their construction via subordination. Section 3 is devoted to the timefractional diffusion case. We study the decay properties of Z and Z(t, ·) ⋆ u 0 , respectively. Sections 4 and 5 deal with the general case. We first prove L 2 -decay estimates for Z(t, ·) ⋆ u 0 using Plancherel's theorem and discuss some specific examples of pairs (k, l) ∈ PC (Section 4). Then we establish L r -estimates and look at the critical dimension case (Section 5). Finally, in Section 6 we discuss decay estimates in the variational setting.

The fundamental solution Z
Suppose (k, l) ∈ PC. We will describe how the fundamental solution Z to the subdiffusion problem (1), (2) can be constructed from the Gaussian heat kernel H by means of the subordination principle for abstract Volterra equations with completely positive kernels, cf. Prüss [29,Chapter 4]. See also the proof of [22,Theorem 3]. We will also show that Z(t, ·) is a probability density function on R d for all t > 0. By definition, the fundamental solution Z(t, x) to the subdiffusion problem (1), (2) is a distributional solution of where δ 0 stands for the Dirac delta distribution. Throughout this paper the Fourier transform extended as usual to S ′ (R d ). Taking the Fourier transform w.r.t. x in the problem for Z we obtain (cf. Section 1) where the relaxation function s(t, µ) is defined via the Volterra equation (8). In what follows we will construct a function Z that enjoys the latter property. Note first that (k, l) ∈ PC implies that the kernel l is completely positive, see [8] and [29]. Denoting byf the Laplace transform of f : R + → R, this in turn implies that ϕ(λ) = λk(λ), λ > 0, is a Bernstein function and that for every τ ≥ 0, the function ψ τ (λ) = exp(−τ ϕ(λ)), λ > 0, is completely monotone, by [29,Proposition 4.5]. Further, ψ τ (λ) is bounded by e −τ ϕ(0+) . Note that ϕ(λ) = 1/l(λ), and thus ϕ(0+) = 1/|l| L1(R+) , which is 0 if and only if l / ∈ L 1 (R + ). By Bernstein's theorem (see e.g. [29, Section 4.1] or [33, Theorem 1.4]) there exist unique nondecreasing functions w(·, τ ) ∈ BV (R + ) normalized by w(0, τ ) = 0 and left-continuity such that The function w(t, τ ) is the so-called propagation function associated with the completely positive kernel l, cf. [29,Section 4.5]. Some important properties of w(t, τ ) can be found in [29,Proposition 4.9]. Among others, w(·, ·) is Borel measurable on R + × R + , w(t, ·) is nonincreasing and rightcontinuous on R + , and w(t, 0) = w(t, 0+) = 1 as well as w(t, ∞) = 0 for all t > 0. Moreover, the relaxation function s(t, µ) is represented by Then Z(t, x) is nonnegative, and |Z(t, ·)| 1 = 1 for all t > 0, since H enjoys these properties. Taking the Fourier transform w.r.t. x and using (18) we obtaiñ Thus Z is the desired fundamental solution to (1).
The following lemma provides a formula for the mean square displacement. The basic idea of the proof can be found already in [25, p. 19] (time-fractional case with d = 1) and [20, p. 268] (ultraslow diffusion with arbitrary d).
Lemma 2.1 Let (k, l) ∈ PC and Z be the fundamental solution to the diffusion problem (1), (2). Let m(t) be defined as in (4). Then Proof. For the Laplace transform of m we havê and thus the claim follows by inversion of the Laplace transform.
Let us illustrate Lemma 2.1 by looking at the time-fractional diffusion case, where k = g 1−α and l = g α with some α ∈ (0, 1). The formula for m(t) gives which is in accordance with our remarks in the paragraph following (4) in Section 1.

Time-fractional diffusion
In this section we study the subdiffusion problem (1), (2) in the important special case k = g 1−α with α ∈ (0, 1). That is, we consider the problem Under appropriate conditions on u 0 the solution of (19), (20) can be represented as where Z is the fundamental solution corresponding to (19), (20), see [11]. It is known (see e.g. [21], [34]) that where H denotes the Fox H-function ( [18], [19]). As the H-function is a rather complicated object, this representation of Z is not so useful for deriving estimates for Z directly. However, using the analytic and asymptotic properties of H, one can obtain the subsequent sharp estimates, which can be found in [11], see also [21].
Here C = C(α, d) is a positive constant which may differ from line to line.
We point out that for d ≥ 2 the fundamental solution Z(t, x) has a singularity at the origin x = 0, which is a fundamental difference to the heat kernel and which leads to a restriction concerning p-integrability on R d .
3.1 L p (R d )-estimates for Z and the solution Proposition 3.1 allows us to estimate the L p (R d )-norm of Z(t, ·) for t > 0. We decompose the corresponding integral as follows.
For all dimensions d and 1 < p < ∞, we have in view of (22) We come now to the estimate for the integral where R ≤ 1. In case d = 1 we have in view of (25) for all 1 If d = 2 we may estimate as follows, employing (24).
whenever the last integral is finite, that is, whenever Setting κ(1) = κ(2) = ∞ and combining the previous estimates we see that Observe also that Z(t) ∈ L ∞ (R), t > 0, provided that d = 1, and we have the estimate Summarizing we have proved Theorem 3.1 Let d ∈ N and κ(d) be as above. Then Z(t) belongs to L p (R d ) for all t > 0, provided that 1 ≤ p < κ(d), and there holds  . In fact, taking the Fourier transform w.r.t. the spatial variables we see that the Fourier transform of Z(t, x) solves the fractional differential equation where E α (z) denotes the Mittag-Leffler function, defined by , z ∈ C.
Employing the bounds from (9), a short computation shows that the Mittag-Leffler function satisfies the estimate see also [40,Example 6.1]. An upper bound of the form E α (−x) ≤ C(α)/(1 + x), x ≥ 0, can also be found in [24]. Suppose now that Z(t) belongs to L p (R d ) with some p ≤ 2. Then the Hausdorff-Young inequality implies that where p ′ is the conjugate exponent of p. From this and the representation ofZ as well as (28) it follows that which in turn implies (by changing to polar coordinates) that We next examine the critical case p = d d−2 for d ≥ 3. We know that Z(t) does not belong to L p (R d ). However, it lies in the corresponding weak L p -space as the following theorem shows. This observation will be crucial, among others, to obtain optimal decay estimates for |u(t)| 2 for d ≥ 4.
Proof. Set p = d d−2 . We need to estimate denotes the distribution function of Z(t). Using again the similarity variable R = t −α |x| 2 we have Employing (26), we find that For the term with R ≤ 1 we use (23) to estimate as follows.
This shows that This proves the theorem.
We come now to decay estimates for the L r (R d )-norm of the solution u of (19), (20) given by formula (21).
This estimate remains true for d = q = 1 and p = r = ∞.
Proof. (i) By Young's inequality and Theorem 3.1 we have (iii) The assertion follows from Theorem 3.2 and Young's inequality for weak type spaces (see [13, Theorem 1.2.13]).
Proof. If d < 3 or we have both d ≥ 3 and d < 2r r−1 then r < κ(d) and the first two estimates follow from Theorem 3.3 (i) with q = 1. The third estimate is a consequence of Theorem 3.3 (iii), since d = 2r r−1 is equivalent to r = κ(d) whenever d ≥ 3. To show the last estimate, observe first that the assumptions on r and d imply that there is q ∈ (1, r) such that 1 r + 2 d = 1 q . The assertion then follows from Theorem 3.3 (ii).
Specializing Corollary 3.1 to r = 2 we obtain Corollary 3.2 Let d ∈ N and u 0 ∈ L 1 (R d ) ∩ L 2 (R d ) and u(t) = Z(t) ⋆ u 0 . Then Corollary 3.1 exhibits the critical dimension phenomenon discussed already in Section 1. The critical dimension (which is in general not an integer) is here given by d crit = 2r/(r − 1). As long as the dimension d is below d crit (and at least 3) the decay rate increases with the dimension, whereas for any d > d crit the decay rate is the same, namely t −α , which coincides with the decay rate for the corresponding problem on a bounded domain with homogeneous Dirichlet boundary condition.

Gradient estimates
We turn now to L p -estimates for the spatial gradient of Z(t, x). Then ∇Z(t) belongs to L p (R d ; R d ) for all t > 0, provided that 1 ≤ p < κ 1 (d), and there holds Proof. The proof uses the gradient estimates for Z from Proposition 3.1 and is analogous to that of Theorem 3.1 and Theorem 3.2, respectively.
By means of the L p (R d )-estimates for ∇Z(t, x) from Theorem 3.4 we can derive temporal decay estimates for the gradient of the solution. We record these estimates in the following theorem. Using ∇u(t, ·) = ∇Z(t, ·) ⋆ u 0 the proof is analogous to the one of Theorem 3.3.
(iii) Let d ≥ 2 and u 0 ∈ L 1 (R d ). Then for u(t) = Z(t) ⋆ u 0 we have Arguing analogously to the proof of Corollary 3.1 we obtain , and u(t) = Z(t) ⋆ u 0 . Then In the important special case r = 2 the picture is as follows.
Corollary 3.3 shows a critical dimension phenomenon for the gradient estimates with critical dimension d grad,crit = r/(r − 1) < d crit .

Large time behaviour of Z(t, ·) ⋆ u 0
In this subsection, we want to show that for integrable initial data u 0 the asymptotic behaviour of Z(t, ·) ⋆ u 0 as t → ∞ is described by a multiple of Z(t, x). The corresponding result for the heat equation is well-known, see e.g. [30,Prop. 48.6]. For our strategy of proof we need the following decomposition lemma from [10].
in the distributional sense and We have now the following result.
(ii) Assume in addition that ||x|u 0 | 1 < ∞. Then Moreover, in the limit case p = κ 1 (d) we have Proof. The strategy of the proof is the same as in [44, p. 14, 15].
By Young's inequality it follows that for any 1 ≤ p < κ 1 (d) where we used Theorem 3.4. Hence which is the first part of assertion (ii). The second part follows from (31) by applying Young's inequality for weak L p -spaces ([13, Theorem 1.2.13]). (b) To prove (i) we choose a sequence (η j ) ⊂ C ∞ 0 (R d ) such that R d η j dx = M for all j and η j → u 0 in L 1 (R d ). For each j we have by Part (a) and by Theorem 3.1 and therefore t αd Assertion (i) follows by sending j → ∞.

General subdiffusion equations, L 2 -estimates
by means of harmonic analysis, the main tool being Plancherel's theorem. Under appropriate assumptions on the initial value u 0 (see Kochubei [22]) the solution of (32), (33) is given by In the following we will assume that u is defined by (34).
We turn now to upper estimates.
If (39) where ψ : R + → R + is a continuous function (typically nondecreasing) and the constant C is independent of µ. Then it follows from the proof of Theorem 4.2 that The time-fractional case. We consider the pair (k, l) = (g 1−α , g α ), where α ∈ (0, 1).
Recall that the Laplace transform of g β , β > 0, is given by g β (z) = z −β , Re z > 0, and so it is easy to see that g β1 * g β2 = g β1+β2 for all β 1 , β 2 > 0. In particular (k, l) ∈ PC. We further have and so both k(t) and (1 * l)(t) −1 are of the form c t −α with some constant c > 0. Invoking Theorem 4.2 reproduces the estimates from Corollary 3.2 for noncritical d, that is, Theorem 4.1 shows that this estimate is optimal in the general case.

Example 4.3
The ultraslow diffusion case. We consider the pair Both kernels are nonnegative and nonincreasing, and there holds (see [40,Example 6.5]) Thus (k, l) ∈ PC. There exists a number T 1 > 1 such that see [40,Example 6.5]. This together with Theorem 4.2 yields the logarithmic decay estimate which is optimal, by Theorem 4.1.
Example 4.4 Switching the kernels from the previous example. We consider now the pair From the previous considerations we know already that (k, l) ∈ PC. The kernel k(t) in this example behaves like t −1 as t → ∞, whereas (1 * l)(t) ∼ t/ log t as t → ∞, see [40, Example 6.6]. Thus k(t) decays faster than (1 * l)(t) −1 , so that there is a gap between the decay rates provided by Theorem 4.1 and Theorem 4.2. We claim that which is optimal by Theorem 4.1. What is interesting here is that for d ≤ 3 the decay rate is the same as for the heat equation! The claim follows from the estimate which has been shown in [40, Example 6.6], and from (42) in Remark 4.1. The proof of (43) in [40] is quite involved and makes use of Laplace transform methods.

L r -estimates and the critical dimension case
In this section we continue the study of the general subdiffusion problem (32), (33) under the assumption that the kernel k is of type PC.
Lemma 5.1 Assume that (k, l) ∈ PC. Let t ≥ 0 be fixed. Then the function µ → s(t, µ) belongs to C ∞ (R + ) and in particular s(t, µ) is completely monotone w.r.t. µ. Moreover, Proof. Recall that s(t, µ) = s µ (t) solves the equation Since µ merely appears as coefficient in front of the second term, it is clear that the dependence of the solution s µ (t) on the parameter µ is C ∞ . Differentiating w.r.t. µ gives ∂ µ s µ + µ(∂ µ s µ * l) + s µ * l = 0, which is equivalent to the integro-differential equation The property ∂ µ s µ (0) = 0 follows from the fact that s µ (0) = 1 for all µ ≥ 0. Note also that ∂ µ s µ | µ=0 = −(1 * l)(t). Differentiating (46) w.r.t. µ leads to Differentiating further, a simple induction argument shows that Assertion (44) follows then by means of induction from (47) and the fact that the solution v of is nonnegative, whenever f ∈ L 1, loc (R + ) enjoys this property, see e.g. [40] for the latter property. To see (45), we apply Taylor's theorem to the function µ → s(t, µ), thereby obtaining that for every n ∈ N for some η ∈ ( µ 2 , µ). In view of (44) every summand on the right-hand side of (48) is nonnegative. This implies which in turn yields (45).
Proof. The assertion on the structure of ∂ β ξ m 0 can be proved by induction over |β|. If |β| = 0, then ∂ β ξ m 0 (ξ) = m 0 (ξ) = ψ(|ξ| 2 ), which is of the desired form with j = 0. Suppose now that the assertion is true for all β ∈ N d 0 of the same fixed order . . , d}. By the induction hypothesis, ∂ ξ l ∂ β ξ m 0 is a finite sum of first order partial derivatives w.r.t. ξ l of terms of the form described in (49). Let us consider such a term. If γ l = 0 we have whereas in case γ l > 0 we obtain The first term on the right-hand side of (50) has the desired form, since with γ ′ i := γ i , i = l and γ ′ l = γ l + 1, we have by the induction hypothesis The second term has the desired form as well, since setting The second part of Lemma 5.3 follows from the first one and Lemma 5.2. In fact, for any term T (ξ) of the form (49) Lemma 5.2 yields the estimate It is now evident that m(ξ) satisfies Mihlin's condition with a constant M that merely depends on the dimension d.
Relying on Lemma 5.3, we are now able to prove the following generalization of Theorem 3.3.
Theorem 5.1 Let (k, l) ∈ PC and suppose that u is given by u(t) = Z(t) ⋆ u 0 , where u 0 is as described below.
(iii) We know already that m(ξ) = ψ 1 (|ξ| 2 )[(1 * l)(t)] is an L r (R d )-Fourier multiplier for all r ∈ (1, ∞) with a constant that only depends on r and d, that is, the operator T defined by T f = F −1 (mF f ) (F denoting the Fourier transform) on a suitable dense subset of L r (R d ) is L r (R d )-bounded, thus extends to an operator T ∈ B(L r (R d )), and |T | B(Lr) ≤ M (d, r). The weak L r -spaces can be obtained from the strong ones by real interpolation. Assuming 1 < r < ∞ we may choose r 1 ∈ (1, r), r 2 ∈ (r, ∞), and θ ∈ (0, 1) such that 1 r = 1−θ r1 + θ r2 . By [36, Theorem 1.18.2], we then have (L r1 , L r2 ) θ,∞ = L r,∞ . It follows that T ∈ B(L r,∞ (R d )), with a norm bound that only depends on r and d.
We choose r = d d−2 . Then 1 − 1 r = 2 d , and the Hardy-Littlewood-Sobolev theorem ([13, Thm. 6.1.3]) implies that (−∆) −1 u 0 ∈ L r,∞ (R d ). Note that u 0 ∈ L 1 (R d ) and so we only get an estimate in a weak L r -space. The assertion now follows from (51) with δ = 1 and the fact that T ∈ B(L r,∞ (R d )), with a norm bound that is independent of t > 0.
The following result generalizes Corollary 3.1 and is a direct consequence of the previous theorem. The proof is analogous to the one of Corollary 3.1.
As a special case we obtain the expected weak L 2 -decay estimate in the case of the critical dimension d = 4, which was missing in Theorem 4.2.
Corollary 5.2 Let d = 4, u 0 ∈ L 1 (R d ), and assume that (k, l) ∈ PC. Suppose that u is given by Remark 5.1 Similarly as in Section 4 the decay results from this section might not be optimal if the condition (39) is violated, but can be improved provided one has an estimate (41) where ψ(t) increases faster than (1 * l)(t) as t → ∞. In this case, the above statements from Lemma 5.2, Lemma 5.3, Theorem 5.1, Corollary 5.1, and Corollary 5.2 remain valid when (1 * l)(t) is replaced by ψ(t).

Decay estimates via the energy method
In this section we study a time-fractional diffusion equation with a more general elliptic operator in divergence form. To be precise, we consider the problem Here, we assume α ∈ (0, 1), for all T, R > 0, and ∃ν > 0 such that . We say that a function u : (0, ∞) × R d → R is a global weak solution of (52), (53) if for any T, R > 0, and for any test function Theorem 6.1 Let d ∈ N, α ∈ (0, 1), u 0 ∈ L 1 (R d ) ∩ L 2 (R d ), and suppose that condition (H) is satisfied. Let u be a global weak solution of (52), (53), and assume in addition that |∇u|, u 2 ∈ L 1,loc ([0, ∞); L 1 (R d )). Then The lemma follows from a straightforward computation. We remark that (56) remains valid for singular kernels k, like e.g. k = g 1−α with α ∈ (0, 1), provided that u is sufficiently smooth.
The following result is new and a very useful implication of the fundamental identity.
From this and Hölder's inequality, we infer that for a.a. t ∈ (0, T ) This proves the lemma.
We are now in position to prove Theorem 6.1.
This finishes the proof of Theorem 6.1.
Note that the decay rate for the L 2 -norm in (55) is strictly less than the one we obtained in Section 3, see (29). However, sending d → ∞ the decay rate in (55) becomes α, which is precisely the decay rate for the L 2 -norm in (29) for d > 4. This phenomenon of a smaller decay rate in the variational setting does not occur in the case α = 1. In fact, proceeding similarly as above in the latter case one obtains the differential inequality ∂ t |u(t)| 2 + µ|u(t)| γ 2 ≤ 0, t > 0, with the same γ (> 1) and µ as before. This implies On the other hand, the L p (R d )-norm of the Gaussian heat kernel H(t, x) = (4πt) −d/2 exp(− |x| 2 4t ) decays as |H(t)| p t − d 2 (1− 1 p ) , t > 0, 1 ≤ p ≤ ∞.
Assuming u 0 ∈ L 1 (R d ), Young's inequality implies that the L 2 (R d )-norm of u(t) = H(t) ⋆ u 0 decays as |u(t)| 2 t − d which is precisely the decay rate we obtain by means of energy estimates. It is an interesting open problem whether the decay rate in (55) is optimal in general. If this was not the case how could it be upgraded?