Non-branching geodesics and optimal maps in strong CD(K,{\infty})-spaces

We prove that in metric measure spaces where the entropy functional is K-convex along every Wasserstein geodesic any optimal transport between two absolutely continuous measures with finite second moments lives on a non-branching set of geodesics. As a corollary we obtain that in these spaces there exists only one optimal transport plan between any two absolutely continuous measures with finite second moments and this plan is given by a map. The results are applicable in metric measure spaces having Riemannian Ricci-curvature bounded below, and in particular they hold also for Gromov-Hausdorff limits of Riemannian manifolds with Ricci-curvature bounded from below by some constant.


Introduction
Ricci curvature lower bounds in general metric measure spaces were studied by the second author in [16,17] and at the same time with a similar approach by Lott and Villani in [12]. There the lower bound K ∈ R on the Ricci curvature without a reference to the dimension of the metric measure space was defined as K-convexity of the entropy functional along Wasserstein geodesics (see Section 2 for details). These spaces are called (weak) CD(K, ∞)spaces. The word weak is sometimes used to emphasize that the convexity is required only along one geodesic between any two given probability measures.
A property of the CD(K, ∞)-spaces which complicates the theory is the possibility to have branching geodesics. For example the space R 2 with the l ∞ -norm is a CD(0, ∞)-space and it has lots of branching geodesics. Being the limit as p → ∞ (in any reasonable sense) of the spaces R 2 with the l p -norm -which are non-branching CD(K, ∞)-spaces -this example in particular illustrates that being non-branching is not a stable property. A number of results in CD(K, ∞)-spaces have been proven only under the extra assumption that there are no branching geodesics in the space. Although in some of the results this assumption has recently been removed (see for example [15,14]), in many it still remains.
Because branching geodesics are hard to deal with, it is reasonable to consider more restrictive definitions that exclude spaces with branching geodesics or at least limit the amount of branching that can occur. One of the essential properties of the definitions of Ricci curvature lower bounds in metric measure spaces is the stability under the measured Gromov-Hausdorff convergence. Therefore any stable definition that extends the Riemannian case should include the limit spaces of Riemannian manifolds with uniform Ricci curvature lower bounds. While Riemannian manifolds are known to be non-branching, to our knowledge it is not known whether this holds for their limit spaces. The spaces we consider in this paper cover also these limit spaces. Although our result does not rule out the possibility to have some branching geodesics, it still says that there has to be so few branching geodesics that optimal transports between any two absolutely continuous measures do not see them.
Before stating our main result we fix some terminology. In this paper we will always assume (X, d) to be a complete separable geodesic metric space and m to be a locally finite Borel measure. We refer to Section 2 for some details on optimal mass transportation and CD(K, ∞) condition in such spaces. We call a space (X, d, m) essentially non-branching if for every µ 0 , µ 1 ∈ P 2 (X) which are absolutely continuous with respect to m we have that any π ∈ OptGeo(µ 0 , µ 1 ) is concentrated on a set of non-branching geodesics. The space (X, d, m) is said to be a strong CD(K, ∞)-space if the entropy Ent m is K-convex along every π ∈ OptGeo(µ 0 , µ 1 ) for every µ 0 , µ 1 ∈ P 2 (X). Theorem 1.1. Every strong CD(K, ∞)-space is essentially non-branching.
One of the cases which is covered by Theorem 1.1 are the RCD(K, ∞)-spaces, that is, spaces with Riemannian Ricci curvature bounded from below by some constant K ∈ R. These spaces were recently defined and studied in [3,4], see also [2]. They were obtained by reinforcing the CD(K, ∞)-spaces with a requirement that the local structure of the space must be Hilbertian. The Hilbertian structure immediately rules out spaces like the above mentioned R 2 with the l ∞ -norm. In [4] it was shown that one of the equivalent formulations of RCD(K, ∞)-spaces is that any probability measure with finite second moment is the starting point of an EVI Kgradient flow of the entropy. This condition is known to imply K-convexity of the entropy along every geodesic [8].
In [4] it was also proven that the definition of RCD(K, ∞) with a finite reference measure is stable under the measured Gromov-Hausdorff convergence (or under the D-convergence introduced in [16]). Later in [10] the stability was proven with more general reference measures. When we combine the stability with Theorem 1.1 we arrive at the following corollary which in fact was our main motivation to write this paper.
Corollary 1.2. The RCD(K, ∞) condition is stable under measured Gromov-Hausdorff convergence (or under the D-convergence) and it implies essential non-branching.
As we already mentioned, we are not aware of any other non-branching results even for the Gromov-Hausdorff limits of Riemannian manifolds with Ricci curvature bounded from below by some constant.
Another corollary of our result is that the RCD(K, ∞) condition implies the formulation of CD(K, ∞) that was used by Lott and Villani [12]. They required convexity type inequalities for a class of functionals instead of just Ent m , and hence their definition was at least a priori more restrictive than the definition of CD(K, ∞) we use here, following [16]. Corollary 1.3 is proven exactly as in the non-branching CD(K, ∞)-spaces. For the proof, see for instance [18,Theorem 30.32] or [17,Proposition 4.2]. By inspecting the proof of [9,Theorem 3.3] where the existence of optimal maps in nonbranching CD(K, ∞)-spaces was shown we can make a refinement of the statement of Theorem 1.1. Regarding this refinement, stated in Corollary 1.4, we note that in [9] there was an extra assumption for the measures µ 0 and µ 1 to have finite entropy. This assumption was needed for showing that all absolutely continuous measures with respect to a measure π ∈ OptGeo(µ 0 , µ 1 ) satisfying (2.2) also satisfy (2.2). Here this conclusion is already as an assumption so finiteness of the initial and final entropies are not needed. Corollary 1.4. Let (X, d, m) be a strong CD(K, ∞)-space. Then for every µ 0 , µ 1 ∈ P 2 (X) that are absolutely continuous with respect to m there is a unique π ∈ OptGeo(µ 0 , µ 1 ) and it is induced by a map.
Although Corollary 1.4 follows quite easily from the proof of [9, Theorem 3.3] together with Theorem 1.1 we will give at the end of the paper an outline of the proof together with some details that are different from the proof by Gigli. In the classical situation of Euclidean spaces the existence of optimal transport maps was proven by Brenier in [7]. Since then there have been several generalizations of this result. Most relevant in the context of this paper are the generalizations of McCann [13] for Riemannian manifolds, of Bertrand [6] for Alexandrov spaces, and the most recent results of Gigli [9] in non-branching CD(K, N ) and CD(K, ∞)-spaces and of Ambrosio and the first author [5] in strongly non-branching metric spaces. Notice that as in the approach by Gigli [9] our proof for the existence of optimal transport maps does not use Kantorovich potentials.

Preliminaries
Let us first recall some definitions and results related to optimal mass transportation and Ricci curvature lower bounds using optimal mass transportation. More details on this subject can be found for example in the book by Villani [18].
In this paper we always work in a complete separable geodesic metric space (X, d) equipped with a locally finite Borel measure m.

2.1.
Optimal mass transportation. We write as P(X) the set of all the probability measures defined on the σ-algebra consisting of universally measurable sets of X. We denote by P 2 (X) the subset of P(X) consisting of probability measures with finite second moments. We equip the space P 2 (X) with the Wasserstein 2-distance which for any two measures µ 0 , µ 1 ∈ P 2 (X) is defined as where the infimum is taken over all σ ∈ P 2 (X × X) with µ 0 = (p 1 ) # σ and µ 1 = (p 2 ) # σ.
Here, and later on, p k denotes the projection to the k-th coordinate. We call a plan σ ∈ P 2 (X × X) that minimizes (2.1) an optimal plan. Optimal plans exist under were mild assumptions. In contrast, the existence of optimal maps is rare. By an optimal map we mean Borel T : X → X for which the plan G # µ 0 given by the graph G(x) = (x, T (x)) of T is optimal. For the arguments in this paper, it is crucial to notice that any subplan of an optimal plan is also optimal in the sense that for σ optimal between µ 0 and µ 1 anyσ ≪ σ is optimal between (p 1 ) #σ and µ 1 = (p 2 ) #σ .
Any optimal plan is concentrated on a cyclically monotone set M ⊂ X × X. This means that for any family (x 1 , y 1 ), . . . , (x n , y n ) ∈ M and any permutation p : Intuitively this just means that optimal plans can not be improved.
Our proof is heavily based on restricting a given π ∈ OptGeo(µ 0 , µ 1 ). Without mentioning it every time, we use the fact that for a Borel f : analogously to the restrictions of optimal plans. Another fact which we will repeatedly use is that (restr s t ) # π ∈ OptGeo((e t ) # π, (e s ) # π).

2.3.
Ricci curvature lower bounds. The Ricci curvature lower bounds are defined using the entropy functional Ent m : for any absolutely continuous measure µ = ρm ∈ P(X) for which the positive part of ρ log ρ is integrable. For other measures in P(X) we define Ent m (µ) = ∞. Notice that for an absolutely continuous measure with support in a set of finite m-measure the negative part of ρ log ρ is always integrable. As we will later see we may always assume the measures to be supported in a set of finite m-measure in the proof of Theorem 1.1.
Following the definition in [16] we call a metric measure space (X, d, m) a (weak) CD(K, ∞)space, with some K ∈ R, provided that for any µ 0 , µ 1 ∈ P 2 (X) that are absolutely continuous with respect to m there exists a geodesic (µ t ) ∈ Geo(P 2 (X)) along which Ent m is K-convex, that is holds for all t ∈ [0, 1]. We say that a functional is K-convex along π ∈ OptGeo(µ 0 , µ 1 ) if it is K-convex along the corresponding geodesic (e t ) # π. If the inequality (2.2) is required to hold for all geodesics (µ t ) ∈ Geo(P 2 (X)) with endpoints absolutely continuous with respect to m, the space is called a strong CD(K, ∞)-space.
In the proofs we will need Ent m to be convex along restrictions f π. If the space (X, d, m) would only be a weak CD(K, ∞)-space, K-convexity along restrictions would not be guaranteed. However, in strong CD(K, ∞)-spaces it follows directly from the definition.
The idea in the proof of Theorem 1.1 is to find two sets of geodesics Γ 1 and Γ 2 so that the transport supported on the union of the geodesics is a measure π satisfying the assumption of Theorem 1.1 and so that the two sets of geodesics agree until time t 1 and then branch out so that they become totally separated after time t 2 . The entropies along the set Γ 1 and along the set Γ 2 are illustrated by the solid K-convex graphs. The entropy along Γ 1 ∪ Γ 2 is illustrated by the dashed graph. The non-K-convexity of this graph due to the drop of log 2 in the entropy contradicts the assumption of the theorem.

Proof of Theorem 1.1
The idea of the proof is similar to the proof of [15,Theorem 4]. We prove the claim by contradiction. First we find two geodesics in the Wasserstein space which start as the same geodesic and then branch out to two completely disjoint ones. By comparing the two geodesics separately and on the other hand their sum as one geodesic, we notice that there will be an extra drop of log 2 in the entropy during the time interval when the branching occurs. In order to arrive at a contradiction this branching has to happen in a small enough time interval as in the Figure 1.
There are two difficult steps in the proof before we arrive at the contradiction mentioned above. First of all we have to find the two geodesics that branch out to two completely separate ones. Once we have found them, we have to restrict the measures so that branching happens in a small enough interval. Here the difficulty is that when we restrict the measures, also their marginals change. By choosing the correct restrictions we can overcome this problem.
Let us start the proof with a simple lemma. It allows us to select the two disjoint geodesics needed for the contradiction. The geodesics will be selected using a probability measure on the product space Geo(X) × Geo(X). Notice that the constant 1/5 in the lemma is not sharp, however it is sufficient for our use.
Proof. We will first reduce the general case to the case where X is a finite set. Take ǫ > 0. Because the diagonal {(x, x) : x ∈ X} has zero σ-measure, there exists δ > 0 so that σ (x, y) ∈ X × X : d(x, y) < δ < ǫ. (3.1) Partition the space X into a countable collection of Borel sets (Q i ) i with diameter at most δ/2. Now there exists some n ∈ N so that Therefore, by combining (3.1) and (3.2) we have and so by forgetting a part of the space with arbitrarily small measure we may assume the space X to consist of n points. The existence of the set E in the case where X consists of n points follows easily: There are a total number of 2 n − 2 ways to select a non-empty set E X and for any pair of points (i, j) with i = j there are 2 n−2 sets E ⊂ X with i ∈ E and j / ∈ E. As a consequence and so there has to be a set E ⊂ X with Taking ǫ > 0 sufficiently small finishes the proof.
Now we are ready to continue with the proof of Theorem 1.1. The first reduction steps contain ideas which were used in the proofs of [9, Theorem 3.3] and [15,Theorem 4]. The rest of the proof is then close to that of [15,Theorem 4].
Proof of Theorem 1.1. Let us give the idea behind each step of the proof. We will prove the claim by contradiction, so we have a measure π that does not live on a non-branching set of geodesics. The first step is to restrict the measures in time and space to live inside a sufficiently small ball. This will help in the last step of the proof in estimates involving the extra term coming from K-convexity when K = 0. In the second step we produce via disintegration a measure σ on the product space Geo(X) × Geo(X) which gives positive measure to pairs of geodesics which start as the same but after some time T branch out.
In the third step we pushforward the measure σ on the product space to live on more suitable pairs of geodesics. The pairs of geodesics which we want to avoid are the ones that branch out and come back together infinitely often. This step will be needed only to achieve the reductions in the next step. In the fourth step of the proof we restrict the measure to geodesics which stay disjoint at least for some time δ > 0 (independent of the geodesics) immediately after the branch out for the first time. We also restrict the measure so that its marginals have bounded densities.
In the fifth step we find, for a given ǫ > 0, a time t so that during the time interval [t, t + ǫ] the measure sees lots of branching. In the sixth step we first use Lemma 3.1 to obtain two disjoint sets of geodesics so that their product has large measure. After this we further restrict the measure so that we can project the product measure to two measures π up and π down which have disjoint supports at time t + ǫ. Finally in the last step, step seven we compare the entropies along π up , π down and π up + π down to obtain a contradiction which we already mentioned in the beginning of this section and in the Figure 1.
Step 1: Localization to a small ball. Assume that Theorem 1.1 is not true. Then there exist µ 0 , µ 1 ∈ P 2 (X) and an optimal π ∈ OptGeo(µ 0 , µ 1 ) that is not concentrated on non-branching geodesics. Because the space (X, d) is separable and the measure m locally finite, we can cover (X, d) with a countable collection of balls B(x i , l i /4) so that and m(B(x i , l i )) < ∞. Since π is not concentrated on non-branching geodesics there exist some i ∈ N and L > l i so that π has some branching inside B(x i , l i /4) along geodesics with length at most L. That is, π(Γ) < 1 for every Γ ⊂ Geo(X) satisfying the following: if γ 1 , γ 2 ∈ Γ so that γ 1 s = γ 2 s for all s ∈ [0, t 0 ] with γ 1 t 0 ∈ B(x i , l i /4) and l(γ 1 ) ≤ L, then γ 1 s = γ 2 s also for all s for which γ 1 s ∈ B(x i , l i /4) or γ 2 s ∈ B(x i , l i /4). Therefore there exists t 1 ∈ (0, 1 − l i /(4L)) so that π r = (restr with π(Γ r ) > 0 defined as is not concentrated on non-branching geodesics. Now the measure π r is supported on a set of geodesics that live inside the ball B(x i , l i /2). Thus without loss of generality, we may assume from the beginning that the original measure π is concentrated on geodesics living inside some ball B(x, l/2) ⊂ X with l having the same bound from above as l i in (3.3) and m(B(x, l)) < ∞.
Step 2: Disintegrating the branching measure. We claim that from the assumption that π is not concentrated on a non-branching set of geodesics we know that there exists some T ∈ (0, 1) so that the measure π γ is not a Dirac mass for a (restr T 0 ) # π-positive set of curves γ ∈ Geo(X), where {π γ } ⊂ P(Geo(X)) is the disintegration of π with respect to restr T 0 . The measure π γ not being a Dirac mass means that it is concentrated on geodesics which coincide with γ in [0, T ] and that a set of π γ -positive measure of geodesics do branch after time T . We prove the claim by contradiction. Assume that it is not true so that for any T ∈ (0, 1) the measures π γ are Dirac for (restr T 0 ) # π-almost every curve γ ∈ Geo(X). Take ǫ > 0. Since Geo(X) is separable there exists a countable Borel decomposition of Geo(X) into disjointed sets (A i,ǫ ) i∈I with diam(A i,ǫ ) < ǫ. For each i ∈ I the map M i,ǫ,T : Geo(X) → [0, 1] : γ → π γ (A i,ǫ ) is Borel measurable and so M −1 i,ǫ,T ({1}) is a Borel set. Since π γ are probability measures, (M −1 i,ǫ,T ({1})) i∈I are disjointed. By the assumption that almost every π γ is a Dirac mass the Borel set Figure 2. The measure σ is constructed by integrating up the measures π γ × π γ . Since branching for π γ occurs only after time T , all the measures π γ × π γ and hence also the measure σ live on the diagonal of X × X until time T .
Consider a measure σ ∈ P(Geo(X)×Geo(X)) obtained by integrating up the disintegrated measure π γ as a product measure π γ × π γ . That is, for any Borel measurable f : The measures π γ × π γ are illustrated in Figure 2.
Step 3: Reducing the number of branching points of pairs of geodesics. It might happen that part of the measure σ lives on pairs of geodesics (γ 1 , γ 2 ) for which the set {t ∈ [0, 1] : γ 1 t = γ 2 t } consists of infinitely many intervals. We want to exclude this behaviour so that branching geodesics stay disjoint for a positive time immediately after branching. We will do this by a measurable selection.
We start by defining the subsets D, B, B 1 of Geo(X) 2 . The first one is the diagonal set D := {(γ 1 , γ 2 ) ∈ Geo(X) 2 : γ 1 = γ 2 }, the second one the set of pairs of geodesics branching after time T B := {(γ 1 , γ 2 ) ∈ Geo(X) 2 \ D : restr T 0 γ 1 = restr T 0 γ 2 } and the final one the set of pairs of geodesics that branch exactly once after time T B 1 := (γ 1 , γ 2 ) ∈ B : the set {t ∈ [0, 1] : γ 1 t = γ 2 t } is an interval . The subsets D and B are clearly Borel. Let us show that B 1 is also Borel. For every ǫ, t 1 , t 2 > 0 define the closed sets ) and let End −1 : End(B 1 ) → B 1 be its Suslin measurable right-inverse given by the Jankoff theorem [11]. (The set B 1 is a Suslin space as a Borel subset of a Polish space. The mapping End is continuous and thus End(B 1 ) is also Suslin space. Suslin subsets of a Polish space are universally measurable and so Suslinmeasurability suffices for our considerations.) Now consider the Suslin measurable map Br : Geo(X) 2 → Geo(X) 2 given by otherwise.
Step 4: Further restrictions of the measure σ. Next we will show that by restricting and rescaling the measure σ we may assume that there exist S ∈ (T, 1), δ ∈ (0, (1 − S)/2) and C > 0 so that Br Figure 3. The mapping Br takes a branching pair of geodesics (γ 1 , γ 2 ) to another pair of geodesics (γ 3 , γ 4 ) that have the same endpoints, but branch only once.
Step . Roughly speaking f (t) gives the amount of branching that will occur after time t.
Step 7: Comparing entropies along different geodesics. Now, similarly as in [15] we can estimate using the K-convexity first along the measure (π up + π down )/(2w) between times 0, t and t + ǫ, and then separately along π up /w and π down /w between times t, t + ǫ and 1 to get Remark 3.2. In the proof of Theorem 1.1 we worked with two fixed marginals µ 0 , µ 1 . Therefore the assumption in Theorem 1.1 of being a strong CD(K, ∞)-space could be weakened. The slightly stronger (but more complicated-looking) version of Theorem 1.1 would be the following.

Proof of Corollary 1.4
Let us now outline the proof of Corollary 1.4. As usual, it suffices to prove that every optimal plan is given by a map. This implies uniqueness of optimal plans because for any π 1 , π 2 ∈ OptGeo(µ 0 , µ 1 ) also 1 2 (π 1 + π 2 ) ∈ OptGeo(µ 0 , µ 1 ). Suppose then that there are two µ 0 , µ 1 ∈ P 2 (X) which are absolutely continuous with respect to m and that there is π ∈ OptGeo(µ 0 , µ 1 ) which is not induced by an optimal map.
Because µ 0 = ρ 0 m is absolutely continuous, the union C≥0 Γ C , Γ C = {γ ∈ Geo(X) : ρ 0 (γ 0 ) ≤ C} has full π-measure. Therefore for some C ≥ 0 the measure π| Γ C is not induced by a map. Therefore we may assume µ 0 to have bounded density. Similarly we may assume µ 1 to have bounded density. Emptying the space X with larger and larger balls we may also assume µ 0 and µ 1 to have bounded supports. Therefore we may assume Ent m (µ 0 ), Ent m (µ 1 ) ∈ R.
In the proof of [9, Theorem 3.3] Gigli finds two probability measures π 1 , π 2 ≪ π with π 1 ⊥ π 2 and (e 0 ) # π 1 = (e 0 ) # π 2 = m| D for a compact D ⊂ X with m(D) > 0. But by [9,Lemma 3.2], which holds also under the strong CD(K, ∞) assumption, we have lim inf for the densities ρ i t of (e t ) # π i . Therefore for some small time t ∈ (0, 1) the sets {ρ 1 t > 0} and {ρ 2 t > 0} must intersect in a set E of positive m-measure. So far we have followed the proof of [9,Theorem 3.3]. Now the final step in the case of non-branching CD(K, ∞)-spaces would be to say that no two different geodesics in the support of an optimal transport can intersect. In the essentially non-branching spaces this conclusion is not so clear, so we argue differently.
The heuristic idea is to mix the measures π 1 and π 2 so that at time t we are allowed to change from the geodesics where π 1 lives to the geodesics where π 2 lives, and vice versa.

Now let {π left
x } be the disintegration of π left with respect to e 1 and let {π right x } be the disintegration of π right with respect to e 0 .
Observe that the mapping