Mean square rate of convergence for random walk approximation of forward-backward SDEs

Abstract Let (Y, Z) denote the solution to a forward-backward stochastic differential equation (FBSDE). If one constructs a random walk $B^n$ from the underlying Brownian motion B by Skorokhod embedding, one can show $L_2$-convergence of the corresponding solutions $(Y^n,Z^n)$ to $(Y, Z).$ We estimate the rate of convergence based on smoothness properties, especially for a terminal condition function in $C^{2,\alpha}$. The proof relies on an approximative representation of $Z^n$ and uses the concept of discretized Malliavin calculus. Moreover, we use growth and smoothness properties of the partial differential equation associated to the FBSDE, as well as of the finite difference equations associated to the approximating stochastic equations. We derive these properties by probabilistic methods.


Introduction
Let ( , F, P) be a complete probability space carrying the standard Brownian motion B = (B t ) t≥0 and assume that (F t ) t≥0 is the augmented natural filtration. Let (Y, Z) be the solution of the forward-backward stochastic differential equation (FBSDE) Let (Y n , Z n ) be the solution of the FBSDE if the Brownian motion B is replaced by a scaled random walk B n given by where h = T n and (ε i ) i=1,2,... is a sequence of independent and identically distributed (i.i.d.) Rademacher random variables. Then (Y n , Z n ) solves the discretized FBSDE Z n r− dB n r , 0 ≤ s ≤ T.
The aim of this paper is to study the rate of the L 2 -approximation of (Y n t , Z n t ) to (Y t , Z t ) when X satisfies (1). For this, we generate the random walk B n by Skorokhod embedding from the Brownian motion B. In this case the L p -convergence of B n to B is of order h 1 4 for any p > 0. The special case X = B has already been studied in [22], assuming a locally α-Hölder continuous terminal function g and a Lipschitz continuous generator. An estimate for the rate of convergence was obtained which is of order h α 4 for the L 2 -norm of Y n t − Y t , and of order h α 4 √ T−t for the L 2 -norm of Z n t − Z t . In the present paper, where we assume that X is a solution of the stochastic differential equation (SDE) in (1), rather strong conditions on the smoothness and boundedness of f and g and also of b and σ are needed. In Theorem 3.1, the main result of the paper, we show that the convergence rate for (Y n t , Z n t ) to (Y t , Z t ) in L 2 is of order h The paper is organized as follows. Section 2 contains the setting, the main assumptions, and the approximative representationẐ n of Z n . Our main results about the approximation rate for the case of no generator (i.e. f = 0) and for the general case are in Section 3. One can see that in contrast to what is known for time discretization schemes, for random walk schemes the Lipschitz generator seems to cause more difficulties than the terminal condition: while in the case f = 0 we need that g is locally α-Hölder continuous, in the case f = 0 this property is required for g . In Section 4 we recall some needed facts about Malliavin weights, the regularity of solutions to BSDEs, and properties of the associated partial differential equations (PDEs). Finally, we sketch how to prove growth and smoothness properties of solutions to the finite difference equation associated to the discretized FBSDE. Section 5 contains technical results which mainly arise from the fact that the construction of the random walk by Skorokhod embedding forces us to compare our processes on different 'timelines', one coming from the stopping times of the Skorokhod embedding, the other from the equidistant deterministic times due to the quadratic variation process [B n ].

The SDE and its approximation scheme
We introduce and its discretized counterpart where (ε i ) i=1,2,... is a sequence of i.i.d. Rademacher random variables. Letting it follows that the associated discrete-time random walk (B n t k ) n k=0 is (G k ) n k=0 -adapted. Recall (2) and h = T n . If we extend the sequence (X n t k ) k≥0 to a process in continuous time by defining X n t := X n t k for t ∈ [t k , t k+1 ), it is the solution of the forward SDE (3). We formulate our first assumptions. Assumption 2.1(ii) will not be used explicitly for our estimates, but it is required for Theorem 4.1 below. (i) g is locally Hölder continuous with order α ∈ (0, 1] and polynomially bounded in the following sense: there exist p 0 ≥ 0, C g > 0 such that Notice that (6) implies for some K > 0. From the continuity of f we conclude that Notation: • · p := · L p (P) for p ≥ 1. For p = 2 we write simply · .
• If a is a function, C(a) represents a generic constant which depends on a and possibly also on its derivatives.

The FBSDE and its approximation scheme
Recall the FBSDE (1) and its approximation (3). The backward equation in (3) can equivalently be written in the form if one puts X n r := X n t m , Y n r := Y n t m , and Z n r := Z n t m for r ∈ [t m , t m+1 ). Remark 2.1. Equations (3) and (9) do not contain any martingale orthogonal to the random walk B n , since we are in a special case where the orthogonal martingale is zero (see [7, p. 3] or [34, Proposition 1.7.5]). Indeed, for the symmetric simple random walk B n the predictable representation property holds; i.e. for any G n -measurable (see (5)) random variable ξ = F(ε 1 , . . . , ε n ) there exists a representation Put c = E[F(ε 1 , . . . , ε n )]. Our aim is to determine a G m−1 -measurable h m such that By the tower property it holds that One can derive an equation for Z n = (Z n t k ) n−1 k=0 if one multiplies (9) by ε k+1 and takes the conditional expectation with respect to G k , so that where E G k := E( · |G k ). Remark 2.2. For n large enough, the BSDE (3) has a unique solution (Y n , Z n ) (see [36,Proposition 1.2]), and (Y n t k , Z n t k ) n−1 k=0 is adapted to the filtration (G k ) n−1 k=0 . 2.2.1. Representation for Z. We will use the following representation for Z, due to Ma and Zhang (see [30,Theorem 4.2]): where E t := E( · |F t ), and for all s ∈ (t, T], we have (cf. Lemma 4.1) where ∇X = (∇X s ) s∈[0,T] is the variational process; i.e., it solves with (X s ) s∈[0,T] given in (1).

Remark 2.3.
In the following we will assume that g exists. In such a case we have the following representation for Z: 2.2.2. Approximation for Z n . In this section we state the discrete counterpart to (11), which, in the general case of a forward process X, does not coincide with Z n (given by (10)). In contrast to the continuous-time case, where the variational process and the Malliavin derivative are connected by ∇X t ∇X s = D s X t σ (s,X s ) (s ≤ t), we cannot expect equality for the corresponding expressions if we use the discretized versions of the processes (∇X t ) t and (D s X t ) s≤t introduced in (16). This counterpartẐ n to Z is a key tool in the proof of the convergence of Z n to Z. As we will see in the proof of Theorem 3.1, the study of Z n t k − Z t k goes through the study of Z n t k −Ẑ n t k and Ẑ n t k − Z t k . Before defining the discretized versions of (∇X t ) t and (D s X t ) s≤t , we briefly introduce the discretized Malliavin derivative. We refer the reader to [4] for more information on this topic.
For any ξ = F(ε 1 , . . . , ε n ), the discretized Malliavin derivative is defined by If D n k X n t −1 = 0, the second ':=' holds as an identity. We are now able to define the discretized versions of (∇X t ) t and (D s X t ) s≤t .
, we can show that the difference of these terms converges in L p (see Lemma 5.4).

741
(ii) With the notation introduced above, (10) can be rewritten as In order to define the discrete counterpart to (11), we first define the discrete counterpart to T] given in (12): Notice that there is some constant Definition 2.4. (Discrete counterpart to (14).) Let the processẐ n = (Ẑ n t k ) n−1 k=0 be defined bŷ ) also could have been used, but since we will assume that g exists, we work with the correct term.
The study of the convergence E G 0,x |Z n t k −Ẑ n t k | 2 requires stronger assumptions on the coefficients b, σ , f , and g. Assumption 2.3. Assumptions 2.1 and 2.2 hold. Additionally, we assume that all first and second derivatives with respect to the variables x, y, z of b(t, x), σ (t, x), and f(t, x, y, z) exist and are bounded Lipschitz functions with respect to these variables, uniformly in time. Moreover, g satisfies (6).

Proposition 2.1. If Assumption 2.3 holds, then
Proof. According to [7, Proposition 5.1] one has the representations where u n is the solution of the finite difference equation (44) with terminal condition u n (t n , x) = g (x). Notice that by the definition of D n m+1 in (15) the expression D n m+1 u n (t m+1 , X n t m+1 ) depends in fact on X n t m . Hence we can put (20) and (17) we conclude the following (we use E := E G 0,x for · ): With the notation introduced in Definition 2.2 applied to F n , For A 1 we use Definition 2.2 again and exploit the fact that is locally α-Hölder continuous according to (63). By Hölder's inequality and Lemma 5.4 Parts (i) and (iii), For the estimate of A 2 we notice that by our assumptions the The second expression on the right-hand side of (22) is bounded by C(b, σ, T, δ)h 1 2 as a consequence of Lemma 5.4 Parts (ii) and (iii). To show that the first expression is also bounded by C(b, σ, T, δ)h 1 2 , we rewrite it using (16) and get We take the L 4 -norm of (23) and apply the Burkholder-Davis-Gundy (BDG) inequality and Hölder's inequality. The second term on the right-hand side of (23) will be used for Gronwall's lemma, while the first and last terms can be bounded by C(b, σ, T)h 1 2 , using Lemma 5.4(iii). For the last term we also use the Lipschitz continuity of b x and σ x in space and Lemma 5.4(i).

Main results
In order to compute the mean square distance between the solution to (1) and the solution to (3), we construct the random walk B n from the Brownian motion B by Skorokhod embedding. Let We will denote by E τ k the conditional expectation with respect to F τ k := G k . In this case we also use the notation X τ k := X n t k for all k = 0, . . . , n, so that (4) turns into Assumption 3.1. We assume that the random walk B n in (3) is given by where the τ k , k = 1, . . . , n, are taken from (24).
Remark 3.1. Note that for p > 0 there exists a C(p) > 0 such that for all k = 1, . . . , n it holds that The upper estimate is given in Lemma 5.
Proposition 3.1 states the convergence rate of (Y n v , Z n v ) to (Y v , Z v ) in L 2 when f = 0, and Theorem 3.1 generalizes this result to any f which satisfies Assumption 2.3. Assumptions 2.1 and 3.1 hold. If f = 0 and g ∈ C 1 is such that g is a locally α-Hölder continuous function in the sense of (6), then for all 0 ≤ v < T, we have (for sufficiently large n) that

Remark 3.2.
As observed above, the filtration G k coincides with F τ k , for all k = 0, . . . , n. The expectation E 0,x appearing in Proposition 3.1 and in Theorem 3.1 is defined on the probability space ( , F, P).

Remark 3.3.
In order to avoid too much notation for the dependencies of the constants, if for example only g is mentioned and not C g , this means that the estimate might depend also on the bounds of the derivatives of g.
From (25) one can see that the convergence rates stated in Proposition 3.1 and Theorem 3.1 are the natural ones for this approach. The results are proved in the next two sections. In both proofs, we will use the following remark.

Remark 3.4.
Since the process (X t ) t≥0 is strong Markov, we can express conditional expectations with the help of an independent copy of B denoted byB. For example, (we defineτ k := 0,τ j := inf{t >τ j−1 : |B t −Bτ j−1 | = √ h} for j ≥ 1, and τ n := τ k +τ n−k for n ≥ k). In fact, to represent the conditional expectations E t k and E τ k , we work here withẼ and the Brownian motions B and B , respectively, given by

Proof of Proposition 3.1: the approximation rates for the zero generator case
To shorten the notation, we use E := E 0,x . Let us first deal with the error of Y. 1 2 (since α = 1 can be chosen when g is locally Lipschitz continuous). It remains to bound Finally, we get by Lemma 5.2(v) that Let us now deal with the error of Z. We use For the first term we get by the assumption on g and Lemma 5.2 Parts (i) and (iii) that We compute the second term using Z n t k as given in (17). Hence, with the notation from Definition 2.2, We insert ±Ẽ(g (k+1,n+1) x ∇X t k ,X t k t n ) and get by the Cauchy-Schwarz inequality that For the estimate ofẼ|∇X t k ,X t k t n | 2 we use Lemma 5.2. Since g satisfies (6) we proceed with we use Lemma 5.4 and Lemma 5.2(v). For the last term in (29) we notice that , and by Lemma 5.4,

Proof of Theorem 3.1: the approximation rates for the general case
Let u : [0, T) × R → R be the solution of the PDE (38) associated to (1). We use the representations Y s = u(s, X s ) and Z s = σ (s, X s )u x (s, X s ) stated in Theorem 4.2 and define From (1) and (3) we conclude We use and estimate the expressions on the right-hand side. For the function F defined in (30) we use Assumption 2.3 (which implies that (6) holds for α = 1) to derive by Theorem 4.2 and the mean value theorem that for x 1 , By (7), standard estimates on (X s ), Theorem 4.1(i), and Proposition 4.1 for p = 2, we immediately get For the estimate of d 2 one exploits and then uses (31) and Lemma 5.2(v). This gives For d 3 we start with Jensen's inequality and then continue similarly as above to get and for the last term we get This implies where C(b, σ, f , g, T, p 0 , δ).
For Z t k − Z n t k we use the representations (14) and (17), the approximation (20), and Proposition 2.1. Instead of N n,t k t n we will use here the notation N n,τ k τ n to indicate its measurability with respect to the filtration (F t ). It holds that For the terminal condition, Proposition 3.1 provides We continue with the generator terms and use F defined in (30) to decompose the difference As before, we rewrite the conditional expectations with the help of the independent copyB. Then We apply the conditional Hölder inequality, and from the estimates (37) and , since for 0 ≤ t < s ≤ T we have by Theorem 4.1 and Proposition 4.1 that For the estimate of t 2 , Lemma 5.2, Lemma 5.3, (31), and (37) yield For t 3 we use the conditional Hölder inequality, (31), (19), and Lemma 5.2: The term t 4 can be estimated as follows: Finally, for the remaining term of the estimate of Z t k − Z n t k , we use (35) and (37) to get

751
Consequently, from (33), (34), and the estimates for the remaining term and for t 1 , . . . , t 4 , it follows that Then we use (32) and the above estimate to get Consequently, summarizing the dependencies, there is a C = C(b, σ, f , g, T, p 0 , δ) such that By Theorem 4.1 (note that by Assumption 2.3 on g we have α = 1) it follows that while Proposition 4.1 implies that and hence we have

Malliavin weights
We use the SDE from (1) started in (t, x),   (36), then setting where (F t r ) r∈ [t,T] is the augmented natural filtration of (B t,0 r ) r∈ [t,T] , and ∇X t,x s is given in (13). Moreover, for q ∈ (0, ∞) there exists a κ q > 0 such that and we have

Regularity of solutions to BSDEs
The following result originates from [20,Theorem 1], where path-dependent cases were also included. We formulate it only for our Markovian setting but use P t,x since we are interested in an estimate for all (t, x) ∈ [0, T) × R. A sketch of a proof of this formulation can be found in [22].
(ii) There exists a constant C z 4.1 > 0 such that for 0 ≤ t < s < T and x ∈ R,

Properties of the associated PDE
The theorem below collects properties of the solution to the PDE associated to the FBSDE (1). For a proof see [43,Theorem 3.2], [41], and [ we have the following: (8), where c 1 4.2 depends on L f , K f , C g , T, and p 0 , as well as on the bounds and Lipschitz constants of b and σ .

(ii) (a) ∂ x u exists and is continuous in
where c 2 4.2 depends on L f , K f , C g , T, p 0 , and κ 2 = κ 2 (b, σ, T, δ), as well as on the bounds and Lipschitz constants of b and σ , and hence c 2 x u exists and is continuous in where c 3 4.2 depends on L f , C g , T, p 0 , κ 2 = κ 2 (b, σ, T, δ), C y 4.1 , and C z 4.1 , as well as on the bounds and Lipschitz constants of b and σ , and hence c 3 4.2 = c 3 4.2 (L f , K f , C g , b, σ, T, p 0 , δ). Using Assumption 2.3, we are now in a position to improve the bound on Z s − Z t L p (P t,x ) given in Theorem 4.1.  It is well-known (see e.g. [19]) that the solution ∇Y of the linear BSDE can be represented as where r := (r, X r , Y r , Z r ), s denotes the adjoint process given by whereB denotes an independent copy of B. Notice that ∇X t,x t = 1, so that Then, by (39), Since (∇Y s , ∇Z s ) is the solution to the linear BSDE (40) with bounded f x , f y , f z , we have that ∇Y t L 2p (P t,x ) ≤ C(b, σ, f , g, T, p). Obviously, X t,x s − x L 2p (P t,x ) ≤ C(b, σ, T, p)(s − t) 1

. So it remains to show that
We intend to use (41) in the following. There is a certain degree of freedom in how to connect B andB in order to compute conditional expectations. Here, unlike in (27) Since f y and f z are bounded we haveẼ|˜ s,X s r | q +Ẽ|˜ t,x r | q ≤ C(f , T, q). Similarly to (31), since f x , f y , f z are Lipschitz continuous with respect to the space variables, |f x (˜ s,X s r ) − f x (˜ t,x r )| = |f x (r,X s,X s r , u(r,X s,X s r ), σ (r,X s,X s r )u x (r,X s,X s r )) − f x (r,X t,x r , u(r,X t,x r ), σ (r,X t,x r )u x (r,X t,x r ))| ≤ C(c 2,3 4.2 , σ, f , T)(1 + |X s,X s r | p 0 +1 + |X t,x r | p 0 +1 ) |X s,X s r −X t,x r | (T − r) 1 2 , so that Lemma 5.2 yields E|f x (˜ s,X s r ) − f x (˜ t,x r )| q ≤ C(c 2,3 4.2 , b, σ, f , T, p 0 , q)(1 + |X s | p 0 +1 + |x| p 0 +1 ) q |X s − x| q + |s − t| q 2 (T − r) 1 2 .
The same holds for |f y (˜ s,X s r ) − f y (˜ t,x r )| and |f z (˜ s,X s r ) − f z (˜ t,x r )|. Applying these inequalities and Gronwall's lemma, we arrive at Ẽ [˜ s,X s T −˜ t,x T ] p ≤C(c 2,3 4.2 , b, σ, f , g, T, p 0 , p) (x)|s − t| 1 2 for p > 0. For J 2 ≤ C(t − s) it is enough to realise that the integrand is bounded. The estimate for J 3 follows similarly to that of J 1 .
While for the solution to the PDE (38) one can observe in Theorem 4.2 the well-known smoothing property which implies that u is differentiable on [0, T) × R even though g is only Hölder continuous, in the following proposition, for the solution u n to the finite difference equation we have to require from g the same regularity as we want for u n .