A New Augmented Lagrangian Approach for L 1-mean Curvature Image Denoising

Variational methods are commonly used to solve noise removal problems. In this paper, we present an augmented Lagrangian-based approach that uses a discrete form of the L-norm of the mean curvature of the graph of the image as a regularizer, discretization being achieved via a finite element method. When a particular alternating direction method of multipliers is applied to the solution of the resulting saddle-point problem, this solution reduces to an iterative sequential solution of four subproblems. These subproblems are solved using Newton’s method, the conjugate gradient method, and a partial solution variant of the cyclic reduction method. The approach considered here differs from existing augmented Lagrangian approaches for the solution of the same problem; indeed, the augmented Lagrangian functional we use here contains three Lagrange multipliers “only,” and the associated augmentation terms are all quadratic. In addition to the description of the solution algorithm, this paper contains the results of numerical experiments demonstrating the performance of the novel method discussed here.

Actually, as pointed out by [38] (see also [3], [12], and [51]), the ROF model has some significant drawbacks, such as the loss of image contrast, the smearing of corners, and the staircase effect. To remedy these unfavorable properties, several cures have been proposed (see [51] for a list of related references), among them the one introduced in [51], namely, instead of (1.1) with J defined by (1.3) use the following model: (1.5) commonly known these days as the L 1 -mean curvature denoising model; s ≥ 1, s = 2 being the most common choice. The fidelity term in (1.5) is of the simplest form compared to the proposed formulations. In particular, as depicted in [40], one can, via adding two linear transformations to this model, address other important image processing tasks related to deconvolution, inpainting, and superresolution. The rationale for (1.5) is discussed in much detail in [51]. However, to the best of our knowledge, the variational problem (1.5) is not yet fully understood mathematically due to the nonconvexity and nonsmoothness of the functional in (1.5) and to the fact that a natural choice for V is not clear. Concerning this last issue, taking V = BV (Ω) is suggested in [51]. We have, however, a problem with such a choice since we think that the definition of V has to include conditions on the second derivatives. We can only hope that a variant of [31] dedicated to problem (1.5) will appear in the near future.
Considering the above situation, our goals in this paper are more modest and purely finite dimensional algorithmic. They can be summarized as follows: 1. Taking advantage of the relative simplicity of the formalism of the continuous problem, we derive in section 2 a (necessarily formal) augmented Lagrangian algorithm. Our algorithm is a simplified variant of the one considered in [52] since we use a projection on a nonconvex set to treat a nonlinear constraint instead of treating it by penalty-duality, which would imply one extra augmentation functional and the related Lagrange multiplier. Thus, our algorithms involve three augmentation terms instead of four and three Lagrange multipliers instead of four. Indeed, when several Lagrange multipliers are used, one of the main issues is their adjustment to optimize convergence. We will return to the details of this reduction in section 2. 2. Taking advantage of the augmented Lagrangian algorithm described in section 2, we define in section 3 a discrete version of this algorithm to be applied to a (kind of) mixed finite element approximation of problem (1.5). We choose finite element methods for approximating the problem instead of finite differences since, when applied on uniform triangulations (like the one in Figure 1), in particular, these finite element methods automatically generate finite difference approximations with, among other attributes, good accuracy, stability, and monotonicity properties. Moreover, the variational derivation of Galerkin/finite element approximations (like the one we use here) makes them the perfect match for the solution of problems from Calculus of Variations, such as (1.1) with J defined by (1.3) and (1.5) (see [24] for other examples). Another advantage of finite element approximations is their ability to handle nonuniform meshes, adaptively or not, which may be of interest for some applications. More precisely, the minimization of the discrete counterpart of the functional in (1.5) is transformed into the iterative sequential solution of four subproblems: one being smooth but nonlinear in R 2 , one purely explicit vertexwise, and two linear with positive definite and symmetric coefficient matrices of a scalar-(assuming s = 2) and vector-valued type. 3. In section 5 we apply the algorithms defined in section 4 to the solution of a variety of test problems of variable complexity in order to evaluate the capability of the methodology discussed in this paper. The actual solution process is somewhat simplified by assuming that s = 2 (which was also recommended by Zhu and Chan [51]).

Augmented Lagrangian formulation and basic solution algorithm.
2.1. Some preliminary observations. Despite the fact that problem (1.5) is not fully understood mathematically, we are going to take advantage of the simplicity of its formalism to derive a formal solution algorithm of the augmented Lagrangian type. This algorithm will be useful since in section 3 we will take it as a model to define a finite dimensional analogue dedicated to the solution of a finite element approximation of problem (1.5).
Actually, augmented Lagrangian techniques have a well-established role in analyzing constrained optimization problems as well as in deriving general solution algorithms for such problems [5,10,22,25,29,32]. With BV-regularization a reformulation with an augmented Lagrangian method can introduce one or two new variables to deal with ∇u in the nonsmooth regularization term (e.g., [15,48,40] and the references provided in the reviews therein) or additionally to represent u in the fidelity term [11,50]. In the latter case, three subproblems (alternating directions) typically appear and are then solved using linear solvers, explicit formulae (projection or shrinking), and nonlinear optimization methods (for nonsmooth fidelity). Moreover, when more than one additional variable is introduced, one typically applies varying regularization parameters for the penalized constraint (e.g., [11,50]). More examples where many variational formulations (including Euler's elastica) for image processing have been efficiently treated using augmented Lagrangian approaches can be found in, e.g., [20,30,46,50].
As already mentioned in section 1, an augmented Lagrangian algorithm was used in [52] for the solution of (1.5). The augmented Lagrangian approach we apply in this paper is of the ALG-2 type [22,29] (better known as ADMM for alternating direction methods of multipliers). The basic idea of ADMM and the convergence proof in a convex situation were presented in the 1970s by Glowinski and Marrocco [27] and Gabay and Mercier [23]. Augmented Lagrangian methods, with partly similar ingredients for the solution of other challenging problems, are described, e.g., in [8,16,17,19,18,21,28]; most of these problems are nonconvex, as is the one discussed here.
The solution method in [52] is also of the ALG-2 (or ADMM) type, but it uses a different (and, we think, more complicated) augmented Lagrangian functional as the functional involves four augmentation functionals and four Lagrange multipliers. What made us uncomfortable was the Lagrange multiplier treatment of the nonlinear nonsmooth condition |q 1 |−q 1 ·q 2 = 0, whereq 1 andq 2 belong (formally) to (L 2 (Ω)) 3 . The related condition in our approach is where q 1 and q 2 belong (formally) to (L 2 (Ω)) 2 . The discrete analogue of the aforementioned Downloaded 01/16/15 to 130.234.161.67. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php Copyright © by SIAM. Unauthorized reproduction of this article is prohibited. equality will be treated by projection, avoiding those two terms associated with it in the augmented Lagrangian and reducing to three the number of Lagrange multipliers. This is a notable simplification considering that the adjustment of these parameters is one of the main issues associated with ADMM-type methods (it is discussed in [19] in a particular case). Also, in our approach, all the constraints treated by penalty-duality are linear, which is not the case in [52] (one of the constraints there is not only nonlinear but also nonsmooth).
Remark 1. At present, there is no available theory (as far as we know) for the convergence of ADMM methods for nonconvex problems, even in a finite dimension. Currently, the most popular publication concerning augmented Lagrangian and ADMM algorithms is certainly [7], a large review article (>100 pages) uniquely concerned with finite dimensional problems. The part of the article dedicated to nonconvex problems covers four (inconclusive) pages, suggesting that convergence proofs are difficult to obtain in most nonconvex cases. However, various investigations concerning nonconvex problems and comparisons with known solutions have shown the capability of augmented Lagrangian methods at solving nonconvex problems (as shown, for example, in [18] and [29]). This certainly encouraged us to apply this methodology to the solution of problem (1.5).

An augmented Lagrangian formulation.
Assuming that a minimizer exists, the minimization problem (1.5) is clearly equivalent to where H(Ω; div) = q ∈ (L 2 (Ω)) 2 : ∇· q ∈ L 2 (Ω) . (2.5) Remark 2. In section 1, we mentioned that the choice of V in the denoising model (1.5) is a critical issue. Actually, a reasonable candidate is (for its simplicity) the Sobolev space W 2,1 (Ω) since the two terms defining the cost functional in (1.5) make sense in that space (we recall that from the Rellich-Kondrachov compact imbedding theorem the injection of W 2,1 (Ω) in L q (Ω) is compact ∀q ∈ [1, +∞)). From a practical (but formal) point of view, there is an advantage to taking V = H 2 (Ω)(= W 2,2 (Ω)) for the following reasons: (i) H 2 (Ω) is a Hilbert space; (ii) H 2 (Ω) being dense in W 2,1 (Ω), one obtains the same infimium when one minimizes the cost functional in (1.5) over these two spaces; and (iii) the above two spaces lead to the same discrete minimization problem.
Let us define With problem (2.2) we associate the following augmented Lagrangian functional L r : Υ → R (with r = (r 1 , r 2 , r 3 ), r i > 0 ∀i = 1, 2, 3): where (q 1 , q 2 ) ∈ E 12 . Now, suppose that the augmented Lagrangian L r has a saddle-point It can be easily shown that if ω is a saddle-point of L r over Υ, then u is a solution of the minimization problem (1.5) and 2.3. The basic algorithm. A natural candidate for the solution of the saddle-point problem (2.9)-(2.10) is a particular ADMM called ALG-2 by various practitioners (see, e.g., [4,22,24,29]). Among the several algorithms of the ALG-2 type, the following is considered in this paper.
, ψ n+1 ) end if end for return ERROR A more explicit formulation of subproblem (2.12) reads as follows: (2.16) Similarly, the minimization problem (2.13) is equivalent to the following well-posed linear variational problem in H(Ω; div): Next, a more explicit formulation of the minimization problem (2.14) is given by Remark 3. Assuming the minimizing sequences converge to a limit, we do not know in which space the related weak convergence takes place and if the functional under consideration has the weak lower semicontinuity property necessary to guaranty the convergence to a minimizer. Indeed, since our main concern is mostly to find a simpler alternative to the method discussed in [52], we skip this theoretical aspect of the problem.
Remark 4. As mentioned in the introduction, the augmented Lagrangian approach discussed in this section is largely formal, unlike its finite element realization discussed in section 3. This formality yields to various variational crimes, one of them being to take ϕ in L 2 (Ω), and consequently q 3 in H(Ω; div), while the natural functional space for ϕ is obviously L 1 (Ω). Actually, a similar variational crime is committed when associating (as done by various practitioners today) with the functional J defined by (1.3) the augmented Lagrangian which is well suited to operations in H 1 (Ω), but definitely not in BV (Ω), which is the natural space in which to minimize the above functional J . Of course, the finite dimensional analogues of (2.20) make sense, authorizing, for example, the use of ADMM to solve the corresponding minimization problem.

Finite element realization.
3.1. Generalities. The rationale for using finite elements instead of finite differences was given in the introduction. An inspection of relations (2.16)- (2.19) shows that none of them explicitly involve derivatives of an order higher than one, implying that finite element spaces consisting of piecewise polynomial functions are well suited for defining a discrete analogue of Algorithm 1. Moreover, the expected lack of smoothness of the solutions (or quasi solutions) of problem (1.5) strongly suggests employing low degree polynomials (of a degree typically less than or equal to one). Actually, the approximation we will use is of the mixed type, like those used, for example, in [9,16,21,31]; it allows solving a nonsmooth fourth-order elliptic problem using approximation spaces commonly used for the solution of second-order elliptic problems. For a thorough discussion of mixed finite element methods and some applications, see [6].
Concerning the solution of problem (1.5), we assume that Ω is a rectangle and denote by ∂Ω the boundary of Ω. Since Ω is polygonal, we can triangulate it using a standard finite element triangulation T h (verifying therefore the assumptions listed in, e.g., [24]). A typical finite element triangulation (uniform here) is shown in Figure 1.
We denote by Σ h (respectively, Σ 0h ) the finite set of the vertices of T h (respectively, the finite set of the vertices of T 0h that do not belong to ∂Ω). From now on we will assume that where N 0h (respectively, N h ) is the number of elements of Σ 0h (respectively, Σ h ). Finally, we denote by Ω j the polygon that is the union of those triangles of T h that have P j as a common vertex, and by |Ω j | the measure of Ω j .

Fundamental finite element spaces and the discrete divergence operator.
Following Remark 2, we assume from now on that V = H 2 (Ω). Using the appropriate discrete Green's formula, there is no difficulty in approximating the saddle-point problem (2.9)-(2.10) using classical C 0 -conforming finite element spaces. To approximate the spaces H 1 (Ω) and H 2 (Ω), we will use Above, P 1 is the space of the polynomials of two variables of degree less than or equal to one. Now, for j = 1, . . . , N h , let us uniquely define the shape function w j associated with the vertex P j by Other finite element spaces will prove useful in what follows. The first, denoted by V 0h , is the subspace of V h consisting of the functions vanishing on ∂Ω, that is, The other space, denoted by Q h , is defined by where P 0 is the space of those polynomials that are constant. We clearly have where χ T is the characteristic function of T and The linear space Q h is a suitable candidate for the approximation of the space H(Ω; div), the main issue being to properly approximate the divergence of an arbitrary element of Q h . Suppose that q ∈ H(Ω; div) and v ∈ H 1 0 (Ω); we have (from the divergence theorem) Since the functions q and ∇w j are constant over the triangles of T h , the integrals on the righthand sides of the equations in (3.12) can be computed exactly (and easily). On the other hand, to simplify the computation of the integrals on the left-hand sides, we advocate using the trapezoidal rule to compute (approximately this time) these integrals; we then obtain Remark 5. Albeit satisfactory conceptually, the use of the discrete Green's formulas (as done above to approximate the divergence operator) may lead to spurious oscillations (see [9] for dramatic evidence of this unwanted phenomenon), particularly when combined with low-order approximations, as in the case here. In order to eliminate (or at least strongly dampen) these unwanted oscillations, we advocate the following regularization (some say also stabilization) procedure: replace (3.12) with C(> 0); boundary layer thickness considerations suggest C ≈ 1. The above kind of Tychonov regularization procedure has been successful when applied to the solution of the Dirichlet problem for the Monge-Ampère equation in two dimensions, using mixed finite element approximations based on low-order C 0 -conforming finite element spaces; see [9] for further details. Actually, it has not been tested yet for the solution of problem (1.5).

Discrete Lagrangian and discretized subproblems.
Since our goal in this paper is to compute and approximate the solution of problem (1.5), using a discrete variant of Algorithm 1, a first step in that direction is to define an approximation of the augmented Lagrangian (2.8). The candidate functional L rh : paper is the following: Based on L rh , subproblem (2.12) can be approximated by (3.17) Since functional (3.17) does not contain derivatives of q 1 and q 2 , its minimization can be performed pointwise (in practice on the triangles of the finite element triangulation T h ). This leads to the solution, a.e. in Ω, of a four-dimensional problem of the following type: (x 1 , x 2 ) = arg min (y 1 ,y 2 )∈e 12 1 2 The term y 2 from (3.18) can be easily eliminated using the nonlinear relation in (3.19). This leads to the following unconstrained (and nonconvex) two-dimensional problem: Since the objective function in (3.20) is differentiable, an obvious choice for the solution of the problem is Newton's method.
Problem (3.20) can be reduced even further by observing that if x is a solution of (3.20), then it follows from the Schwarz inequality in R 2 that x and the vector b 1 + b 2 √ 1+|x| 2 are positively co-linear, that is, there is α ≥ 0 such that It follows from (3.20) and (3.21) that Now clearly, due to (3.23), we have and thus Similarly, subproblem (2.17) can be approximated by The bilinear functional on the left-hand side of (3.26) being positive definite and symmetric, an obvious choice for the solution of the above problem is the conjugate gradient algorithm, initialized with the vector-valued function p n 3 . Subproblem (2.18) can be approximated by The closed form solution of subproblem (3.27) is given by Finally, subproblem (2.19) can be approximated by Problem (3.29) could be solved, for 1 ≤ s < 2, for example, using a semismooth Newton method or a nonlinear overrelaxation method like that discussed in [26]. Alternately, if s = 2, problem (3.29) reduces to It that case, a wide variety of methods could be applied. This article advocates a method called radix-4 partial solution variant of the cyclic reduction (PSCR) [36,37,43,47] in the cases where the discretization mesh is orthogonal. In other cases, subproblem (3.30) could be solved, for example, using multigrid-type methods.

Implementation.
Subproblems excluding the first one can be addressed in a straightforward fashion. Subproblem (3.26) is solved using the conjugate gradient method without any preconditioning since we use a uniform mesh in practice. The subproblem (3.30) is solved using the radix-4 PSCR method. Finally, the solution of subproblem (3.27) is a simple trianglewise operation.
Before a digital imagev : {1, . . . ,Ŵ } × {1, . . . ,Ĥ} → R can be presented as a member of the finite element space V h , we must first choose how we are going to deal with the dimensions of Ω and the spatial discretization step h. When one of these is chosen, the other is also fixed. We decided to normalize the dimensions of Ω = (0, W ) × (0, H) by setting max(W, H) = 1. As a result, the spatial discretization step h is fixed to 1/(max(Ŵ ,Ĥ) − 1), implying h 1 in practice. The fact that the spatial discretization step depends on the pixel size of the image could potentially have an effect on the final denoised image. See Remark 7 for further discussion.

Solution of the first subproblem.
Apparently the most involved part of the discrete analogue of Algorithm 1 is the solution of the first subproblem (3.17). We could solve either the two-dimensional form (3.20) or the one-dimensional form (3.25). Depending on b 1 , b 2 and r 1 , r 2 both forms can have multiple local minimas due to the nonconvex nature of the mean curvature. Our actual realization first applies Newton's method to the two-dimensional form (3.20) starting from obvious initial guess p n 1 (x). Assuming that the method converges and the achieved solution is actually a local minimum, we then use the explicit relation (3.21) to test the obtained solution candidate. Only then is the solution candidate accepted.
If Newton's method fails, we proceed with the one-dimensional form (3.25) and apply the well-known bisection method. Some fine-tuning is needed because the best local mimima should be obtained to guarantee overall convergence. Hence, our actual heuristic algorithm for the minimization of the one-dimensional form (3.25) reads as follows: Above, Ψ is the objective function from the one-dimensional form (3.25), and the function bisection(Ψ, l, ς) applies the bisection search to the function Ψ on the interval l with accuracy tolerance ς. If interval l is unbounded, then interval l is adjusted by moving the right-hand side boundary in such a way that the value of the function Ψ is larger on the right-hand side boundary than it is on the left-hand side boundary. In addition, h is the spatial discretization step and σ (> 0) is the requested accuracy. Finally, we set where Φ is the objective function from (3.20) and p newton is the solution (or the result of the last iteration) obtained by Newton's method.
We obtained this heuristic algorithm by observing the cases where Newton's method failed to solve the two-dimensional form (3.20) and by investigating the behavior of the onedimensional form (3.25) on those cases. Since p n 1 begins to approximate the gradient of u n as the discrete analogue of Algorithm 1 progresses and the solution of the one-dimensional form (3.25) is the length of p n 1 , it makes sense to concentrate on the interval [0, √ 2h −1 ] (assuming Im(u n ) ⊂ [0, 1]). The overall shape of the graph motivated us to divide the interval into subintervals as described above. In rare cases, the global mimima was actually located outside the range [0, √ 2h −1 ], which also had to be taken into account. In practice, the combination of the two aforementioned algorithms works well in almost all cases. Newton's method converges quickly for most of the triangles of T h ; for the few triangles for which Newton's method fails, one uses Algorithm 2. The cases where both algorithms fail are rare and are limited to individual triangles.

Initialization and stopping criterion.
Concerning the use of the discrete variant of Algorithm 1, several issues will be addressed, obvious ones being initialization, stopping criterion, and the choice of the Lagrange multipliers r and ε. The choice of parameters r and ε is addressed in section 5, and thus only the initialization and stopping criterion will be discussed here.
A number of different kinds of initialization methods were considered; the most prominent were and In the case of initialization (4.2), only the term corresponding to the mean curvature of the image surface is nonzero in the functional L rh ; and, in the case of initialization (4.3), only the term |u − f | s is nonzero. These differences play a major role in the overall behavior of the algorithm; the effect of both initializations will be discussed in further detail in section 5.
The second issue to be addressed is the stopping criterion. Several candidates were considered, such as where δ > 0. The following criterion was found to be the most straightforward: where ω n = (u n , p n 1 , p n 2 , p n 3 , ψ n ; λ n 1 , λ n 2 , λ n 3 ). The aforementioned criterion works well as long as the Lagrange multipliers r 1 , r 2 , and r 3 are selected to be large enough so that they accurately enforce the equality constraints in (2.4).

Numerical results.
In order to demonstrate the functionality of the discrete variant of Algorithm 1, we applied it against a large variety of test problems. These problems include synthetic and photographic images. For all these test problems, the values of the noise function g are uniformly distributed on the closed interval [−p, p], p > 0. We took δ = 10 −4 for the stopping criterion defined by (4.5). All images are grayscale, and the original images are scaled to the range [0, 1].

Choice of initialization method and parameters.
The behavior of the algorithm varied drastically depending on its initialization. Initialization (4.2) had a tendency to cause extremely slow convergence and an imperceptible low decrease of the value of the objective function in (1.5). By adjusting the parameters ε and r accordingly, reasonably good results were obtained in some cases. Usually this required a large parameter ε and small values for the Lagrange multipliers r 1 , r 2 , and r 3 . However, each Lagrange multiplier combination worked only for a specific problem, and the undertaking of finding a Lagrange multiplier combination applicable to all problems proved futile. When the method was successfully initialized this way, it retained a considerable amount of detail while leaving some residual noise.
On the other hand, the initialization (4.3) caused a completely different behavior as the convergence was much faster, particularly during the first few tens of iterations. Finding a suitable Lagrange multiplier combination was difficult. The equality constraints in (2.4) and a crude dimensional analysis suggest that if |∇u| 2 1, the augmentation functionals behave such that p 1 Thus, taking into account the homogeneity considerations, we should choose the following: ε ∼ h 2 , r 1 ∼ h 2 , r 2 ∼ h 2 , and r 3 ∼ h 4 . On the other hand, the same analysis suggests that for |∇u| 2 1, the augmentation functionals behave such that p 1 ∼ h −1 , p 2 ∼ 1, p 3 ∼ 1, and ψ ∼ h −1 , and thus we should choose the following: ε ∼ h, r 1 ∼ h 2 , r 2 ∼ 1, and r 3 ∼ h 2 . In addition, from the formulation of subproblem (3.26), it can be seen that h 2 r 2 ≈ r 3 could be a suitable choice because it would balance the left-hand side terms. Otherwise the conjugate gradient method would converge extremely slowly.
However, by applying the discrete analogue of Algorithm 1 against a large number of test problems and observing the residuals associated with the equality constraints in (2.4), we concluded that we should choose ε ∼ h, r 1 ∼ h, r 2 ∼ 1, and r 3 ∼ h 2 . This was due to the fact that the equality constraint associated with the Lagrange multiplier r 1 did not converge when r 1 ∼ h 2 . Further testing leads us to the following Lagrange multiplier combination: where r 0 (> 0) is a parameter that can be tuned depending on the amount of noise.
In (5.1), the Lagrange multipliers r 1 , r 2 , and r 3 are large enough to accurately enforce the equality constraints in (2.4) while keeping the convergence speed reasonable. When one combines the initialization (4.3) with the above-mentioned Lagrange multipliers, the method removes a considerable amount of noise while filtering out some detail. From these observations it was decided to use initialization (4.3) and the Lagrange multiplier combination (5.1) for all examples and comparisons.
Remark 6. A "simple" way to fix the problem with the selection of ε is to take with C on the order of 1 (other nonlinear functions of |∇u| are possible). We intend to investigate this approach in another paper. A simple way to use this variable in space parameter ε is as follows: 1. Solve the image restoration problem using a simpler method (based on BV-regularization, for example). Call u 0 the solution to this problem.
2. Set ε 0 = Ch and set ε i = Ch We believe that M = 2 should be enough. Remark 7. In our approach, Ω = (0, W ) × (0, H) is normalized such that max(W, H) = 1. This means that the spatial discretization step h depends on the pixel dimensions of the image. However, this issue was treated very differently in [51] and [52]. In these papers, the spatial discretization step h was chosen first and, thus, in contrast to our approach, the dimensions of Ω depend on the pixel dimensions of the image.
As pointed out in [51] and [52], the spatial discretization step h plays an important role in the behavior of the method. This is due to the fact that h affects the magnitude of the gradient. In order to justify the choices made in this paper we investigated the behavior of the discrete regularization term Our goal was to find out how k(v) behaves pointwise as a function of h. This refines the view on how h should be chosen. We generated a large number of test images containing only random noise and analyzed the obtained data statistically. In addition, we modified our implementation so that the parameter h we consider in the remainder of the current remark is consistent with the one used in [51], [52], implying that h may be chosen freely, and tested various values of this parameter (including some larger than 1). Figures 2, 3, and 4 show the obtained results. In Figure 2, the pixel values are uniformly distributed over various intervals; i.e., the figure shows how the regularization term reacts to the changes in the "intensity" of the noise. In Figure 3, the nonzero pixel values are uniformly distributed in the interval [−0.2, 0.2], but only a certain percentage of the pixels contains nonzero values; i.e., the figure shows how the regularization term reacts to the changes in the "quantity" of the noise. Figure 4 shows the same results in a scaled form.   It is clear that k(v) exhibits three different behaviors when the value of h is varied: 1. If h is small enough (∼ 10 −3 ), then k(v) ∼ h −1 , which is consistent with the aforementioned dimensional analysis. Another defining property is that the regularization term k(v) is unable to differentiate between low intensity and high intensity noise.
This has three main consequences: First, the regularization term is almost always very large when dealing with photographic images; thus the model wants to produce overly smooth solutions. As a result, the parameter ε must be small to preserve small details. Second, it is likely that the model may misidentify some noise as a true jump in the data and preserve it. This observation was also noted in [51] and [52]. Third, the regularization term probably becomes even more difficult to deal with because the transition from a smooth image to a noisy image is extremely steep. This could also explain why the initialization (4.2) did not work: If u 0 = f , then the value of the regularization term would be the same almost everywhere at u 0 , and a transition to even slightly lower "energy" would require significant smoothing of the image. In other words, it is nearly impossible to find a descending path from u 0 to the global minimizer because the regularization term is "flat" near u 0 (= f ). It should be noted that the regularization term can still differentiate between low and high quantity of noise. 2. If h is large enough (∼ 10 1 ), then k(v) ∼ h −2 , which is again consistent with the aforementioned dimensional analysis. The results show that the value of the regularization term is directly proportional to the intensity of the noise (see, in particular, Figure 4). This is not particularly surprising because k(v) ∼ v when |∇v| 2 1. Numerical experiments indicate that our method works really well when h = 1, but the output images are quite blurred, as expected, since the regularization term favors shallow gradients over the steeper ones. In addition, the initialization (4.2) actually works even better than the initialization (4.3) in the sense that the method converges much faster. Both initializations also lead to the same solution in most cases. Again, this is not a surprise because the nonconvex part of the regularization term does not have a major impact when |∇v| 2 1. 3. If h ∼ 10 −1 , then the regularization term does not have a clear asymptotic behavior.
This parameter range is clearly the most interesting because of the way the regularization term reacts to the changes in the intensity of the noise. The value of h determines how steep the transition from a smooth image to a noisy image is, and, thus, it has a significant effect on the final output image. Unfortunately, it is far from clear how h should be chosen. In principle, the parameter h determines how the regularization term reacts to noise, and the parameter ε determines how strongly this reaction is taken into account. However, in practice, there seems to be some overlap between these two parameters. Our choice of h falls into the first category if it is assumed that the number of pixels is large. Although this choice has many disadvantages, the effects of which can be seen in some of the numerical results presented in this paper, we believe that our choice is justified, at least in the context of this paper, for the following reasons: (i) If the input image contains substantial amount of noise, then only a limited amount of information can be recovered even under the best conditions. By taking this into consideration, a solution that is a little too smooth is not a big disadvantage. In addition, if the original image is smooth, then small h is a reasonable choice. (ii) If h 1, then the two terms in the objective function can be easily balanced by selecting ε ∼ h. This means that tuning the parameters is going to be a much easier task. Considering that the goals of this paper are purely algorithmic, we do not want to focus too much on such tuning. (iii) Since the objective function is likely to be more challenging to deal with when h is chosen to be small, problems with small h can be seen as benchmark problems. In that sense, small h is is well suited for the goals of this paper. Figure 5 shows the results obtained while applying the implementation against one of the synthetic test images (Test9). This synthetic test image contains only simple patterns and shapes. The purpose of this test image is to show that the algorithm works effectively in the sense that edges, corners, and image contrast are preserved. Here the noise parameter p was 0.2, and the parameter r 0 was 0.015. The value of the objective function in (1.5) at noisy image f was 0.027159 and J (u 0 ) = 0.364139. The convergence to the given tolerance was achieved after 291 iterations with J (u 291 ) = 0.007054. The l 2 -residual between the vector representations of the original and noisy images was 58.95. A similar residual between the original and output images was 9.049. The algorithm was able to eliminate practically all noise, and the output image is almost indistinguishable from the original image. However, when the difference between the output and noisy images is examined more closely, it is clear that the algorithm had some minor difficulties with the diagonal tips of the star. Figure 6 shows similar results for a second synthetic test image (Test6). This synthetic test image contains many different challenges (sine waves, high and low contrast text, gradients, edges, corners, and narrow bounces) to the algorithm. Here p = 0.2, r 0 = 0.004, J (f ) = 0.004503, and J (u 0 ) = 0.207021. The convergence to the given tolerance was achieved after 215 iterations with J (u 215 ) = 0.004130. The residuals between the noisy and original and the output and original images were 58.16 and 14.35, respectively. The algorithm filtered out a considerable amount of noise, and it is very clear that the algorithm does not have problems with sine waves or gradients. However, the algorithm had difficulties in preserving certain details, such as low-contrast text and the narrowest bounces in the surface. Figure 7 shows results for a photographic image (Barbara). This test image is particularly challenging because it contains image details and noise on similar scales. Here, p = 0.2, r 0 = 0.005, J (f ) = 0.008823, and J (u 0 ) = 0.134979. After 268 iterations, the value of the objective function in (1.5) was 0.008394. The residuals between the noisy and original and the output and original images were 58.86 and 35.50, respectively. Again, the algorithm filtered out a considerable amount of noise, but some details were lost in the process as it is clear that the algorithm was unable to resolve the stripes and grids in the table cloth and pants. This can be seen clearly in the intersection plot.

Examples and comparisons.
The values of the objective function along the iterations are plotted in Figures 8, 9, and 10. The figures also include the following normalized residuals: .

(5.5)
In all three cases, the value of the objective function drops sharply during the first few tens of iterations. However, in the case of the last two, it takes a few tens of iterations more before the value of the objective function drops below J (f ). Also, the decrease is not monotonic as the value of the objective function jumps momentarily after a few iterations. This jump takes place at the same time as the value of the normalized residual R n 1 jumps and was observed with almost all the test images in varying extent. Similar behavior was also presented in [52]. Figure 11 shows a comparison between different test problems with varying amounts of noise. The relevant parameters, objective function values, iteration counts, and residuals are shown in Table 1. It is clear that the algorithm performed commendably when p = 0.05 or p = 0.1. In the case of p = 0.2, more details were lost in the process, and when p = 0.4, almost all small details were lost.
Finally, Figure 12 visualizes the influence of r 0 on the resulting denoised image. The     relevant objective function values, iteration counts, and residuals are shown in Table 2. In all cases, p = 0.2. It is clear that the optimal value of r 0 is somewhere near 0.005. These results illustrate a clear trend that can be observed in all considered test images: the larger the value of r 0 is, the better the algorithm behaves. Actually, when r 0 = 0.001, the value of the objective function at the achieved solution u n is higher than it is at the noisy image f , although the algorithm appears to be working properly otherwise. On the other hand, a large value of r 0 means that more detail is lost in the process. Thus, selecting an optimal value for r 0 is a difficult balancing act. Fortunately, it appears that the optimal value of the parameter r 0 mainly depends on the amount of noise and does not depend as strongly on the image itself. The previous sentence holds true at least when success is measured by the l 2 -norm. If success is estimated by visual inspection, then it appears that r 0 must be selected image dependently, because even a small change in the value of the parameter r 0 can have a visible impact on the final result. This is consistent with comments in [52]. However, the values provided in Table  2 work fairly well for most images.
All presented numerical results show that the L 1 -mean curvature allows both smooth transitions and large jumps without staircasing. In two dimensions the output results do not appear perfect, but the one-dimensional intersections demonstrate desired overall behavior; the qualitative challenges and difficulties are mostly related to the mixtures of image details and noise on similar scales.

Table 1
A comparison between different test problems with varying amounts of noise, with n denoting the number of iterations necessary to achieve convergence and u n the achieved solution. The Residuals column shows the residuals between the vector presentations of the original and noisy images and the original and output images.   6. Conclusions. This paper presents an image denoising algorithm based on an augmented Lagrangian approach that uses the L 1 -mean curvature of the image surface as a regularizer. The main difference between this paper and existing literature (e.g., [52]) is that our methodology relies on a novel augmented Lagrangian functional where the equality constraints treated by augmentation-duality are all linear, resulting in different (and simpler) subproblems. The functionality of the proposed algorithm was demonstrated by applying it against a large set of different types of test problems, some of which were presented in further detail. Based on the numerical experiments, it can be concluded that the algorithm can remove considerable amounts of noise within a reasonable number of iterations. The cpu time used by our implementation is dominated by the solution of the first subproblem; thus we feel that the effort of improving our method should be directed toward this subproblem.