A fast Fourier transform based direct solver for the Helmholtz problem

This paper is devoted to the efficient numerical solution of the Helmholtz equation in a two- or three-dimensional rectangular domain with an absorbing boundary condition (ABC). The Helmholtz problem is discretized by standard bilinear and trilinear finite elements on an orthogonal mesh yielding a separable system of linear equations. The main key to high performance is to employ the Fast Fourier transform (FFT) within a fast direct solver to solve the large separable systems. Numerical results for both two- and three-dimensional problems are presented confirming the efficiency of the method discussed.


Introduction
This work presents an efficient fast direct solver employing FFT for the Helmholtz equation in a rectangular domain with absorbing boundary conditions (ABCs). Here, ω denotes the wave number, which is assumed to be a constant. This condition is valid for homogeneous media. For example, the iterative solution techniques for acoustic scattering by objects in homogeneous and layered media considered in [10,12] leads a sequence of such problems. In this work, we derive the fast solver for the case of having a first-order ABC imposed. However, the same method can be used for the second-order ABCs employed in [10]. These ABCs are the time-harmonic counterpart of the ABCs for the wave equation considered in [1]. Moreover, the solver can be used also for problems with Robin boundary conditions, which essentially broadens its applicability. Solving the Helmholtz equation is in general difficult or impossible to solve efficiently with most numerical methods. Difficulties related to the numerical solution of time-harmonic Helmholtz equations are discussed for instance in [19].
The numerical method proposed in this work is a fast direct solver, which is applicable for problems posed in rectangular domains and having suitable tensor product form matrices. This kind of diagonalization technique has already been proposed in [14]. However, we use FFT and sparsity in order to implement it efficiently. For example, bilinear or trilinear finite element discretizations employed in this paper lead to matrices with such suitable tensor product form. Also the fourth-order accurate modified versions of these elements [7] are applicable with the solver. To improve the efficiency this method employs FFT instead of cyclic reduction. Several other fast direct solvers for elliptic problems in rectangular domains are discussed, e.g., in [17].
The fast direct solvers can be used as efficient preconditioners in the iterative solution of problems in more general domains, see [3,11]. Other methods, which are based on an equivalent formulation of the original problem to enable preconditioning with fast direct methods, are referred to as fictitious domain, domain imbedding, or capacitance matrix methods, and they have been successfully applied also to acoustic scattering problems for instance in [6,8,9,11] as well as to scattering problems with multiple right-hand sides, where a specific method for such problems is considered, for example, in [18].
A different fast direct solver for the Helmholtz equation (1) has already been presented in [10], where cyclic reduction techniques are used in the solution procedure. In [10], the application of a specific cyclic reduction-type fast direct method is presented, called the partial solution variant of the cyclic reduction (PSCR) method, to the solution of the Helmholtz equation [15,16,20] and its computational efficiency is demonstrated. For three-dimensional problems, a general fast direct solver like the PSCR has excellent parallel scalability, see [16]. However, the use of FFT over cyclic reduction techniques is preferable, since FFT is generally faster. That is partly due to very fast implementations of FFT.
The reason for not applying FFT previously was that the ABCs prevent the diagonalization using FFT. The following three steps describe the basic idea to solve this problem and are discussed in more detail in this work: (1) Some of the boundary conditions are changed on the ABC part of the boundary to be of periodic type. This modified problem can be solved now with an FFT solver. (2) Since the boundary conditions have been changed to periodic ones, the residual vector is computed due to these incorrect boundary conditions. This has nonzero components only on the boundary parts, where we have changed the boundary conditions. The correction is computed by solving a problem with the original matrix, where the right-hand side vector is the residual. Only the component of the correction which lie on the changed boundaries are computed. For this the so-called partial solution problem a special technique exists, see [2,10,13,16] for example. Due to sparsity of the vectors the solution requires only O(N ) or O(N log N ) operations in case of two or three dimensions, respectively, where N is the total number of unknowns. After this step, the correct boundary values are known. (3) A similar problem to the one in the first step is solved, but now the right-hand side vector is adjusted so that the solution has correct values on the boundaries. From this it follows that the solution is also the solution of the original problem.
A similar method to the above one was already proposed in [5], but with the major difference that the problem in the second step was solved iteratively. The new solver employing the FFT method is efficient in terms of computational time and memory usage, especially in the threedimensional case. For three-dimensional problems, a general fast direct solver like the PSCR method [16] requires O(N (log N ) 2 ) operations, whereas the FFT based solver reduces the complexity to O(N log N ) operations.
The paper is organized as follows: In Section 2, we present the classic and variational formulation of the Helmholtz problem as well as its discretization by bilinear and trilinear finite elements leading to a system of linear equations. The main idea of the solver for this problem and some preliminaries, which appear in the initialization process of the implemented fast solver for the twoand three-dimensional case, are discussed in Section 3. Sections 4 and 5 are devoted to the twoand three-dimensional problems, respectively. Here, some preliminaries as well as the fast solver steps are discussed. Finally, numerical results for both two-and three-dimensional problems are presented in Section 6, and conclusions are drawn in Section 7.

Problem formulation and discretization
Let Ω ⊂ R d , d = 2, 3, be a d-dimensional rectangular domain and Γ = ∂Ω its boundary. We consider the Helmholtz equation describing the linear propagation of time-harmonic acoustic waves given by where ω denotes the wave number. Here, equation (3) is an approximation for the Sommerfeld radiation condition appearing for the general model problem in R d . The approximation is performed by truncating the unbounded domain at a finite distance, which provides the boundary condition at the truncation boundary. In order to reduce spurious reflections caused by this artificial boundary, we use local absorbing boundary conditions. In general, the boundary Γ can be decomposed into parts where different boundary conditions are imposed. Let now Γ = Γ B ∪ Γ N be decomposed into an absorbing boundary condition part Γ B and a Neumann boundary condition part Γ N . The Neumann boundary conditions are given by whereas in case of absorbing boundary conditions we have that where n denotes the outward normal to the boundary. This type of absorbing boundary conditions (6) are called first-order absorbing boundary conditions. Remark 1. In practical applications, so-called second-order absorbing boundary conditions are also important and were considered in [10] as well. However, the choice of the absorbing boundary conditions does not have any impact on the performance of the proposed solver. Thus, the same method can be used with given second-order absorbing boundary conditions, see [10].
In order to obtain the discrete version of the Helmholtz problem (2)-(3), let us first state its weak formulation. For that we introduce the Hilbert space V = H 1 (Ω). Let the source term f ∈ L 2 (Ω) be given. Multiplying (2) with a test function as well as applying integration by parts and the boundary condition (3), yields the weak problem: Find u ∈ V such that where Let us denote the mesh points by x j,l , l = 1, . . . , n j for every x j -direction with j ∈ {1, . . . , d} and the corresponding mesh size by h j = 1/(n j − 1). Thus, the mesh is equidistant in each direction x j . Moreover, we denote the full mesh size by N , which is given by N = n 1 × · · · × n d . Discretizing problem (7) by bilinear or trilinear finite elements on an orthogonal mesh leads to a system of linear equations given by where the matrix A has a separable tensor product form and f = (f 1 , . . . , f N ) denotes the discrete right-hand side. For the two-dimensional case, the matrix A is given by whereas in three dimensions it is given by Here, the n j × n j -matrices K j and M j are one-dimensional stiffness and mass matrices, respectively, in the x j -direction with possible modifications on the boundaries due to the absorbing boundary conditions. They are computed by one-dimensional numerical quadrature on the unit interval and are given as follows and where the first and last entries are including the corresponding boundary conditions. Absorbing boundary conditions (6) yield the entries k 1,1 = k nj ,nj = 1 − iωh j , whereas Neumann boundary conditions lead to k 1,1 = k nj ,nj = 1. The matrix M j is the same for both Neumann and (firstorder) absorbing boundary conditions. Remark 2. Without loss of generality let us assume that the absorbing boundary conditions are posed in direction of x 1 for both (opposite) sides.
In the next two sections, an efficient fast solver for solving the large linear systems Au = f is proposed. We split the discussion into two sections corresponding to the two-dimensional and three-dimensional problems with the matrix A given by (10) and (11), respectively.

Main idea and preliminaries
3.1. Basic steps. The idea for solving the problem Au = f is to consider an auxiliary problem where the system matrix B is derived by changing some absorbing boundary conditions to periodic ones. The key is that we can solve the modified (periodic) problem Bv = f now by using the FFT method, which is not possible for the original problem Au = f .
Before going into more details later, let us discuss the main steps of the solver.
Step 2: Step 3: Solve Note that after Step 2, we would have already had the identity u = v + w, but we do not use this identity directly to compute u, which will be explained in Subsections 4.2 and 5.2.

Preliminaries.
We have assumed that the absorbing boundary conditions are given in x 1direction on both opposite boundaries, see Remark 2. Since we have to change the boundary conditions on the two opposite boundaries to be of periodic type for the auxiliary problem (14), we consider the one-dimensional stiffness and mass matrices K 1 and M 1 and change the entries corresponding to the boundary parts into periodic conditions for the auxiliary problem (14). We denote these matrices by K B 1 and M B 1 . The n j × n j circulant matrices K B j and M B j are given as follows and which means that we have changed the boundary conditions on two opposite boundaries to be of periodic type. Hence, Let us consider now the original problem (9) and the auxiliary problem (14) in more detail. After a suitable permutation A and B have the block forms the subscripts b and r correspond to the nodes on the Γ B boundary and to the rest of the nodes, respectively. Note that the matrix B − A has the structure and only the matrix has to be saved for the application of the fast solver. Steps of the fast solver for Au = f include also the application of the so-called partial solution method, which is a special implementation of the method of separation of variables. This method involves the solution of the generalized eigenvalue problems where the matrices Λ A 1 and Λ B 1 contain the eigenvalues as diagonal entries and the matrices V 1 and W 1 contain the corresponding eigenvectors as their columns. Let us define the more general problem with circulant matrices K B j and M B j for the problem size n j corresponding to any x j -direction. The n j × n j eigenvector matrix W j for the generalized eigenvalue problem (25) is given by and the diagonal entries of the corresponding diagonal eigenvalue matrix Λ B j are for l = 1, . . . , n j . In order to apply the partial solution method the eigenvectors are normalized so that they satisfy the conditions: This can be achieved by multiplying V 1 and W 1 with the vectors s A 1 and s B 1 of length n 1 , respectively. We denote the general vectors by s A j and s B j of length n j . The entries of s A j are given by with the componentwise matrix norms which are weighted by the l-th eigenvectors of V j for l = 1, . . . , n j . Similarly, the entries of s B j are given by with the componentwise matrix norms which are weighted by the l-th eigenvectors of W j for l = 1, . . . , n j . In (28) and (29), I 1 denotes the identity matrix of length n 1 . In the following, I j and I jk denote the identity matrices of lengths n j and n j × n k , respectively. The eigenvector matrices V 1 and W 1 are used for solving the partial solution problems when needed in the steps of the solver. However, the generalized eigenvalue problems (23) and (24) have to be solved only once during the solution processin the initialization. The conditions (28) and (29) also lead to a convenient representation for the inverses of the system matrices A and B, which is discussed in Subsections 4.1 and 5.1 for the respective two-dimensional and three-dimensional case. In the next two sections, we discuss the two-dimensional and three-dimensional problems in more detail separately, since the efficient implementation of the initialization process and the steps of the fast solver differs in both cases.  (14) in the two-dimensional case is given by Using the conditions (29) as follows the inverse of B can be represented by where Similarly using the conditions (28), we obtain for the system matrix A the following representation: where Then the linear system is solved by solving the respective two subproblems consecutively in the application of the fast solver. Note that r denotes now some right-hand side which is in the different steps of the solver also different. However, we will describe that in more detail in the next subsection. We denote by the LU decomposition corresponding to H A . The structure of H B and H A is essential for the fast application of the direct solver. The diagonal blocks of these matrices are tridiagonal which makes the LU decomposition fast for them. The computational complexity is optimal O(N ).

4.2.
Fast solver in the two-dimensional case.

Step 1. The auxiliary problem
is solved, but only v b and not v r is computed. For that, the FFTf of the right-hand side f is computed first, where its coefficientsf k are given as followŝ for all k = 1, . . . , N and normalization might be needed, which means thatf is multiplied by the vector s B 1 , where its components are defined in (32) with j = 1. Note that computing the FFT corresponds to the multiplicationf = (W T 1 ⊗ I 2 ) f . The vectorf is saved, since it will be needed in Step 3 as well. Next, we apply the LU decomposition (41) with the right-hand sidef Performing the inverse FFT on the resulting vectorz 1 would provide both v b and v r , which would correspond to the multiplication v = (W 1 ⊗ I 2 )z 1 . However, since we only need v b , instead of that, we multiply the vectorz 1 by the matrix W 1 by taking advantage of the sparsity of the desired components v b . More precisely, we multiplyz 1 from the left side by the eigenvectors of W 1 which correspond only to the boundary Γ B denoted by W b 1 leading to which resembles the representation (36) for B −1 . The computational complexity of Step 1 is O (N log N ).

4.2.2.
Step 2. We introduce the additional vector w defined as w = u − v. Since and (21), we obtain the following problem: where C bb is defined by (22). We solve the problem (49) by using the representation (38), but compute again only w b and not w r , since the right-hand side as well as the desired components of the solution are both sparse. First, we compute then by using the LU decomposition (43) we solve and, finally, we solve which resembles the representation (38) for A −1 but corresponds only to the boundary Γ B . The computational complexity of Step 2 is of optimal order O(N ).

4.2.3.
Step 3. Finally, in order to obtain the solution u of the original problem (9), we solve now the problem Since we have already computed the Fourier transformationf of f in Step 1 by (45), we only need to compute the Fourier transformation of the second term of the right-hand side of equation (53) and due to sparsity again only the part corresponding to the boundary Γ B as follows leading to the Fourier transformation of the entire right-hand side of equation (53) denoted bŷ f +ĥ. Next, we apply the LU decomposition (41) with this right-hand side yielding In the last step all resulting components are needed (not only the ones corresponding to Γ B ) to obtain the solution u by applying the inverse Fourier transformation onz 3 , where its coefficientŝ u k are given as followsû for all k = 1, . . . , N . Again the vectorz 3 may need to be multiplied by the vector s B 1 with entries defined in (32) for n 1 before applying the inverse Fourier transformation on it. The computation of the inverse Fourier transformation corresponds to the multiplication The computational complexity of Step 3 is O (N log N ).

5.
The three-dimensional case 5.1. Preliminaries for the three-dimensional problem. A and B. The matrix B in the three-dimensional case is given by

Matrices
Using the equations (28) and (29), the inverses of A and B can be represented analogously as in the two-dimensional case as follows and where and respectively.

5.1.2.
Applying the two-dimensional fast solver. Linear systems with the block diagonal matrices H B and H A are again solved with a direct method. However, it is performed in a different way than for the two-dimensional case, since the LU decomposition is slow for large block tridiagonal problems. More precisely, the efficient implementation in three dimensions contains the application of the two-dimensional fast solver for n 1 subproblems of size n 2 × n 3 including the computation of the partial solution method in x 2 -direction. For that, one needs to solve the following generalized eigenvalue problems during the initialization process: with the matrices K B 2 and M B 2 as defined in (17) and (18) but for the x 2 -direction now. Moreover, the diagonal eigenvalue matrices Λ A 2 and Λ B 2 and the eigenvector matrices V 2 and W 2 are formed analogously as for the x 1 -direction only the problem size changes to n 2 . For that, the generalized eigenvalue problem with circulant matrices for a general problem size n j was defined in (25) and the corresponding eigenvector matrix is represented in (26). The eigenvectors are normalized to satisfy the conditions This can be achieved by multiplying V 2 and W 2 with the vectors s A 2 and s B 2 of length n 2 , respectively, which are introduced in (30)-(33). The fast solver includes the application of the partial solution method n 1 times for the following two-dimensional n 2 × n 3 subproblems in x 2direction corresponding to the matrices (61) and (62): and where l = 1, . . . , n 1 . This yields the direct solution for H −1 A and H −1 B . Note that the subproblems (66) and (67) reflect the structure of (61) and (62), respectively, in the framework of the twodimensional problem (10). Now the parameters p A and p B have been introduced in (66) and (67), respectively. The corresponding auxiliary problems are given by and which now reflect the structure of (61) and (62), respectively, in the framework of the twodimensional problem (34). As defined in (27), Λ B 1,l is the lth-diagonal entry of the diagonal eigenvalue matrix Λ B 1 . Analogously, we have denoted by Λ A 1,l the lth-diagonal entry of the diagonal eigenvalue matrix Λ A 1 . In summary, the two-dimensional problems (66) and (67) are solved for all l = 1, . . . , n 1 by applying n 1 times the two-dimensional FFT based fast solver from Subsection 4.2 using the auxiliary two-dimensional problems (68) and (69). The computational complexity of this step is now O (N log N ).

5.2.
Fast solver in the three-dimensional case. The three-dimensional solver has the same three main steps, but now instead of applying LU decomposition we apply the two-dimensional solver as described in the previous subsection. We present all the main steps of the solver similarly as for the two-dimensional case in Subsection 4.2 but in less detail. Hence, we refer the reader also to Subsection 4.2 for more details.

5.2.1.
Step 1. The auxiliary problem (44) is solved, but only v b and not v r is computed. For that, we need to computef = (W T 1 ⊗ I 23 ) f first, which is equivalent to the computation of the FFTf for the right-hand side f , where its coefficientsf k are given in (45) for all k = 1, . . . , N . The vectorf can be normalized by multiplying it with the vector s B 1 of length n 1 with components defined in (32). The vectorf is saved, since it will be needed in Step 3 as well. Next, we apply n 1 times the two-dimensional solver for the n 2 × n 3 problems (67) with the auxiliary problems (69) leading to the solution of the problem Since we only need v b , we now multiply the vectorz 1 by the matrix W 1 by taking advantage of the sparsity of the desired components v b . More precisely, we multiplyz 1 from the left side by the components of the eigenvectors of W 1 which correspond only to the boundary Γ B denoted by which resembles the representation (60) for B −1 .

5.2.2.
Step 2. We introduce the additional vector w defined as w = u − v. Since (48) and (21), we derive at problem (49) We solve this problem by using the representation (59), but compute again only w b and not w r , since the right-hand side as well as the desired components of the solution are both sparse. First, we solve then by applying n 1 times the two-dimensional solver for the n 2 × n 3 subproblems (66) with the auxiliary problems (68) leading to the solution of the problem Finally, we compute which resembles the representation (59) for A −1 but corresponds only to the boundary Γ B .

5.2.3.
Step 3. The last step yields the solution u of the original problem (9). First, we solve the problem (53) Since we have already computed the Fourier transformationf of f in Step 1 by (45), we only need to compute the Fourier transformation of the second term of the right-hand side of equation (53) and due to sparsity again only the part corresponding to the boundary Γ B as follows leading to the Fourier transformation of the entire right-hand side of equation (53) denoted bŷ f +ĥ. Next, we apply n 1 times the two-dimensional solver for the n 2 × n 3 problems (67) with the auxiliary problems (69) leading to the solution of the problem In the last step all resulting components are needed (not only the ones corresponding to Γ B ) to obtain the solution u by applying the inverse Fourier transformation onz 3 , where its coefficientŝ u k are defined as in (56) for all k = 1, . . . , N . The vectorz 3 may be normalized by multiplying it with the vector s B 1 of length n 1 with the components defined in (32) before applying the inverse Fourier transformation which corresponds to the multiplication finally leading to the solution u.
In the next section, we present numerical experiments for both the two-dimensional and threedimensional case.

Numerical results
For d = 2 and d = 3, the computational domain is chose as Ω = [0, 1] d the unit square and unit cube, respectively. The wave number is set ω = 2π for all numerical experiments. The discretization meshes are uniform with respect to each x j -direction, where the corresponding step sizes are denoted by h j = 1/(n j − 1), j = 1, . . . , d. The right-hand side is chosen as 0.01 for the first n 1 entries and 1 for all the other entries. In this set of experiments, the efficiency of the fast direct solver is discussed. The numerical experiments have been computed using MATLAB 9.3, R2017b.
In the following, we compare the CPU times in seconds for computing the solution by applying Matlab's backslash and the fast solver presented in this work. In our case, Matlab's backslash uses the sparse direct solver UMFPACK for computing the solution of the sparse linear systems (10) and (11), see [4]. Besides the computational times of applying Matlab's backslash and the FFT based direct solver also the computational times needed in the initialization process are presented. However, the initialization processes differ between the two-and three-dimensional problems, which has already been addressed in Sections 4 and 5. Summing up, in the two-dimensional case the LU decompositions are computed for the matrices H B and H A defined in (37) and (39), respectively, (see Subsection 4.1.2), whereas the three-dimensional problem is not solved using LU decomposition in three dimensions. In this case, the action of the inverses of H B and H A defined in (62) and (61), respectively, are computed by applying the fast solver in two dimensions n 1 times (see Subsection 5.1.2). This approach leads to a more efficient implementation in three dimensions.
The largest numerical experiments were computed in three dimensions with 135 005 697 (= 513 3 ) unknowns. Note that this size is already too large regarding the computational memory in Matlab for setting up the whole matrix matrix A as defined in (11), which would be needed in order to apply Matlab's backslash. However, the fast solver presented in this work still computes the solution in a reasonable amount of time without using too much computational memory, since the fast solver does not need to form the whole matrix A, but only of the left upper block (22) denoted by C bb of the matrix B − A defined in (21).

Remark 3. All numerical experiments presented in this section were performed on a laptop with
Intel(R) Core(TM) i5-6267U CPU @ 2.90GHz processor and 16 GB 2133 MHz LPDDR3 memory.
6.1. Numerical results for the two-dimensional case. For the first set of numerical experiments, we choose n = n 1 = n 2 . The CPU times in seconds for the initialization process as well as the application of the fast solver and Matlab's backslash for comparison are presented in Table 1 for different values of n. The largest numerical experiments in two dimensions have been computed for n = 2049 with 4 198 401 unknowns. As can be observed in Table 1 Table 2 presents the CPU times in seconds for various combinations of n 1 and n 2 in the twodimensional case. More precisely, we compare here the computational times with respect to a large difference in the magnitude of n 1 and n 2 . Both cases are discussed n 1 >> n 2 and n 2 >> n 1 . As one can observe in Table 2 it does not influence the computational times of the solver whether n 1 >> n 2 or n 2 >> n 1 . For n 1 = 65 and n 2 = 2049, the fast solver's CPU time was 0.10 seconds, and for n 1 = 2049 and n 2 = 65, it was 0.12 seconds. However, if n 1 is very large, it influences the computational times of the initialization process, more precisely, of solving the generalized eigenvalue problem (23) performed by applying Matlab's function eig, which requires the full representations of the matrices K 1 and M 1 , which are otherwise stored as sparse matrices. These increased computational times can be observed in the last two columns of Table 2. We note here that the solution method described in [10] would be faster for solving this generalized eigenvalue problem.  Table 2. CPU times in seconds for various combinations of n 1 and n 2 in the two-dimensional case 6.2. Numerical results for the three-dimensional case. In the three-dimensional case, we chose n j = n for all j = 1, 2, 3 for the first set of numerical experiments again. The CPU times in seconds for the initialization process as well as the application of the fast solver and Matlab's backslash for comparison are presented in Table 3 for different values of n. As can be observed from Table 3, the computational times for applying Matlab's backslash rise very quickly. For n = 65, its CPU time was 608.12 seconds, whereas the fast solver's CPU time was only 2.02 seconds. Then already for n = 129 with 2 146 689 unknowns, Matlab runs out of memory when the matrix A is formed which makes it impossible to solve the problem (9) Table 4. CPU times in seconds for various combinations of n 1 and n 2 and n 3 in the three-dimensional case Based on this it can be expected that, for example, a C++ implementation would be essentially faster than the Matlab implementation.

Conclusions
In this work, we have derived an FFT based direct solver for solving efficiently the Helmholtz equation in a rectangular domain with an absorbing boundary condition. The model problem and fast solver are discussed for two-and three-dimensional domains as well as numerical results for both cases are presented. The experiments show nearly optimal order of complexity O(N log N ) matching the efficiency of the FFT method. In particular, the numerical results in this work show the efficiency of the fast solver compared to Matlab's backslash and also with respect to the computational memory needed to solve the discretized Helmholtz problem for a large number of unknowns.