k -step shape estimators based on spatial signs and ranks

In this paper, the shape matrix estimators based on spatial sign and rank vectors are considered. The estimators considered here are slight modiﬁcations of the estimators introduced in D¨umbgen (1998) and Oja and Randles (2004) and further studied for example in Sirki¨a et al. (2009). The shape estimators are computed using pairwise diﬀerences of the observed data, therefore there is no need to estimate the location center of the data. When the estimator is based on signs, the use of diﬀerences also implies that the estimators have the so called independence property if the estimator, that is used as an initial estimator, has it. The inﬂuence functions and limiting distributions of the estimators are derived at the multivariate elliptical case. The estimators are shown to be highly eﬃcient in the multinormal case, and for heavy-tailed distributions they outperform the shape estimator based on sample covariance


Introduction and some notations
A p-variate random vector x is elliptically symmetric if its density function is of the form where a p-vector µ denotes the location of the distribution and a positive definite symmetric p × p matrix (PDS(p)) Σ, the so called scatter matrix, defines the shape and scale of its contours. If x is a random variable from an elliptical distribution, then the standardized variable z = Σ −1/2 (x − µ) has a spherical distribution with location 0, and z can be decomposed as z = ru, where r = ||z|| is the Euclidean length of z, and r and u = ||z|| −1 z are independent with u being uniformly distributed on the unit sphere. The scatter matrix Σ can be decomposed into two parts, that is, Σ = σ 2 Λ, where σ 2 is the scale parameter and PDS(p) matrix Λ with T r(Λ) = p is the shape matrix. Note that the condition T r(Λ) = p is sometimes replaced by Det(Λ) = 1 or Λ 11 = 1 (Paindaveine, 2008). A shape matrix includes information only of the shape and orientation of the data and in several applications it is enough to estimate the shape matrix only.
In this paper, we denote the scatter matrix functional as C(x), where x is a random vector with cumulative distribution function F . C(x) is defined to be a scatter matrix if it is PDS(p) and affine equivariant so that for every nonsingular p × p matrix A and p-vector b. Applying scatter matrix functional to the empirical distribution function F n gives the scatter matrix estimate C. Further, the shape matrix functional is denoted by V (x) and it is PDS(p) and affine equivariant in the sense that The corresponding estimate V is again obtained by applying shape functional to the empirical distribution function F n . Note that in the elliptical model, different scatter matrices are not comparable, since a correction factor is needed in order to quarantee the Fisher consistency towards Σ. All shape matrices, however, estimate the same population quantity Λ and are directly comparable without any modifications. Tyler (1987) introduced a simple shape matrix estimator based on spatial sign vectors. His estimator is distribution-free under elliptically symmetric distributions and highly efficient in case of heavy-tailed distributions. Under multivariate normal model the efficiencies are, however, quite low. More efficient estimators are obtained by using pairwise differences instead of the original observations in Tyler's approach (Dümbgen, 1998), or spatial rank vectors as in Oja and Randles (2004) and Sirkiä et al. (2009). Since the Dümbgen's shape matrix and the spatial rank covariance matrix use the pairwise differences of the observed data, there is no need to estimate the location center. The use of differences in Dümbgen's estimator also implies that the estimator has important independence property. It means that V (x) is a diagonal matrix for all x with independent components. This property is useful in several multivariate analysis problems, for example in independent component analysis Tyler et al., 2009).
The estimators derived in Dümbgen (1998) and Oja and Randles (2004) are computed using iterative algorithms. Due to this property and the use of pairwise differences, the estimation procedure is rather time-consuming when the sample size is very large. In this paper, we therefore consider so called k-step versions of these estimators, that is, the estimators obtained after k-th iteration step. Similar estimators have been considered for example in Oja et al. (2006) and Tyler et al. (2009), where 1-step M-estimators and 1-step W-estimators were used in invariant coordinate selection and independent component analysis. Other examples include 1-step signed-rank scatter matrix that was introduced by Hallin et al. (2006), and 1-step Tyler's shape matrix that was used in the context of principal axis analysis by Chritchley et al. (2008).
The outline of the paper is as follows. In Section 2, the corresponding estimators are defined and in Sections 3 and 4, the robustness and efficiency properties of them are studied. In Section 5, the properties of the estimators are examined using simple simulation studies. The paper is concluded with some final comments in Section 6. All the proofs are found in the Appendix.

Shape estimators based on spatial sign and rank vectors
The spatial sign function is defined as where x = (x T x) 1/2 is the Euclidean length of the vector x. Applying spatial sign function to the data points x 1 , . . . , x n produces spatial sign vectors S(x i ). The centered spatial rank function is defined as thus unlike the sign function, the rank function depends on the data. If x 1 , . . . , x n is distributed according to F , then R(x) converges uniformly in probability to the theoretical rank function (Möttönen et al., 1995). At spherical distribution, for some bounded increasing function q F (r). In the following, the spatial rank vectors are denoted by R( In Visuri et al. (2000), the symmetrised spatial sign covariance matrix and the spatial rank covariance matrix were introduced. These estimates are given by Since the estimates are based on differences, they can be computed without the need for the location estimate µ. The main disadvantage is that these estimates are only orthogonal equivariant, that is, they satisfy (2) only for orthogonal p×p matrices A. The estimators are therefore not genuine scatter matrix estimators.
To derive affine equivariant versions of these estimates, one has to proceed as in Hettmansperger and Randles (2003) and apply the spatial sign and rank functions to the transformed data points. Let V be a PDS(p) matrix with T r(V ) = p and write z i = V −1/2 x i , i = 1, . . . , n, for the standardized observations. Then the affine equivariant version of the symmetrised spatial sign covariance matrix is such V that satisfies and for which T r( V ) = p. The resulting estimate is also known as the Dümbgen's estimate (Dümbgen, 1998) and it is a symmetrised version of the Tyler's M-estimate (Tyler, 1987) that is found as a solution to p ave{S(z i )S T (z i )} = I p . The properties of the Dümbgen's estimator are studied in Dümbgen (1998), Dümbgen and Tyler (2005) which is repeated until the sequence converges. In the end, the resulting estimate is scaled so that T r( V ) = p. This algorithm is similar to that considered in Tyler (1987), only computed using pairwise differences instead of the original observations. For the proof of convergence, see Tyler (1987). The affine equivariant version of the spatial rank covariance matrix is obtained by solving The resulting estimate is again scaled so that T r( V ) = p. This estimate is considered for example in Oja and Randles (2004) and in Sirkiä et al. (2009). An iteration step suggested by the estimating equation (7) is now Note that the above equation does not include any standardization term, since the resulting estimate is scaled to have trace equal to p. The algorithm (8) seems always to converge in practice, however, the proof for this is still an open question. In Sirkiä et al. (2009), it is shown that the above estimates are highly efficient even in the multinormal case. The main drawback is that since these estimates are based on pairwise differences, they are rather slow to compute when the sample size is very large. Motivated by this computational inconvenience and the fact that there is no proof for the convergence of (8) so far, we consider in the following so called k-step versions of the estimates. These are simply obtained by starting with some √ n-consistent shape matrix estimate, for example Tyler's M-estimate, and repeating steps in (6) and (8) k times.
Definition 1. Let V 0 be some √ n-consistent shape matrix estimate, which is used as a starting value. The k-step sign estimate is then computed with and then standardized so that T r( V k ) = p.

5
Definition 2. Let V 0 be some √ n-consistent shape matrix estimate, which is used as a starting value. The k-step rank estimate is then computed with and then standardized so that T r( V k ) = p.
To compute influence functions of k-step estimators, we need to define the corresponding functionals. Write V k = V k (x), where x ∼ F , for the functional corresponding to k-step sign estimate V k . Then V k is given by the implicit equation , and x i 's are independent and distributed according to F .
For k-step rank estimator, the functional corresponding to the estimator is given by where again z i = V −1/2 k−1 x i and x i 's are independently distributed according to F . Note that to quarantee that T r(V k ) = p, (10) now includes a scaling term.
The above k-step estimators inherit now some important properties of the initial estimator. Estimators are naturally affine equivariant shape estimators and satify (3), as the initial estimator V 0 is affine equivariant. The independence property of k-step sign estimators is also inherited from the initial estimator as stated in the following theorem. It is still unclear whether the rank estimators have the same property.
Theorem 1. Assume that x is a random p-vector with independent components and V 0 (x) has the independence property, that is, V 0 (x) is diagonal. Then the k-step sign functional has the independence property.
In the following sections, we examine the robustness and efficiency properties of the k-step estimators. It is shown that when for example Tyler's M-estimate is used as a starting value, then even the resulting 1-step estimators are highly robust and efficient shape matrix estimators.

Influence functions
The robustness of a functional T against a single outlier x can be measured using the influence functions (Hampel et al., 1986). Let denote the contaminated distribution, where ∆ x is the cdf of a distribution with probability mass one at point x. The influence function of T is then given by Hampel et al. (1986) showed that, for any scatter functional C(F ), the influence function of C at a spherical F 0 , symmetric around the origin and with C(F 0 ) = I k , is given by where r = ||x|| and u = ||x|| −1 x, and weight functions α C and β C depend on the functional and the distribution F 0 . For shape functionals V (F ), the influence function reduces to where α V is again a weight function that includes all information of the influence of x on the estimator. For robust estimators, α V is continuous and bounded and for comparisons between different shape estimators, we only need to compare functions α V . The influence function of k-step sign functional defined in (9) is now given by the following theorem.
Theorem 2. At spherical F 0 , the influence function of k-step sign functional V k with initial shape matrix V 0 is given by (11) with where e 1 = (1, 0, . . . , 0) T , r = ||x|| and x and x 1 are independent and spherically distributed.
Note that when k → ∞, the weight function reduces to α V (r) = 2(p + 2)(1 − pg(r)), that is, the weight function of the regular Dümbgen's shape matrix (Sirkiä et al., 2009). For k-step rank functional defined in (10) the following result holds: Theorem 3. At spherical F 0 , the influence function of k-step rank functional V k with initial shape matrix V 0 is given by (11) with where e 1 = (1, 0, . . . , 0) T , r = ||x|| and x, x 2 and x 3 are independent and spherically distributed.
When k → ∞, the influence function of regular rank covariance functional is obtained. In that case, α V (r) = (p + 2)(g 1 (r) − g 2 (r))/c 2 F . Figure 1 illustrates functions α V k at bivariate standard normal distribution case for k-step sign and rank estimators with k = 1, 2, 3 and ∞ and when the Tyler's M-estimator with α V 0 = p + 2 (first row) or the sample covariance matrix with α V 0 = r 2 (second row) is used as a starting value.
As seen in the figure, when k = 3 and the Tyler's M-estimator is used as a starting value, the α V k functions are very similar to those obtained as k → ∞. For all estimators, the functions α V k are clearly continuous. Influence functions are also bounded as stated in the following corollary. When the sample covariance matrix is used as an initial estimator, the influence functions of k-step estimators are naturally unbounded. However, as k increases, the resulting estimators are clearly more robust than the sample covariance matrix as the outlying observations do not have that much influence on the estimators.
Corollary 1. For k-step sign and rank estimators, the functions α V k are bounded when α V 0 is bounded.

Asymptotic distributions and efficiencies
In the next, we will denote where K p,p is the commutation matrix, that is, a p 2 × p 2 block matrix with (i, j)-block being equal to a p × p matrix that has 1 at entry (j, i) and zero elsewhere. By vec(V ) we mean the vectorization of a matrix V , which is obtained by stacking the columns of V on top of each other. If V = I p , then The limiting distributions of k-step sign and rank estimators are given by the following theorem.
Theorem 4. Let F 0 be a spherical distribution. Assume that the initial estimate V 0 is √ n-consistent and follows the multinormal distribution. Then the limiting distribution of √ n vec( V k − I p ) is also multinormal with mean 0 and asymptotic covariance matrix τ C p,p (I p ), where denotes the asymptotic variance of any off-diagonal element of V k and functions α V k (r) are defined in Theorems 2 and 3.
The limiting distributions at elliptical case can be derived using the affine equivariance properties of the estimators.
Corollary 2. Let F be an elliptical distribution with population shape parameter V . Then the limiting distribution of √ n vec( V k − V ) is multivariate normal with mean 0 and asymptotic covariance matrix where τ is given in Theorem 4.
Note that only the scalars τ are needed to characterize the limiting distributions of the shape estimators. Therefore, in order to compare asymptotic efficiencies of different estimators, one only has to compare these scalars.
In Table 1, we list the asymptotic relative efficiencies of k-step sign and rank estimators with k = 1, 2, 3 and ∞ with respect to the regular shape estimator. By regular shape estimator, we mean the estimator, which is obtained by standardizing the sample covariance matrix. The efficiencies are computed at different p-variate t-distributions with selected values of dimensions p and degrees of freedom ν, where ν = ∞ refers to the multivariate normal case. The variance of the off-diagonal element of regular shape estimator at the t-distribution is (ν − 2)/(ν − 4). In the table, the notation k = 0 refers to the Tyler's M-estimator, which is used as a starting value. The variance of the off-diagonal element of it is (p + 2)/p. When k = ∞, the efficiencies of regular Dümbgen's estimator and the rank covariance estimator are listed.
As seen in Table 1, when p is small, by taking just one step in this estimation procedure, a significant increase in the efficiencies is seen as compared to the initial Tyler's M-estimator (k = 0). After three steps, the efficiencies of k-step estimators are already very close to those of the Dümbgen's or rank covariance estimator (k = ∞). When p increases, there are not much differences seen in the efficiencies of different k-step estimators. In fact, for high-dimensional data, 1-step sign estimator seems to provide a robust and highly efficient alternative to the regular shape estimator.
In case of multivariate normal data, the limiting efficiencies of sign and rank covariance estimators are very close to each other. When high-dimensional heavy-tailed data is encountered, the sign estimators seem to outperform the rank estimators. In other cases, the rank estimators are slightly more efficiencient than the sign estimators.

Simulation studies
In this section, we will use simple simulation studies to compare the finitesample efficiencies of different k-step shape estimators with respect to those of the regular shape estimator. M = 2500 samples with sample sizes n = 50, 100 and 200 and dimensions p = 3 and 5 were drawn from t-distributions with ν = 5, 8 and ∞ degrees of freedom, where ν = ∞ corresponds to multinormal samples. For every estimator and distribution, the mean squared errors (MSE) were computed Table 1: AREs of k-step sign and rank (in brackets) estimators as compared to the regular shape estimator at different p-variate t-distributions with selected values of dimension p and degrees of freedom ν. k = 0 refers to the Tyler's M-estimator that is used as a starting value. k = ∞ refers to the Dümbgen's estimator and the regular rank estimator.
where V k,ij , with i = j, denotes the off-diagonal element of corresponding k-step estimate computed from the m-th generated sample. The estimated finite sample efficiencies were then computed as ratios of the simulated MSEs and are listed in Tables 2 and 3 for p = 3 and p = 5 respectively. The corresponding asymptotic relative efficiencies (denoted by n = ∞) from Table 1 are also listed for easy reference.
The results in Tables 2 and 3 show that when samples are generated from the multinormal distributions, the finite sample and the limiting efficiencies are almost the same even when n = 50. When sampling is done from the heavy-tailed distributions, the convergence to the limiting efficiency is clear but much slower. Especially for small sample sizes, the loss in efficiency is remarkable, but also in the case n = 200, the efficiencies are far from the asymptotical ones.
In Figure 2 examples of the average computation times (in seconds) of different k-step sign (left column) and rank (right column) estimators are illustrated as a function of n for p = 2, 3 and 5. The average was taken over 100 trials. Tyler's M-estimator was again used as an initial estimator. The estimators were computed on an ordinary workstation using Cfunctions called from R, similar to the implementations found in R-package SpatialNP (Sirkiä et al., 2008).
Since these estimators operate on pairs of observations instead of observations themselves it is clear that in relative terms they are much more time consuming than the ordinary covariance matrix, for example. However, as shown in Figure 2, in absolute terms the increase is not unbearable, at least in a simple application. As the number of dimensions increase the time consumption of the sign estimator increases as a function of n faster than that of the rank estimator. This is because the computation of an outer product becomes heavier in higher dimension, and the number of those to be computed for the sign estimator is of order n 2 and only n for the rank estimator.
The use of pairs also implies that the increase in time consumption as a function of sample size is essentially quadratic. This together with the iterative nature of the original estimators can indeed lead to rather long computing times but it should be kept in mind that in practice the iteration is just a repetition of the step until some convergence criteria is met. This Table 2: Finite sample efficiencies of the k-step sign (left column) and rank (right column) estimators as compared to the regular shape estimator at different 3-variate t-distributions with selected values of ν.  Table 3: Finite sample efficiencies of the k-step sign (left column) and rank (right column) estimators as compared to the regular shape estimator at different 5-variate t-distributions with selected values of ν. criteria is basically a small number which is usually chosen somewhat arbitrarily but it has a severe impact on how many times the iteration step is repeated. On the other hand it has been previously shown that even with a very modest number of steps the properties of the k-step estimators are already close to those of the (theoretical) limit. Using the k-step versions removes the need for choosing the converence criteria, and as can be seen from the figure, can greatly reduce the total time consumption.

Conclusions
In this paper, we proposed so called k-step versions of the shape estimators introduced by Dümbgen (1998) and Oja and Randles (2004). The motivation for this study arose mainly from the fact that at the moment there is no proof for the existence and uniqueness of the estimator based on spatial rank vectors (Oja and Randles, 2004). From the practical point of view, the long computation times of the estimators may also cause problems in some applications.
The estimators proposed in this paper appear to be very practical with known asymptotical properties. We showed that the k-step estimators inherit some important properties of the initial estimators, namely, affine equivariance, robustness and limiting normality. The k-step sign estimator was also shown to have the important independence property if the initial estimator has it. This property is needed for example if the independent component analysis problem is solved using the technique utilizing two scatter (or shape) matrices . Note that in our robustness and simulation studies we used as a starting values the Tyler's M-estimator (Tyler, 1987), which does not possess the independence property. A simple example of the robust initial estimator with independence property is obtained as the Tyler's M-estimator is computed using n/2 pairwise differences x 2i − x 2i−1 , i = 1, . . . , n/2.
In our simulation studies of Section 5 we showed that with taking just few steps in the estimation procedure, robust and highly efficient alternatives to traditional spatial sign and rank covariance estimators are obtained. The computation times of these estimators are very low as compared to those of Dümbgen's estimator and spatial rank covariance matrix. Programs are available in the R-package SpatialNP.
Proof of Theorem 3. The functional (10) may alternatively be written as Then proceeding as in the proof of Theorem 2, that is, applying functional (12) to F ǫ = (1 − ǫ)F 0 + ǫ∆ x and taking the derivative with respect to ǫ at ǫ = 0, we get where u 12 , u 13 , u 1x , u x2 and u x3 are defined as in the previous proof. Next we will use the results given in the proof of Lemma 4 in Sirkiä et al. (2009). The terms on the second line of (13) include T r(IF (x; V k−1 , F 0 )) and therefore equal to zero. Further, where c 2 F = E F [q 2 F (r)] and q F (r) is defined in (4). Functions g 1 (r) and g 2 (r) are given in the proof of Lemma 3 in Sirkiä et al. (2009). Thus and further The result then follows by proceeding as in the proof of Theorem 2.
Proof of Corollary 1. As in , note that is clearly finite since 0 ≤ (x 1 − re 1 ) 2 2 ≤ ||x 1 − re 1 || 2 . By similar argument, it is seen that g 1 (r) − g 2 (r) in the influence function of k-step rank estimator is finite. Corresponding influence functions are therefore bounded when α V 0 is bounded.
Proof of Theorem 4. Consider first the limiting distribution of 1-step sign estimator. Write , that is, the estimator is scaled so that T r( V 1 ) = p. Let now x i , . . . , x n be a sample from a spherically symmetric distribution and write x ij = x i − x j , r ij = x ij and u ij = r −1 ij x ij , for simplicity. Further, write V * 0 = √ n( V 0 − I k ) and note that since V 0 is √ n-consistent, V * 0 is bounded in probability. Now, and [x T ij V −1 0 x ij ] −1 = 1 ||x ij || 2 1 + 1 √ n u T ij V * u ij + o p (n −1/2 ) .
Then (omitting the o p (n −1/2 ) terms) Proof of Corollary 2. Let X = {x 1 , . . . , x n } denote a random sample from a spherical distribution and write V k (X) for the estimator computed on the sample. Now due to affine equivariance of V k , in the elliptical case Then using some simple matrix algebra, we get √ n vec( V k (V 1/2 X) − V ) where W = I p 2 − p −1 vec(V )vec T (I p ) . By Slutsky's theorem, √ n vec( V k (V 1/2 X)− V ) has now a limiting normal distribution with asymptotic covariance matrix Finally note that W 2p −1 vec(V )vec T (V ) W T = 0, the result thus follows.