Rank scores tests of multivariate independence

. New rank scores test statistics are proposed for testing whether two random vectors are independent. The tests are asymptotically distribution-free for elliptically symmetric marginal distributions. Recently, Gieser and Randles (1997), Taskinen, Kankainen and Oja (2003) and Taskinen, Oja and Randles (2005) introduced and discussed di(cid:11)erent multivariate extensions of the quadrant test, Kendall’s tau and Spearman’s rho statistics. In this paper, standardized multivariate spatial signs and the (univariate) ranks of the Mahalanobis-type distances of the observations from the origin are combined to construct rank scores tests of independence. The limiting distributions of the test statistics are derived under the null hypothesis as well as under contiguous sequences of alternatives. Three di(cid:11)erent choices of the score functions, namely the sign scores, the Wilcoxon scores and the van der Waerden scores, are discussed in greater detail. The small sample and limiting e(cid:14)cien-cies of the test procedures are compared and the robustness properties are illustrated by an example. It is remarkable that, in the multinormal case, the limiting Pitman e(cid:14)ciency of the van der Waerden scores test equals to that of the classical parametric Wilks’ test.


Introduction
Puri and Sen (1971) introduced a nonparametric analogue to Wilks' test where the data vectors are replaced by the vectors of their componentwise ranks.Gieser and Randles (1997) and Taskinen et al. (2003) proposed invariant extensions of the univariate quadrant test of Blomqvist (1950).The former test procedure is based on interdirection counts and the latter on standardized spatial signs.If the marginal distributions of x (1) i and x (2) i are elliptic, these two tests are asymptotically equivalent.Later Taskinen et al. (2005) proposed multivariate invariant extensions of Kendall's tau and Spearman's rho.
Our plan is as follows.In Section 2, we explain the test constructions starting with standardized spatial signs and ranks of the lengths of the standardized vectors.The test statistics for multivariate dependence are then introduced.Special choices of the score functions then yield the sign test, the Wilcoxon scores test and the van der Waerden scores test.In Section 3, the limiting distribution of the test statistic is derived under the null hypothesis and under interesting sequences of contiguous alternatives.The finite-sample and limiting efficiencies of the new procedures are then compared to that of the classical Wilks' test in Section 4, and the robustness properties are illustrated by an example in the final Section 5.The proofs are postponed to Appendix I.

The rank scores test statistics
2.1.Spatial signs and ranks of the distances from the origin Consider a random sample x 1 , . . .x n from a k-variate distribution.The spatial sign of vector x is defined as S(x) = x −1 x, x = 0 0, x = 0, where x = (x T x) 1/2 is the (Euclidean) length of the vector x.The spatial signs S(x i ) and ranks rank(||x i ||) of the distances from the origin are not invariant under affine transformations to the data vectors, however.In order to construct invariant test statistics, the data points have to be standardized before spatial signs and ranks are formed.For the standardization we need affine equivariant √ nconsistent location vector and scatter matrix estimates, µ and C. The transformed data points are then given as The vectors u i = S(z i ), i = 1, . . ., n, are called standardized spatial sign vectors.Standardized sign vectors are affine invariant in the sense that if u * i are calculated from x * i = Ax i + b, i = 1, . . ., n, with a nonsingular k × k matrix A and k-vector b, then u * i = P u i , i = 1, . . ., n, for some orthogonal P .See e.g.Taskinen et al. (2005).The ranks R i = rank(||z i ||) are naturally affine invariant (in the usual sense).Note that, in the standardization, the scatter matrix estimate C may be replaced by a √ n-consistent affine equivariant shape matrix estimate V as only the directions and ranks of distances are used in the analysis.For the shape matrices, see Ollila et al. (2003).Note also that, if the standardization is done using such location vector and scatter (or shape) matrix estimates that do not require any moment assumptions of the underlying data (e.g.Tyler's shape matrix and the transformation retransformation spatial median in Hettmansperger and Randles, 2002), then the resulting test procedures are valid without any moment assumptions.

New test statistics
Our test statistic for testing the null hypothesis of independence is obtained as follows.For i = 1, . . ., n, write u n .For the second random vector, write similarly u (2) i for q-dimensional standardized sign vectors based on x (2) i and let r (2) i and R (2) i be constructed as before.The test statistic is then as follows.
The rank test statistic for testing H 0 is then where ] with U uniformly distributed on (0, 1).
Note that since standardized sign vectors and ranks are invariant with respect to the group of affine transformations, the invariance of T n easily follows.As score functions, one may use optimal location score functions.See Hallin and Paindaveine (2002), for that.In the following, some choices of the score functions and resulting test statistics are given.Definition 2.2.For a(u) = 1 and b(u) = 1, the sign test of independence (Taskinen et al., 2003) with test statistic is obtained.For a(u) = u and b(u) = u, one gets the Wilcoxon (scores) test of independence with test statistic , where Ψ k is a cdf of chi-square distribution with k degrees of freedom, yield the van der Waerden (scores) test of independence with test statistic

Limiting distributions
In order to derive the limiting distribution of T n , we assume that the marginal distributions of x (1) and x (2) are elliptically symmetric.The marginal density functions are then of the form where Σ is a positive definite symmetric matrix and f 0 (z) = exp{−ρ(||z||)} with z = Σ −1/2 (x−µ).Note that if r = ||z|| and u = z/r, then r and u are independent.In the following we denote the cdf of r (1) as G 1 and the cdf of r (2) as G 2 .
To establish a limiting distribution of our test statistic under the null hypothesis, we need the following lemma.
Now the limiting distribution can be found easily.
Theorem 3.2.Under H 0 and for elliptically distributed x (1) and x (2) , the limiting distribution of T n is a chi-square distribution with pq degrees of freedom.
Next we derive the limiting distribution of T n under alternative sequences similar to those used in Gieser and Randles (1997).As T n is affine invariant, we restrict to the spherical case only.See Appendix II, for a discussion on the alternative sequences.Let thus x (1) i and x (2) i be independent with spherical marginal densities exp{−ρ 1 (||x (1) ||)} and exp{−ρ 2 (||x (2) ||)}, respectively, and write , where ∆ = δ/ √ n.If T * n is calculated from transformed observations in (3.1), we get Theorem 3.3.Under general assumptions (stated in the Appendix), the limiting distribution of T * n is a noncentral chi-square distribution with pq degrees of freedom and noncentrality parameter where and i ))r i ], with optimal location score functions ψ 1 (r

Limiting Pitman efficiencies
In this section we consider the sign, Wilcoxon and van der Waerden tests of independence: We compare the limiting and finite-sample efficiencies of the new tests to those of the Wilks' likelihood ratio test W n .The comparisons are made in the multivariate normal distribution, t distribution and contaminated normal distribution cases.Since −n log W n has, under the alternative sequences, a limiting noncentral chi-squared distribution with pq degrees of freedom and noncentrality parameter , the asymptotic efficiencies are simply , where c 1 and c 2 are given in Theorem 3.3.Note that for multivariate normal distribution, ψ(r) = r, for k-variate t distribution with ν degrees of freedom, ψ(r) = (k + ν)r/(ν + r 2 ) and for k-variate contaminated normal distribution with cdf Assume now for simplicity that M 1 = M T 2 .For the limiting efficiency of the sign test of independence, we refer to Taskinen et al. (2003).The limiting efficiency of the Wilcoxon test T 1n with respect to the Wilks' test where i )ψ 1 (r i ] and i ].The resulting efficiencies for t distributions with selected degrees of freedom and dimensions are listed in Table 1 and for contaminated normal distributions with = 0.1 and for selected values of c in Table 2.The efficiencies were derived using numerical integration.
Further, the limiting efficiency of the van der Waerden test T 2n as compared to the W n is where i i . Now for the multivariate normal distribution, ARE(T 2n , W n ) = 1, and for the contaminated normal distribution, ARE( These efficiencies do not depend on the dimensions at all.For the efficiencies at certain contaminated normal distributions, see Figure 1.The efficiencies for t distribution with 5 degrees of freedom were derived using numerical integration and are listed in Table 3.Now some comments follow.First of all, the limiting efficiencies of the Wilcoxon test T 1n decrease with increasing dimension while the efficiencies of sign test T 0n and van der Waerden test T 2n increase or stay constant.Due to this property, for low dimensions, the efficiencies of T 1n are higher than those of T 0n , but for high dimensions, T 0n outperforms T 1n .The van der Waerden scores test is the most efficient one in all considered cases.When the underlying distribution is multivariate normal, it is as efficient as the Wilks' test.When the distribution becomes heavy-tailed, the efficiencies are higher than those of T 0n and T 1n (for the contaminated normal distribution with = 0.1 and c = 3 and c = 6, the efficiencies of T 2n are 1.254 and 1.891).For comparisons of limiting efficiencies, see also Figures 2 and 3.

A simulation study
A simple simulation study was used to compare the finite sample efficiencies of W n , T 0n , T 1n and T 2n .1500 independent x (1) -and x (2) -samples of sizes n = 50 and 200 were generated from a multivariate standard normal distribution, from a t distribution with 5 degrees of freedom and from a contaminated normal distribution with = 0.1 and c = 6.The transformation in (3.1) with M 1 = M T 2 = I was applied for chosen values of ∆ = δ/ √ n to introduce dependence into the model.The tests were applied using the location and shape estimates chosen to satisfy ave{ S (1) i } = 0 and p ave{ S (2) T i } = I q , that is, the transformation retransformation spatial median and the Tyler's Mestimate (Tyler, 1987;Hettmansperger and Randles, 2002).For the transformation retransformation technique, see also Chakraborty et al. (1998).The critical values used in test constructions were based on the chi-square approximations to the null distributions.
In Figure 2, the empirical powers as well as exact limiting powers (n = ∞) computed using Theorem 3.3 are given for p = q = 3.In the multivariate normal case W n is slightly better than T 1n and T 2n and much better than T 0n .In the t distribution case no big differences can be seen between tests and in the contaminated normal case T 1n and T 2n outperform W n and T 0n .In Figure 3, the empirical powers are illustrated for p = q = 8.In the multivariate normal case T 0n and T 2n are slightly more powerful than T 1n .In the considered t distribution case T 1n performs poorly, but as the underlying distribution is contaminated normal, T 1n performs very well.As p = q = 8, the sizes of T 0n and T 2n are often slightly below 0.05.The size of T 1n is very close to 0.05 in all cases and for heavy-tailed distributions, the size of W n often exceeds 0.05.

A robustness study and final comments
Finally, a simple simulation study was used to illustrate the robustness of test statistics proposed above.Independent x (1) -and x (2) -samples of size n = 30 were generated from a bivariate standard normal distribution and the transformation in (3.1) with M 1 = M T 2 = I 2 was applied for chosen values of ∆ to introduce "positive" dependence into the model.By positive dependence we mean that each x (1) -coordinate is positively dependent on each x (2) -coordinate.Finally, the first observation vectors in each sample were replaced by contaminated vectors x .Empirical powers for p = q = 8 using the multivariate normal distribution (first row), multivariate t distribution with ν = 5 (second row) and contaminated normal distribution with = 0.1 and c = 6 (third row).The thick solid line denotes W n , the thin solid line T 1n , the thick dotted line T 2n and the thin dotted line T 0n .
In Figure 4, the mean p-values are illustrated as a function of contamination value c for ∆ = 0 and for ∆ = 0.2.In the null hypothesis case (∆ = 0), all tests give p-values close to 0.5, as the contamination value is near zero.Note also that T 0n and T 1n give practically the same p-values as Wilks' test.When the contamination value is high, p-values given by Wilks' test decrease considerably and some decrease is also seen in the p-values of T 1n and T 2n .In the considered case, the sign test T 0n seems to be the most robust one, since the mean p-value is constant as a function of c.As ∆ = 0.2, the contamination slightly increases the mean values of rank scores tests.In the case of Wilks' test the p-values first increase and then decrease to zero with the contamination value.The careful analysis shows that the small p-values for large contamination values erroneously indicate "negative" dependence, however.In the paper, new affine invariant rank scores procedures were proposed for testing whether two random vectors are independent.The test statistics were constructed using standardized spatial signs and ranks of the lengths of the standardized vectors.It is remarkable that, the proposed tests are valid without any moment assumptions on the underlying data as far as the standardization is done using such location vector and scatter (or shape) matrix estimates that do not require any moment assumptions.In the paper, three different score functions, namely the sign scores, the Wilcoxon scores and the van der Waerden scores were considered in more detail.The tests have good limiting and finite-sample efficiencies and as illustrated by an example, the tests are resistant to outliers.can be made as small as one wishes.For the latter convergence note that a 2 (u)du.

5.
Next decompose H − H into two parts as follows.
i )) u i u (2 i u (2 i u (2) T i )} =: So it is enough to show that H 1 → p 0 and H 2 → p 0. We proceed by proving E[vec(H i )] → 0 and V ar[vec(H i )] → 0 for i = 1, 2. 6.As the standardized sign vectors are equivariant and ranks (of distances) are invariant under sign changes of the original data vectors, E[vec(H 1 )] = 0 and E[vec(H 2 )] = 0. 7.As
8. Consider next the variance of vec(H 2 ).To shorten notations, write a i = a(G 1 (r i )).The variance can then be written as i v ( i v ( i u (2 which converges (use again the sign change property) to zero when E[1/r i ] < ∞. 9.The result follows as µ * and C * are bounded in probability.
Proof of Theorem 3.2 By Lemma 3.1, the limiting distribution of T n can be found using the limiting distribution of √ nvec(H).Since for i = 1, . . ., n, i ))vec(u i u (2 i ))vec(u i u ( i u (2 ] with U uniformly distributed on (0, 1), the central limit theorem implies that Proof of Theorem 3.3 In the proof, we apply LeCam's third lemma.See for example Hájek et al. (1999, Section 7.1).For testing H 0 against H ∆ , the optimal likelihood ratio test statistic is i , y i ) − log f 0 (y i , y i )}.
Gieser (1993) considered the asymptotic representation L = √ nδK− 1 2 δ 2 σ 2 +o P (1), where i )r (1) i i )r (2) i u (1 i )r (2) i i )r i u (2) T i M 2 u (1) i , If in the above representation E(k i ) = 0 and E(k 2 i ) = σ 2 , the sequence of alternatives is contiguous to the null hypothesis (LeCam's first lemma).See Gieser (1993) for mild conditions.Write then vec(H) = 1 n n i=1 h i , where H is given in Lemma 3.1.We assume that, under H 0 , √ n vec(H) , where E 0 denotes the expectations taken under the null hypothesis.Then by LeCam's third lemma, √ nvec(H) → d N pq (E 0 (h i k i ), σ 2 a σ 2 b /pq I pq ) under the alternative sequences.Using the independence of r i , r (2) i and u (2) i , it is easy to see that i ))ψ 1 (r i ))r i ]vec(M 1 ) i ))ψ 2 (r i ))r Appendix II: Some notions on alternative sequences For all elliptic cases, it is enough to consider the alternative sequences y , where ∆ = δ/ √ n and x (1) i and x (2) i are independent with spherical marginal distributions.This is because, for the weighted sum of elliptical marginals, .

Figure 1 .
Figure 1.ARE(T 2n , W n ) as a function of c at the contaminated normal model with = 0, 0.05, 0.10, 0.20.

Figure 2 .Figure 3
Figure2.Empirical powers for p = q = 3 using the multivariate normal distribution (first row), multivariate t distribution with ν = 5 (second row) and contaminated normal distribution with = 0.1 and c = 6 (third row).The thick solid line denotes W n , the thin solid line T 1n , the thick dotted line T 2n and the thin dotted line T 0n .

Figure 4 .
Figure 4. Mean p-values for the true null hypothesis H 0 : ∆ = 0 (left figure) and for the alternative hypothesis H 1 : ∆ = 0.2 (right figure) as a function of contamination value as described in the text.The thick solid line refers to W n , the thin solid line to T 1n , the thick dotted line to T 2n and the thin dotted line to T 0n .

Table 2 .
ARE(T 1n , W n ) at different p-and q-variate contaminated normal distributions for = 0.1 and for selected values of c.