In(cid:13)uence functions and e(cid:14)ciencies of the canonical correlation and vector estimates based on scatter and shape matrices

In this paper, the in(cid:13)uence functions and limiting distributions of the canonical correlations and coe(cid:14)cients based on a(cid:14)ne equivariant scatter matrices are developed for elliptically symmetric distributions. General formulas for limiting variances and covariances of the canonical correlations and canonical vectors based on scatter matrices are obtained. Also the use of the so called shape matrices in canonical analysis is investigated. The scatter and shape matrices based on the a(cid:14)ne equivariant Sign Covariance Matrix as well as the Tyler’s shape matrix serve as examples. Their (cid:12)nite sample and limiting e(cid:14)ciencies are compared to those of the Minimum Covariance Determinant estimators and S-estimator through theoretical and simulation studies. The theory is illustrated by an example


Introduction
The purpose of canonical correlation analysis (CCA) is to describe the linear interrelations between p-and q-variate (p ≤ q) random vectors.New coordinate systems are found for both vectors in such a way that, in both systems, the marginals of the random variables are uncorrelated and have unit variances, and that the covariance matrix between the two random vectors is (R, 0), where R is a diagonal matrix with descending positive diagonal elements.The new variables and their correlations are called canonical variates and canonical correlations, respectively.Moreover, the rows of the transformation matrix are called canonical vectors.Canonical analysis is one of the fundamental contributions to multivariate inference by Harold Hotelling (1936).
To be more specific, assume that x and y are p-and q-variate random vectors, p ≤ q and k = p + q.Let F be the cumulative distribution function of the kvariate variable z = (x T , y T ) T .Decompose its covariance matrix (if it exists) as where Σ xx and Σ yy are nonsingular.In canonical analysis, one thus finds a p × p matrix A = A(F ), a q × q matrix B = B(F ) and p × p diagonal matrix R = R(F ) = diag(ρ 1 , . . ., ρ p ), ρ 1 ≥ . . .≥ ρ p , such that (1) The diagonal elements of R are called the canonical correlations, the columns of A and B the canonical vectors and the random vectors A T x and B T y give the canonical variates.
Simple calculations show that ). Therefore A and (the first p columns of ) B contain the eigenvectors of the matrices respectively.The eigenvalues of M A and M B are the same and are given by the diagonal elements of R 2 , so by the squared canonical correlations.We will assume throughout the paper that ρ 1 > . . .> ρ p to avoid multiplicity problems.From (1) we see that the eigenvectors need to be chosen such that A T Σ xx A = I p and B T Σ yy B = I q . (3) Alternatively, one can also find eigenvalues and orthonormal eigenvectors A 0 and B 0 of symmetric matrices as yy B 0 = B 0 (R, 0) T (R, 0), with A T 0 A 0 = I p and B T 0 B 0 = I q .The regular canonical vectors are then A = Σ −1/2 xx A 0 and B = Σ −1/2 yy B 0 .For more information on the canonical analysis problem, see e.g.Johnson and Wichern (1998, chapter 10).
To estimate the population canonical correlations and vectors one typically estimates Σ by the sample covariance matrix, and computes afterwards the eigenvalues and eigenvectors of the sample counterparts of the matrices M A and M B given in (2).This procedure is optimal for a multivariate normal distribution F , but it turns out to be less efficient at heavier tailed model distributions.Moreover, the sample covariance matrix is highly sensitive to outliers, and a canonical analysis based on this matrix will then give unreliable results.For these reasons, it can be appropriate to estimate Σ by other, more robust estimator.As such, Karnel (1991) proposed to use M-estimators and Croux and Dehon (2002) the Minimum Covariance Determinant estimator.However, no asymptotic theory has been developed yet for canonical analysis based on robust covariance matrix estimators.
It was only quite recently that Anderson (1999) completed the asymptotic theory for canonical correlation analysis based on the sample covariance matrix.In this paper we study the asymptotic distribution of estimates of canonical correlations and canonical vectors based on more general estimators of the population covariance matrix, the so called scatter matrices.The results will not be restricted to the normal case, but are valid for the class of elliptically symmetric model distributions.Moreover, also the asymptotic distribution for canonical analysis based on shape matrices has been derived.
The plan of the paper is as follows.In Section 2, we review scatter matrices and define the canonical correlation and vector functionals based on scatter functionals.We also treat shape matrices, which are estimating the form of the underlying elliptical distribution, but have no size information.In Section 3, we give the expressions for the influence functions of canonical correlation and vector functionals based on any regular scatter and shape matrix functional and in Section 4, the limiting distributions and the limiting efficiencies of canonical correlations and vectors are derived.Numerical values for the asymptotic efficiencies at normal distributions are presented for shape matrices based on the Sign Covariance Matrix (Ollila et al., 2003b), the Minimum Covariance Determinant estimators (Rousseeuw 1985) and S-estimator (Davies 1987).We also consider Tyler's shape matrix (Tyler 1987) estimator.By means of a simulation study, the finite sample efficiencies are compared with the limiting ones in Section 5 and a real data example will illustrate the methods.The Appendix collects all the proofs and additional lemmas.A functional V = V (F ), or alternatively V (z), is a shape matrix if it is P DS(k) with Det(V ) = 1 and affine equivariant in the sense that The condition Det(V ) = 1 is sometimes replaced by the condition T r(V ) = k but the former one is more convenient here.See Ollila et al. (2003a) and Hallin and Paindaveine (2004).If C(F ) is a scatter matrix then is the associated shape matrix.It can be seen as a standardized version of C(F ).However, a shape matrix can be given without any reference to a scatter matrix; the Tyler's shape matrix (1987) serves as an example.For the above elliptical distribution F , V (F ) = [Det(D T D)] −1/k D T D. This means that in the elliptic model, shape matrices estimate the same population quantity and are directly comparable without any modifications.Note that in several multivariate inference problems, the test and estimation procedures may be based on the shape matrix only.
Finally note that if C(F ) is a scatter matrix, the functional S(F ) = Det(C(F )) is a global scalar valued scale measure.The scale measure Det(Σ(F )) given by the regular covariance matrix is the well-known Wilks' generalized variance.In general, we will say that S(F ) is a scale measure if it is nonnegative and affine equivariant in the sense that S(Gz) = Det(G) 2 S(z) for all nonsingular k × k matrices G.Note that the shape and scale information may be combined to build a scatter matrix since Canonical correlation and vector functionals based on scatter and shape matrices are now defined as follows.We assume that the k-variate distribution of z = (x T , y T ) T is elliptic with cumulative distribution function F and that p ≤ q.Consider the scatter matrix with nonsingular C xx and C yy .The matrices A = A(F ), B = B(F ) and R = R(F ) chosen so that then yield the canonical vectors and correlations.The canonical correlations in R keep unchanged for all scatter matrices C. If the p canonical correlations are distinct, then the p × p matrix A and q × p matrix B 1 are unique up to a sign and the q × (q − p) matrix B 2 is unique up to multiplication on the right by an orthogonal (q − p) × (q − p) matrix.The values of the canonical vectors A and B will depend on the used scatter functional C via the constant c 0 .If, however, the scatter functional is such that C(F ) = Σ, then the canonical vectors become comparable over different scatter matrix estimators used.Now let A(F ), B(F ) and R(F ) be determined by a shape matrix functional Also now the canonical correlations in R keep unchanged for all V .The canonical vectors are unique up to a constant.We therefore make the choice to take A * and B * such that A * T V xx A * = I p and B * T V yy B * = I q .If the shape functional V is associated to a scatter functional C, then We call A * and B * the standardized canonical vectors.These standardized canonical vectors are comparable between any two scatter or shape matrix functionals used, whether a correction factor has been used or not.

Influence functions
Influence functions are often used for robustness considerations.The influence function measures the robustness of a functional T against a single outlier, that is, the effect of an infinitesimal contamination located at a single point z on the estimator (see Hampel et al., 1986).Consider hereafter the contaminated distribution where ∆ z is the cdf of a distribution with probability mass one at a singular point z.Then the influence function of T is defined as Lemma 1 in Croux and Haesbroeck (2000) states that, for any scatter functional C(F ), there exist two real valued functions γ C and δ C such that the influence function of C at a spherical F 0 , symmetric around the origin and with C(F 0 ) = I k , is given by Using the definition of determinant and basic derivation rules, the influence function of scale functional associated with scatter functional is seen to be and by chain rule, the influence function of associated shape functional is where γ V = γ C , see Ollila et al. (2003a).The influence functions of scatter, shape and scale functionals at elliptical F are given in Lemma 1 in the Appendix.
To derive the influence functions of canonical correlation and vector functionals R(F ), A(F ) and B 1 (F ) based on C(F ), we introduce the following notation.Write the canonical variates as where r stands for the length of the vector z and (u T , v T ) T is the direction vector, that is, the unit vector in the direction of z .Throughout the paper the cumulative distribution function of z is denoted by F .The influence functions at the elliptical F are now as follows (all proofs are found in the Appendix): Theorem 1.Let C be the affine equivariant scatter matrix functional used to obtain the canonical correlations R and the canonical vectors A and B 1 .Then the influence functions of the functionals R, A and B 1 at the k-variate elliptical distribution F are Here H 1 is a diagonal matrix with diagonal elements The elements of H 3 are for i = 1, . . ., q, j = 1, . . ., p, i = j and ρ i = 0 as i > p, and Finally, the elements of The influence functions of the canonical correlations R, and the standardized canonical vectors A * and B * 1 based on a shape matrix functional V are obtained using the fact that where C is a related scatter matrix constructed as C(F ) = S(F ) 1/k V (F ) for a given scale measure S, as described in Section 2.
Theorem 2. Let V be the affine equivariant shape matrix functional used to obtain the canonical correlations R and the standardized canonical vectors A * and B * 1 .Then the influence functions of the functionals R, A * and B * 1 at the k-variate elliptical distribution F are with H 1 , H 2 and H 3 as in Theorem 1.
Note that the above influence functions factorize in a product of a function of r and a function of (u, v), where we know that the distribution of r and (u, v) are statistically independent (see the proof of Theorem 1).Since H 1 (u, v, R), H 2 (u, v, R) and H 3 (u, v, R) are continuous functions on the periphery of an ellipsoid, it follows that the influence functions for the canonical correlations and standardized canonical vectors are bounded as soon as the associated γ V is bounded.Figure 1 illustrates functions γ V for the shape estimators used in efficiency and robustness comparisons in Sections 4 and 5 at the bivariate standard normal distribution.The influence functions can be found from Ollila et al. (2003a) for Tyler's M-estimator, from Ollila et al. (2003b) for affine equivariant sign covariance matrix (SCM) and from Lopuhaä (1989) for S-estimator.The influence functions of Minimum Covariance Determinant (MCD) estimator and Reweighted MCD-estimator (RMCD) are given in Croux and Haesbroeck (1999).As seen in Figure 1, function γ V is bounded for Tyler's M-estimator, MCD-estimators and S-estimator.

Limiting distributions and efficiencies
Assume next that z 1 , . . ., z n is a random sample from an elliptical distribution F with corresponding spherical distribution F 0 and that a correction factor is used to adjust the estimate so that C(F 0 ) = I k .Let then C be the estimator associated to the functional C(F ), that is C = C(F n ), where F n is the empirical distribution function computed from the sample.We will assume throughout the paper that the limiting distribution of √ n vec( C − C) is multivariate normal with zero mean vector and covariance matrix (cfr.Huber, 1981).Here "vec" vectorizes a matrix by stacking the columns on top of each other.Tyler (1982) showed that the above covariance matrix may be written as where I k,k is a k 2 × k 2 matrix with (i, j)-block being equal to a k × k matrix that has 1 at entry (j, i) and zero elsewhere.ASV ( C 12 ; F 0 ) represents the variance of any off-diagonal element of C at spherical F 0 and ASC( C 11 , C 22 ; F 0 ) is the covariance between any two distinct diagonal elements of C at F 0 .Note also that Similarly, we assume that the limiting distribution of √ n ( V − V ) is k 2 -variate normal with zero mean vector and covariance matrix where ASV ( V 12 ; F 0 ) is the variance of any off-diagonal element of V at F 0 .The limiting distribution of the shape matrix estimator is thus characterized by one single number, while the limiting distribution of a scatter matrix estimator is completely determined by 2 numbers.Limiting variances are derived in Lemma 5 in the Appendix.Write now R, A and B 1 for the canonical correlation and vector estimators based on C and let R, A and B 1 be the corresponding functional values.If ρ 1 > . . .> ρ p > 0, then at elliptical F , the limiting distributions of R, A and B 1 are multivariate normal.See Lemma 3 in the Appendix for the exact expressions.To compute the marginal distributions of canonical correlations and vectors at elliptical F , the following covariances of the elements of R, A and B 1 at canonical distribution F of z are needed.
Theorem 3. Let C 12 be any off-diagonal and C 11 any diagonal element of the scatter matrix C. At the canonical distribution F we have that: (v) For j = 1, . . ., p, and with q ≥ i > p, the asymptotic variance of b ij is given by (ρ −2 j − 1)ASV ( C 12 ; F 0 ).All the other limiting covariances between elements of R, A or B 1 are equal to zero.
The special case of the sample covariance matrix Cov at normal distribution gives the limiting covariances obtained by Anderson (1999).In this special case ASV ( Cov 11 ; F 0 ) = 2 and ASV ( Cov 12 ; F 0 ) = 1, and expressions (i), (iii) and (iv) correspond with those of Anderson (1999).Note that the second statement of Theorem 3 gives then a zero asymptotic covariance matrix between [ a ii , b ii ] T and [ a jj , b jj ] T .Anderson (1999) also assumed p = q, and therefore did not report the last statement of Theorem 3 for Cov.
From Theorem 3 and affine equivariance (as stated in Lemma 3 in the Appendix) one easily obtains the marginal distributions of canonical correlation and vector estimates at elliptical F .Here a 1 , . . ., a p and b 1 , . . ., b p denote the columns of A and B 1 , and a 1 , . . ., a p and b 1 , . . ., b p are the columns of A and B 1 , respectively.
Corollary 1.Let F be an elliptical distribution, then √ N ( r j − ρ j ), √ N ( a j − a j ) and √ N ( b j − b j ) have limiting normal distributions with zero mean and asymptotic variances for every 1 ≤ j ≤ p.For q ≥ k > p, we put ρ k = 0.
Note that the multiplication of B 2 = (b p+1 , . . ., b q ) by an orthogonal matrix does not affect the value of the asymptotic variances ASV ( b j ; F ) of the first p canonical vectors.Moreover, Corollary 1 implies that the asymptotic relative efficiency of the estimate r j,C based on a scatter matrix C with respect to r j,C * based on a scatter matrix C * at elliptical F is simply and the asymptotic relative efficiencies of two canonical vector estimates a j,C and a j,C * are determined by the following ratios The above relative efficiencies thus equal relative efficiencies of diagonal and offdiagonal elements of the scatter matrices at spherical F 0 .Now let R, A * and B * 1 be the canonical correlation and standardized canonical vector estimators based on a shape matrix estimator V .Again, if ρ 1 > . . .> ρ p > 0, then at elliptical F , the limiting distributions of R, A * and B * 1 are multivariate normal (see Lemma 4 in the Appendix).At canonical distribution F all asymptotic covariances of canonical correlation and standardized vector estimates are as follows.
Theorem 4. Let V 12 be any off-diagonal and V 11 any diagonal element of the shape matrix , is given by (v) For j = 1, . . ., p, and with q ≥ i > p, the asymptotic variance of c R b * ij is given by (ρ −2 j − 1)ASV ( V 12 ; F 0 ), All the other limiting covariances between elements of R, A * or B * 1 are equal to zero.
Combining Lemma 4 and Theorem 4 one again easily obtains the marginal distributions of the canonical correlations and standardized canonical vectors based on a shape matrix estimator.
have limiting normal distribution with zero mean and asymptotic variances ASV ( and where ρ k = 0, as k > p.
Note that now all the asymptotic efficiencies of canonical correlation and vector estimates based on V relative to estimates based on V * are given by .
Table 1 lists these asymptotic relative efficiencies of canonical correlation and vector estimates based on robust shape matrices with respect to the estimates based on classical shape matrix at k-variate normal distribution.Considered robust shape matrices are based on affine equivariant sign covariance matrix (SCM), a 25% breakdown S-estimator with biweight loss-functions, a 25% breakdown Reweighted Minimum Covariance Determinant (RMCD), Tyler's M-estimator and the 25% breakdown MCD-estimator.Ollila et al. (2003b).Davies (1987) and Lopuhaä (1989) showed that under general assumptions, the S-estimator of scatter has a limiting normal distribution.For the MCD and RMCD scatter estimators asymptotic normality has been shown by Butler et al. (1993) and by Lopuhaä (1999).Their limiting variances have been computed by Croux and Haesbroeck (1999).Finally, Tyler (1987) showed the limiting normality of Tyler's M-estimator.The asymptotic variance of Tyler's M-estimator equals k/(k + 2).Other examples of asymptotically normal scatter estimators which could be used here include for example the projection depth weighted scatter estimator by Zuo and Cui (2004).Recently, Hallin and Paindaveine (2004) and Hallin et al. (2004) have developed optimal nonparametric tests and corresponding estimates for shape.
The SCM estimator, being a covariance matrix build from affine equivariant sign vectors, has a very high efficiency at the normal model.S-estimators have a slightly lower efficiency, but in contrast to the SCM they have a high breakdown point.The other high breakdown point estimators RMCD and MCD suffer from larger losses in efficiency.Tyler's M-estimator has a low breakdown point, but is very fast to compute (see Hettmansperger and Randles, 2002), and has good efficiency properties in larger dimensions.For the efficiencies at heavy-tailed distributions, see Ollila et al. (2003a;2003b) and Croux and Haesbroeck (1999), for example.

Finite-sample efficiencies
In this section we compare by means of a modest simulation study finite-sample efficiencies of canonical correlation and vector estimates based on the robust shape matrices with corresponding estimates based on the classical shape matrix.At first, a number of M = 1000 samples of sizes n = 20, 50, 100, 300 were generated from three different 2p-variate normal distributions with fixed covariance matrices where R = diag(ρ 1 , . . ., ρ p ).Our choices for canonical correlations were (a) ρ 1 = 0.8, ρ 2 = 0.2 (b) ρ 1 = 0.6, ρ 2 = 0.4 and (c) ρ 1 = 0.9, ρ 2 = 0.6, ρ 3 = 0.3.The estimated quantities were the canonical correlations and the standardized canonical vectors.The estimated values were compared with the theoretical ones by the following mean squared errors (MSE).The MSE of the jth canonical correlation is given by where ρ j is the true canonical correlation and r (m) j the corresponding estimate computed from the mth generated sample.Further, the MSE of the jth canonical vector is measured by where a * j is the theoretical vector and a * (m) j the estimate obtained from the mth generated sample.Thus, this MSE is the average squared angle between the estimated and the true standardized canonical vectors.Working with the angle has the advantage that the same MSEs are obtained, whether one works with the standardized or unstandardized canonical vectors.The estimated efficiencies were then computed as ratios of the simulated MSEs and are listed in Tables 2-4.
Table 2: Finite-sample efficiencies of the canonical correlation and vector estimates based on five robust shape matrices.Samples were generated from a 4-variate normal distribution.The quantities to be estimated were ρ 1 = 0.8, ρ 2 = 0.2, a * T 1 = (1, 0) T and a * T 2 = (0, 1) As seen in Table 2, the finite-sample efficiencies converge to the asymptotic ones listed in the previous section.For the SCM and the S-estimator the finitesample efficiencies are very stable over the different sample sizes.For the other estimators, the convergence to the limiting variance is slower.The MCD is more efficient and the RMCD is less efficient at small sample sizes than one would expect from the asymptotic results.This finding, at least for the canonical correlation coefficients, is consistent over all considered simulation setups.For small samples (n = 20, n = 50), Tyler's estimator seems to be more efficient than RMCD, but for larger sample sizes the RMCD is of course more precise, given its larger asymptotic efficiency.
In the second case samples were generated from a 4-variate normal distribution, such that the true canonical correlations were closer to each other than in the previous case.Corresponding finite-sample efficiencies are given in Table 3.
Table 3: Finite-sample efficiencies of the canonical correlation and vector estimates.Samples were generated from a 4-variate normal distribution.The quantities to be estimated were ρ 1 = 0.6, ρ 2 = 0.4, a * T 1 = (1, 0) T and a * T 2 = (0, 1) As compared to the earlier case, now the differences between the finite-sample and asymptotic efficiencies are more pronounced especially for small sample sizes.This holds in particular for the canonical vectors: even in the case n = 300, the efficiencies are still quite different from the asymptotical ones for some estimators.This simulation experiment suggests that, when the canonical correlations are closer to each other, the convergence to the limit distribution for the canonical vectors is slower.This is because the canonical vectors of different orders are harder to distinguish.Comparing the different estimators reveals again that also at finite samples the SCM and S estimator outperform the other estimators in terms of statistical efficiency.The RMCD estimator behaves now much better at the small sample sizes.
In the third case samples were generated from a 6-variate normal distribution, so p = q = 3. Efficiencies of the first canonical correlation and vector estimates are reported in Table 4. Again, as n increases, the efficiencies converge to the asymptotic ones.Note that, by comparing Table 4 with Tables 2 and 3, the asymptotic efficiencies are indeed larger in the higher dimensional setting.However, this does not systematically carry over all finite sample sizes.
Table 4: Finite-sample efficiencies of the first canonical correlation and vector estimates.Samples were generated from a 6-variate normal distribution.The quantities to be estimated were ρ 1 = 0.9 and a * T 1 = (1, 0, 0) Finally, the finite-sample efficiencies of canonical correlation and vector estimates were compared in the case of heavy-tailed distribution.Samples were then generated from 6-variate t-distribution with 5 degrees of freedom and fixed covariance matrix with R = diag(0.9,0.6, 0.3).Resulting efficiencies are given in Table 5.
Table 5: Finite-sample efficiencies of the first canonical correlation and vector estimates.Samples were generated from a 6-variate t-distribution with 5 degrees of freedom.The quantities to be estimated were ρ 1 = 0.9 and a * T 1 = (1, 0, 0) As compared to the multinormal case, now the convergence to the asymptotic efficiencies is much slower.This slow convergence occurs now also for the SCM and S estimators.Especially for small sample sizes the loss in efficiency is remarkable, but also in the case n = 300, the efficiencies are substantially below the asymptotical ones.This holds for all considered estimators.Comparing the different estimators, we see that the SCM is not the most efficient estimator anymore, while the more robust estimators behave much better.Among the estimators considered here, S-estimator seems to give the best compromise between efficiency and robustness.
To compute the estimators, the FAST-MCD algorithm of Rousseeuw and Van Driessen (1999) was used for computation of the 25% breakdown point MCD and RMCD estimators.The S-estimator has been computed with the surreal algorithm of Ruppert (1992).For the computation of the SCM, the same approximations as in Ollila et al (2003b, Section 7) were used.

An example
In this section we apply the proposed methods through a simple example.We consider the Linnerud data (Tenenhaus, p. 15) consisting of 20 observations and wish to describe the relationships between two sets of variables, namely x 1 =weight, x 2 =waist measurement, x 3 =pulse and y 1 =pull-ups, y 2 =bendings, y 3 =jumps.In order to compare the methods proposed above, we consider canonical correlation and vector estimates obtained from different shape matrices.Estimates as well as corresponding standard deviations, obtained using the asymptotic results given in Corollary 2, are listed in Table 6.
The coefficients of the different canonical vectors are often used to interpret the canonical variates, since they give the weight of every variable.By reporting the standard error around these coefficients, one can quickly see whether these coefficients are significantly different from zero or not.Although reporting these standard errors is no common practice in canonical analysis (probably also because the asymptotic distribution of the canonical vectors has only been established recently, even in the classical case), it helps to detect non-significant coefficients and it helps to avoid overinterpretation.For example, one sees that for all shape matrices considered a * 1 is mainly determined by x 2 , and to a lesser extend by x 1 .On the other hand, for none of the considered shape estimators, b * 1 is not significantly affected by y 1 .Note that standard errors are larger for the less efficient estimators, like the MCD.Differences between the different estimation procedures do not seem to be substantial.A more detailed look is revealed by the plot of the the first canonical variates (x 1 , y 1 ) in the Figure 2. The fitted lines are resulting from the canonical analysis, having as equation y 1 = ρ1 x 1 .We see that the Classical and the SCM approach, both having a zero breakdown point, have been attracted by the outliers in the upper right and lower left corner of the plot.The MCD and RMCD have been more resistant with respect to these outliers, and the data cloud is more concentrated around the linear fit, as is also witnessed by the higher values for the first correlation coefficent of these estimators.
Table 6: Canonical correlation and vector estimates for the Linnerud data given by the classical shape matrix, the SCM-, the S-, the RMCD-based, Tyler's, and the MCD-based shape matrix.The standard deviations are reported between parentheses.

Conclusion
The asymptotic behaviour of canonical correlations has been widely studied in the literature (e.g.Hsu, 1941;Eaton and Tyler, 1994), but less attention has been given to the limiting distribution of canonical vectors.Anderson (1999) reviews previous work on the asymptotics of canonical analysis, and clearly states the asymptotic variances and covariances of both canonical correlations and vectors derived from the sample covariance matrix.It is not without interest to have information on the asymptotic variance of the canonical vectors since it allows, for example, to compute (asymptotic) standard errors around the coefficients of the canonical vectors.Since these coefficients are often interpreted as the contributions of the original marginal variables to the canonical vectors, it is useful to check on their significance.
In this paper a full treatment of the asymptotic distribution of the canonical correlations and canonical vectors derived from any regular affine equivariant scatter matrix estimator is given.Results do not only hold at the normal, but at any elliptical distribution where the scatter matrix being used is well defined and asymptotically normal.Moreover, we allow for a different dimension of the two multivariate variables x and y, a situation often occuring in practice.The advantage of working with shape matrices, yielding standardized canonical vectors, has also been pointed out.Also here, a full treatment of the asymptotic distribution of the canonical correlations and standardized canonical vectors derived from any regular affine equivariant shape matrix estimator has been presented.In the paper we have considered five shape estimators in more detail.We have shown that the canonical correlations and vectors based on SCM-and S-estimators have good limiting and finite-sample efficiencies and as illustrated by an example, especially MCD-based estimators are resistant to outliers. where To prove Theorems 1 and 2 , we use the following affine invariance property of canonical correlation functional R(F ) and affine equivariance properties of canonical vector functionals A(F ), B(F ), A * (F ) and B * (F ).The proofs are straightforward and follow from the affine equivariance properties of C(F ) and V (F ).
Lemma 2. Let z = (x T , y T ) T follow the k-dimensional distribution F and write R(F ), A(F ) and B(F ) alternatively as R(x T , y T ) T , A(x T , y T ) T and B(x T , y T ) T .Then for every nonsingular p × p and q × q matrices Ã and B.
and similarly for standardized canonical vectors A * (F ) and B * (F ), Proof of Theorem 1 Let F be the cdf of the canonical variates Due to Lemma 2, it is enough to compute the influence functions at F , where and B 22 is an orthogonal (q −p)×(q −p) matrix.Then C xx (F ) = I p , C yy (F ) = I q and C xy (F ) = C T yx (F ) = (R, 0).The influence functions of A, B and R at F are obtained as follows.From the conditions A T C xx A = I p and B T C yy B = I q we have that and where and H i is a 2p×2p matrix with four non-zero elements namely , where ρ i = 2∆ i (1 + ∆ 2 i ) −1 .The direction vector of z equals then (u T , v T ) T = H −1 (s T , t T ) T .(Thus also r and (u T , v T ) T are independent).Now equation (4) gives and affine equivariance of C yields Combining ( 12) with the formulas for influence functions yield the expressions for the influence functions at F .Then by the definition of the influence function and Lemma 2 we have From the above relations between the influence functions at F and F , the desired influence functions follow.
Proof of Theorem 2 First note that the canonical correlations derived from V or the associated scatter matrix C are the same.Therefore it follows from Theorem 1 and (6) that

The influence functions of
where IF (z ; Det(C), F ) = Det(C(F ))IF (z 0 ; Det(C), F 0 ) was used together with (5).Similarly So by Lemma 2 at elliptical F the influence functions become Proof of Lemma 3 The asymptotic normality of R, A and B 1 follows simply by the delta-method, see for example Anderson (1999).The asymptotic variances are obtained using Theorem 1 and the following property of vec-operator: Further, the limiting distributions of canonical correlation and standardized vector estimators R, A * and B * 1 based on shape matrix estimator V are as follows.The proof is as the proof of Lemma 3. To prove Theorems 3 and 4, we use the following Lemma.The results follow from (4) and (6).
Lemma 5.At spherical distribution F 0 , the limiting variances of any diagonal and off-diagonal elements of C are and .
Further, the limiting variance of any off-diagonal element of V is .
Proof of Theorem 3 Consider for example the limiting variance of r i , for 1 ≤ i ≤ p. Lemma 3 gives Use now the transformation (11), then where s i and t i are different marginals of a vector distributed uniformly on the periphery of the k-dimensional unit-sphere, and also independent of r.Then, after some tedious calculations, When carrying out the calculations, symmetry properties of s i and t i can be used together with E[s 2 i ] = 1/k, E[s 4 i ] = 3/(k(k + 2)) and E[s 2 i t 2 i ] = 1/(k(k + 2)) (see Lemma 5 in Ollila et al., 2003b).Other limiting variances and covariances are obtained in a more or less similar way, by carefully carrying out computations along the lines above.

Proof of Theorem 4
As the proof of Theorem 3.

2
Canonical correlations and vectors based on scatter and shape matricesLet us first define the scatter and shape functionals.A k × k matrix valued statistical functional C = C(F ) is a scatter matrix if it is positive definite and symmetric (P DS(k)) and affine equivariant.We can denote C(F ) alternatively asC(z) if z ∼ F .Affine equivariance then means that C(D T z + b) = D T C(z)Dfor all nonsingular k × k matrices D and k-vectors b.This implies that, for a spherically symmetric distribution F 0 , C(F 0 ) = c 0 I k with some constant c 0 > 0 depending on C and F 0 .If F is the cdf of an elliptically distributed random vector z = D T z 0 + b, where z 0 ∼ F 0 , then C(F ) = c 0 D T D. Therefore a correction factor is needed for Fisher consistency of C(F ) towards Σ(F ).Introducing such a correction factor also allows comparisons between different scatter matrix estimates at a specific model.

Figure 1 :
Figure 1: Examples of the function γ V for some shape estimators at the bivariate (k = 2) standard normal distribution.

Figure 2 :
Figure 2: Scatterplot of the first canonical variates based on classical and robust shape matrices.

Table 1 :
Asymptotic Relative Efficiencies of the canonical correlation and vector estimates based on several robust shape matrices relative to the estimates based on the classical sample covariance matrix at a k-variate normal distribution.