On Mardia’s Tests of Multinormality

. Classical multivariate analysis is based on the assumption that the data come from a multivariate normal distribution. The tests of multinor-mality have therefore received very much attention. Several tests for assessing multinormality, among them Mardia’s popular multivariate skewness and kurtosis statistics, are based on standardized third and fourth moments. In Mardia’s construction of the aﬃne invariant test statistics, the data vectors are ﬁrst standardized using the sample mean vector and the sample covariance matrix. In this paper we investigate whether, in the test construction, it is ad-vantagoeus to replace the regular sample mean vector and sample covariance matrix by their aﬃne equivariant robust competitors. Limiting distributions of the standardized third and fourth moments and the resulting test statistics are derived under the null hypothesis and are shown to be strongly dependent on the choice of the location vector and scatter matrix estimate. Finally, the eﬀects of the modiﬁcation on the limiting and ﬁnite-sample eﬃciencies are illustrated by simple examples in the case of testing for the bivariate normality. In the cases studied, the modiﬁcations seem to increase the power of the tests.


Introduction
Classical methods of multivariate analysis are for the most part based on the assumption that the data are coming from a multivariate normal population. In the one-sample multinormal case, the regular sample mean vector and sample covariance matrix are complete and sufficient, but on the other hand highly sensitive to outlying observations. It is also well known that the tests and estimates based on the sample mean vector and sample covariance matrix have poor efficiency properties in the case of heavy tailed noise distributions. Testing for departures from multinormality helps to make practical choices between competing methods of analysis.
In the univariate case, standardized third and fourth moments b 1 and b 2 are often used to indicate the skewness and kurtosis. For a random sample x 1 , ..., x n from a p-variate distribution with sample mean vectorx and sample covariance matrix S, Mardia (1970Mardia ( ,1974Mardia ( ,1980 defined the p-variate skewness and kurtosis statistics as b 1,p 2 , respectively. The statistics b 1,p and b 2,p are functions of standardized third and fourth moments, respectively. From the definition one easily sees that b 1,p and b 2,p are invariant under affine transformations. In the univariate case these reduce to the usual univariate skewness and kurtosis statistics b 1 and b 2 . Mardia advocated using the skewness and kurtosis statistics to test for multinormality as they are distribution-free under multinormality. See also Bera and John (1983) and Koziol (1993) for the use of the standardized third and fourth moments in the test construction.
Intuitively, as in the univariate case, the skewness and kurtosis statistics b 1,p and b 2,p compare the variation measured by third and fourth moments to that measured by second moments. The second, third and fourth moments are all highly non-robust statistics, however, and more efficient procedures may be obtained through comparisons of robust and nonrobust measures of variations. It is therefore an appealing idea to study whether it is useful or reasonable to replace the regular sample mean vectorx and sample covariance matrix S by some affine equivariant robust competitors, location vector estimateμ and scatter matrix estimateΣ.
The paper is organised as follows. In Section 2 the limiting distributions of Mardia's test statistics under the null model of multinormality are discussed. The test statistics are based on the standardized third and fourth moments; the regular mean vector and regular covariance matrix are used in the standardization. Section 3 lists the tools and assumptions for alternative √ n-consistent affine equivariant location and scatter estimates. The location and scatter estimates are then used in the regular manner to standardize the data vectors. The limiting distributions of the standardized third and fourth moments in this general case are found in Section 4. As an illustration of the theory, the effect of the way of standardization on the limiting and small sample efficiencies in the bivariate case are studied in Sections 5 and 6. The auxiliary lemmas and proofs have been postponed to the Appendix.

Limiting null distributions of Mardia's statistics
In this section we recall the wellknown results concerning the limiting null distributions of the Mardia's skewness and kurtosis statistics. As, due to affine invariance, the distributions of the test statistics do not depend on the unknown µ and Σ, it is not a restriction to assume in the following derivations that x 1 , ..., x n is a random sample from the N p (0, I p ) distribution. Write for the data vectors standardized in the classical way. Mardia's skewness statistic can be decomposed as and the Mardia's kurtosis statistic is similarly See e.g. Koziol (1993). Under multinormality b 1,p and b 2,p are affine invariant, all the standardized third and fourth moments are asymptotically normal and asymptotically independent and consequently the limiting distributions of are a chi-square distribution with p(p+1)(p+2)/6 degrees of freedom and a N (0, 1) distribution, respectively.

Location and scatter estimates
In Mardia's test construction, the data vectors are standardized using the regular sample mean vectorx and sample covariance matrix S. In this paper we thus consider what happens if these are replaced by other √ n-consistent affine equivariant location and scatter estimates of µ and Σ, say,μ andΣ. Next we list assumptions and useful notations for location vector estimateμ and scatter matrix estimateΣ. First we assume that, for the N p (0, I p ) distribution, the influence functions of these location and scatter functionals at z are γ(r) u and α(r) uu T − β(r) I p , respectively, where r = ||z|| and u = ||z|| −1 z. We will later see that the limiting distributions of the modified test statistics depend on the used location and scatter estimates through these functions. For the inluence functions of location and scatter functionals, see also Hampel et al. (1986), Croux and Haesbroek (2000) and Ollila et al. (2003a,b).
We further assume that, again under the N p (0, I p ) distribution, Finally, assume that, for the N p (0, I p ) distribution, the expected values E(γ 2 (r)), E(α 2 (r)) and E(β 2 (r)) are finite. These assumptions imply that the limiting distributions of √ nμ and √ n vec(Σ − I p ) are p-and p 2 -variate normal distributions with zero mean vectors and covariance matrices

Limiting distribution of the standardized third and fourth moments
Consider now any estimatorsμ andΣ and write for the standardized observations. Denote the standardized third and fourth moments by All the other covariances are zero. and .
We close this section with some discussion on the implications of the above results. First note that, if the regular mean vector and sample covariance matrix are used in the standardization, then simply γ(r) = r, α(r) = r 2 and β(r) = 1 which gives κ = τ 1 = τ 2 = 0 and all the third moments and fourth moments are asymptotically mutually independent, respectively. For other location and scatter statistics this is not necessarily true and also the limiting variances may vary. This means that the limiting distributions and efficiencies of b 1,p and b 2,p may drastically change. The limiting distribution of nb 1,p /6 is a weighted sum of chi-square variables with one degree of freedom which makes the calculation of the approximate p-value less practical. The limiting distribution of √ n(b 2,p − p(p + 2)) is still normal with mean value zero, but its limiting variance depends on the chosen scatter matrix. In the last two section we illustrate the impact of the choice of the location vector and scatter matrix estimate on the limiting distribution and efficiency in the bivariate case. The extension to the general p-variate case is straightforward.

Tests for bivariate normality
Assume now that x 1 , ..., x n is a random sample from an unknown bivariate distribution and we wish to test the null hypothesis H 0 of bivariate normality. For limiting power considerations we use the contaminated bivariate normal distribution, the so called Tukey distribution: We say that the distribution of x is a bivariate Tukey distribution T (∆, µ, σ) if the pdf of x is  It is now possible to construct the limiting distributions of the affine invariant test statistics b 1,2 and b 2,2 . The comparison of the effect of different standardizations is possible but not easy, however, as the limiting distributions may be of different type. This is avoided if one uses the following test statistic versions: Definition 5.3. Modified Mardia's skewness and kurtosis statistics in the bivariate case are b * 1,2 = T 1 T Ω −1 1 T 1 and b * 2,2 = (1, 1, 2)T 2 . Note that, if the sample mean and covariance matrix are used, the definition gives b * 1,2 = b 1,2 /6 and b * 2,2 = b 2,2 − 8, respectively. In the former case, with general µ andΣ, the affine equivariance property is lost.
The alternative to the regular sample mean vector and sample covariance matrix used in this example is the affine equivariant Oja median (Oja, 1983) and the related scatter matrix estimate based on the Oja sign covariance matrix (SCM).
See Visuri et al. (2000) and Ollila et al. (2003b) for the SCM. Note also that the scatter matrix estimate based on the Oja SCM is asymptotically equivalent with the zonoid covariance matrix (ZCM) and can be seen as an affine equivariant extension on mean deviation. For the definition and use of the ZCM, see Koshevoy et al (2003). We chose these statistics as their influence functions are relatively simple at the N 2 (0, I 2 ) case, the influence functions are given by γ(r) = 2 2 π , α(r) = 2 2 2 π r − 1 and β(r) = 1.
The constants needed for the asymptotic variances and covariances are then Consider first the alternative contiguous sequence H n 1 with µ = (r, 0, ..., 0) T . Using classical LeCam's third lemma one then easily sees that, under the alternative sequence H n 1 , the limiting distribution of nb * 1,2 is a noncentral chi-square distribution with 4 degrees of freedom and noncentrality parameter δ T 1 Ω −1 1 δ 1 , where δ 1 depends on the location estimate used in the standardization. In the regular case (sample mean vector), δ 1 = (0, 0, r 3 , 0) T and the Oja median gives The function q(r) = ||R(µ)|| is easily computable and defined through the so called spatial rank function . See Möttönen and Oja (1995).
Next we consider the sequence H n 2 .Then and under H n 2 the limiting distribution of n(b * 2,2 ) 2 /V ar(b * 2,2 ) is a noncentral chisquare distribution with one degree of freedom and noncentrality parameter [(1, 1, 2)δ 2 ] 2 /V ar(b * 2,2 ), where δ 2 in this case depends on the scatter estimate used in the standardization. In the regular case (sample covariance matrix) we get Figure 1 the asymptotical relative efficiencies of the new tests (using the Oja median and the ZCM estimate) relative to the regular tests (using the sample mean and sample covariance matrix) are illustrated for different values of r and for different values of σ. The AREs are then simply the ratios of the noncentrality parameters. The new tests seem to perform very well: The test statistics using robust standardization seem to perform better than the tests with regular location and scatter estimates.

A real data example
In our real data example we consider two bivariate data sets of n = 50 observations and the combined data set (n = 100). The data sets are shown and explained in Figure 2. We wish to test the null hypothesis of bivariate normality. Again the the skewness and kurtosis statistics to be compared are those based on the sample mean vector and sample covariance matrix (regular Mardia's b 1,2 and b 2,2 ) and those based on the Oja median and the ZCM (denoted by b * 1,2 and b * 2,2 ). The values of the test statistics and the approximate p-values were calculated for the three data sets ('Boys','Girls', 'Combined'). See Table 1 for the results. As expected, the observed p-values were much smaller in the combined data case.
As discussed in the introduction, the statistics b 1,2 and b 2,2 compare the third and fourth central moments to the second ones (given by the sample covariance matrix). The statistics b * 1,2 and b * 2,2 use robust location vector and scatter matrix estimates in this comparison. One can then expect that, for data sets with outliers or for data sets coming from heavy tailed distributions, the comparison to robust estimates tends to produce higher values of the test statistics and smaller p-values.  In our examples for the analyses of bivariate data we used the Oja median and the ZCM to standardize the data vectors. In the general p-variate case (with large p and large n) these estimates are, however, computationally highly intensive, and should be replaced by more practical robust location and scatter estimates. and As the observations come from the N p (0, I p ) distribution (all the moments exist), the standard law of large numbers and central limit theorem together with Slutsky's lemma yield the result.