This Reprint May Differ from the Original in Pagination and Typographic Detail. Examining the Double-deficit Hypothesis in an Orthographically Consistent Language Examining the Double-deficit Hypothesis in an Orthographically Consistent Language Examining the Double-deficit Hypothesis in an Orthogra

All material supplied via JYX is protected by copyright and other intellectual property rights, and duplication or sale of all or part of any of the repository collections is not permitted, except that material may be duplicated by you for your research use or educational purposes in electronic or print form. You must obtain permission for any other use. Electronic or print copies may not be offered, whether for sale or otherwise to anyone who is not an authorised user. Abstract We examined the double-deficit hypothesis in Finnish. One-hundred-and five Finnish children with high familial risk for dyslexia and 90 children with low family risk were followed from the age of 3.5 years until grade 3. Children's phonological awareness, rapid naming speed, text reading and spelling were assessed. A deficit in RAN predicted slow reading speed across time and spelling difficulties after Grade 1. A deficit in phonological awareness predicted difficulties in spelling, but only in the familial risk sample. The effect of familial risk was significant in the development of phonological awareness, RAN, reading, and spelling. Our findings suggest that the basic premise of the double-deficit hypothesis – that RAN and phonological awareness are separable deficits with different effects on reading and spelling outcomes – holds also in a consistent orthography. Over the last three decades, phonological awareness, broadly defined as the ability to perceive and manipulate the sounds of spoken words, has dominated the research in probable causes of reading difficulties (e. Several studies have found that children with impaired phonological awareness in kindergarten or in grade 1 experience reading deficits in phonological awareness are likely a universal characteristic of children with dyslexia Although the phonological deficit hypothesis has been able to account for the large proportion of reading impairments, it has not been able to explain the heterogeneity of deficits observed in children and adults with dyslexia. Wolf and Bowers (1999) proposed an alternative conceptualization of dyslexia – the double-deficit hypothesis (DDH) – according to which phonological deficits and deficits in rapid automatized naming (RAN) speed are largely independent sources of reading difficulties. According to the DDH, the majority of poor readers can be classified into three groups: two with a single deficit in either phonological awareness or RAN, and one with a double deficit in both phonological awareness and RAN. Because phonological awareness and RAN deficits are assumed to have independent negative effects on reading, children …

Examining the Double-Deficit Hypothesis 5 Vukovic and Siegel (2006) suggested that longitudinal studies are needed to improve our understanding of the DDH. To our knowledge, only three studies have examined the DDH longitudinally and have provided partly inconsistent findings (Kirby et al., 2003;Papadopoulos et al., 2009;Wimmer et al., 2000). Wimmer et al. (2000) examined the DDH with two samples of German-speaking children; the first was followed from school entry to grade 3 (Study 1) and the second from kindergarten to grade 4 (Study 2). The three deficit groups along with the nodeficit group (identified at the first assessment) were compared on reading accuracy and rate, and on spelling in grades 3 or 4. The double-deficit and naming-deficit groups differed significantly from both the no-deficit and the phonological-deficit groups on most of the speed measures.
With respect to spelling, all three deficit groups performed significantly poorer than the nodeficit group without any reliable differences observed between the deficit groups. In reading accuracy, all three deficit groups showed close to ceiling performance. Wimmer et al. (2000) interpreted the high scores of the deficit group on reading accuracy as a function of two factors: the easy orthographic system in which German-speaking children are learning to read and the synthetic phonics teaching approach that emphasizes phonological coding. Although Wimmer et al.'s (2000) findings illustrate the areas in which the deficit groups may differ in an orthographically consistent language, they provide little information as to how the three deficit groups developed across time.
This limitation was addressed by Kirby et al. (2003) who examined the DDH in a longitudinal study extending from kindergarten until grade 5 in English. The three deficit groups along with the no-deficit group were compared on word identification, word attack, and passage comprehension. The participants in the no-deficit group performed consistently well and the participants in the double-deficit group performed consistently poorly. Participants with single Examining the Double-Deficit Hypothesis 6 phonological deficits performed poorly at the beginning, but then approached the no-deficit group on reading. The participants in the naming-deficit group did poorly throughout, almost as poorly as the double-deficit participants. Kirby et al. (2003) concluded that the double-deficit group lagged behind the no-deficit group by almost two years of achievement and were showing no sign of catching up.

examined the DDH in a longitudinal study with
Greek-speaking children followed from kindergarten until grade 2. Papadopoulos et al. formed the deficit groups in grade 1 and then examined their performance retrospectively (in kindergarten) and prospectively (in grade 2). The three deficit groups along with the no-deficit group were compared in grades 1 and 2 on reading accuracy and speed, orthographic processing, and reading comprehension. In grade 1, the double-deficit group performed significantly poorer than the other three groups on all measures. The single-deficit groups also performed poorer than the no-deficit group on both reading accuracy and speed tasks and the phonological-deficit group also performed poorer than the no-deficit group in orthographic processing. In grade 2, there was a clear tendency to catch up, particularly with respect to reading accuracy. Gains were evident in reading accuracy, particularly for the phonological-deficit group, but the difficulties in reading speed persisted in the naming-deficit and double-deficit groups.
In summary, the previous longitudinal studies have offered support for the DDH, although there are notable inconsistencies between the studies. For example, Kirby et al. (2003) showed that the double-deficit group performed consistently poorly on reading accuracy in English. In contrast, Wimmer et al. (2000) found no significant differences among the groups on reading accuracy in German. Inconsistent findings have been reported even between orthographically consistent languages (German and Greek): Papadopoulos et al. (2009)

found no
Examining the Double-Deficit Hypothesis 7 differences between the naming-deficit and the phonological-deficit groups on reading speed, whereas Wimmer et al. (2000) found that the naming-deficit group performed significantly worse than the phonological-deficit group on reading speed. A potential reason for this inconsistency may be the different age and reading level of the children; in Wimmer et al.'s (2000) study, the DDH grouping was based on assessment prior to the beginning of reading instruction, whereas in Papadopoulos et al.'s (2009) study it was based on assessment at the end of grade 1 and after reading instruction had already exerted some influence on the classification measures.

Overview of the Present Study
The purpose of the present study was to examine longitudinally the DDH in an orthographically consistent language (Finnish). To our knowledge, this is the first study to examine the DDH in Finnish. Compared to previous longitudinal studies the current study makes three important contributions: First, we included a sample of children with high familial risk for dyslexia (and matched controls) as indexed through familial incidence in first degree relatives.
The inclusion of the high-risk sample contrasts markedly with the relatively small unselected school samples used in previous longitudinal studies (e.g., Kirby et al., 2003;Papadopoulos et al., 2009;Wimmer et al., 2000). The probability to identify children with phonological and/or naming difficulties is higher in a familial risk sample than in unselected samples because the amount of children with genetic vulnerability for cognitive deficit is higher (e.g., Scarborough, 1990;Snowling, Gallagher, & Frith, 2003). The incidence of dyslexia is also higher in a familial risk sample for dyslexia: In the present sample, 35.8% of the high-risk children were found to have dyslexia in grade 2 whereas the comparable percentage in the lowrisk sample was 9.8% (Puolakanaho et al., 2008). In Puolakanaho et al.'s (2008)

study, familial
Examining the Double-Deficit Hypothesis 8 risk was a significant predictor of dyslexia in addition to phonological awareness, RAN, and letter knowledge. In the present study, we compare and describe in more detail the prediction of reading and spelling development from a double-deficit viewpoint in the two samples.
Second, we trace the development of the double-deficit groups in RAN and phonological awareness from the age of 3.5 years until grade 3 (approaching age 10). This developmental span is significantly longer than in previous longitudinal studies, begins earlier than in previous studies and, importantly, begins prior to the implementation of formal reading instruction.
Previous studies have examined across time correlations of phonological awareness and RAN (e.g., Puolakanaho et al., 2008;Scarborough, 1998b), but the consistency of phonological and RAN deficit across time has rarely been investigated. Spector (2005) reported a considerable amount of instability on the phonological awareness and RAN deficits. High instability of the groups may reflect either true changes in the skills or problems with the reliability of measures, or in the cut-off criteria, all of which weaken the prediction power. Spector's (2005) results were from a sample of school-age children and thus likely confounded by the effects of reading acquisition and teaching. In the present study we examined stability between ages 3.5 and grade 3 across five measurement occasions.
Third, we control for early reading skill (before school entry, age 6.5 years) as a possible confounding variable to group differences in literacy outcomes. It is critical that the groups are matched on their reading ability at the outset of the study; otherwise later differences in reading could simply reflect an early advantage in reading. In line with Kirby et al. (2003) andWimmer et al. (2000), we formed our deficit groups prior to the commencement of formal reading instruction. However, neither Wimmer et al. (2000) nor Kirby et al. (2003) mentioned whether there were children in the no-deficit group who could already read some words in kindergarten.
Examining the Double-Deficit Hypothesis 9 The control for reading level even prior to school entry is important in this study because Finnish is a highly consistent orthography and about one third of Finnish children can read before they enter formal education (Lerkkanen, Rasku-Puttonen, Aunola, & Nurmi, 2004;Silvén, Poskiparta, & Niemi, 2004).
In this study we asked the following questions: 1. Can we identify the DDH groups in the two samples varying in familial risk for dyslexia? 2. How stable are deficits in RAN and phonological awareness between 3.5 years and grade 3?
3. Do phonological and/or naming deficits predict difficulties in reading and spelling development from grade 1 to grade 3?

Participants
Participants were 194 children followed from birth as part of the Jyväskylä Longitudinal Study of Dyslexia (JLD, e.g., Lyytinen et al., , 2008. Participating families were recruited with the help of maternity clinics throughout Central Finland. Children born to families with one or both parents diagnosed as dyslexic were assigned to the high-risk group (n = 105). Those born to families with no report of reading difficulties were assigned to the low risk group (n = 90). Parental risk status was confirmed with extensive individual assessment comprising reading/spelling, phonological, and orthographic processing (see Leinonen Müller, Leppänen, Aro, Ahonen, & Lyytinen, 2001). The high-risk and low-risk samples were matched on their parental educational level. All the children spoke Finnish as their native language and had no mental, physical, or sensory impairments. Of the total 222 children recruited, 22 withdrew from the study. Six children were omitted from this study due to missing data.

Examining the Double-Deficit Hypothesis 10
The geographic distribution of the sample across Central Finland ensured few of the participating children were classmates. However, Finnish schools follow the same national curriculum and the classroom level effect on student's reading skill variance is small (e.g., Torppa et al., 2007).

Measures and Procedure
RAN, phonological awareness, reading (accuracy and speed), spelling, and IQ estimates were all administered individually by a trained examiner. The assessments were scheduled according to children's age prior to school entry. From school entry onwards, the assessments were scheduled according to the month of a school year and thus children's ages varied.
Consequently, the effect of age was examined at school age. Where standardized variables were used, all standardizations were based on the JLD low-risk sample's distribution because it was considered as being closer to the age cohort.
Examining the Double-Deficit Hypothesis 11

General Cognitive Ability
The Wechsler Intelligence Scale for Children -Third Edition (WISC-III, Wechsler, 1991) was administered in grade 2, and the scale scores of four performance quotient subtests (Picture Completion, Block Design, Object Assembly, and Coding) were used to form the performance IQ measure. Similarly, verbal IQ was calculated from the scale scores of five subtests: Similarities, Vocabulary, Comprehension, Series of Numbers, and Arithmetic. The total IQ measure was calculated from the Verbal and Performance scale scores according to the manual.

Phonological Awareness
At age 3.5 years, phonological awareness was measured by identification of word size segments, identification of subword segments, and blending. At ages 5.5 and 6.5 years, phonological awareness was measured by identification of subword segments, initial phoneme identification, initial phoneme production, and blending. Composite scores of these tasks were formed by calculating the means of the standardized scores of each task. Cronbach's alpha reliability coefficients for the composite scores at each age were .45 at age 3.5 years, .76 at age 5.5 years, and .82 at age 6.5 years. At school age, phonological awareness was measured by a common unit task (see below).
Identification of word size segments from compound words. The child was presented with three pictures of objects with simultaneous pronunciation of the name of the illustrated objects (e.g., 'lentokone' [aeroplane]; 'soutuvene' [rowing boat]; 'polkupyörä' [bicycle]). All targets were compound words and the child was required to identify the picture on a computer screen containing a specified part of the compound (e.g., In which picture can you hear the sound 'kone' (plane)?). Two practice items were administered for each child, and a third practice Examining the Double-Deficit Hypothesis 12 followed automatically if the child failed the first two. There were eight items in this computeraided task at age 3.5 years (for details, see Puolakanaho, Poikkeus, Ahonen, Tolvanen, Poikkeus, & Lyytinen, 2003) and the Cronbach's alpha reliability coefficient was .71.
Identification of subword segments. The task was the same as in the word level identification task but with the requirement to identify sub-word level units within the target (e.g., the /koi/ in the word 'koira' (dog)). No practice items were given as the procedure was identical to the previous task. The size of the segment to be identified varied from one to four phonemes, and from two syllables to one phoneme. Segments came from the beginning, end, or middle part of the word. There were eight items in this computer-aided task at age 3.5 years, and 20 items at 5.5 and 6.5 years. Cronbach's alpha reliability coefficients were .53 at age 3.5 years, .82 at 5.5 years, and .76 at age 6.5 years.
Blending. The child was presented with segments (words, syllables, or phonemes), each separated by 750 msec., with the task requirement to blend the segments and to pronounce the resulting word that was presented in small pieces (e.g., per-ho-nen (butterfly)). One test item consisted of a compound word, eight items required synthesis of syllables, and three items required synthesis of syllables and phonemes. Only a response containing the correctly assembled form was coded as correct and one point was awarded for each correct answer. The principle of assembly was reinforced in two practice items. There were nine items in this computer-aided task at age 3.5 years, and 16 items at 5.6 and 6.5 years. Cronbach's alpha reliability coefficients were .78 at age 3.5 years, .78 at 5.5 years, and .80 at age 6.5 years.
Initial sound identification. This task entailed the child being shown four pictures of objects with the simultaneous presentation of the object name. The child was then required to select the correct picture on the basis of the oral presentation of a subsequent initial phoneme Examining the Double-Deficit Hypothesis 13 relating to one target (e.g., "In the beginning of which word do you hear ____?"). There were nine items measured at ages 5.5 and 6.5 years. Cronbach's alpha reliability coefficients were .95 at age 5.5 years, and .65 at age 6.5 years.
Initial sound production. The experimenter showed a picture to the child and asked what she/he saw in the picture. Next, the child was asked to listen the word and to articulate the initial sound (phoneme or letter name) of the object. The sum of the correct sounds or initial letter answers formed the score of the task. There were eight items at age 5.5 years and 10 items at age 6.5 years. Cronbach's alpha reliability coefficients were .87 at age 5.5 years, and .94 at age 6.5 years.
Common unit. In grade 1, the child was asked to say aloud the sound which she/he heard to be similar in both words of a word pair heard via headphones. The length of the similar sound was always one phoneme, and it was situated either at the beginning (e.g., lahja -lintu) or at the middle (e.g., muutos -keitin) of the word (four word pairs of each type were used). The length of the words varied between four to six phonemes. Before formal assessment, the children performed two practice trials. Correct responses were praised with emphasis on the common sound and the desired response. Throughout testing, the question put to the child was "which sound is the same in ...?" No reference was made to positional concepts, such as "beginning" or "in the middle". Responses were scored as correct if they corresponded to the shared segment of sound in the word pairs. Both the letter names and phoneme sounds were scored as correct. The sum of correct responses on the eight items was used as the individual's score. In grade 3, the task was similar to the grade 1 task, with the exception that nonwords (5-12 phonemes in length) were used instead of words. There were three practice nonword pairs followed by 15 experimental items. The position of the common phoneme varied between the words in a Examining the Double-Deficit Hypothesis 14 nonword pair as well as between the pairs (e.g., nohdit -latsukurje, tuvinoiski -rolla). The sum of correct responses out of 15 trials was used as an individual's score. Cronbach's alpha reliability coefficient was .74 in grade 1 and .81 in grade 3.

Rapid Naming Speed
RAN objects and colors were assessed at age 3.5, 5.5, 6.5, 1 st grade January and 3 rd grade November using the standard procedure (see Denckla & Rudel, 1976) in which the child is asked to name as rapidly as possible a series of five different visual stimuli. All the Finnish names for the objects were two syllable high frequency words and all color names were either two or three syllables long. The child's performance was timed, and the errors and self-corrections were recorded. The score at ages 3.5 and 5.5 years was the time to name 30 items dispersed in five rows of six (5 stimuli by 6 times). From age 6.5 years onwards the traditional matrix of 50 items (5 stimuli by 10 times) was used. Total matrix completion naming time (in seconds) was used as the child's score. Because only few naming errors occurred, they were not considered further.
The RAN composite score was the mean on the standardized object and color naming tasks at ages 5.5 and 6.5 years. At age 3.5 and at school assessments color naming was not available and the RAN score was only the standardized object naming task. All RAN tasks were multiplied by -1 to ease interpretation (a higher RAN score means better performance). The test-retest correlations were high: between the composite scores of RAN objects and colors between ages 5.5 and 6.5 correlation coefficient was .67, between RAN objects at grade 1 and grade 3 it was .66 and the correlation between the 6.5 year RAN composite and grade 1 RAN objects was .63. However, 3.5 year RAN objects correlation coefficient was .30 with 5.5 year RAN composite.

Nonword Reading
Examining the Double-Deficit Hypothesis 15 In kindergarten (age 6.5 years), the children were asked to read a list of nine bisyllabic (vcv, cvcv, vcvc) nonwords (e.g., ame, hopa, olus). Before each task the participants were presented with a practice list with three items. The nonword items were derived from a test battery compiled as part of a pan-European collaboration (COST A8 Action 'Learning Disorders as a Barrier to Human Development', for more details, see Wimmer, 2003 andSeymour et al., 2003). The score in each task was the number of items the child read accurately.

Text-Reading Accuracy and Speed
A text reading task was used to measure reading accuracy and speed. This task was administered individually in a laboratory setting as a part of JLD assessment phase at first grade (May), second grade (June), and third grade (April). The text was a fictional story consisting of 124 words / 901 letters in first and second grade, and a more difficult story in third grade consisting of 189 words / 1189 letters. Reading accuracy score is the percentage of correct words read and the reading speed measure is the amount of words per minute a child was able to read.

Spelling
Spelling was assessed by spelling of nonwords. Nonwords were presented separately via headphones by using the sound files accessed through the computer, and the children were asked to write them on a piece of paper. An individual's score was the number of correctly spelled nonwords. There were 18 items (nine bi-syllabic and nine more complex nonwords) in first grade and 12 items (all four-syllabic) in second and third grade. The spelling items where the same in grades 2 and 3 but different in grade 1. Cronbach's alpha reliability coefficients were .85 in grade 1, .86 in grade 2, and .71 in grade 3.

Examining the Double-Deficit Hypothesis 16
Results

Preliminary Data Analyses
The distributions of RAN, reading accuracy, and grade 1 spelling deviated from normality. The few extremely low response times for RAN were moved to the tails of the distributions. A ceiling effect emerged for reading accuracy with 70%, 75%, and 80% of grades 1, 2, and 3 children, respectively, exceeding 90% accuracy. Grade 1 spelling accuracy also produced a ceiling effect. Transformations did not correct the skewness in these measures.
Because of the ceiling effect in text-reading accuracy across ages, it was not considered in further analyses. The distribution of nonword reading prior to school entry was U-shaped and for this reason we classified the children into readers and nonreaders. Ability to read at least two short nonwords correctly was considered as a proof of reading ability. This cut-off score was also supported by the distribution. Table 2 presents the descriptive statistics and the results of group comparisons on all the measures used in the study. In all of the measures included in this study the high-risk group showed, on average, lower mean level performance. In many of the measures, there was also more variability in the high-risk sample than in the low-risk sample.

Identifying the Double-Deficit Hypothesis (DDH) Groups
First, we examined whether we could identify the DDH groups. The classification was based on the RAN composite (mean of standardized colors and objects scores) and the phonological awareness composite (mean of standardized scores of segment identification, initial phoneme identification, initial phoneme production, and blending) at age 6.5 years. We chose the 20 th percentile as a cut-off score from the low-risk sample distribution for the identification of the deficit groups in RAN and phonological awareness (see Figure 1 for within sample scatter plots with cut-off lines). This cut-off score is similar to the one used in previous longitudinal Examining the Double-Deficit Hypothesis 17 studies in orthographically consistent languages (Papadopoulos et al., 2009;Wimmer et al., 2000). By using the 20 th percentile cut-off score we identified a group of children (n = 25) scoring at or below the 20 th percentile in both phonological awareness and RAN; they constituted the Double Deficit (DD) group. Those (n = 29) with a score at or below the 20 th percentile on the RAN composite but higher than the 20 th percentile in phonological awareness constituted the Naming Deficit (ND) group and those (n = 27) with a score at or below the 20 th percentile in the phonological awareness composite but higher than the 20 th percentile in RAN constituted the Phonological Deficit (PD) group. There were 113 children with a score above the 20 th percentile on both RAN and phonological awareness that were allocated to the Double Asset (DA) group.
Because 52 children were reading at least two nonwords prior to school entry, we further divided the children based on their early reading skill. Of the 52 children who were able to read two or more nonwords at age 6.5, 47 belonged to the DA group and five belonged to the ND group. The DA group was further divided into the Double Asset Readers group (DAR, n = 47) and the Double Asset non-Readers group (DAnR, n = 66). The five ND group children who could read were omitted from further analyses. In order to avoid a potential confounding effect of low IQ, we omitted two additional children from the group comparisons because their performance IQ was below 80 at age 8. One of these children came from the DD group and the other one from the DAnR group.
Thus, the final group sizes were: DD (n = 24), PD (n = 27), ND (n = 24), DAnR (n = 65), and DAR (n = 47). There were no significant differences between the final groups in gender or in the average age at any school assessment phase, in performance IQ at age 8 years, or in parental education level. The DDH groups were found to be differently represented in the familial risk samples (χ 2 (4) = 10.63, p < .05, see Table 3 for the cross-tabulation of samples and the DDH Examining the Double-Deficit Hypothesis 18 classification). Two thirds of children in the deficit groups were coming from the high-risk sample, while there was a slight overrepresentation of children from the low-risk sample (55% vs. 45 %) in the double asset groups. The examination of the adjusted standardized residuals revealed that the high-risk sample children were significantly over-represented in the ND group (adjusted standardized residual = 2.2).
Note that the effect of the familial risk status was examined in all of the subsequent analyses for three reasons: the sample differences were significant in each measure used, the correlations between RAN and phonological awareness at age 6.5 were lower (z = 1.87, twosided p = .06) in the high-risk sample (r = .34) than in the low-risk sample (r = .56), and because of generalizability concerns if the two samples were to be combined.

Comparison of RAN and Phonological Awareness in the DDH Groups
Before examining the differences between the groups on reading speed and spelling, we compared them on RAN and phonological awareness. This was done to address Schatschneider, Carlson, Francis, Foorman, and Fletcher's (2002) concern that the double-deficit subtype may show the most severe deficits in reading and spelling merely because their RAN and phonological awareness difficulties are also more severe than in the single-deficit groups. The means, standard deviations, and pairwise comparisons for phonological awareness and RAN in the five DDH groups are reported in Table 4. Note that at the age of classification, the RAN performance of the DD and ND groups were almost identical. Similarly, at the age of classification, the phonological awareness performance of the DD and PD groups was almost identical. Thus, the separation of the groups on the two measures was optimal. Table 4 presents the group comparisons for RAN and phonological awareness also at other available measurement occasions. The group differences in RAN were very stable: the DD and ND groups did not differ Examining the Double-Deficit Hypothesis 19 from each other at any of the assessment points and were poorer than DAnR and DAR groups at every point. Groups with no naming deficits (PD, DAnR, and DAR) did not differ from each other in RAN at any age. Group differences in phonological awareness, on the other hand, were less stable, but the DD and PD did not differ from each other from age 5.5 years onwards. DD group was poorer than DAnR and DAR groups at every assessment of phonological awareness, and the PD group was poorer than DAnR and DAR groups at the ages 5.5 years, 6.5 years and grade 3.
To address our second research question about the stability of the deficits in RAN and phonological awareness, we assessed how many children in the DD, ND and PD groups could be identified with RAN or phonological awareness deficit (with using the same 20 th percentile cutoff criteria from the low-risk sample as we used at age 6.5 years) also at ages 3.5 years, 5.5 years, in grade 1, and in grade 3. The analysis showed reasonable stability: of the children who were classified as having a RAN deficit at age 6.5, 90.1% were classified as having a RAN deficit also in at least one other measurement point and 77.5% in three or more measurement points out of the five assessment occasions. Of the children who were classified as having a phonological awareness deficit at age 6.5, 92.2% were classified as having phonological awareness deficit also in at least one other measurement point and 64.7% in three or more measurement points out of the five assessment occasions.

Prediction of Reading and Spelling Development
Our third research question concerned the prediction of reading and spelling development. The analysis included: (a) pairwise mean comparisons of reading and spelling performance between the five DDH groups; (b) mixed methods ANOVA (see below); and (c) the cross-tabulation of DDH groups and dyslexia diagnosis at the end of grade 2.

Examining the Double-Deficit Hypothesis 20
The means and standard deviations for the five DDH groups, and the results of ANOVA pairwise comparisons of reading and spelling are presented in Table 5. In Table 5, results for the combined sample of the high-risk and low-risk samples are reported because of the small sample sizes within each DDH group. However, all group comparisons were conducted also within the samples and described in text. The comparisons between the five DDH groups on reading speed in each assessment point (see Table 5) indicated that all deficit groups had slower reading speed than the DAR group, but the two DDH groups with a RAN deficit (DD and ND) showed significant differences also from the DAnR group in grade 1, 2, and 3 reading speed. The ND group showed the slowest development in reading speed and their mean reading speed at the end of grade 3 was about half of that of the DAR group. The groups with single or double deficit did not differ significantly from each other in reading speed. The DDH group comparison results were very similar in the high-risk and low-risk samples.
In spelling, the DAR and DAnR groups were more accurate at each age than the groups with single or double deficits (with the exception of the ND group in grade 1; see Table 5). The DDH group comparisons within the familial risk samples showed that in the low-risk sample children rarely had spelling difficulties and the DDH groups were different only in grade 1 spelling where the ND group children made on average more spelling errors than the other groups. In the high-risk sample, the DDH group comparison results were similar to those of the combined sample reported in Table 5.
In order to verify the results derived from the pairwise group comparisons, we examined next the effects of familial risk status, phonological deficit, and RAN deficit on reading and spelling development with mixed method ANOVAs. The analyses included time as a withinsubjects factor and three between-subjects factors: RAN deficit (no deficit, deficit), phonological Examining the Double-Deficit Hypothesis 21 deficit (no deficit, deficit), and familial risk (high-risk, low-risk). The analysis for reading speed included three time points (grade 1, grade 2, and grade 3) but the analysis for spelling included only two time points (grade 2 and grade 3) because the nonword spelling measure was different in grade 1. Only children who were nonreaders prior to school entry were included to avoid the confounding effect of reading on the results. The results are reported in Table 6. The results for reading speed indicated that the main effects of time, familial risk status, and RAN deficit, and the interaction of RAN and time were significant. The main effects suggest that reading speed developed across time, the high-risk sample had slower reading speed than the low-risk sample, and the children with a RAN deficit were slower readers than children without a RAN deficit. A further analysis of the RAN x time interaction revealed that the reading speed development of the children with a RAN deficit was slower than that of the children without RAN deficit (the details of this analysis are available from first author).
The mixed method ANOVA for nonword spelling indicated significant main effects of time, familial risk status, and RAN deficit, and interactions for familial risk status and phonological deficit, and for RAN deficit and phonological deficit. The main effects suggest that there was a significant spelling accuracy development from grade 2 to grade 3, the high-risk sample was poorer in nonword spelling than the low-risk sample, and the children with RAN deficit were poorer in nonword spelling than the children without RAN deficit. A further examination of the interactions revealed that the interaction between familial risk status and phonological deficit reflected the difference in the effect of phonological deficit in the two samples: phonological deficit predicted spelling difficulties only in the high-risk sample (details of this analysis available from the first author). The further examination of the interaction between RAN deficit and phonological deficit revealed that it reflected the equally poor Examining the Double-Deficit Hypothesis 22 performance of the children with any deficit (either RAN, phonological, or double) as compared to the children without deficits (see Figure 2. Details of this analysis are available from the first author).
Finally, we examined how many of the DDH group children were diagnosed as dyslexics at the end of grade 2. Table 7 reports the results separately for the high-risk, low-risk, and the combined sample. There was a statistically significant link between dyslexia diagnosis and DDH grouping (χ 2 (4) = 18.60, p < .001, for the high-risk sample, χ 2 (4) = 24.30, p < .001, for the lowrisk sample, and χ 2 (4) = 40.05, p < .001, for the combined sample). None of the DAR group children (who had no phonological or naming deficits and who were readers before school age) were diagnosed as having dyslexia. The ND group had more children diagnosed with dyslexia than what would have been expected by chance alone (adjusted standardized residuals were 4.1, 2.8, and 2.1 for the total, high-risk, and low-risk samples, respectively). In addition, in the analyses with combined and low-risk samples, the DD group had more children diagnosed with dyslexia than what would have been expected by chance alone (adjusted standardized residuals were 3.2 and 4.2 for total and low-risk samples, respectively). There were two interesting differences between the high-risk and the low-risk samples; first, phonological deficit alone did not seem to predict dyslexia in the low-risk sample as only one out of ten PD children had dyslexia whereas in the high-risk sample seven out of 17 PD children (41.2%) had dyslexia later on. Second, the DAnR group children do not seem be at high risk for dyslexia if they come from a low familial risk group as only one out of 35 children (2.8%) were diagnosed with dyslexia whereas in the high-risk sample almost one third of the DAnR children (eight out of 29 children (27.6%)) were diagnosed with dyslexia at the end of grade 2.

Discussion
This study examined the DDH in Finnish, a highly consistent orthography. We asked three research questions: Can we identify the DDH groups in a sample of 6.5-year-old Finnish children? How stable are RAN and phonological deficits across time? And, does phonological and/or naming deficit predict difficulties in reading and spelling development from grade 1 to grade 3? In all analyses, we controlled for the effect of familial risk and reading ability prior to school entry.
We identified five DDH groups based on the 20 th percentile cut-off in phonological awareness and RAN at age 6.5 that in Finland is just prior to the school entry at the fall of the year children turn seven years. The comparison of the development of phonological and RAN skill in the DDH groups from age 3.5 to the end of third grade showed that the group differences were most pronounced at the age of classification, but the differences were evident already at 3.5 years of age. For RAN, the group differences remained stable across school age. For phonological awareness, the group differences were less stable.
As expected, the percentage of children with RAN and/or phonological deficit was higher in the high-familial risk sample than in the low-familial risk sample, and the average skill level of RAN, phonological awareness, reading, and spelling was lower in the high-risk sample (e.g., Elbro et al., 1998;Gallagher et al., 2000Lyytinen et al., 2006;Puolakanaho et al., 2008;Scarborough, 1990;Snowling et al., 2003). In the high-risk group, 50% of the children had RAN, phonological or double deficit whereas the comparable percentage was 28% in the low-risk sample. Our findings complement the recent behavioral genetic studies showing that rapid naming and, to a lesser extent, phonological awareness difficulties are heritable (e.g., Petrill et al., 2010;Samuelsson et al., 2007) and that the genetic effects of RAN Examining the Double-Deficit Hypothesis 24 and phonological awareness are partially independent (e.g., Compton, Davis, DeFries, Gayan, & Olson, 2001;Grigorenko et al., 1997). It should be noted, however, that there was more variation in the high-risk sample than in the low-risk sample. This difference in variation stems from the heterogeneity of the high-risk sample; the cases with the most severe deficits in phonological awareness and RAN difficulties were high-risk children while many others in the high-familial risk sample had above average skills in RAN, phonological awareness, reading, and spelling.
The examination of the phonological awareness and RAN deficit stability showed that over 90% of the children who were classified as having RAN or phonological awareness deficits at age 6.5 were also performing below 20 th percentile in some other measurement occasion irrespective of the risk sample they came from. A more enduring deficit identified in three or more assessment points was found for 78% of the children with a RAN deficit, and 65% of the children with a phonological awareness deficit. These numbers are similar to those reported by Spector (2005) for school-aged children. The lower stability of the phonological awareness deficit may be attributed to two factors: a change in the phonological awareness measures at school entry, and the effects that formal reading instruction has on phonological awareness but not on RAN (when measured with objects and colors). Korkman, Barron-Linnankoski, and Lahti-Nuuttila (1999) demonstrated that reading instruction influences subsequent phonological awareness in Finnish, but does not have an effect on subsequent RAN. A similar nonsignificant effect of reading instruction on subsequent RAN (when measured with colors and objects) was observed by Compton (2003).
Our final research question addressed the effects of phonological and naming deficits on subsequent reading and spelling development. The results indicated that RAN deficit was predictive of slow development of text-reading speed. The effect of familial risk was also Examining the Double-Deficit Hypothesis 25 significant, a finding that reflects the overall slower reading speed observed in the high-risk sample. These findings are in line with previous studies that have shown a unique association between RAN and reading fluency (e.g., de Jong & van der Leij, 1999;Georgiou, Parrila, & Liao, 2008;Georgiou, Parrila, & Papadopoulos, 2008;Katzir et al., 2008;Savage & Frederickson, 2005; see also Kirby et al., 2010, for a review).
In spelling, there were two interesting interactions: an interaction between phonological deficit and RAN deficit and another one between risk status and phonological deficit. A further examination of the phonological deficit and RAN deficit interaction revealed that the interaction likely resulted from the fact that there was no cumulative effect of having a double deficitchildren with RAN or PA deficit alone performed equally poor in spelling than children with a double deficit. This finding of both RAN and phonological deficit being linked to spelling difficulties is similar to Wimmer et al.'s (2000) findings. The interaction between familial risk and phonological deficit reflected the fact that, in the high-risk sample, phonological deficit predicted poor performance in nonword spelling whereas in the low-risk sample it did not. The finding of phonological awareness being a good predictor of spelling among the high-risk children is easy to understand because spelling in Finnish requires sensitivity to the phonemelevel information. Finnish morphology is agglutinative in nature with very rich and complex sequential inflections and frequent stem variations (see for more detailed description of Finnish morphology in e.g. Lyytinen et al., 2006;. Many of the morphological variations of the same words often differ only by one phoneme (e.g., the inflections of the word talo (a house) include forms such as talossa (in a house) and talosta (from a house). In order to be able to use such inflections, children must have well specified phonological representations. The explanation why the spelling was not similarly predicted in the Examining the Double-Deficit Hypothesis 26 low-risk sample needs more examination. One explanation could be the small number of children with spelling and/or phonological difficulties; however, although severe spelling and phonological difficulties were not as common in the low-risk as in the high-risk sample, there was a reasonable amount of variation also in the low-risk sample.
The effects of RAN deficit on reading speed and spelling development were persistent.
Evidence showing the persistence of RAN effects has also been reported by Kirby et al. (2003).
The naming speed deficit may impact the development of reading fluency and of spelling through its effects on the formation of orthographic representations, which are essential in the development of both skills. Wolf and Bowers (1999) proposed that an inadequate development of the ability to form orthographic representations for commonly seen letter strings may be caused by slow retrieval of letter identities, which is reflected in the performance on the RAN tasks. According to Wolf and Bowers' hypothesis, processes underlying slow naming speed may contribute to reading failure in three ways: (a) by impeding the appropriate amalgamation of connections between phonemes and orthographic patterns at sub-word and word levels of representation, (b) by limiting the quality of orthographic codes in memory, and (c) by increasing the amount of repeated practice needed to unitize codes before representations of adequate quality are achieved. Moll et al. (2009), however, argued that RAN is not a measure of orthographic processing; they found that RAN predicted unique variance in reading fluency even after accounting for spelling, whereas RAN accounted hardly any variance in word-reading fluency when it was entered after nonword-reading fluency.
Overall, the DDH group comparisons suggested that naming deficit predicted slow reading speed and spelling development after grade 1 and phonological deficit predicted early difficulties in spelling but only in the high-risk sample. It was also found that the double-deficit Examining the Double-Deficit Hypothesis 27 group performed significantly more poorly in reading and in spelling than the no-deficit groups, but the performance level did not differ from the single-deficit groups.
It is also worth noting that 27.6% of the high-risk children (but only 2.8% of the low-risk sample children) who were classified to the double asset non-readers group in kindergarten were diagnosed with dyslexia in grade 2. This finding suggests that familial risk has a negative impact on reading acquisition despite adequate performance on RAN and phonological awareness (see also Puolakanaho et al., 2008), suggesting that additional risk factors contribute to the development. What these additional risk factors are is an important issue for future studies to examine. Our results suggest, however, that when assessments are carried out in kindergarten to identify children at-risk for future reading difficulties, information on familial risk should also be obtained.
Some limitations of the present study are worth reporting. First, our sample size was relatively small. In a larger dataset, more children with deficits may be identified with a more stringent cut-off score than the one used in the current study or with a more sophisticated analysis method, such as mixture models (e.g., Torppa et al., 2007). Second, we used different measures of phonological awareness across time and the interpretations regarding its development should be viewed with some caution. However, in an assessment of phonological awareness between the ages of 3.5 and 10, the measures must necessarily change. Third, the distribution of the reading and spelling accuracy measures deviated from normality.
Measurement of reading and spelling accuracy in Finnish is challenging given the almost one-toone consistent mapping between graphemes and phonemes. Fourth, because of the age of the participants at the first two measurement points, only non-alphanumeric (Colors and Objects) RAN tasks were administered. Previous studies have shown that RAN digits and letters are more Examining the Double-Deficit Hypothesis 28 strongly related to reading than RAN colors and objects (e.g., Georgiou, Parrila, Kirby, & Stephenson, 2008;Georgiou, Parrila, & Papadopoulos, 2008;Wolf, Bally, & Morris, 1986).
To conclude, the present study offered support for the view that RAN and phonological awareness are partly independent deficits (Wolf & Bowers, 1999). Our findings indicated that RAN and phonological awareness were separable deficits and they predict different kinds of literacy problems. These findings may be useful in the development of screening and intervention strategies. In screening, the assessment of both skills, as well as information on the familial risk for dyslexia, is valuable for predicting reading and spelling development. In addition, because different problems in literacy development were predicted by partly different cognitive deficits (RAN and/or phonological awareness) and differently in the two samples, individually designed reading instruction and intervention may be needed.  a Means and standard deviations (in parenthesis). Parental questionnaire based education was classified using a 7-point scale: 1 = only comprehensive school (CS); 2 = CS and short-term vocational courses; 3 = CS and a vocational school degree; 4 = CS and a vocational college degree; 5 = CS and a lower university degree / a polytechnic degree; 6 = upper secondary general school and a lower university degree / a polytechnic degree; 7 = CS or upper secondary general school diploma and a higher university degree (Master's or a Doctorate-level degree). .60 *** ** a Note that standardized variables are reported here for RAN and phonological awareness composites. Standardization was based on the total low-risk sample's distribution at each measure at each time point. The mean values for the low-risk sample in this study are not exactly zero because all children were not included in the analyses of this paper because of missing data in the key variables. Note also that the spelling measure on grade 1 was different from the spelling measure used on grades 2 and 3.. *p < .05. **p < .01. ***p < .001.  Note. The superscript number indicate that group means differ significantly (p ≤ .05), two-tailed ANOVA F tests. Reference of superscript numbers: 1 = Double Deficit, 2 = Phonological Deficit, 3 = Naming deficit, 4 = Double Asset Non-Readers, and 5 = Double Asset readers. The pairwise comparisons are based on Bonferroni correction. The F test results were confirmed with a robust test of Equality of means (Welch) when equality of variances was not obtained. There were no differences between F-test and Welch. Therefore Fs are reported. *p < .05. **p < .01. ***p < .001.  (4) = 40.05, p < .001 for the total sample, χ 2 (4) = 18.60, p < .001 for the high-risk sample, and χ 2 (4) = 24.30, p < .001 for the low-risk sample.  Scatter plots of RAN and phonological awareness at 6.5 years in the high-risk and lowrisk sample. Values are standardized based on the low-risk sample's distribution (0 = low-risk sample average, 1= one standard deviation above low-risk sample average). DD = double deficit group, PD = phonological deficit group, ND = naming deficit group, DAnR = double asset nonreader group, and DAR = double asset reader group