An interactional “live eye tracking” study in autism spectrum disorder: combining qualitative and quantitative approaches in the study of gaze

ABSTRACT Recent studies on gaze behaviours in individuals with autism spectrum disorder (ASD) have utilised “live eye tracking.” Such research has focused on generating quantitative eye tracking measurements, which provide limited (if any) qualitative contextual details of the actual interactions in which gaze occurs. This article presents a novel methodological approach that combines live eye tracking with qualitative interaction analysis, multimodally informed conversation analysis. Drawing on eye tracking and wide-angle video recordings, this combination renders visible some of the functions, or what gaze “does,” in interactional situations. The participants include three children with ASD and their adult co-participants during body-movement gaming sessions. The article demonstrates how quantitative eye tracking research can be extended qualitatively using a microanalytic interaction analysis to recontextualise the gaze shifts identified. The findings in this article show that the co-participants treat a child’s gaze shifts differently depending on when these occur in a stream of other action. The study suggests that introducing this qualitative dimension to eye tracking research could increase its ecological validity and offer new insight into gaze behaviours in ASD.


Introduction
There has been an increase in the use of eye tracking technologies to examine gaze behaviours in autism spectrum disorder (ASD). Gaze behaviours in ASD have been often described as "atypical," marked by reduced attention to socially salient information (e.g., von Hofsten et al. 2009). Gaze behaviours have been also proposed to underpin difficulties in communication and social interaction associated with ASD. The present study sets out to respond to some of the challenges that have emerged in this primarily quantitative research on gaze which tends to focus on looking time measurements to predefined areas of interest (AOIs) even in naturalistic communicative situations (e.g., Falck-Ytter 2015). As discussed below, gaze-related research can benefit from combining eye tracking with a qualitative video-based approach to social interaction to better capture what gaze is used for in social situations.
Eye tracking has been predominantly used to study whether individuals with ASD gaze at socially salient information, including other people's faces and the eye region. While ample research has been generated, the overall evidence is inconclusive. Some studies have shown that, compared to typically developing individuals, individuals with ASD look more at nonsocial objects than socially salient stimuli (Falck-Ytter & von Hofsten 2011;Guillon et al. 2014;von Hofsten et al. 2009), have impairments in using gaze to orient to others' communicative cues (Chawarska, Macari & Shic 2012;Falck-Ytter et al. 2012;Fletcher-Watson et al. 2009;Gillespie-Lynch et al. 2013), fail to predict social events (Ruffman, Garnham & Rideout 2001;Senju & Johnson 2009;von Hofsten et al. 2009), and gaze more frequently at the mouth rather than the eye region of faces (e.g., Hanley et al. 2015;Klin et al. 2002). However, contradictory findings draw a complex picture. For instance, Falck-Ytter (2010) found similar goal-directed eye movements in children with ASD and in typically developing individuals when observing actions performed by other people. Other studies have failed to find increased looking durations at the mouth region or decreased looking durations at the eye area in individuals with ASD (e.g., Dapretto et al. 2006; see Falck-Ytter & von Hofsten 2011 for a review).
One explanation for such variation in research findings could be the type of stimuli used in eye tracking (Chevallier et al. 2015;Falck-Ytter & von Hofsten 2011;Freeth, Foulsham & Kingstone 2013;Risko et al. 2012). Videos can provide more dynamic and richer information on social events than static images (Falck-Ytter & von Hofsten 2011;Falkmer et al. 2011). However, videos can be also problematic insofar as the scenes may not realistically represent everyday situations, and thus the stimuli might not be ecologically valid (Norbury et al. 2009). Video stimuli can also raise concerns beyond the type of scenes viewed. Falck-Ytter and von Hofsten (2011) have stressed that the viewing of videos places participants in the role of a passive receiver of social information and lacks interactional opportunities for participation with other people (Falck-Ytter & von Hofsten 2011;Gobel, Kim & Richardson 2015;Guillon et al. 2014). Recent developments have attempted to overcome some of these limitations using face-to-face situations in which eye movements are measured using mobile equipment; to date, only a handful of such "live eye tracking" studies exist (Falck-Ytter 2015;Falck-Ytter, Carlström & Johansson 2015;Freeth et al. 2013;Hanley et al. 2014;Hanley et al. 2015;Magrelli et al. 2013;Nadig et al. 2010;Noris et al. 2012; Thorup et al. 2016;Vabalas & Freeth 2016).
Live eye tracking can better capture how gaze is used in naturalistic situations. For instance, some studies of children with ASD have shown a reduced tendency to look at an adult's face during a storytelling situation but not in a cognitive testing situation with an experimenter (Falck-Ytter, 2015;Falck-Ytter et al. 2015). Such evidence suggests that children with ASD might not have a fundamental deficit in their social use of gaze, but that the degree to which gaze is used varies in different contexts. This also links with the notion of ASD related impairments in declarative interactions (i.e., the goal is in social sharing) rather than in imperative interactions (i.e., attention is shared for instrumental purposes, such as to receive assistance; e.g., Camaioni et al. 2003;Maljaars et al. 2011;Broekhof et al. 2015).
That gaze behaviours seem to vary in different contexts, however, calls into question what gaze is used for in the first place: what an instance of gaze "does" when the eyes are directed to another person in a social situation. Eye tracking research has been mostly concerned with studying "where" individuals look (AOIs) and "when" individuals look at the AOI, that is, focusing on the timing of gaze. Less attention has been paid to such timing of gaze in naturalistic interactions, which might provide important insight into the "social" connotations of different gaze behaviours.
"Where" and "when" gaze occurs Quantitatively oriented eye tracking researchers, including Falck-Ytter et al. (2013), have warned that recording overall "looking times" at a target might be insufficient when measuring competency in the social use of gaze. The argument stands that the attentional differences between children with ASD and those with typical development might be "related to the exact timing of their eye movements, rather than the distribution of gaze over long periods of time" (Falck-Ytter et al. 2013, p. 2250. For instance, Hanley et al. (2014) found that children with ASD were slower to gaze at an adult after an unexpected event to check their awareness of the situation. Thus, such event-related gaze measurements might be more informative about what makes some instances of gazing "social." However, even in the most novel live eye tracking studies, the actual interactions have not received detailed attention. As Nadig et al. (2010Nadig et al. ( , p. 2737 have put it, "the time spent looking at a region does not necessarily capture the function of visual attention"-that is, what kinds of functions gaze serves cannot be reduced merely to their quantity. To date, perhaps closest to a naturalistic examination of the timing of gaze has been the study by Falck-Ytter (2015), which measured the target and the timing of gaze in a face-to-face situation: as an adult (experimenter) was reading a story to a child. While the findings indicated that the gaze of children with ASD was less frequently aligned with the experimenter's gaze (unlike in interactions with typically developing children), only limited information was available about the stream of action as the alignment happened or was expected to happen. Such details can be crucial in identifying the functions of gaze at a particular moment.
Even where the gaze shifts have been examined temporally, that is, showing when (relative to some aspect of interaction or stimuli) children look at an experimenter's facial area, the exact contextual details, including the experimenter's actions, have not been fully explored. Such examination would require at least two methodological considerations to complement the quantified eye tracking measurements: the use of wide-angle video recordings showing the contributions of all participants in an interaction, and a qualitative interactional framework for an analysis of these contributions.

Interactional approaches to gaze
To date, the social scientific approaches to social interaction have been underutilised in eye tracking research. In experimental social psychological research, gaze measurements have been traditionally taken to indicate emotional states and interpersonal attitudes. For instance, mutual gaze has been associated with intimacy between interactants (e.g., Argyle & Dean 1965), with increased eye contact correlating with interpersonal attraction (Exline & Winters 1965). However, more naturalistic examinations have considered what gaze accomplishes as part of social interaction rather than how it reflects on interpersonal psychology. One such approach, conversation analysis (CA), is gaining popularity in the field of ASD research (e.g., O'Reilly, Lester & Muskett 2016). This framework draws on audio-visual recordings of social interactions in naturalistic everyday and institutional settings, which are carefully transcribed for a qualitative analysis of the organisation of interactions. Such microanalytic research elaborates on the sequences of actions through which participants organise their conduct structurally and collaboratively. In short, this means that an action by one party can be responsive to another party's action; thus, an action can place specific constraints on others to produce a response at a particular moment (Schegloff 2007). For example, in spoken interactions, a question (i.e., an initiating action) projects an answer (i.e., a responsive action) to be a relevant contribution from the recipient in the next turn. The responsive action provides evidence of what the prior contribution has been taken to have done and what (if anything) is expected to occur next (Sacks et al., 1974).
The extension of CA work that examines multimodal interaction (e.g., Korkiakangas & Rae 2014;Mondada 2014;Solomon et al. 2016;Stivers & Sidnell 2005) focuses on the structural organisation of actions beyond speech, including gaze and gesture. Multimodally informed CA has provided evidence for how "atypical" means of interacting can constitute significant resources for individuals with ASD to engage in social situations. For example, repetitive tapping movements (Dickerson, Stribling & Rae 2007), "problematic" behaviours (Damico & Nelson 2005), and "inflexibility" (Muskett et al. 2010), which could all be glossed as symptomatic behaviours inherent to ASD, can be shown to have interactional relevance when investigated in the context of their occurrence. In order to render visible such relevancies, a multimodal approach to CA does not rely on coding of behaviours a priori; rather, this approach examines how individuals use the resources such as gaze, through video recordings and detailed transcriptions of these recordings. The prior multimodal work around gaze suggests that interactants do not engage in continuous mutual gaze with one another. Gaze toward another person can become relevant at particular moments (Goodwin 1981;Stivers & Rossano 2010) and accomplish specific actions; thus, the "function" of gaze varies. Table 1 summarises some of the functions suggested in social scientific research on interpersonal situations.
In the multimodal work of social interaction, gaze has been analysed as part of interactional face-to-face encounters. Stivers and Rossano (2010) have shown that speakers can mobilise a response from others by shifting their gaze to them and that gaze alone can be sufficient to elicit the response; speech is not always required. Gaze can therefore constitute a social action in its own right and progress the organisation of an interactional exchange. Also, young children have been shown to use gaze for interactional purposes, for instance, to appeal for assistance by shifting their gaze to their caregivers (Kidwell 2009). This suggests that gaze can have multiple functions in interaction, which have to be identified not as decontextualized categories but as part of other streams of activity.
The existing corpus of multimodally informed CA studies on gaze in ASD is still limited (Dickerson et al. 2005;Dickerson & Robins 2015;Korkiakangas 2011;Korkiakangas & Rae 2014;Tuononen et al. 2016;Wiklund 2012). However, the key message in these studies has been consistent: a decontextualised examination of gaze (whereby the occurrence of gaze is codified and quantified) cannot fully explain what gaze is used for or identify its significance for the participants themselves (see Dickerson & Robins 2015;Korkiakangas & Rae 2014;Tuononen et al. 2016). For instance, that children with ASD have been shown to be capable for mobilising a response from their co-participants by turning to look at them after a turn-at-talk (Dickerson et al. 2005;Korkiakangas & Rae 2014) could not be captured should gaze shifts alone be identified for an aggregated distributional analysis.

Source
Gaze function Goffman (1963), Goodwin (1981), Heath (1986 Displaying attention and (dis)engagement Argyle et al. (1973), Goodwin (1981), Kendon (1967) Displaying participation roles Argyle and Dean (1965), Argyle et al. (1973) Displaying intimacy Exline and Winters (1965) Displaying interpersonal attitudes Heath (1986), Kendon (1967), Lerner (2003) Regulating turn taking or turn allocation Argyle et al. (1973), Goodwin (1980), Kendon (1967) Monitoring others' expressions Stivers and Rossano (2010) Soliciting a response Haddington (2006), Kidwell (2005Kidwell ( , 2009 Implementing social actions (e.g., appealing for assistance, displaying a stance) The placement of gaze in relation to other activities in progress, such as speech, is thus an important measure of the interactional work implicated by an instance of gaze. This requires careful identification of what other parties are doing in these interactions-the actions of the person who is shifting his or her gaze are to be examined in relation to rather than in isolation from such contributions. However, that the prior interactional studies have based their judgements of eye movements solely on video recordings can pose limitations to the judgements made about those gaze behaviours. An eye tracking study has a greater potential to provide precise measurements of where and when gaze shifts occur. While eye tracking has been traditionally used for quantitative research on gaze, it has the potential to work in harmony with the qualitative multimodally informed CA. This combination of approaches is yet to be utilised in the field of ASD studies (we are only aware of a few studies on typically developing adults' gaze behaviours during conversations in laboratory setups; Auer in press; Hirvenkari et al. 2013;Holler & Kendrick 2015;Kendrick & Holler 2017) and could provide a novel contribution to the event-related measures of eye movements (Falck-Ytter et al. 2013).
In this article, we demonstrate how this methodological combination works in practice. We will focus on gaze shifts identified in the eye tracking data recorded with three children with ASD during educational gaming sessions. We will limit our focus on such gaze shifts that were directed from the game (a screen) toward other people in the room. First, we code the location and duration of the gaze shifts through eye tracking. We then draw on a multimodal approach to CA to examine these instances in the context of their occurrence, which involves the co-participants' responses to these gaze shifts. This qualitative microanalytic examination of the previously quantified gaze measurements can render visible some of the different functions that looking at another person has in naturalistic situations.

Data
Data for this study include wide-angle video recordings and live eye tracking data recorded at schools for children with special needs. The data collection took place during weekly activity group sessions. During the sessions, children with ASD used various technology applications with their familiar teachers and special needs assistants (hereafter referred to as educators) in the presence of researchers. This study focuses on the children's gazing practices during a Kinect® body-movement game. In this game, the children were required to control an avatar on the screen using their body movements (i.e., hands and feet) to catch moving virtual objects. The children were able to choose the virtual objects they wanted to catch among the options presented on the game screen (see Figure 1). The room also included other game-playing areas on the other side of the Kinect area.
Data were collected using two tripod-mounted digital cameras and SensoMotoric Instruments (SMI) mobile eye tracking glasses (sampling rate 30 Hz). The cameras were used to capture a wide-angle view of the game playing setting (including the Kinect screen) from the front and back, enabling us to situate the gazing practices within broader social situations.
The eye tracking data were exported using the SMI software as scan path raw video (i.e., no fixation filter was used) to allow the inclusion of quick eye movements, saccades, in the analysis. Each video presented the gaze cursor (seen in orange in Figure 1) on the scene captured by the eye-tracker's video cameras. Data were initially calibrated prior to the playing sessions using 1-point calibration. 1 To ensure continuing tracking quality, we manually recalibrated data using the SMI software, for instance, if the children touched the glasses or made rapid head movements that caused the glasses to move.
Using the video annotation tool ELAN (developed by Max Planck Institute for Psycholinguistics), the scan path videos for each participant were synchronised with the video recordings of the participants' whole body movements. The data pool included approximately 232 minutes of scan path video. To quantify the data, we chose a sample of 9 minutes and 45 seconds of video for each child (uninclusive of noncalibrated data), which approximately covered their first time playing the game using the glasses.

Participants
This article involves data from three children-11-year-old Matti, 8-year-old Veli, and 6-year-old Roope, who represent children of different ages and with different types of ASD. With Matti and Veli, the data sample covers their first session with Kinect, whereas data from Roope's second session are included, as longer periods of noncalibrated data were removed from his data. Figure 1. Video camera view of the broad social situation (frame 1), a child playing the Kinect game (frame 2) and an eye-tracker view (frame 3). 1 1-point calibration was chosen for quick calibration; initial attempts showed that a 3-point calibration procedure was slow and challenging for the participating children.
The children had been previously assessed for diagnostic and educational purposes, and these clinical documents were available for the researchers. An extensive assessment procedure was considered unnecessarily distressing for the children; hence, the selected tests were kept to a minimum. These included the Comprehension of Instructions test to evaluate comprehension of oral instructions (NEPSY-II) and the Spatial Span test to evaluate visual working memory (Wechsler Memory Scale, WMS-III). The Sally-Anne falsebelief task was also administered to evaluate children's theory-of-mind abilities. The teachers completed the Finnish version of the Autism Spectrum Screening Questionnaire (ASSQ), and the children's parents completed the Social Communication Questionnaire (SCQ) and the Strengths and Difficulties Questionnaire (SDQ).
One of the children, Veli, has a "pure" ASD diagnosis, whereas Matti and Roope have other comorbid diagnoses, representing "autism plus" (see Gillberg & Fernell 2014). The interactions examined in this article also include the children's adult co-participants: two special needs assistants (Kaisa and Mirja) and a researcher, Tommi. All the participants are native Finnish speakers. Table 2 provides information on Matti, Veli, and Roope based on their clinical documents. Table 3 presents the children's scores on NEPSY-II, WMS-III, the Sally-Anne task, ASSQ, SCQ, and SDQ.
Written consent for children to participate in the study was obtained from the children's guardians and the educators. The children's willingness to participate was monitored throughout the sessions. All the names of people and places have been changed to prevent the participants from being recognised. The photos captured from the videos have been used with permission, with the facial features of the participants anonymised. The work was fully assessed by the ethics committee of the researchers' home institution and carried out in accordance with the Declaration of Helsinki as revised in 2000.

Coding of eye tracking data
The videos were coded according to the location of the gaze cursor in relation to specific AOIs using ELAN (see, e.g., Holler & Kendrick 2015). Coding was made on a frame-by-frame basis to determine the exact onsets and durations of the gaze shifts toward people. The AOIs included people (when the child gazed at any body part of the people in the environment), which was further specified as educators (i.e., familiar teachers and special needs assistants), researchers (i.e., researchers who organised the activity group sessions weekly), children (i.e., other familiar children who participated the activity group sessions), and nonpresent people (i.e., people who were not present at the Kinect playing station, e.g. another child working at another activity station). We also further differentiated whether the children gazed at others' head area or other body parts. However, we did not attempt to differentiate between looking at others' eye or mouth region, which would likely yield inaccurate results using live eye tracking . All noncalibrated or "off-screen" (looking at an area that fell beyond the bounds of the scene camera) data were identified and removed from the analysis.

A multimodal approach to CA
We extended the coding of eye tracking data with a qualitative multimodal approach to CA. This enabled a structural analysis of how the co-participants treated a child's gaze shifts toward people in the room, which in turn demonstrated what kind of functions they assigned to the gaze shifts. Rather than decontextualising gaze shifts categorically a priori, multimodally informed CA is discovery-oriented and proceeds inductively, that is, examining the timing of the gaze shifts in relation to other activities in progress and how the gaze shifts are treated by the gaze-recipients. The interactional relevancy of an instance of gaze shift can be determined by a co-participant's response to it, whether this is by the party who has been gazed at or another party who sees the gaze shift. This interactional framework thus grounds observations empirically to participants' own orientation and responsiveness to each other's conduct in a stretch of interaction (see Schegloff 2007). The so-called next-turn proof procedure (Sacks et al. 1974), wherein a participant's understanding of the preceding action (e.g., talk, gaze, or gesture) is displayed in the next turn-at-talk or other action, is a crucial requirement for the structural analysis of an interactional episode to be accepted as valid (Peräkylä 2004). Thus, a function of gaze is here measured in terms of a co-participant's treatment of the gaze and what was accomplished with a gaze shift in interaction, rather than through a rendition of what an individual seemed to "aim for." A co-participant's response demonstrates how the gaze shift becomes treated functionally: undertaking a specific work (if any) in interaction. The approach draws on detailed transcriptions of the gaze shifts identified on the recorded data. The transcripts show how gaze moves in relation to speech or during silent intervals and in relation to a child's own or a co-participant's actions in the environment. Such information helps to recontextualise the gaze shifts for the analysis of their functions. The rigour of multimodal interaction analysis is produced by co-examining the primary data and transcripts with fellow analysts and following transcription conventions to ensure proper documentation of the primary data (Peräkylä 2004). The transcriptions also enable third parties to access the observations made from the data. This further contributes to the methodological rigour wherein participants' own orientations to one another are at issue, rather than the orientations imposed by an analyst. The present study is limited to examining the gaze shifts directed toward other people in the room in a specific game-playing setting (Kinect bodymovement game). Thus, from the entire data pool (i.e., 232 minutes of video material), the cases in which a child's gaze moved to another person were transcribed. Talk was transcribed according to the conventions described by Jefferson (in Atkinson & Heritage 1984), and gaze using the notation developed by Goodwin (1981; see Figure 11 in Appendix). The latter enables us to represent how gaze moves in time and co-occurs with speech, silence, and other actions.

A distributional examination of eye tracking data
To examine where the children gazed at (i.e., the gaze target), the frequencies, total durations and percentages of their gaze shifts were counted based on the AOI coding. Owing to the small sample size and the illustrative nature of the distributional examination, no statistical tests were employed. Table 4 presents the distributional examination of the target of children's gaze shifts (AOIs). Table 4 shows that during the game playing session, Veli spent more time looking at people (9.15%) compared to Matti (2.68%) and Roope (1.48%). The children also differed in terms of the persons they looked at the most: Matti looked at researchers (58.24%), Veli gazed at other children (37.60%), and Roope gazed at educators (76.94%) the longest. Veli also spent more time looking at people who were not present at the Kinect station (13.53 s) than Matti (1.86 s) and Roope (0.24 s). Of the people that the children gazed at, Matti looked more at their body areas (84.32%), whereas Veli (52.18%) and Roope (62.60%) looked more at their head areas (although for Veli the difference is small).
The coding suggests that the children differed in the distributional pattern of their attention to different AOIs. However, this coding is decontextualized and provides little information about the contingencies under which these gaze shifts occurred (such as what was said or done just before a child turned to look at an AOI). Thus, the focus on the overall looking times on the AOIs does not delineate the social functions (if any) of these gaze shifts. To examine this in detail, we demonstrate how to recontextualise the gaze shifts using a multimodal approach to CA.
In the following section, we draw on the wide-angle video recordings of the gaming sessions to situate the identified gaze shifts within the broader streams of activity. We analyse a selection of gaze shifts to exemplify how the different functions of gaze shifts can be rendered visible through a qualitative structural analysis of the interactional situations: what elicits a child's gaze to another person, and how a co-participant responds to a child's gaze shift.

Analysing gaze shifts through a multimodal approach to CA
We used a multimodally informed CA to study the functions of the gaze shifts identified with eye tracking. As the children turned to look at other people, we show how to examine what elicited their gaze, and how the coparticipants responded to the gaze, and thus what the gaze accomplished. In line with the principle of sequential organisation of social interactions (Schegloff 2007), we focus on examining gaze in both responsive and initiating environments. We begin with examples in which gaze emerges in a responsive environment within a stream of action (i.e., gaze shift occurred in response to something). We then move to examples of gaze in an initiating environment (i.e., gaze shift initiates an interactional exchange) and elaborate on the different functions within these environments. Figure 2 outlines the four types of functions found in the analysis. The interactional implications (if any) have been developed from the analysis of how the gaze was elicited and how the co-participants responded to these gaze shifts. While each example involves a gaze shift directed to a person in the room (indicated by the gaze cursor coloured orange), each instance differs in the degree to which this gaze can be characterised as an interactional contribution. In each case, we dissect these functions using detailed transcriptions of what was said and done when the gaze shift occurred, with the analysis unfolding moment-by-moment at the level of tenths of seconds. Drawing on the principles of CA, each function is empirically grounded in the analysis of what the gaze shift accomplishes at the moment it occurs, withholding psychologised accounts of participants' internal motivations.

Responsive gaze shifts
In the environment of responsive gaze shifts, a child's gaze was either spontaneously directed to a person or an event (e.g., sound) or elicited from the Percentages based on total gaze durations child using a prompt (e.g., a request to look). We first consider an instance of the former in Figure 3. Veli has been playing the game, but then something captures his attention and brings his playing to a halt. Veli shifts his gaze from the screen to the door. (Prior to the beginning of the extract, another child, Roope, had caused some distractions by making loud noises in the room. Roope's special needs assistant, Mirja, decided to take Roope outside until he had calmed down. The door from which they had exited a moment earlier is next to Veli and the gaming screen.) We join in as the sound of footsteps apparently catches Veli's attention.  In line 1, Veli orients to the sound of footsteps and shifts his gaze towards the door in anticipation of someone entering the room. Mirja steps in (line 3) and says noni ("alright"), notifying their return to the room (line 4). Figure 4 zooms in on Veli's gaze movements upon Mirja's entrance during a silent interval of 2.3 seconds. Veli looks at Mirja and then shifts his gaze to the corridor in an apparent expectation of Roope, who enters the room shortly after Mirja. Veli shifts his gaze to Roope (line 6) and monitors him as Mirja instructs Roope to "go and sit down." As Mirja continues speaking to Roope, odotettaan kun meidän vuoro tullee ("let's wait until our turn comes"), Veli shifts his gaze between Mirja and Roope, and then re-orients back to the screen. Even though Veli has been looking at Mirja and Roope during their stretch of interaction, neither pays any particular attention to Veli looking at them.
Such instances of monitoring were relatively common in the eye tracking dataset: a sound or a movement catches a child's attention, and the child responds by shifting his gaze to that location. In such cases, the people who were being monitored would not notice or respond to the gaze directed at them: the gaze shift was not treated as a move to enter in mutual interaction. Thus, while Veli's gaze was directed at both Mirja and Roope, the moment when he turned to look at them indicated his responsive role as an observer of a situation, which did not implicate him as a "ratified" co-participant (see Goffman 1981;Goodwin 1981). Veli was not addressing Mirja and Roope in any way by monitoring their return to the room; consequently, neither Mirja nor Roope responded to Veli's gaze directed at them. The primary function of the responsive monitoring gaze was to check what was happening in the environment without having interactional implications to the parties being looked at. Occasionally, a child's gaze shift to another person was responsive to a prompt or an explicit call for attention. In Figure 5 we consider an instance from a different gaming session involving Roope. Roope had been struggling to master the interface of the game: even though he had been moving his hands in front of the screen, he had failed to catch any virtual objects. Such difficulties were relatively common during the first play trials as the children were learning to connect their body movements with the avatar on the screen. Roope managed to complete one game level, but in the next level the instructions had changed and now required him to use both hands to capture the virtual objects. We join in as Roope stands in front of the screen holding his right index finger extended. As this position would not allow him to catch any objects, Mirja intervenes and provides instructions for Roope.
Mirja waves her hands in the air demonstrating how to catch the virtual objects. She also prompts verbally, molemmilla käsillä Roope ("with both hands Roope"). While Mirja speaks, Roope sustains his gaze at the screen. On hearing his name mentioned ("Roope," end of line 1), Roope shifts his gaze toward her. Figure 6 zooms in on Roope's gaze movements from the screen to Mirja precisely when he hears his name being called; the gaze fixates on Mirja's facial area at the end of the lexical item "Roope." His gaze then moves to the wall, then to Mirja's hand demonstrating the catching gesture, and back to the wall.
Since Roope did not turn to look at Mirja as she first spoke in line 1, Mirja uses a highly specific address term ("Roope") to redirect his attention to the instructions given. In line 2 Mirja further requests Roope's gaze, kato ("look") and repeats her instruction while using her hands to demonstrate the required hand movements to catch the objects (line 3). Gaze is a key resource for recipients to show whether or not they are "acting as hearers" (Goodwin 1981); here the timing of Roope's gaze shift indicates that he responds to Mirja by attending to her demonstration.
In these two examples, a gaze shift has emerged in response to a change in the environment (i.e., an observed interaction between two people entering the room) or a prompt (i.e., a direct pursuit of attention) from a coparticipant. Both examples demonstrate the children's ability to attend to socially relevant information (i.e., other people) in a responsive role. While the monitoring gaze did not have interactionally relevant connotations, the gaze shift following a prompt demonstrated a child's recipiency to an interactional pursuit, which sought the child's realignment of gaze.

Initiating gaze shifts
The children also directed their gaze from the screen to a person close-by in an initiating environment. The functions of such gaze shifts were rendered visible by examining when the gaze shift occurred in relation to the activities in progress and how the people present responded to such gaze. In Figure 7, Matti shifts his gaze to a co-present adult after several unsuccessful attempts to catch virtual objects in the game.
Matti halts playing and shifts his gaze from the screen toward Tommi, who is also looking at the screen and does not notice Matti's gaze reaching him. Here Matti's gaze addresses Tommi as the selected party among all the other people in the room (see also Dickerson et al. 2005). While looking at Tommi, Matti brings his hands down and then quietly says something (inaudible from audio). Gaze can be used as a resource to mobilise a response from a co-participant in the environments where talk has not so implicated (Stivers & Rossano 2010); even though his quiet utterance might have passed unnoticed, Matti pursues some kind of response by sustaining his gaze at Tommi. Kaisa, who has been observing the situation on the side of the room, sees Matti's gaze shifting to Tommi, and that Tommi seems unavailable to respond (i.e., Tommi is not "acting as a hearer"; see Goodwin 1981) and steps in instead. She quickly prompts Matti to use both hands, molemmat kädet saat ottaa käyttöön ("you can use both hands") (line 2), overlapping with Matti's quiet utterance. Figure 8 demonstrates how Kaisa's instruction starts precisely in response to Matti's head turn as he shifts his gaze to Tommi.
The timing of Matti's gaze shift suggests that he is eliciting assistance with the game: Matti turns to Tommi directly after his game had come to a halt. The direction of Matti's gaze selects Tommi, who is also a researcher controlling the game using a laptop, as the recipient who would be able to provide him the assistance he needs. Matti's perseverance with looking at Tommi shows thus competence in speaker-selection (see also Korkiakangas & Rae 2014): he does not withdraw his gaze until he receives some instructions, and it is Kaisa not Tommi who provides these instructions.
That Kaisa responds to Matti's gaze shift with an instruction provides strong evidence that his gaze was designed to elicit assistance: Kaisa's prompt is produced precisely as Matti shifts his gaze to Tommi; not a moment before when Matti was still gazing at the screen. On hearing Kaisa's instructions to Matti, Tommi quickly reorients to Matti to monitor the situation. Matti then shifts his gaze back to the screen to follow Kaisa's instructions (lines 2-3). This re-orientation to the screen underscores that the function of Matti's gaze was to elicit assistance or instructions on the game, since when these had been received, Matti was ready to re-orient back to the game.
Here Matti's gaze has emerged in an initiating environment. It was also treated as initiating an interactional exchange for imperative purposes: to mobilise other people to help him with the game. However, some of the gaze shifts to the people in the room occurred when no apparent trouble was taking place. These gaze shifts were also treated differently, and some of them included an element of social sharing. We consider this in Figure 9. The example involves Roope, who has been playing the game rather successfully for a good while.
In line 1, Roope follows a moving virtual object on the screen with his gaze and his hand while uttering pum ("boom"). Roope apparently imitates the sound of the virtual objects being "caught." Having a slight smile on his face, he turns to Tommi. Tommi notices Roope's gaze to him and looks back at him, thus attending to Roope (line 2). Roope sustains his gaze at Tommi, suggesting that he is pursuing Tommi to respond (Stivers & Rossano 2010). A silent interval occurs during which Tommi does not immediately say anything, resulting in a momentary halt in the progressivity of their interaction. Tommi's lack of responsiveness suggests his apprehension in what he should respond to: whether to Roope's utterance pum ("boom") or to his game-related actions on the screen (Figure 10). Since Roope's gaze remains sustained at Tommi, Tommi responds by appraising his playing, hyvin menee ("it's going well") (line 3). While Roope's utterance pum ("boom") did not by its design or delivery make a response relevant, Roope's gaze was designed to invite Tommi to respond (see also Korkiakangas & Rae 2014;Stivers & Rossano 2010). The timing of Roope's gaze shift right after his utterance suggests that rather than initiating interaction for any instrumental purpose, Roope seeks to share an observation of successful object catching with Tommi (presumably in response to the sound of virtual objects) and elicits a response to that observation. Thus, Roope's gaze to Tommi suggests a declarative purpose-yet Tommi's delayed response indicates that mutual understanding was not immediately reached in terms of what Roope wanted to share with Tommi.

Summary
The multimodal approach to CA can complement quantitatively oriented (AOI-based) eye tracking measurements with a qualitative examination, which helps examining the functions of gaze shifts identified. The present study, which was based on precise measurements of gaze, rendered visible four types of functions within initiating and responsive environments that varied in their interactional implications, that is, the degrees to which the gaze shifts were taken to contribute to some stretch of social interaction. Monitoring the activities of others without addressing them appeared to be broadly devoid of interactional connotations, while those gaze shifts that were interactionally geared either responded to an interactional bid by another person or initiated an interactional exchange with them. The timing of the gaze shifts with respect to other activities in progress was crucial in delineating the interactional connotations by the co-participants, who either (a) responded by engaging in interaction with the child (i.e., with gaze and some talk) or (b) did not respond to the gaze shifts, indicating that the child's looking was not treated as betokening interactional participation from the child.

Discussion
This study set out to demonstrate how to combine live eye tracking and a multimodal approach to CA to study the functions of gaze shifts, that is, what gaze accomplishes in an interactional context (here limited to a game-playing activity). The co-participants responded to the children's gaze shifts differently depending on when these occurred, that is, how a gaze shift was timed with the on-going game, talk, and body movements or other people's activities. In this study, the children's gaze shifts were identified in responsive and initiating environments, and their functions varied in the degrees to which they provided an interactional contribution to a situation at hand.
Previous experimental studies in ASD have rarely considered the use of gaze beyond a child's responsive role and even less examination has been conducted on the actions of co-interactants, especially on their responses to a child's gaze. However, the responses from others are important in the analysis of what gaze does in real-time exchanges, what makes some instances of gaze socially oriented, and precisely interactional. Streeck (1993) talks about the "contractual" nature of mutual gaze, which requires an act of recognition what is "going on" between participants whose eyes meet. In the present study, the co-participants were making judgements about the degree to which a child's gaze implicated "ratified" interactional participation. Such judgements were not based on the occurrence of a child's gaze shift alone, but rather on how the gaze shift was timed in relation to other activities in progress.
The children in the present study shifted their gaze to a co-participant occasionally to initiate interaction. This aligns with previous interactional research on children with ASD that has shown competency in the use of gaze, for example, to address another person (Dickerson et al. 2005) and to elicit feedback from a co-participant (Korkiakangas & Rae 2014) in a similar manner as participants with typical development (see, e.g., Goodwin 1981). Such competencies can be rendered visible when behaviours such as gaze are examined in detailed interactional contexts and as tied with the contributions of other people. Dickerson and Robins (2015) have stressed that interactional nuances are likely to be lost when gaze behaviours are coded in a decontextualised manner, rather than transcribed and analysed sequentially. Thus, while eye tracking research traditionally codifies and quantifies the overall looking times at an AOI, it can remain as a limited approach in rendering the different functions of gaze visible. The current study has demonstrated how the functions of gaze shifts can be identified through an analysis of line-byline transcription of the events that occurred in wide-angle video recordings as the gaze shifts emerged.
The current study has limitations that should be acknowledged. The study was focused only on three children in a specific educational setting. Thus, the children's production of gaze shifts should be primarily understood in this specific game-playing context. However, as the identified functions of gaze here align with prior multimodally informed CA work on gaze in everyday interactions (e.g., Korkiakangas & Rae 2014;Stivers & Rossano 2010), they seem possible in different contexts (see Peräkylä 2004 for generalizability in CA), and offer hypotheses for further research with a larger sample of children. Despite the context-specificity of this study, it foregrounds the benefits of qualitative interaction analysis on quantitative eye tracking measurements. Extending this methodological combination to diverse interactional contexts and comparing gaze behaviours with typically developing children would suggest a fruitful line of inquiry.
While the use of mobile eye tracking glasses offers potential for researching interactions with multiple participants wearing the glasses (as in Holler & Kendrick 2015), our study was limited to the Kinect gaming sessions where only children wore the glasses. The use of eye tracking equipment also warrants some consideration from a research perspective. While this technology can provide more accurate measurements of gaze behaviours than is possible in merely video-based analysis (Guillon et al. 2014), there has been a tendency to overrely on the quantified eye tracking measurements in isolation. In this article, we have suggested that a more detailed and ecologically valid approach in the study of gaze would entail eye tracking in combination with the qualitative analysis of wide-angle video recordings.
Mobile eye tracking equipment can also interfere with the interactions taking place. Although the technology allows participants to move freely in space, the participants are usually actively aware of the situations in which they are wearing the glasses. In the present study, the game-playing activity began with calibrating the device and the children were asked to refrain from touching the glasses unless necessary due to its effect on the calibration. Thus, live eye tracking brings in elements from experimentally oriented research and does not result in fully natural situations, as the gaze-related investigation is usually apparent to all participants. Despite these limitations, mobile eye tracking promotes more ecologically valid research compared with computer-presented stimuli (e.g., using table-mounted eye-trackers), since gaze can be analysed in actual interactions between co-participants.
The interactional framework can complement live eye tracking research with a bottom-up, data-driven approach (as has been called for by Falck-Ytter et al. 2013) in at least three ways. First: the setting. The naturalistic environments where children are allowed to act rather freely (although here the setting was restricted within the parameters of the Kinect game) can render visible the different purposes for which gaze is used in social settings: those designed for social interaction and those devoid of such implications. For instance, the gaming activity meant that children sometimes required help with catching the virtual objects; when faced with difficulties, children were found to request assistance by turning their gaze to the adults in the room (see also Kidwell 2009). The gaming took place in a large room with several people present, which naturally provided opportunities to monitor their activities. While experimental rigour requires that procedures are kept "stable" and "reproducible," there can be enormous merit in examining more free-flowing naturalistic interactions.
Second, the wide-angle video footage is crucial for locating the gazing practices within the on-going streams of activity. While synchronising eye tracking data with video recordings can be time-consuming and laborious, it provides a valuable resource for the interactional exploration of gaze-incontext. Third, the combination of eye tracking with an interactional framework is paramount for dissecting the social qualities of gaze. These can be examined empirically by analysing the actions of all parties on a moment-bymoment basis. For instance, a co-interactant can respond to an instance of gaze from a child differently in terms of when the gaze occurs during a stretch of interaction. This, in turn, can indicate what kind of work, if any, the gaze accomplished in that interaction.
Since the primary aim of live eye tracking is to study gaze behaviours in real social situations, such research could benefit greatly from the framework that situates the gaze observations within the interactions in which they have emerged. This combination of live eye tracking and qualitative microanalytic study of social interaction enables us to study individuals with ASD as active social participants, rather than as passive observers of information, which has been the predominant approach taken in eye tracking research to date.