Adaptive Vocabulary Learning Environment for Late Talkers

: The main aim of this research is to provide children who have an early language delay with an adaptive way to train their vocabulary taking into account individuality of the learner. The suggested system is a mobile game-based learning environment which provides simple tasks where the learner chooses a picture that corresponds to a played back sound from multiple pictures presented on the screen. Our basic assumption is that the more similar the concepts (in our case, words) are, the harder the recognition task is. The system chooses the pictures to be presented on the screen by calculating the distances between the concepts in different dimensions. The distances are considered to consist of semantic, visual and auditory similarities. Each similarity factor can be measured with different methods. According to the user’s feedback, the weights of the factors and similarity distance are adjusted to modify the level of difficulty in further iterations. The system is designed to attempt to retrieve knowledge about the learners by recognition of aspects that are difficult for them. Proposed solution could be considered as a self-adaptive system, which is trying to recognize individual model of the learner and apply it for further facilitation of his/her learning process. The use of the system will be demonstrated in future work.


INTRODUCTION
The focus of our work is in the area of facilitation learning techniques for children with an early language delay. In the literature, these children are often called late talkers, a term also adopted in this paper. Late talkers are a group of children who learn to form sentences later and have smaller vocabularies than their more typical peers who start putting words together before turning two (Preston et al., 2010). This early language delay has been connected with a risk of later difficulties in language learning such as dyslexia (Lyytinen et al., 2001;Lyytinen, 2015;Lyytinen et al., 2005). A very recent finding (Lyytinen, 2015) confirmed earlier observation (Lyytinen et al., 2005) reveals that if such an early language delay comprises receptive language, i.e. comprehension of spoken language during early years, such children will face serious difficulties in becoming fully literate. Authors documented (Lyytinen et al., 2005) the fact which has been more recently noted also by Nematzadeh et al. (2011), that many late talkers catch up with their peers, but some continue with slower learning pace and are later considered to have a Specific Language Impairment (SLI) (Thal et al., 1997;Desmarais et al., 2008).
Every child is different and learns with different paces and strategies. Information technology has long been seen as a cost-efficient solution for meeting the students' individual needs in learning (Murray and Pérez, 2015). Learning with mobile devices (M-learning) has been recognized to motivate the children to learn as well as to attract their attention while solving problems (Skiada et al., 2014). M-learning can provide a stress-free environment combining ubiquitous learning with individualisation so that the learner gets to proceed in their own pace. As well as games are designed to generate a positive effect in players and are most successful and engaging when they facilitate the flow experience (Kiili, 2005). Flow is considered to be a state of mind where a person forgets his surroundings and track of time while occupying themselves with tasks that are neither too easy (which would lead to boredom), nor too difficult, (which would lead to anxiousness) (Csikszentmihalyi, 1990). In order to facilitate a flow experience meaning to provide sufficient challenge, a learning game should take into account learner's individual needs.
The key to prevent the late talking children from continuing with slow learning pace is to intervene with their vocabulary learning as early as possible. Broadening the vocabulary creates new opportunities to form sentences. As young children are the target group of our study, we faced a challenge of providing them with a simple and motivating way to learn.
In this research, we decided to elaborate an adaptive mobile application with gamified elements such as rewards and animations with actions which are attractive for children. The application is developed for touchscreens that are 7 inches or larger to prevent the effect of motoric skills. The simple functionality on the surface confirms that the system is not too complicated even for very young children.
This paper consists of 4 sections. Section 2 discusses the related work and Section 3 concentrates on late talkers and on how they could benefit from adaptive learning technologies. In Section 4, we describe the proposed system's architecture. In Section 5, conclusions are drawn. Section 6 presents future work and plans for evaluation.

RELATED WORK
Many have recognized the importance and the benefits of developing digital systems for young children. It is possible to find lots of game-based learning solutions for preschool children with categories "Preschool games", "Language Arts from Phonics through Reading", "Word Games", "Animal Games", "Phonetics", etc. Unfortunately, they are not made keeping in mind those who have difficulties in learning. Research and development of ICT-supported learning for children with disabilities has not received as much attention and it is also difficult to access research findings in this field (Istenic, Starcic and Bagon, 2014). However, some contributions have been made and in the next paragraphs we discuss few of them.
PAL system proposed by Newell, Booth and Beattie (1991) was created for the children with poor motor control; the key-saving aspects speeded up text creation and it was found very useful for children with spelling problems. Also, it was observed that children who were on the verge of being classified as non-readers, showed a significant improvement in their work. Skiada et al. (2014) suggested a mobile application named "EasyLexia" which is built taking into account the needs of dyslexic children as well as the usability. EasyLexia included tasks that train reading skills, memory, concentration and mathematical logic providing both auditory and visual stimuli. The results of Skiada et al. showed that children preferred doing the exercises with mobile application over the pen and paper version.
Concentrating on the touch screen kept the children's focus better, which emphasizes the use of M-learning as a significant method in today's learning. A fun mobile learning application developed for children with dyslexia certificates the potential of mobile learning application in such environments (Saleh and Alias, 2012). In addition, "Dyseggia"a game application with word exercises for children with dyslexia suggested in (Rello et al., 2012) showed positive results of the technology use in today's learning methods. Singleton and Simmons (2001) proposed the multisensory drill-and-practice computer program Wordshark, which is designed to improve children's spelling and word recognition skills. Wordshark is presented in a game-format, and it is used to practice words, learn new words, find out whether children can read and spell particular words. As a reward for good work from these tasks, Wordshark enables earning teaching points. This utility does not have components for assessment or diagnosis and does not automatically adjust the task to the individual learner, but motivates and is a useful reinforcement resource for pupils with dyslexia and also for others with special educational needs.
Possibly the most extensive empirical documentation concerning the efficiency of game based training of the reading skill is coming from the research based on the Graphogame (see graphogame.info). Its effects on the reading skill among children with dyslexia and also among typical learners with insufficient reading instruction in developing countries have been documented in detail in tens of studies listed in the mentioned pages.
While M-learning, as well as game-based learning have proven their benefits in helping children with learning difficulties, there is still not much done, especially in the area of adaptive learning, for the target group of late talkers in vocabulary learning.

LATE TALKERS
The child's vocabulary in preschool age is dependent on the social conditions of education. Children who start school with greater literacy skills and background knowledge have a persisting advantage over those children who do not have these skills (Snow et al., 1998).
The connection between an early language delay and later difficulties in language learning has been studied extensively. Preston et al. (2010) mention that longitudinal studies (Scarborough and Dobrich, 1990;Paul et al., 1997;Stothard et al., 1998;Rescorla, 2002Rescorla, , 2005Rescorla, , 2009) have shown that the delay predicts later difficulties in, e.g. reading.
Longitudinal study of dyslexia in (Lyytinen et al., 2001) recognized that late talkers with a familial risk for dyslexia are more likely to have such problems than typical children with the same familial risk.

How Late Talkers Learn
Many studies have noted that late talkers are using different strategies in learning. One of these differences is that some of these children have difficulties in their general cognitive abilities such as attention, categorization and memory skills (Nemanzadeh et al. 2011). Another is related to the connectedness of their semantic network structures which is intuitively connected with the ability of forming sentences. Beckage et al. (2010) andNemanzadeh et al. (2011) noted that the semantic network structures of late talkers are less connective compared to their age peers. Beckage et al. (2010) retrieved the vocabularies from questionnaires filled by the children's parents and formed the connections between using co-occurrence in a corpus of child directed speech. They also noted that there was more variance in late talkers' network structures than the typical talkers'. Nemantzadeh et al. (2011) studied the matter by teaching novel words to the children, and then connected learned words by the similarity of their meaningsthe late talkers learned fewer words and those that were semantically rather further than closer.
The results allow us to conclude that these children require more personalisation during their learning process. The personalisation should allow them to learn with their own pace and keep their attention. In addition, it can support them in forming the associative connections in their vocabulary in order to create better understanding of categorisation.

Adaptive Learning Systems for Children with Learning Difficulties
It is known that learning is improved when the instructions are given to the learner in a personalised manner (Murray and Pérez, 2015). This knowledge and background theories of education have been a decades lasting trend in creating technologically enhanced learning environments that adapt, one way or another, to the learners needs. In the literature, these learning environments are often referred as "adaptive" or "intelligent" tutoring or learning systems. According to (Gifford, 2013), adaptive learning is a methodology that is centred on "creating a learning experience that is unique" for every individual learner through the intervention of computer software. Adaptive learning systems allow organising content, identifying the way to learn according to learner's knowledge and use assessment result to provide personalised feedback for each learner (Sonwalkar, 2005).
Adaptive learning systems have a lot of features and functions (Venable, 2011) that are combined to provide relevant content, support and to guide the user through the adaptive learning courses or modules: pre-test, pacing and control, feedback and assessment, progress tracking and reports, motivation and reward.
These systems can be either simple or algorithmbased (Oxman et al., 2014). Simpler adaptive learning systems are rule-based, created using a series of if-then statements. Algorithm-based systems take advantage of advanced mathematical formulas and machine learning concepts to adapt with greater specificity to individual learners. Earlier research by (Brusilovsky and Peylo, 2003) divides these systems into Adaptive Hypermedia Systems and Intelligent Tutoring Systems. By these technologies we mean different ways to add adaptive or intelligent functions into learning systems. Adaptive Hypermedia Systems include adaptive presentation and adaptive navigation support, and also adaptive information filtering, which includes collaborative filtering and content-based filtering. Intelligent Tutoring Systems include curriculum sequencing, intelligent solution analysis and problem solving support and intelligent collaborative learning, which includes adaptive group formation and adaptive collaboration support.
Finding an optimal way to present the concepts to the children during their first years of life might benefit in diminishing their risk to develop difficulties in language learning in the future. That is why we should have adaptive content presentation in learning system for late talkers. To make it available we should use the adaptive presentation technology, which aims to adapt the content of a hypermedia page to the user's goals, knowledge and other information stored in the user model. In a system with adaptive presentation, the pages are not static, but adaptively generated or assembled from pieces for each user (Brusilovsky, 1999). Respectively, in our system, the content shown to the children should not be predefined, but should have the ability to adapt according to user's feedback.
We assume that it is harder to recognize some particular concept among several other similar ones. Every child can perceive images and sounds corresponding to these images differently. For one child, it is difficult to distinguish between words, which sound similar. For another child, it could be difficult to distinguish between words that have a similar pattern, whether they could have a similar shape or similar colour. Another child may have difficulty with differentiating words, which are semantically close to each other.
With proposed solution we develop an adaptive learning system for late talkers, which will take into account personal qualities of learners' in their perception of a concept. The system will learn and build a personal model of a learner based on his/her answers while changing the complexity of concept representation, and will apply this model for further facilitation of learning process. Therefore, the system could speed up the vocabulary learning process individually for each user.

SYSTEM DESCRIPTION
As a basis for the learning system improvement, we have chosen "Graphogame" learning tool (Richardson and Lyytinen, 2014), which is a learning environment that teaches children reading by playing back a sound of a letter and asking the learner to choose the correct letter from multiple choices presented on a screen, adapting the given choices according to similarities between the letters and user's feedback. We are going to utilise the functionality of "Graphogame" tool replacing the letters with images of words. We are not going to use predefined sets of images or populate a set with random images because such strategy lacks an individual approach and does not take personal specifics of a learner into account. Therefore, the functionality of our approach should be able to take into account differences between visual and auditory stimuli and personalise further picture selection based on intelligent analysis of user's answers.

Concept Similarity
We make an assumption that concept learning and recognition depends on individual perception of several parameters such as sound and visual representations as well as their semantics. Our approach is based on manipulation with complexity level of concept recognition caused by these factors. Taking into account users' feedbacks (correctness of answers), system will automatically increase or decrease the level of complexity, adapting it to the individual learning abilities of the users, and in such a way will facilitate learning process.
We highlighted three main factors that influence the complexity. These factors are: visual similarity, similarity of sound-based phonetic representations and semantic similarity of the concepts. Assuming that sets of more similar concepts bring more difficulties for their recognition (distinction), we automate the process of image selection using multidimensional concept similarity metric, defined as an aggregation of similarity values of mentioned three factors. Now, we are able to personalize our learning tool via changing the delta (Δ) of concept similarity as well as "fiddling" with different levels of influence of the three factors on aggregated similarity.

Visual Similarity
Measuring the distance between two images is a central problem in image recognition and computer vision and many definitions for the metric have been suggested (Wang et al., 2005). Considering visual similarity, image features such as shape of presented object, colour distribution, brightness, contrast, etc. are usually acknowledged. The most commonly used image metric is Euclidean distance due to its advantage of simplicity. Euclidean distance is computed by summing the squares of differences between each pixel in images (Wang et al., 2005). However, in pattern recognition, it performs poorly compared to e.g. Tangent distance (Simard et al., 1993) which succeeds well in tasks such as recognizing handwritten digits (Wang et al., 2005). On shape matching tasks, e.g. Latecki and Lakämper's, (2000) approach gives intuitive results basing the metric on correspondence between object's visual parts. Image Euclidean Distance Measure (IMED) metric (Wang et al., 2005) is based on Euclidean distance. IMED adds the spatial relationships of pictures into consideration and outperformed traditional Euclidean distance in face recognition tasks in evaluations performed by the authors. Besides shape, colour is a very dominant visual feature. According to (Deng et al., 2001), distance between colours in two pictures can be measured by comparing their colour histograms, which represent the colour distribution in an image. However, colour histograms do not consider spatial knowledge and have high cost in retrieval and search. Authors proposed a "dominant colour descriptor" which consists of the representative colours in a region and their distribution. A similarity measure for the descriptor and an efficient colour indexing scheme for image retrieval were also suggested. The method performed fast and efficiently in the experimentations. However, the descriptor did not take into account the spatial relationships between the colour regions and considering high level matches, the correspondence was unstable.
Some of the visual similarity features might be more valuable than others. For example, take a set of images created by the same designer: they might be drawn in the same style using the same colour palette that makes all images quite similar in spite of the difference between the objects they represent. In this case, shape feature might be more valuable in calculating the actual human perception of visual similarity. Thus, there might be various automatic techniques to recognize different levels of the features relevance, but in our solution, we are going to use manually defined coefficient for the feature and leave possible automation of this process for future work.

Phonetic Similarity
Along with visual similarity, phonetic similarity of concepts can also affect children's performance in distinguishing them. There are many researches and practical implementations done with respect to automated voice and speech recognition (Petajan, 1990;Astradabadi, 1998;Potamianos et al., 1997) Adaptive speech recognition technology is not yet at the point where machines understand all speech, in any acoustic environment, or by any person, but it is used on a day-to-day basis in a number of applications and services (Docsoft Inc., 2009). The vast difference in anatomy and physiology between the speech production and perception systems of humans makes it difficult to analyse (Kessler, 2005). Unfortunately, these systems are expensive and they cannot always correctly recognize the input from a person who speaks with a dialect, accent, and also they have some problems with recognizing words from people who are combining words from different languages by force of habit.
Because the complexity of voice recognition especially in case of sound samples created by different persons is considered high, we decided to calculate phonetic similarity of the concepts as string based similarity of their phonetic transcriptions. In a non-orthographic language, the phonetic representations of the words are more reliable than the written format of words. However, they do not take into account e.g. different dialects that would sound more native to the learner. In spite of these disadvantages, it was decided that the standardised language provides enough information on the phonetic distances. The auditory representation of the words is provided in such a way that it follows the patterns of these phonetic representations aiming to the most standardised way of speaking.
There are several string-based techniques that could be applied for phonetic transcriptions similarity matching: Edit Distancefinds how dissimilar two strings are by counting the minimum number of operations required to transform one string into another; Jaro-Winkler measure (Winkler, 1999), N-gram similarity function (Kondrak, 2005), Soundex (Russell and Odell, 1918)phonetic similarity measure, which principle of operation is based on the partition of consonants in the group with serial numbers from which then compiled the resulting value; Daitch-Mokotoff (Mokotoff, 1997) has much more complex conversion rules than in Soundexnow shaping the resulting code involved not only single characters, but also a sequence of several characters; Metaphonetransforms the original word with the rules of English language, using much more complex rules, and thus lost significantly less information as letters are not divided into groups (Euzenat and Shvaiko, 2013). In our solution we allow utilisation of several measuring functions with further weighted aggregation of the results (e.g., weighted product or weighted sum).

Semantic Similarity
Semantic similarity between concepts plays an important role in semantic sense understanding. The measuring is not a trivial task. The most used technique is to measure semantic similarity based on domain ontology; a conceptual model that describes the corresponding domain. Ontology-based semantic similarity can be measured with different methods (Sanchez, 2012;Bin et al., 2009):  Edge-counting method (or graph-based) calculates the minimum path length connecting the corresponding ontological nodes through the 'is-a' -links (Sanchez et al., 2012). Equally distant pairs of concepts which belong to the upper level of taxonomy are counted less similar than those which belong to a lower level (Wu and Palmer, 1994). Very often the shortest path length is combined with the depth of ontology in a nonlinear function (Li et al., 2003), sometimes with the overlapping between the nodes (Alvarez and Lim, 2007). In (Al-Mubaid and Nguyen, 2006), authors applied cluster-based measure on top of a minimum length path and taxonomical depth. Taxonomical edge-counting method was extended by including non-taxonomic semantic links to the notion in the path (Hirst and St-Onge, 1998).  Feature-based measures use taxonomical features extracted from ontology. The similarity between two concepts can be computed as a function of their common and differential features (assessing similarity between concepts as a function of their properties) (Sanchez et al., 2012). Such facetbased classification could be combined with similarity of common properties' values.  Combined measures which include the edgecounting based and information content (IC)based measures with edge weight (Jiang and Conrath, 1997). In our current solution, we have limited our focus to a domain of animals. Each domain brings certain specifics to semantic similarity measuring metric. Semantic similarity could be measured differently depending on the context. Such context dependent similarities could be calculated separately and be further aggregated using different weights for different contexts. Thus, for the chosen domain, we may highlight several classifications:  Biological species-based classification: this metric is based on subclass hierarchy of animals classified by biological families of animals. In this case, the most suitable approaches to measure similarity are graph-based techniques (e.g. Jaccard metric, Scaled shortest path, Depth of the subsumer and closeness to the concepts, etc.) (Euzenat and Shvaiko, 2013;Bouquet et al., 2004;Leacock and Chodorow, 1998;Haase e. al., 2004). Thus, we calculate semantic similarity of concepts based on locations of corresponding nodes in the graph, using taxonomy-based ontology that represents class hierarchy of animals.  Geolocation-based classification: here we distinguished animals by geographical regions they live in. It is a complex metric that integrates continent-based, latitude-and climate zone based clustering. Similarity between the clusters is calculated based on climate groups' hierarchy and similarity of different continents.  Domestication-based classification: animals could be also divided to those who are fully domesticated by human and live at their homes and farms, those whom we may meet in a zoo, and those who live only in wild nature and most probably are only seen via video records and photos. In this case, distances between the classes could be predefined. Also, other metrics that define semantic similarity in other contexts may exist. Therefore, analogically to other similarity factors, final semantic similarity measure could be aggregated by weighted products/sum.

Concept Similarity Measure
Since all of our concept similarity factors (visual, phonetic and semantic) could be represented as weighted functions of various similarity measuring techniques, we may define a general formula to calculate similarity between the concepts (Figure 1).

System Architecture and Adaptation Logic
The system is presented as game-based learning environment for mobile phones and tablets. The system's aim is to teach recognition of vocabulary items from multiple choices of their visual and auditory representations. The task of the learner is to recognize the word that he or she hears from a group of pictures presented on the screen. All pictures except for the one presenting the correct answer are further referred as distraction items. General architecture of proposed facilitation solution is shown in Figure 2.  Based on calculated similarity measures between all the concepts and the chosen one ("Random Concept" in the figure), we rank them and select a group of the most closest to the defined delta (Δ) value. Initial value could be, for instance, considered as an average of similarity values of all the measured concepts. Depending on the amount of distraction items (since their amount also influences the overall complexity), system chooses the concepts with similarity value closest to the value of Δ. Depending on the user's feedback (answer), the value of Δ will be changed in the feedback analysis module. If a user makes a mistake and provides wrong answers, the Δ value will be increased (moved to the side of the concepts with lower similarity). Otherwise, complexity could be increased (by decreasing value, a group of more similar concepts will be selected next time).
At the same time, manipulating with coefficients of visual, phonetic and semantic factors influence (the vector of weights, ⃗⃗⃗ = ( , , ) we are able to recognize levels of difficulties that an individual factor brings for a particular learner. Once user provides a wrong answer, next time, when the same concept will be chosen, the algorithm will change vector of weights giving more preference to one of the factors. Therefore, system will collect statistics on personal learning model of the user, while trying to already personalize complexity of the tasks. Whole collected statistics including value of Δ, vector of weights ⃗⃗⃗ , values of similarities between chosen and other distraction concepts, etc. is stored in the Log module of the system and is further used as labelled learning sample to recognize individual user's features of concept perception. Furthermore, this will allow the system to personalize the strategy for individual learning process and to develop ability of the learner to overcome individual difficulties in vocabulary learning.
For the current research we used breadth-first search to calculate the semantic similarity from the graph of used concepts, Levenhstein distance for phonetic similarity and Euclidean distance for the visual similarity. In future work, we will use other algorithms, as well as combine and compare them to conclude which are the most relevant.

CONCLUSIONS
In this work, we concentrated on developing an adaptive vocabulary learning system for late talkers who have difficulties in learning language due to individual reasons. The system is a mobile learning application which aims to provide an optimal way of vocabulary presentation for young children.
The functionality behind the system is based on different similarity factors (visual, phonetic and semantic) of the learning objectives, words, in our case. We assumed that the more similar the words are, the more difficult it is to distinguish them. Thus, as the learners improve, the system is able to provide them with more challenging tasks. In addition to using the information of learners' mistakes in the system's adaptation, the data is collected and further analysed from angles of semantic network connectivity, emphasis of the dimensions and overall system's educational effectivity.
Existing approaches to study the characteristics of semantic networks are dependent on either parent's knowledge of their children's vocabularies (cf. Beckage et al, 2010). Our approach has the advantage of not only impartiality but automaticity, which makes it possible to collect larger amount of data with less effort and bias. However, some disadvantages in the methodology exist. Firstly, the system does not require the children to form the words, only to recognize them, and therefore it cannot be considered as a way of training word composition. Secondly, it is likely that all the children's answers are not equally valuable since multiple-choices questions can be answered randomly and also wrong answers could be given intentionally.

FUTURE WORK
To evaluate the system's efficiency, it would be tested with a group of children consisting of both 1-3-year-old late talkers and other children with more typical language proficiency. The testing group of children will be split in halves. For every child a filtration process of the concepts will be made. By testing, the system will exclude the words that child already knows from the provided sample of the animal words and will only work with those words child does not know. For the first group of children, the minimum delta (lowest difficulty), which is not changed during the learning and testing process, would be used. And for the second group, the delta will be changing for every learner personally according to the adaptation logic of the system. After some period of learning and testing there will be a final test, which would check the amount of words the children learned. Knowing the percentage of the learned concepts of each group, we can make a conclusion if the proposed adaptive system is effective. After the evaluation we will use the gathered data for further development of the algorithm. As the current system relies on metrics made by adults, it is possible that they do not reflect the children's associations. The gathered association networks can be further used as a base for the algorithm instead of the adult-made ontologies. The evaluation results could also show if recognizable patterns exist in the mistakes the children make.
Also, we will extend the knowledge base, including also other domains of vocabulary. The data like user's answers could be automatically sent to a database on a cloud server. The algorithm could be extended to consider several users' feedback, which could be used to form more reliable user models. Applied to a larger set and multiple vocabulary domains, analysis of the children's answers in the system can result in valuable information of children's semantic networks.
The system could also be modified to be more intuitive and motivating. For example, it could be extended to provide interactive storybook-type of tasks, where a child could listen to a story illustrated on the screen and then answers to multiple-choice questions about the context of the story.