Improving search engine results using different machine learning models and tools

Ambaye, Michael

dc.contributor.advisor	Khriyenko, Oleksiy
dc.contributor.author	Ambaye, Michael
dc.date.accessioned	2020-12-11T07:49:35Z
dc.date.available	2020-12-11T07:49:35Z
dc.date.issued	2020
dc.identifier.uri	https://jyx.jyu.fi/handle/123456789/73119
dc.description.abstract	The aim of this thesis is to provide viable methods that can be used to improve the return position (RP) of a relevant document when a natural language query (NLQ) is applied by a user. For the purpose of demonstration, we will be using IBM's Watson Discovery Service (WDS) as a search engine that uses supervised machine learning. This feature of WDS enables a user to train the tool so that it can learn to associate the language used in the NLQ to the language used in documents labelled as relevant. Therefore, instead of mapping an NLQ to the relevant document, it will build a model that works in such a way that similar language used in the natural language query will be associated with documents containing similar language as the document labeled as relevant. The search engine works in such a way that it first searches for the first 100 documents and then ranks the documents based on the training examples provided by the user. In other words, the training example is only applied after the search is complete and the first 100 documents are collected. The first 100 documents are retrieved based on what has been enabled from options such as: keywords, entities, relations, semantic roles, concept, category classification, sentiment analysis, emotion analysis, and element classification (Watson Discovery Service, 2019). Bringing 100 documents to be re-ranked for NLQ presents a challenge when the user uses a language that is not present in the documents ingested. For example, the documents ingested could be technical documents using official languages and the user could be using a search word that is commonly used among colleagues. This would mean that even when the training example is present for the type of language used by the user pointing to relevant document, the user will not be able to get the expected documents because they will not have been inside the first 100 documents and therefore will not be re-ranked. Therefore, in this thesis, we will be going through various tools and methods that would enable us to improve the return position of relevant documents that a user expects.	en
dc.format.extent	67
dc.format.mimetype	application/pdf
dc.language.iso	en
dc.rights	In Copyright	en
dc.subject.other	IBM Watson
dc.subject.other	natural language query
dc.subject.other	Watson discovery
dc.title	Improving search engine results using different machine learning models and tools
dc.type	master thesis
dc.identifier.urn	URN:NBN:fi:jyu-202012117065
dc.type.ontasot	Pro gradu -tutkielma	fi
dc.type.ontasot	Master’s thesis	en
dc.contributor.tiedekunta	Informaatioteknologian tiedekunta	fi
dc.contributor.tiedekunta	Faculty of Information Technology	en
dc.contributor.laitos	Informaatioteknologia	fi
dc.contributor.laitos	Information Technology	en
dc.contributor.yliopisto	Jyväskylän yliopisto	fi
dc.contributor.yliopisto	University of Jyväskylä	en
dc.contributor.oppiaine	Tietotekniikka	fi
dc.contributor.oppiaine	Mathematical Information Technology	en
dc.type.coar	http://purl.org/coar/resource_type/c_bdcc
dc.type.publication	masterThesis
dc.contributor.oppiainekoodi	602
dc.subject.yso	tiedonhaku
dc.subject.yso	hakuohjelmat
dc.subject.yso	koneoppiminen
dc.subject.yso	big data
dc.subject.yso	kyselykielet
dc.subject.yso	tiedonhallinta
dc.subject.yso	luonnollinen kieli
dc.subject.yso	kieli ja kielet
dc.subject.yso	Query
dc.subject.yso	tiedonhakujärjestelmät
dc.subject.yso	information retrieval
dc.subject.yso	search engines
dc.subject.yso	machine learning
dc.subject.yso	big data
dc.subject.yso	query languages
dc.subject.yso	information management
dc.subject.yso	natural language
dc.subject.yso	languages
dc.subject.yso	Query
dc.subject.yso	information retrieval systems
dc.format.content	fulltext
dc.rights.url	https://rightsstatements.org/page/InC/1.0/
dc.type.okm	G2