dc.contributor.advisor | Khriyenko, Oleksiy | |
dc.contributor.author | Ambaye, Michael | |
dc.date.accessioned | 2020-12-11T07:49:35Z | |
dc.date.available | 2020-12-11T07:49:35Z | |
dc.date.issued | 2020 | |
dc.identifier.uri | https://jyx.jyu.fi/handle/123456789/73119 | |
dc.description.abstract | The aim of this thesis is to provide viable methods that can be used to improve the return position (RP) of a relevant document when a natural language query (NLQ) is applied by a user.
For the purpose of demonstration, we will be using IBM's Watson Discovery Service (WDS) as a search engine that uses supervised machine learning. This feature of WDS enables a user to train the tool so that it can learn to associate the language used in the NLQ to the language used in documents labelled as relevant. Therefore, instead of mapping an NLQ to the relevant document, it will build a model that works in such a way that similar language used in the natural language query will be associated with documents containing similar language as the document labeled as relevant.
The search engine works in such a way that it first searches for the first 100 documents and then ranks the documents based on the training examples provided by the user. In other words, the training example is only applied after the search is complete and the first 100 documents are collected.
The first 100 documents are retrieved based on what has been enabled from options such as: keywords, entities, relations, semantic roles, concept, category classification, sentiment analysis, emotion analysis, and element classification (Watson Discovery Service, 2019).
Bringing 100 documents to be re-ranked for NLQ presents a challenge when the user uses a language that is not present in the documents ingested. For example, the documents ingested could be technical documents using official languages and the user could be using a search word that is commonly used among colleagues. This would mean that even when the training example is present for the type of language used by the user pointing to relevant document, the user will not be able to get the expected documents because they will not have been inside the first 100 documents and therefore will not be re-ranked. Therefore, in this thesis, we will be going through various tools and methods that would enable us to improve the return position of relevant documents that a user expects. | en |
dc.format.extent | 67 | |
dc.format.mimetype | application/pdf | |
dc.language.iso | en | |
dc.rights | In Copyright | en |
dc.subject.other | IBM Watson | |
dc.subject.other | natural language query | |
dc.subject.other | Watson discovery | |
dc.title | Improving search engine results using different machine learning models and tools | |
dc.type | master thesis | |
dc.identifier.urn | URN:NBN:fi:jyu-202012117065 | |
dc.type.ontasot | Pro gradu -tutkielma | fi |
dc.type.ontasot | Master’s thesis | en |
dc.contributor.tiedekunta | Informaatioteknologian tiedekunta | fi |
dc.contributor.tiedekunta | Faculty of Information Technology | en |
dc.contributor.laitos | Informaatioteknologia | fi |
dc.contributor.laitos | Information Technology | en |
dc.contributor.yliopisto | Jyväskylän yliopisto | fi |
dc.contributor.yliopisto | University of Jyväskylä | en |
dc.contributor.oppiaine | Tietotekniikka | fi |
dc.contributor.oppiaine | Mathematical Information Technology | en |
dc.type.coar | http://purl.org/coar/resource_type/c_bdcc | |
dc.type.publication | masterThesis | |
dc.contributor.oppiainekoodi | 602 | |
dc.subject.yso | tiedonhaku | |
dc.subject.yso | hakuohjelmat | |
dc.subject.yso | koneoppiminen | |
dc.subject.yso | big data | |
dc.subject.yso | kyselykielet | |
dc.subject.yso | tiedonhallinta | |
dc.subject.yso | luonnollinen kieli | |
dc.subject.yso | kieli ja kielet | |
dc.subject.yso | Query | |
dc.subject.yso | tiedonhakujärjestelmät | |
dc.subject.yso | information retrieval | |
dc.subject.yso | search engines | |
dc.subject.yso | machine learning | |
dc.subject.yso | big data | |
dc.subject.yso | query languages | |
dc.subject.yso | information management | |
dc.subject.yso | natural language | |
dc.subject.yso | languages | |
dc.subject.yso | Query | |
dc.subject.yso | information retrieval systems | |
dc.format.content | fulltext | |
dc.rights.url | https://rightsstatements.org/page/InC/1.0/ | |
dc.type.okm | G2 | |