Näytä suppeat kuvailutiedot

dc.contributor.advisorKhriyenko, Oleksiy
dc.contributor.authorAmbaye, Michael
dc.date.accessioned2020-12-11T07:49:35Z
dc.date.available2020-12-11T07:49:35Z
dc.date.issued2020
dc.identifier.urihttps://jyx.jyu.fi/handle/123456789/73119
dc.description.abstractThe aim of this thesis is to provide viable methods that can be used to improve the return position (RP) of a relevant document when a natural language query (NLQ) is applied by a user. For the purpose of demonstration, we will be using IBM's Watson Discovery Service (WDS) as a search engine that uses supervised machine learning. This feature of WDS enables a user to train the tool so that it can learn to associate the language used in the NLQ to the language used in documents labelled as relevant. Therefore, instead of mapping an NLQ to the relevant document, it will build a model that works in such a way that similar language used in the natural language query will be associated with documents containing similar language as the document labeled as relevant. The search engine works in such a way that it first searches for the first 100 documents and then ranks the documents based on the training examples provided by the user. In other words, the training example is only applied after the search is complete and the first 100 documents are collected. The first 100 documents are retrieved based on what has been enabled from options such as: keywords, entities, relations, semantic roles, concept, category classification, sentiment analysis, emotion analysis, and element classification (Watson Discovery Service, 2019). Bringing 100 documents to be re-ranked for NLQ presents a challenge when the user uses a language that is not present in the documents ingested. For example, the documents ingested could be technical documents using official languages and the user could be using a search word that is commonly used among colleagues. This would mean that even when the training example is present for the type of language used by the user pointing to relevant document, the user will not be able to get the expected documents because they will not have been inside the first 100 documents and therefore will not be re-ranked. Therefore, in this thesis, we will be going through various tools and methods that would enable us to improve the return position of relevant documents that a user expects.en
dc.format.extent67
dc.format.mimetypeapplication/pdf
dc.language.isoen
dc.rightsIn Copyrighten
dc.subject.otherIBM Watson
dc.subject.othernatural language query
dc.subject.otherWatson discovery
dc.titleImproving search engine results using different machine learning models and tools
dc.typemaster thesis
dc.identifier.urnURN:NBN:fi:jyu-202012117065
dc.type.ontasotPro gradu -tutkielmafi
dc.type.ontasotMaster’s thesisen
dc.contributor.tiedekuntaInformaatioteknologian tiedekuntafi
dc.contributor.tiedekuntaFaculty of Information Technologyen
dc.contributor.laitosInformaatioteknologiafi
dc.contributor.laitosInformation Technologyen
dc.contributor.yliopistoJyväskylän yliopistofi
dc.contributor.yliopistoUniversity of Jyväskyläen
dc.contributor.oppiaineTietotekniikkafi
dc.contributor.oppiaineMathematical Information Technologyen
dc.type.coarhttp://purl.org/coar/resource_type/c_bdcc
dc.type.publicationmasterThesis
dc.contributor.oppiainekoodi602
dc.subject.ysotiedonhaku
dc.subject.ysohakuohjelmat
dc.subject.ysokoneoppiminen
dc.subject.ysobig data
dc.subject.ysokyselykielet
dc.subject.ysotiedonhallinta
dc.subject.ysoluonnollinen kieli
dc.subject.ysokieli ja kielet
dc.subject.ysoQuery
dc.subject.ysotiedonhakujärjestelmät
dc.subject.ysoinformation retrieval
dc.subject.ysosearch engines
dc.subject.ysomachine learning
dc.subject.ysobig data
dc.subject.ysoquery languages
dc.subject.ysoinformation management
dc.subject.ysonatural language
dc.subject.ysolanguages
dc.subject.ysoQuery
dc.subject.ysoinformation retrieval systems
dc.format.contentfulltext
dc.rights.urlhttps://rightsstatements.org/page/InC/1.0/
dc.type.okmG2


Aineistoon kuuluvat tiedostot

Thumbnail

Aineisto kuuluu seuraaviin kokoelmiin

Näytä suppeat kuvailutiedot

In Copyright
Ellei muuten mainita, aineiston lisenssi on In Copyright