dc.contributor.advisor | Khriyenko, Oleksiy | |
dc.contributor.author | Hajihashemi Varnousfaderani, Elahe | |
dc.date.accessioned | 2024-01-08T07:20:46Z | |
dc.date.available | 2024-01-08T07:20:46Z | |
dc.date.issued | 2023 | |
dc.identifier.uri | https://jyx.jyu.fi/handle/123456789/92552 | |
dc.description.abstract | Information Retrieval systems such as search engines, originally designed to assist users in finding information, have evolved to become more potent and have found utility in wider range of applications by incorporating contextual comprehension using Language Models. Selecting the proper Language Model corresponding to the desired task is a challenging multi-objectives problem as each model has specific set of attributes which affect the performance. Accuracy, resource and time consumption are the most important objectives considered in assessing the quality of a search system. These objectives are addressed in this research by exploring the performance of two Language Models with variant characteristics in developing a semantic search pipeline. The studied Language Models include a distilled version of BERT model fine-tuned on specific task and GPT-2 as a general pre-trained model with huge number of parameters.
The semantic search pipeline consisting of mapping the contents and queries into a common vector space using Large Language Model and finding the most relevant results is implemented in this study as experimental set up of the qualitative research. Utilizing evaluation metrics to assess the model’s performance necessitates the availability of ground truth data. Therefore, current research brings up various approaches aimed at generating synthetic ground truth to tackle evaluation and fine-tuning challenges when labeled data is scarce. To follow the research objectives, quantitative data is gathered through an experimental setting and conclusions are drawn and recommendations are raised by analyzing the results of the experiments.
The experimental results indicate the size of the model should not be the major criterion in selecting the language model for downstream tasks. The model architecture and being fine-tuned on special dataset will dramatically affect the performance as well. As it is shown by results, the smaller fine-tuned model for semantic textual similarity surpasses the larger general model. The experiment on investigating the proposed approaches for generating annotations signifies that those methods are decently applicable in computing evaluation metrics and can be extended to fine-tuning.
The results demonstrate that the task-oriented transferred learning by distillation and fine-tuning can compromise the learning capacity instilled in general models by a larger number of parameters, but it should be investigated in future research regarding the values set to various variables in this research e.g., the number of tokens considered in splitting the large text into smaller chunks. Moreover, it would be worthful to fine-tune the general large model as well in the future to compare them in a more comparable condition. | en |
dc.format.extent | 157 | |
dc.language.iso | eng | |
dc.rights | In Copyright | |
dc.subject.other | semantic search | |
dc.subject.other | large language models | |
dc.subject.other | generative models | |
dc.subject.other | fine-tuning | |
dc.subject.other | transfer learning | |
dc.title | Challenges and insights in semantic search using language models | |
dc.type | master thesis | |
dc.identifier.urn | URN:NBN:fi:jyu-202401081055 | |
dc.type.ontasot | Master’s thesis | en |
dc.type.ontasot | Pro gradu -tutkielma | fi |
dc.contributor.tiedekunta | Faculty of Information Technology | en |
dc.contributor.tiedekunta | Informaatioteknologian tiedekunta | fi |
dc.contributor.laitos | Information Technology | en |
dc.contributor.laitos | Informaatioteknologia | fi |
dc.contributor.yliopisto | University of Jyväskylä | en |
dc.contributor.yliopisto | Jyväskylän yliopisto | fi |
dc.contributor.oppiaine | Mathematical Information Technology | en |
dc.contributor.oppiaine | Tietotekniikka | fi |
dc.type.coar | http://purl.org/coar/resource_type/c_bdcc | |
dc.rights.copyright | © The Author(s) | |
dc.rights.accesslevel | openAccess | |
dc.type.publication | masterThesis | |
dc.contributor.oppiainekoodi | 602 | |
dc.subject.yso | luonnollisen kielen käsittely | |
dc.subject.yso | tiedonhaku | |
dc.subject.yso | mallintaminen | |
dc.subject.yso | tekoäly | |
dc.subject.yso | koneoppiminen | |
dc.subject.yso | natural language processing | |
dc.subject.yso | information retrieval | |
dc.subject.yso | modelling (representation) | |
dc.subject.yso | artificial intelligence | |
dc.subject.yso | machine learning | |
dc.rights.url | https://rightsstatements.org/page/InC/1.0/ | |