Challenges and insights in semantic search using language models
Date
2023Copyright
© The Author(s)
Information Retrieval systems such as search engines, originally designed to assist users in finding information, have evolved to become more potent and have found utility in wider range of applications by incorporating contextual comprehension using Language Models. Selecting the proper Language Model corresponding to the desired task is a challenging multi-objectives problem as each model has specific set of attributes which affect the performance. Accuracy, resource and time consumption are the most important objectives considered in assessing the quality of a search system. These objectives are addressed in this research by exploring the performance of two Language Models with variant characteristics in developing a semantic search pipeline. The studied Language Models include a distilled version of BERT model fine-tuned on specific task and GPT-2 as a general pre-trained model with huge number of parameters.
The semantic search pipeline consisting of mapping the contents and queries into a common vector space using Large Language Model and finding the most relevant results is implemented in this study as experimental set up of the qualitative research. Utilizing evaluation metrics to assess the model’s performance necessitates the availability of ground truth data. Therefore, current research brings up various approaches aimed at generating synthetic ground truth to tackle evaluation and fine-tuning challenges when labeled data is scarce. To follow the research objectives, quantitative data is gathered through an experimental setting and conclusions are drawn and recommendations are raised by analyzing the results of the experiments.
The experimental results indicate the size of the model should not be the major criterion in selecting the language model for downstream tasks. The model architecture and being fine-tuned on special dataset will dramatically affect the performance as well. As it is shown by results, the smaller fine-tuned model for semantic textual similarity surpasses the larger general model. The experiment on investigating the proposed approaches for generating annotations signifies that those methods are decently applicable in computing evaluation metrics and can be extended to fine-tuning.
The results demonstrate that the task-oriented transferred learning by distillation and fine-tuning can compromise the learning capacity instilled in general models by a larger number of parameters, but it should be investigated in future research regarding the values set to various variables in this research e.g., the number of tokens considered in splitting the large text into smaller chunks. Moreover, it would be worthful to fine-tune the general large model as well in the future to compare them in a more comparable condition.
...
Keywords
Metadata
Show full item recordCollections
- Pro gradu -tutkielmat [29104]
License
Related items
Showing items with similar title or keywords.
-
Towards a Great Design of Conceptual Modelling
Kiyoki, Yasushi; Thalheim, Bernhard; Duží, Marie; Jaakkola, Hannu; Chawakitchareon, Petchporn; Heimbürger, Anneli (IOS Press, 2020)Humankind faces a most crucial mission; we must endeavour, on a global scale, to restore and improve our natural and social environments. This is a big challenge for global information systems development and for their ... -
Explainable AI for Industry 4.0 : Semantic Representation of Deep Learning Models
Terziyan, Vagan; Vitko, Oleksandra (Elsevier, 2022)Artificial Intelligence is an important asset of Industry 4.0. Current discoveries within machine learning and particularly in deep learning enable qualitative change within the industrial processes, applications, systems ... -
Continuous Software Engineering Practices in AI/ML Development Past the Narrow Lens of MLOps : Adoption Challenges
Vänskä, Sini; Kemell, Kai-Kristian; Mikkonen, Tommi; Abrahamsson, Pekka (Politechnika Wroclawska Oficyna Wydawnicza, 2024)Background: Continuous software engineering practices are currently considered state of the art in Software Engineering (SE). Recently, this interest in continuous SE has extended to ML system development as well, primarily ... -
Artificial Intelligence for Cybersecurity : A Systematic Mapping of Literature
Wiafe, Isaac; Koranteng, Felix N.; Obeng, Emmanuel N.; Assyne, Nana; Wiafe, Abigail; Gulliver, Stephen R. (IEEE, 2020)Due to the ever-increasing complexities in cybercrimes, there is the need for cybersecurity methods to be more robust and intelligent. This will make defense mechanisms to be capable of making real-time decisions that can ... -
Enhancing Holonic Architecture with Natural Language Processing for System of Systems
Ashfaq, Muhammad; Sadik, Ahmed; Mikkonen, Tommi; Waseem, Muhammad; Mäkitalo, Niko (SCITEPRESS Science And Technology Publications, 2024)The ever-growing complexity and dynamic nature of modern System of Systems (SoS) necessitate efficient communication mechanisms to ensure interoperability and collaborative functioning among constituent systems (CS), ...