Approaching Optimal pH Enzyme Prediction with Large Language Models
Zaretckii, M., Buslaev, P., Kozlovskii, I., Morozov, A., & Popov, P. (2024). Approaching Optimal pH Enzyme Prediction with Large Language Models. ACS Synthetic Biology, Early online. https://doi.org/10.1021/acssynbio.4c00465
Published in
ACS Synthetic BiologyDate
2024Copyright
© The Authors. Published by American Chemical Society
Enzymes are widely used in biotechnology due to their ability to catalyze chemical reactions: food making, laundry, pharmaceutics, textile, brewing─all these areas benefit from utilizing various enzymes. Proton concentration (pH) is one of the key factors that define the enzyme functioning and efficiency. Usually there is only a narrow range of pH values where the enzyme is active. This is a common problem in biotechnology to design an enzyme with optimal activity in a given pH range. A large part of this task can be completed in silico, by predicting the optimal pH of designed candidates. The success of such computational methods critically depends on the available data. In this study, we developed a language-model-based approach to predict the optimal pH range from the enzyme sequence. We used different splitting strategies based on sequence similarity, protein family annotation, and enzyme classification to validate the robustness of the proposed approach. The derived machine-learning models demonstrated high accuracy across proteins from different protein families and proteins with lower sequence similarities compared with the training set. The proposed method is fast enough for the high-throughput virtual exploration of protein space for the search for sequences with desired optimal pH levels.
...
Publisher
American Chemical SocietyISSN Search the Publication Forum
2161-5063Keywords
Publication in research information system
https://converis.jyu.fi/converis/portal/detail/Publication/241748096
Metadata
Show full item recordCollections
Related funder(s)
Research Council of FinlandFunding program(s)
Postdoctoral Researcher, AoFAdditional information about funding
P.B. was supported by the Academy of Finland (grant 342908). A.M. was supported by the Russian Science Foundation (RSF-22–74–10098).License
Related items
Showing items with similar title or keywords.
-
GraphBNC : Machine Learning‐Aided Prediction of Interactions Between Metal Nanoclusters and Blood Proteins
Pihlajamäki, Antti; Matus, María Francisca; Malola, Sami; Häkkinen, Hannu (Wiley-VCH Verlag, 2024)Hybrid nanostructures between biomolecules and inorganic nanomaterials constitute a largely unexplored field of research, with the potential for novel applications in bioimaging, biosensing, and nanomedicine. Developing ... -
Molecular docking and oxidation kinetics of 3-phenyl coumarin derivatives by human CYP2A13
Juvonen, Risto O.; Jokinen, Elmeri M.; Huuskonen, Juhani; Kärkkäinen, Olli; Raunio, Hannu; Pentikäinen, Olli T. (Informa Healthcare, 2021)1.CYP2A13 enzyme is expressed in human extrahepatic tissues, while CYP2A6 is a hepatic enzyme. Reactions catalyzed by CYP2A13activate tobacco-specificnitrosamines and some other toxic xenobioticsin lungs. 2.To compare ... -
Firefront Forecasting in Boreal Forests : Machine Learning Approach to Predict Wildfire Propagation
Raita-Hakola, Anna-Maria; Pölönen, Ilkka (Copernicus GmbH, 2024)Wildfires have become increasingly prevalent worldwide due to climate change, posing significant threats to human lives, property, and natural ecosystems. The rapid progression of wildfires necessitates predictive computational ... -
Predicting Children's Myopia Risk : A Monte Carlo Approach to Compare the Performance of Machine Learning Models
Artiemjew, Piotr; Cybulski, Radosław; Emamian, Mohammad; Grzybowski, Andrzej; Jankowski, Andrzej; Lanca, Carla; Mehravaran, Shiva; Młyński, Marcin; Morawski, Cezary; Nordhausen, Klaus; Pärssinen, Olavi; Ropiak, Krzysztof (SCITEPRESS Science and Technology Publications, 2024)This study presents the initial results of the Myopia Risk Calculator (MRC) Consortium, introducing an innovative approach to predict myopia risk by using trustworthy machine-learning models. The dataset included approximately ... -
Lipid monitoring of Chlorella vulgaris using non-invasive near-infrared spectral imaging
Pääkkönen, Salli; Pölönen, Ilkka; Calderini, Marco; Yli-Tuomola, Aliisa; Ruokolainen, Visa; Vihinen-Ranta, Maija; Salmi, Pauliina (Springer Nature, 2024)Microalgal lipids are molecules of biotechnological interest for their application in sustainable food and energy production. However, lipid production is challenged by the time-consuming and laborious monitoring of lipid ...