Show simple item record

dc.contributor.authorZaretckii, Mark
dc.contributor.authorBuslaev, Pavel
dc.contributor.authorKozlovskii, Igor
dc.contributor.authorMorozov, Alexander
dc.contributor.authorPopov, Petr
dc.date.accessioned2024-09-11T08:42:14Z
dc.date.available2024-09-11T08:42:14Z
dc.date.issued2024
dc.identifier.citationZaretckii, M., Buslaev, P., Kozlovskii, I., Morozov, A., & Popov, P. (2024). Approaching Optimal pH Enzyme Prediction with Large Language Models. <i>ACS Synthetic Biology</i>, <i>Early online</i>. <a href="https://doi.org/10.1021/acssynbio.4c00465" target="_blank">https://doi.org/10.1021/acssynbio.4c00465</a>
dc.identifier.otherCONVID_241748096
dc.identifier.urihttps://jyx.jyu.fi/handle/123456789/97023
dc.description.abstractEnzymes are widely used in biotechnology due to their ability to catalyze chemical reactions: food making, laundry, pharmaceutics, textile, brewing─all these areas benefit from utilizing various enzymes. Proton concentration (pH) is one of the key factors that define the enzyme functioning and efficiency. Usually there is only a narrow range of pH values where the enzyme is active. This is a common problem in biotechnology to design an enzyme with optimal activity in a given pH range. A large part of this task can be completed in silico, by predicting the optimal pH of designed candidates. The success of such computational methods critically depends on the available data. In this study, we developed a language-model-based approach to predict the optimal pH range from the enzyme sequence. We used different splitting strategies based on sequence similarity, protein family annotation, and enzyme classification to validate the robustness of the proposed approach. The derived machine-learning models demonstrated high accuracy across proteins from different protein families and proteins with lower sequence similarities compared with the training set. The proposed method is fast enough for the high-throughput virtual exploration of protein space for the search for sequences with desired optimal pH levels.en
dc.format.mimetypeapplication/pdf
dc.language.isoeng
dc.publisherAmerican Chemical Society
dc.relation.ispartofseriesACS Synthetic Biology
dc.rightsCC BY 4.0
dc.subject.otherenzyme optimal pH
dc.subject.otherlarge language models
dc.subject.othermachine learning
dc.subject.otherprotein engineering
dc.titleApproaching Optimal pH Enzyme Prediction with Large Language Models
dc.typearticle
dc.identifier.urnURN:NBN:fi:jyu-202409115903
dc.contributor.laitosKemian laitosfi
dc.contributor.laitosDepartment of Chemistryen
dc.type.urihttp://purl.org/eprint/type/JournalArticle
dc.type.coarhttp://purl.org/coar/resource_type/c_2df8fbb1
dc.description.reviewstatuspeerReviewed
dc.relation.issn2161-5063
dc.relation.volumeEarly online
dc.type.versionpublishedVersion
dc.rights.copyright© The Authors. Published by American Chemical Society
dc.rights.accesslevelopenAccessfi
dc.relation.grantnumber342908
dc.subject.ysolaskennallinen kemia
dc.subject.ysopH
dc.subject.ysoentsyymit
dc.subject.ysobiotekniikka
dc.subject.ysokoneoppiminen
dc.subject.ysoin silico -menetelmä
dc.subject.ysokielimallit
dc.format.contentfulltext
jyx.subject.urihttp://www.yso.fi/onto/yso/p23053
jyx.subject.urihttp://www.yso.fi/onto/yso/p4555
jyx.subject.urihttp://www.yso.fi/onto/yso/p4769
jyx.subject.urihttp://www.yso.fi/onto/yso/p2348
jyx.subject.urihttp://www.yso.fi/onto/yso/p21846
jyx.subject.urihttp://www.yso.fi/onto/yso/p28353
jyx.subject.urihttp://www.yso.fi/onto/yso/p40335
dc.rights.urlhttps://creativecommons.org/licenses/by/4.0/
dc.relation.doi10.1021/acssynbio.4c00465
dc.relation.funderResearch Council of Finlanden
dc.relation.funderSuomen Akatemiafi
jyx.fundingprogramPostdoctoral Researcher, AoFen
jyx.fundingprogramTutkijatohtori, SAfi
jyx.fundinginformationP.B. was supported by the Academy of Finland (grant 342908). A.M. was supported by the Russian Science Foundation (RSF-22–74–10098).
dc.type.okmA1


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record

CC BY 4.0
Except where otherwise noted, this item's license is described as CC BY 4.0