Show simple item record

dc.contributor.authorPoso, Venla
dc.contributor.authorLipsanen, Mikko
dc.contributor.authorToivanen, Ida
dc.contributor.authorVälisalo, Tanja
dc.date.accessioned2024-10-22T11:21:55Z
dc.date.available2024-10-22T11:21:55Z
dc.date.issued2024
dc.identifier.citationPoso, V., Lipsanen, M., Toivanen, I., & Välisalo, T. (2024). Making Sense of Bureaucratic Documents : Named Entity Recognition for State Authority Archives. In <i>Archiving 2024 Final Program and Proceedings</i> (pp. 6-10). Society for Imaging Science & Technology. Archiving, 21. <a href="https://doi.org/10.2352/issn.2168-3204.2024.21.1.2" target="_blank">https://doi.org/10.2352/issn.2168-3204.2024.21.1.2</a>
dc.identifier.otherCONVID_243581273
dc.identifier.urihttps://jyx.jyu.fi/handle/123456789/97591
dc.description.abstractThe usability and accessibility of digitised archival data can be improved using deep learning solutions. In this paper, the authors present their work in developing a named entity recognition (NER) model for digitised archival data, specifically state authority documents. The entities for the model were chosen based on surveying different user groups. In addition to common entities, two new entities were created to identify businesses (FIBC) and archival documents (JON). The NER model was trained by fine-tuning an existing Finnish BERT model. The training data also included modern digitally born texts to achieve good performance with various types of inputs. The finished model performs fairly well with OCR-processed data, achieving an overall F1 score of 0.868, and particularly well with the new entities (F1 scores of 0.89 and 0.97 for JON and FIBC, respectively).en
dc.format.extent106
dc.format.mimetypeapplication/pdf
dc.language.isoeng
dc.publisherSociety for Imaging Science & Technology
dc.relation.ispartofArchiving 2024 Final Program and Proceedings
dc.relation.ispartofseriesArchiving
dc.rightsCC BY 4.0
dc.titleMaking Sense of Bureaucratic Documents : Named Entity Recognition for State Authority Archives
dc.typeconferenceObject
dc.identifier.urnURN:NBN:fi:jyu-202410226451
dc.contributor.laitosMusiikin, taiteen ja kulttuurin tutkimuksen laitosfi
dc.contributor.laitosInformaatioteknologian tiedekuntafi
dc.contributor.laitosDepartment of Music, Art and Culture Studiesen
dc.contributor.laitosFaculty of Information Technologyen
dc.type.urihttp://purl.org/eprint/type/ConferencePaper
dc.relation.isbn978-0-89208-366-2
dc.type.coarhttp://purl.org/coar/resource_type/c_5794
dc.description.reviewstatusnonPeerReviewed
dc.format.pagerange6-10
dc.relation.issn2161-8798
dc.type.versionpublishedVersion
dc.rights.copyright© Authors 2024
dc.rights.accesslevelopenAccessfi
dc.relation.conferenceArchiving Conference
dc.relation.grantnumber358726
dc.subject.ysokoneoppiminen
dc.subject.ysoarkistoaineistot
dc.subject.ysonimien tunnistus
dc.subject.ysotekstintunnistus
dc.subject.ysodigitointi
dc.subject.ysovaltionarkistot
dc.format.contentfulltext
jyx.subject.urihttp://www.yso.fi/onto/yso/p21846
jyx.subject.urihttp://www.yso.fi/onto/yso/p28616
jyx.subject.urihttp://www.yso.fi/onto/yso/p38590
jyx.subject.urihttp://www.yso.fi/onto/yso/p5825
jyx.subject.urihttp://www.yso.fi/onto/yso/p23839
jyx.subject.urihttp://www.yso.fi/onto/yso/p23829
dc.rights.urlhttps://creativecommons.org/licenses/by/4.0/
dc.relation.doi10.2352/issn.2168-3204.2024.21.1.2
dc.relation.funderResearch Council of Finlanden
dc.relation.funderSuomen Akatemiafi
jyx.fundingprogramResearch infrastructures, AoFen
jyx.fundingprogramTutkimusinfrastruktuuri, SAfi
dc.type.okmB3


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record

CC BY 4.0
Except where otherwise noted, this item's license is described as CC BY 4.0