dc.contributor.author | Poso, Venla | |
dc.contributor.author | Lipsanen, Mikko | |
dc.contributor.author | Toivanen, Ida | |
dc.contributor.author | Välisalo, Tanja | |
dc.date.accessioned | 2024-10-22T11:21:55Z | |
dc.date.available | 2024-10-22T11:21:55Z | |
dc.date.issued | 2024 | |
dc.identifier.citation | Poso, V., Lipsanen, M., Toivanen, I., & Välisalo, T. (2024). Making Sense of Bureaucratic Documents : Named Entity Recognition for State Authority Archives. In <i>Archiving 2024 Final Program and Proceedings</i> (pp. 6-10). Society for Imaging Science & Technology. Archiving, 21. <a href="https://doi.org/10.2352/issn.2168-3204.2024.21.1.2" target="_blank">https://doi.org/10.2352/issn.2168-3204.2024.21.1.2</a> | |
dc.identifier.other | CONVID_243581273 | |
dc.identifier.uri | https://jyx.jyu.fi/handle/123456789/97591 | |
dc.description.abstract | The usability and accessibility of digitised archival data can be improved using deep learning solutions. In this paper, the authors present their work in developing a named entity recognition (NER) model for digitised archival data, specifically state authority documents. The entities for the model were chosen based on surveying different user groups. In addition to common entities, two new entities were created to identify businesses (FIBC) and archival documents (JON). The NER model was trained by fine-tuning an existing Finnish BERT model. The training data also included modern digitally born texts to achieve good performance with various types of inputs. The finished model performs fairly well with OCR-processed data, achieving an overall F1 score of 0.868, and particularly well with the new entities (F1 scores of 0.89 and 0.97 for JON and FIBC, respectively). | en |
dc.format.extent | 106 | |
dc.format.mimetype | application/pdf | |
dc.language.iso | eng | |
dc.publisher | Society for Imaging Science & Technology | |
dc.relation.ispartof | Archiving 2024 Final Program and Proceedings | |
dc.relation.ispartofseries | Archiving | |
dc.rights | CC BY 4.0 | |
dc.title | Making Sense of Bureaucratic Documents : Named Entity Recognition for State Authority Archives | |
dc.type | conferenceObject | |
dc.identifier.urn | URN:NBN:fi:jyu-202410226451 | |
dc.contributor.laitos | Musiikin, taiteen ja kulttuurin tutkimuksen laitos | fi |
dc.contributor.laitos | Informaatioteknologian tiedekunta | fi |
dc.contributor.laitos | Department of Music, Art and Culture Studies | en |
dc.contributor.laitos | Faculty of Information Technology | en |
dc.type.uri | http://purl.org/eprint/type/ConferencePaper | |
dc.relation.isbn | 978-0-89208-366-2 | |
dc.type.coar | http://purl.org/coar/resource_type/c_5794 | |
dc.description.reviewstatus | nonPeerReviewed | |
dc.format.pagerange | 6-10 | |
dc.relation.issn | 2161-8798 | |
dc.type.version | publishedVersion | |
dc.rights.copyright | © Authors 2024 | |
dc.rights.accesslevel | openAccess | fi |
dc.relation.conference | Archiving Conference | |
dc.relation.grantnumber | 358726 | |
dc.subject.yso | koneoppiminen | |
dc.subject.yso | arkistoaineistot | |
dc.subject.yso | nimien tunnistus | |
dc.subject.yso | tekstintunnistus | |
dc.subject.yso | digitointi | |
dc.subject.yso | valtionarkistot | |
dc.format.content | fulltext | |
jyx.subject.uri | http://www.yso.fi/onto/yso/p21846 | |
jyx.subject.uri | http://www.yso.fi/onto/yso/p28616 | |
jyx.subject.uri | http://www.yso.fi/onto/yso/p38590 | |
jyx.subject.uri | http://www.yso.fi/onto/yso/p5825 | |
jyx.subject.uri | http://www.yso.fi/onto/yso/p23839 | |
jyx.subject.uri | http://www.yso.fi/onto/yso/p23829 | |
dc.rights.url | https://creativecommons.org/licenses/by/4.0/ | |
dc.relation.doi | 10.2352/issn.2168-3204.2024.21.1.2 | |
dc.relation.funder | Research Council of Finland | en |
dc.relation.funder | Suomen Akatemia | fi |
jyx.fundingprogram | Research infrastructures, AoF | en |
jyx.fundingprogram | Tutkimusinfrastruktuuri, SA | fi |
dc.type.okm | B3 | |