Making Sense of Bureaucratic Documents : Named Entity Recognition for State Authority Archives
Poso, V., Lipsanen, M., Toivanen, I., & Välisalo, T. (2024). Making Sense of Bureaucratic Documents : Named Entity Recognition for State Authority Archives. In Archiving 2024 Final Program and Proceedings (pp. 6-10). Society for Imaging Science & Technology. Archiving, 21. https://doi.org/10.2352/issn.2168-3204.2024.21.1.2
Published in
ArchivingDate
2024Copyright
© Authors 2024
The usability and accessibility of digitised archival data can be improved using deep learning solutions. In this paper, the authors present their work in developing a named entity recognition (NER) model for digitised archival data, specifically state authority documents. The entities for the model were chosen based on surveying different user groups. In addition to common entities, two new entities were created to identify businesses (FIBC) and archival documents (JON). The NER model was trained by fine-tuning an existing Finnish BERT model. The training data also included modern digitally born texts to achieve good performance with various types of inputs. The finished model performs fairly well with OCR-processed data, achieving an overall F1 score of 0.868, and particularly well with the new entities (F1 scores of 0.89 and 0.97 for JON and FIBC, respectively).
Publisher
Society for Imaging Science & TechnologyParent publication ISBN
978-0-89208-366-2Conference
Archiving ConferenceIs part of publication
Archiving 2024 Final Program and ProceedingsISSN Search the Publication Forum
2161-8798Keywords
Publication in research information system
https://converis.jyu.fi/converis/portal/detail/Publication/243581273
Metadata
Show full item recordCollections
Related funder(s)
Research Council of FinlandFunding program(s)
Research infrastructures, AoFLicense
Related items
Showing items with similar title or keywords.
-
Untapped data resources : Applying NER for historical archival records of state authorities
Poso, Venla; Välisalo, Tanja; Toivanen, Ida; Holmila, Antero; Ojala, Jari (University of Oslo Library, 2023)Archives around the world are digitising their material at a growing speed. The National Archives of Finland launched a mass digitisation process in 2019 aiming to digitise vast amounts of state authority archives. In order ... -
Relaatiotietokanta valjastaa arkistot strategiatutkimukseen
Cheung, Zeerim (Kansallisarkisto, 2020)Artikkelin kirjoittajan tuore väitöskirja esittelee uuden, analyyttisesti jäsennellyn historiantutkimuksen menetelmän, joka perustuu laajojen arkistokokonaisuuksien digitoimiseen ja teorialähtöiseen analysoimiseen ... -
Kansallisarkisto kohti vuotta 2025
Ojala, Jari (Kansallisarkisto, 2020) -
Tulevaisuuden työkalu metsien suojeluarvon määritykseen? : boreaalisten puulajien tunnistus hyperspektrikuvauksen avulla
Kauniskangas, Laura (2022)Puulajien tunnistusta hyperspektrikuvista, yhdistettynä nykyiseen metsä-inventointiin, pidetään potentiaalisena keinona lisätä metsänhoidon kustannus-tehokkuutta sekä kartoittaa metsien suojelutarpeita kattavasti. Tutkimusta ... -
Investigating document availability : ethnographic study on a technical specification
Kauppinen, Kaisa (1999)