Semantic annotation and big data techniques for patent information processing

Abstract

This thesis analyzes approaches to generate semantic annotations on patent records, as well as on other structured data, by relying on the structure and semantic representation of documents. Information in patent records reflects how real-world technologies evolve, and the approximately 3 million annual new patent applications capture the global inventive frontier. The volume of this information is too big to be effectively analyzed purely with human effort, necessitating Big data approaches to analyze it with computer aided tools and techniques. Big data is a term that describes a massive volume of structured, semi structured and unstructured data that is so large to the point that it is difficult to process using tradi- tional database and software tools and techniques. Currently, technical information, such as patents, is typically stored in data repositories that do not support advanced Big data methods to structure and interpret documents. In the emerging Semantic technology, annotation, Web search, as well as interpretation and aggregation can be addressed by ontology-based seman- tic annotation. This thesis examines semantic annotation and other Big data methodologies, and their basic requirements, and reviews the current generation of semantic annotation and other Big data systems. As a use case, this thesis demonstrates how semantic annotation and other Big data techniques are employed to enhance the human processes whereby peo- ple retrieve information, carry out analysis or discovery within a large collection of patent information.

Main Author

Mwakyusa, Phesto Enock

Format

Theses Master thesis

Published

2017

Subjects

semanttinen annotointi

Data Mining

Semantic annotation

Patent information

big data

tiedonlouhinta

patentit

annotointi

The permanent address of the publication

https://urn.fi/URN:NBN:fi:jyu-201710234047Käytä tätä linkitykseen.

Language

English

License

Semantic annotation and big data techniques for patent information processing

Share

Similar Items