AUTOMATIC GENERATION OF TEMPORAL DATA PROVENANCE FROM BIODIVERSITY INFORMATION SYSTEMS

Zaenal Akbar, Dadan R. Saleh, Yulia Aris Kartika, Widya Fatriasari, Adilla A. Krisnadhi, Deded Sarip Nawawi

Research output: Contribution to journalArticlepeer-review

Abstract

Aim/Purpose Although the significance of data provenance has been recognized in a variety of sectors, there is currently no standardized technique or approach for gathering data provenance. The present automated technique mostly employs workflow-based strategies. Unfortunately, the majority of current information systems do not embrace the strategy, particularly biodiversity information systems in which data is acquired by a variety of persons using a wide range of equipment, tools, and protocols. Background This article presents an automated technique for producing temporal data provenance that is independent of biodiversity information systems. The approach is dependent on the changes in contextual information of data items. By mapping the modifications to a schema, a standardized representation of data provenance may be created. Consequently, temporal information may be automatically inferred. Methodology The research methodology consists of three main activities: database event detection, event-schema mapping, and temporal information inference. First, a list of events will be detected from databases. After that, the detected events will be mapped to an ontology, so a common representation of data provenance will be obtained. Based on the derived data provenance, rule-based reasoning will be automatically used to infer temporal information. Consequently, a temporal provenance will be produced. Contribution This paper provides a new method for generating data provenance automatically without interfering with the existing biodiversity information system. In addition to this, it does not mandate that any information system adheres to any particular form. Ontology and the rule-based system as the core components of the solution have been confirmed to be highly valuable in biodiversity science. Findings Detaching the solution from any biodiversity information system provides scalability in the implementation. Based on the evaluation of a typical biodiversity information system for species traits of plants, a high number of temporal information can be generated to the highest degree possible. Using rules to encode different types of knowledge provides high flexibility to generate temporal information, enabling different temporal-based analyses and reasoning. Recommendations The strategy is based on the contextual information of data items, yet most in-for Practitioners formation systems simply save the most recent ones. As a result, in order for the solution to function properly, database snapshots must be stored on a frequent basis. Furthermore, a more practical technique for recording changes in contextual information would be preferable. Recommendations The capability to uniformly represent events using a schema has paved the way for Researchers for automatic inference of temporal information. Therefore, a richer representation of temporal information should be investigated further. Also, this work demonstrates that rule-based inference provides flexibility to encode different types of knowledge from experts. Consequently, a variety of temporal-based data analyses and reasoning can be performed. Therefore, it will be better to investigate multiple domain-oriented knowledge using the solution. Impact on Society Using a typical information system to store and manage biodiversity data has not prohibited us from generating data provenance. Since there is no restriction on the type of information system, our solution has a high potential to be widely adopted. Future Research The data analysis of this work was limited to species traits data. However, there are other types of biodiversity data, including genetic composition, species population, and community composition. In the future, this work will be expanded to cover all those types of biodiversity data. The ultimate goal is to have a standard methodology or strategy for collecting provenance from any biodiversity data regardless of how the data was stored or managed.

Original languageEnglish
Pages (from-to)361-385
Number of pages25
JournalInterdisciplinary Journal of Information, Knowledge, and Management
Volume17
DOIs
Publication statusPublished - 2022

Keywords

  • biodiversity
  • ontology
  • rule-based reasoning
  • temporal data provenance

Fingerprint

Dive into the research topics of 'AUTOMATIC GENERATION OF TEMPORAL DATA PROVENANCE FROM BIODIVERSITY INFORMATION SYSTEMS'. Together they form a unique fingerprint.

Cite this