TY - GEN
T1 - On Generating SHACL Shapes from Collective Collection of Plant Trait Data
AU - Saleh, Dadan Ridwan
AU - Kartika, Yulia Aris
AU - Akbar, Zaenal
AU - Krisnadhi, Adila Alfa
AU - Fatriasari, Widya
N1 - Publisher Copyright:
© 2022 ACM.
PY - 2022/11/22
Y1 - 2022/11/22
N2 - Collective data collection has become common in various domains, including biodiversity science. Multiple individuals work on the same biological samples or specimens using various scientific tools to measure different characteristics. Moreover, the measurements are typically regulated by different data collection procedures and protocols. Integrating and guaranteeing the quality of the data has become a significant issue. One solution is to adopt the RDF (Resource Description Framework) data model in combination with a language for validating RDF graphs such as SHACL (Shapes Constraint Language). The RDF data model provides flexibility in accommodating multiple data schemas, while SHACL uses a set of conditions so called shapes, to validate the RDF data graphs. The remaining challenge is an effective method to define SHACL shapes that can be used to validate any given RDF data. This work introduces a semi-Automatic database-driven solution to generate SHACL shapes. The solution relies on the database's internal structure and data items' values. The solution was applied to a traits database from natural fiber plants in Indonesia, where a high number of individual shapes were successfully generated. Furthermore, a qualitative evaluation indicated the appropriate quality of the shapes. This work contributes to increasing the quality of biodiversity data collections, which has become an essential factor in Big Biodiversity Data processing.
AB - Collective data collection has become common in various domains, including biodiversity science. Multiple individuals work on the same biological samples or specimens using various scientific tools to measure different characteristics. Moreover, the measurements are typically regulated by different data collection procedures and protocols. Integrating and guaranteeing the quality of the data has become a significant issue. One solution is to adopt the RDF (Resource Description Framework) data model in combination with a language for validating RDF graphs such as SHACL (Shapes Constraint Language). The RDF data model provides flexibility in accommodating multiple data schemas, while SHACL uses a set of conditions so called shapes, to validate the RDF data graphs. The remaining challenge is an effective method to define SHACL shapes that can be used to validate any given RDF data. This work introduces a semi-Automatic database-driven solution to generate SHACL shapes. The solution relies on the database's internal structure and data items' values. The solution was applied to a traits database from natural fiber plants in Indonesia, where a high number of individual shapes were successfully generated. Furthermore, a qualitative evaluation indicated the appropriate quality of the shapes. This work contributes to increasing the quality of biodiversity data collections, which has become an essential factor in Big Biodiversity Data processing.
KW - biodiversity
KW - collective data
KW - plant trait data
KW - SHACL
UR - http://www.scopus.com/inward/record.url?scp=85149381281&partnerID=8YFLogxK
U2 - 10.1145/3575882.3575945
DO - 10.1145/3575882.3575945
M3 - Conference contribution
AN - SCOPUS:85149381281
T3 - ACM International Conference Proceeding Series
SP - 326
EP - 330
BT - Proceeding - 2022 9th International Conference on Computer, Control, Informatics and Its Applications
PB - Association for Computing Machinery
T2 - 9th International Conference on Computer, Control, Informatics and Its Applications: Digital Transformation Towards Sustainable Society for Post Covid-19 Recovery, IC3INA 2022
Y2 - 22 November 2022 through 23 November 2022
ER -