Increasing FAIRness of FAIRagro data through AI supported metadata enrichment
This use case pilot deals with the possibilities of AI-based methods to support the enrichment of metadata in order to improve the FAIRness of data in the context of agrosystems research.
Partner:
Informationzentrum Lebenswissenschaften (ZB MED)
Description
High-quality metadata is essential for the FAIR principles. As research data management (RDM) becomes crucial, stakeholders recognize the need for meaningful metadata and automated generation processes. These advancements ensure future metadata FAIRness but do not address legacy metadata, which lacks standardized collection practices.
FAIRagro is developing a metadata schema for harmonizing heterogeneous metadata of participating Research Data Infrastructures (RDIs) to facilitate a central search, increasing the Findability of agrosystem resources. To enable an efficient transformation of legacy metadata of the RDIs to make integration into the FAIRagro Central Search Service as efficient as possible, this pilot project tests how far state of the art AI-based text mining techniques, e.g., deep learning models and few-shot learning, are able to automatically extract relevant information from unstructured data (e.g. dataset abstracts, related publications, the data itself, etc.), using Named Entity Recognition (NER). These tasks involve identifying both general and agrosystem domain specific entities and relations. The goal of this pilot is to extract information on two different core entities (Crops, Soil) from two different FAIRagro RDIs (OpenAgrar, BonaRes Repository) and make it available in a structured way. Furthermore, it evaluates if text mining offers a viable method for enriching metadata to the schema developed for powering the Central Search Service.
The outcome will show how far AI methods are ready to make agrosystem resources FAIRer and to assist participating RDIs in extending their provided metadata in a resource efficient way. If the pilot is successful, further developments can be made to support all FAIRagro infrastructures or even other domains in metadata extension, opening up possibilities for e.g. cross-NFDI consortia collaboration in the future.