Increasing FAIRness of FAIRagro data through AI supported metadata enrichment
This use case pilot deals with the possibilities of AI-based methods to support the enrichment of metadata in order to improve the FAIRness of data in the context of agrosystems research.
Partner:
Informationzentrum Lebenswissenschaften (ZB MED)
Description
High-quality metadata is essential for the FAIR principles. As research data management (RDM) becomes crucial, stakeholders recognize the need for meaningful metadata and automated generation processes. These advancements ensure future metadata FAIRness but do not address legacy metadata, which lacks standardized collection practices.
FAIRagro is developing a metadata schema for harmonizing heterogeneous metadata of participating Research Data Infrastructures (RDIs) to facilitate a central search, increasing the Findability of agrosystem resources. To enable an efficient transformation of legacy metadata of the RDIs to make integration into the FAIRagro Central Search Service as efficient as possible, this pilot project tests how far state of the art AI-based text mining techniques, e.g., deep learning models and few-shot learning, are able to automatically extract relevant information from unstructured data (e.g. dataset abstracts, related publications, the data itself, etc.), using Named Entity Recognition (NER). These tasks involve identifying both general and agrosystem domain specific entities and relations. The goal of this pilot is to extract information on two different core entities (Crops, Soil) from two different FAIRagro RDIs (OpenAgrar, BonaRes Repository) and make it available in a structured way. Furthermore, it evaluates if text mining offers a viable method for enriching metadata to the schema developed for powering the Central Search Service.
The outcome will show how far AI methods are ready to make agrosystem resources FAIRer and to assist participating RDIs in extending their provided metadata in a resource efficient way. If the pilot is successful, further developments can be made to support all FAIRagro infrastructures or even other domains in metadata extension, opening up possibilities for e.g. cross-NFDI consortia collaboration in the future.
More Use Cases
Use case 1: Exploiting genotype × location × year × management interactions for sustainable crop production
Use case 2: Assessing tradeoffs for optimal crop nitrogen management
Use case 3: Streamlining pest and disease data to advance integrated pest management
Use case 4: Learning from incomplete data
Use case 5: Noninvasive phenotyping with autonomous robots
Use case 6: Automated data flows for crop simulation models
Use Case 7: Next Generation Environmental and eXtended Tools for Extreme Events & Plant Resilience Assessment (NEXT-Gen-EXPERT)
Use Case (Pilot) 8: Systematic Approaches for Efficient Data Synchronization in Horticultural Sciences (HortSEEDS)
Use Case (Pilot) 9: Increasing FAIRness of FAIRagro data through AI supported metadata enrichment