This use case pilot deals with the possibilities of AI-based methods to support the enrichment of metadata in order to improve the FAIRness of data in the context of agrosystems research.

Partners

ZBMED Informationszentrum Lebenswissenschaften Logo

Informationzentrum Lebenswissenschaften (ZB MED)

This team of our partners is working on the success of this use case (link to German page).

Background

High-quality metadata is essential for the FAIR principles. As research data management (RDM) becomes crucial, stakeholders recognize the need for meaningful metadata and automated generation processes. These advancements ensure future metadata FAIRness but do not address legacy metadata, which lacks standardized collection practices.

FAIRagro is developing a metadata schema for harmonizing heterogeneous metadata of participating Research Data Infrastructures (RDIs) to facilitate a central search, increasing the Findability of agrosystem resources. To enable an efficient transformation of legacy metadata of the RDIs to make integration into the FAIRagro Central Search Service as efficient as possible, this pilot project tests how far state of the art AI-based text mining techniques, e.g., deep learning models and few-shot learning, are able to automatically extract relevant information from unstructured data (e.g. dataset abstracts, related publications, the data itself, etc.), using Named Entity Recognition (NER). These tasks involve identifying both general and agrosystem domain specific entities and relations. The goal of this pilot is to extract information on two different core entities (Crops, Soil) from two different FAIRagro RDIs (OpenAgrar, BonaRes Repository) and make it available in a structured way. Furthermore, it evaluates if text mining offers a viable method for enriching metadata to the schema developed for powering the Central Search Service.

The outcome will show how far AI methods are ready to make agrosystem resources FAIRer and to assist participating RDIs in extending their provided metadata in a resource efficient way. If the pilot is successful, further developments can be made to support all FAIRagro infrastructures or even other domains in metadata extension, opening up possibilities for e.g. cross-NFDI consortia collaboration in the future.

Progress & next steps

UC Update: Increasing FAIRness of FAIRagro data through AI supported metadata enrichment

The UC8 team led by Juliane Fluck  presents how AI-supported approaches are used to enhance the quality and completeness of metadata within FAIRagro research data infrastructures. A manually annotated training corpus was created to enable automated extraction of crop and soil metadata. Although some annotation categories show lower consistency, the corpus remains valuable for model training, evaluation, and development. The overarching goal is to improve data discoverability through the FAIRagro Search Hub and RDIs and to support researchers in producing high-quality metadata.

Increasing FAIRness of FAIRagro data through AI supported metadata enrichment

Abdelmalak, A., Fluck, J., Golz, L., Husain, M., Meier, K., Riegler, H., Schneider, G., Specka, X., Svoboda, N., & on behalf of the FAIRagro consortium. (2025). Increasing FAIRness of FAIRagro data through AI supported metadata enrichment. FAIRagro Plenary 2025, Julius Kühn-Institut (JKI) Bundesforschungsinstitut für Kulturpflanzen Königin-Luise-Straße 19, 14195, Berlin. Zenodo.
https://doi.org/10.5281/zenodo.17349978

Any questions about this use case?

Please contact Anne Sennhenn for further information.