Data Quality Tool

Sven Gedicke explains the Data Quality Tool

Task Area 3 (TA3) “Standardization, Interoperability and Quality” is primarily working on facilitating the reuse, quality control and annotation of research data. The main prerequisites for this are practicable guidelines, quality and legal metadata standards.

One of the six working areas of TA3 is “Data Quality Annotation, Curation and Feedback/Review” – within which Sven Gedicke from the University of Bonn and his colleagues are developing the topic “Light-weight algorithms for in-field data quality assessment in agricultural science”.

Sven Gedicke
Sven Gedicke, FAIRagro, Uni Bonn

Sven Gedicke (SG) explains what this is all about and what the advantages are:

You are developing an interface for evaluating the quality of data “on site”, i.e. in the field. How should I imagine this in concrete terms?

SG: Agricultural scientists already frequently use mobile devices such as smartphones or tablets to collect data in the field. Simple apps are often used to replace the analog field book and facilitate the documentation of data collection. We are building on this and expanding this established working method to include a direct analysis of the quality of the data that has just been recorded. Our toolbox is therefore being developed on a browser basis so that it can be used flexibly on all devices and operating systems. Researchers should be able to open the application on their usual end devices and display various quality metrics of the data they have just recorded, for example on plausibility or anomalies in the data.

What are the advantages of this development for the researcher?

SG: Our toolbox enables researchers to be made aware of potential quality problems while they still are collecting data in the field, so that discrepancies or missing values can be identified and rectified immediately, for example by re-recording a conspicuous data point. This prevents problems from only becoming apparent later in the office, when corrections would be much more time-consuming or, in the worst case, a new field inspection would be necessary. Thanks to the early feedback, our solution saves time and resources and increases the overall data quality.

Speaking of metadata and reusability of data – does this development also bring advantages in this area?

SG: Absolutely! In addition to supporting data collection, our toolbox also aims to improve the availability of quality-related metadata for published datasets. The additional calculation and documentation of quality metrics represents a considerable additional effort for data providers, which is why such information has often been missing from published datasets to date. By using our toolbox, this quality information can be automatically generated and saved as part of the data collection process. This means that it is also directly available for subsequent use as metadata on data quality. This significantly increases the transparency and reusability of the data and makes a targeted contribution to the implementation of the FAIR principles.

Have you found solutions to the challenge that there often is no connection to a server in the field with mobile devices and that mobile devices only have limited storage space?

SG: As a stable internet connection is not always guaranteed in the field, we deliberately avoid server-side calculations and instead rely on local algorithms that can be executed directly on the mobile device. We attach particular importance to ensuring that the processes are memory-efficient and deliver results in real time despite the limited resources of mobile devices. This approach not only improves the accessibility and reliability of the application in the field, but also brings advantages in terms of data protection: all calculations are performed locally so that the data does not leave the device.

There are already institutionalized processes for data collection existing. Can your development be integrated there as a kind of supplement or is it a completely new workflow?

SG: We are currently developing our toolbox as a standalone interface that can be used flexibly on any end device with browser access. However, our long-term goal is to integrate quality analysis directly into existing data collection applications, for example as a plugin or optional module. To achieve this, our algorithms would have to be made available as a modular library that can be seamlessly embedded in existing architectures.

In the video, you present the product at a relatively early stage of development. What is the time horizon until “market maturity”?

SG: As the heterogeneity of the data collected in agricultural science is enormous (from simple in-situ measurements to complex point clouds), we have deliberately opted for a bottom-up strategy. We are starting with simpler, widely used data types and then gradually developing additional functionalities based on the specific needs of the specialist community. We currently have a prototype implementation in which analysis functions for time series data have been implemented. Our aim is to provide a stable and fully functional version by the end of 2026 that can be used both in the field and for the subsequent use of published data. Accompanying documentation and video tutorials will be produced to make it easier to get started.

Sven Gedicke – Light-Weight Algorithms for In-Field Data Quality Assessment in Agricultural Science International Conference on Digital Technologies for Sustainable Crop Production (DIGICROP 2025) • July 8-9, 2025

Watch this video on Youtube https://www.youtube.com/watch?v=cKK7DzXUgjU