Better, easier, more beautiful work … what stressed researcher wouldn’t want that?
Task Area 4 of FAIRagro develops and provides the central infrastructure services for the consortium. One of the four workpackages of TA 4 is Measure 4.4: Scientific Workflow Infrastructure (SciWIn). An essential part of this work package is the conception and development of the SciWIn client. Harald von Waldow and Jens Krumsieck (both from the Thünen Institute) explain what this is and what it involves.
SciWIn stands for Scientific Workflow Infrastructure – what exactly is a “workflow” in this context?
We refer to computer workflows in a rather loose sense: scientists very often work on highly interactive processes such as data extraction, exploration, cleansing, transformation, visualization and analysis. Always digitally, using scripts and all kinds of digital tools. The whole process ultimately leads to one or more sequences of calculation steps, where one step consists of the connection of input with output by a data processing operation. Successful sequences of this kind, i.e. those that the researcher wants to reuse, are “workflows” in this sense.
What are the challenges of these workflows?
There is currently no established practice for storing, reproducing and organizing such workflows and communicating them in an orderly manner, for example to colleagues or cooperation partners. However, there are formalized workflow description languages, such as SnakeMake, Nextflow or CWL.
However, these languages are quite complex and must first be learned before computer-aided workflows can be described in this way. This is the reason why only a few scientific domains with the relevant skills, such as bioinformatics, have introduced such tools. Quantitative scientists in many other fields have no means to systematically manage such workflows, therefore need to work with ad-hoc data management techniques and run the risk of losing track and being less efficient.
SciWIn as a solution to these challenges
To meet these challenges, we are developing the SciWIn client in FAiRagro: We pick up scientists directly at the digital workbench, the computer terminal, where they perform iterative and highly interactive processes such as data extraction, cleansing, visualization, exploration, analysis and transformation.
The SciWIn client is a command-line tool (s4n) designed for easy creation, recording, annotation and execution of computational workflows.
What Git is to versioning, s4n is to provenance management: from simple one-step computations to complex multi-branch pipelines, s4n records the interdependencies of data and code artifacts. These records can be retraced and re-executed, even on other computers. The individual artifacts and calculation steps form a directed graph that can be annotated with metadata. s4n will also support this annotation.
Finally, s4n aims to package the resulting workflow in the Workflow Run RO-Crate format, which is emerging as the current standard. In this way, the SciWIn client will become part of an innovative ecosystem for the FAIR handling of research data and software and will, for example, enable the publication of workflows via workflowhub.eu.
Get involved: Let us know if you have already tried SciWIn as a FAIRagro service (download from GitHub) or share your ideas and feedback with us in the form of a GitHub issue.
If you would like to know more, please contact Harald von Waldow .