WEBVTT video1373535558

NOTE
Transkribiert mit noScribe Vers. 0.7
Audiodatei: M:/FAIRagro Talk/talk sciwin video transkription/video1373535558.mp4
(Sprache: Auto (auto) | Sprecher:in erkennen: auto | Überlappende Sprache: 1 | Zeitmarken: 0 | Füllworte: 1 | Pausen markieren: 1)


NOTE media: M:/FAIRagro Talk/talk sciwin video transkription/video1373535558.mp4

1
00:00:00.010 --> 00:00:20.530
<v S01>It is yours. Thank you. Today I will talk to you, I will present to you what we have done in the last one and a half years. We, what means we, this is, of course, me and Antonia, Xaver, Harald, Florian and Patrick.

2
00:00:22.070 --> 00:00:27.410
<v S01>Unfortunately, Florian and Patrick left the project a few months ago.

3
00:00:27.410 --> 00:00:28.670
<v S01>

4
00:00:28.670 --> 00:00:40.864
<v S01>Today&#x27;s talk will be about SciWIn. SciWIn stands for Scientific Workflow Infrastructure and its purpose is making your workflows work for you, as the subtitle says.

5
00:00:42.460 --> 00:00:51.020
<v S01>First of all, what is a workflow? Workflow usually is a commonly used term for a lot of things.

6
00:00:51.020 --> 00:01:04.830
<v S01>To compile these slides, I searched on the internet for the commonly used definition of workflows and found this one by the Software AG, which says a workflow is an orchestrated

7
00:01:04.830 --> 00:01:15.930
<v S01>and repeatable pattern of business activity enabled by the systematic organization of resources into processes that transform materials, provide services or process information.

8
00:01:15.930 --> 00:01:25.230
<v S01>This definition is not only way too long. Also, it does not fit what we define as a workflow.

9
00:01:25.610 --> 00:01:33.470
<v S01>For example, the term business activity is very non-fitting to what we define as workflow.

10
00:01:33.790 --> 00:01:45.910
<v S01>We define as workflow a scientific computational workflow. So, in short, a set of scripts and tools that run sequential or in parallel to transform input data to output data.

11
00:01:45.910 --> 00:02:07.980
<v S01>So, way easier definition. Short example, we have on the left side input data in form of dataset 1 and on the right side output data in form of figure 1 and table 1 and in between there is transformation and maybe image plot just how easy a workflow could be.

12
00:02:07.980 --> 00:02:09.280
<v S01>

13
00:02:09.280 --> 00:02:17.830
<v S01>Why do we even need workflows? A little example for transforming your data. You can use, I don&#x27;t know, a black box.

14
00:02:17.830 --> 00:02:25.240
<v S01>Because you can transform it by hand in Excel or whatever tool you like or have one messy long script

15
00:02:25.240 --> 00:02:26.320
<v S01>

16
00:02:26.320 --> 00:02:35.710
<v S01>and tell your readers, I have data sets and results and something magic happens between. But how do you publish such a process?

17
00:02:35.710 --> 00:02:50.800
<v S01>So, if you use the workflow, it&#x27;s all described beforehand. So, you have data sets and results like the above example and defined step you could describe as actions in, for example, a publication.

18
00:02:50.800 --> 00:02:57.330
<v S01>So, workflows make data processing transparent, reproducible and also automatable.

19
00:02:58.304 --> 00:03:04.220
<v S01>Speaking of automatable, we have also to discuss how to execute workflows

20
00:03:04.220 --> 00:03:19.900
<v S01>. There are things called workflow management systems and you will see different levels of workflow management systems when we climb the hills of this mountain.

21
00:03:20.220 --> 00:03:30.710
<v S01>The first one I call poor man&#x27;s WfMS, which is you executing the single script by hand. This has a lot of problems

22
00:03:30.710 --> 00:03:39.790
<v S01>. Not only is it error prone. It also leaves data in undefined state because you may forget to re-execute a step of your workflow.

23
00:03:40.150 --> 00:03:51.872
<v S01>And it&#x27;s also repetitive. People do not like repetitive tasks and has this kind of, it works on my machine flare, which is bad.

24
00:03:51.872 --> 00:03:53.344
<v S01>

25
00:03:53.344 --> 00:04:05.330
<v S01>Also, you could write a script, God-like script, which execute every single step you like. But this also faces some problems.

26
00:04:05.330 --> 00:04:13.090
<v S01>You do not have a well-defined environment. You do not have machine actionable metadata about the execution.

27
00:04:13.470 --> 00:04:22.010
<v S01>And we have to code complex logic in step orchestration like parallelism in multithreading or even multi-machine execution.

28
00:04:22.010 --> 00:04:27.170
<v S01>And it&#x27;s also hard to onboard new team members to a long script.

29
00:04:27.170 --> 00:04:29.184
<v S01>

30
00:04:29.184 --> 00:04:36.270
<v S01>So people invented proper WfMS, so workflow management system software where everything is managed.

31
00:04:36.270 --> 00:04:42.880
<v S01>You have a well-defined environment separated from your business logic, so separated from your code.

32
00:04:43.776 --> 00:04:47.220
<v S01>It&#x27;s better reproducible and you can easily share it with coworkers.

33
00:04:47.220 --> 00:04:48.512
<v S01>

34
00:04:48.512 --> 00:04:57.790
<v S01>It is scalable through parallelism. So those workflow management softwares can even distribute workloads among multiple computers.

35
00:04:58.470 --> 00:05:07.790
<v S01>You get visualization and provenance for free, but you have a new system or a new language that you have to learn.

36
00:05:07.990 --> 00:05:16.896
<v S01>So minor downsides. So if workflows are that awesome, why aren&#x27;t our researchers using workflows yet?

37
00:05:16.896 --> 00:05:17.952
<v S01>

38
00:05:17.952 --> 00:05:22.000
<v S01>So first of all, maybe because they&#x27;re not aware that workflows even exist.

39
00:05:22.420 --> 00:05:26.360
<v S01>So researchers usually are trained on science, not on coding or workflows.

40
00:05:26.360 --> 00:05:27.552
<v S01>

41
00:05:27.552 --> 00:05:36.310
<v S01>Then if you did the research or researchers did research, they will find out that there is a lot of tool fragmentations.

42
00:05:37.060 --> 00:05:39.540
<v S01>There are a lot of different tools, different languages.

43
00:05:39.540 --> 00:05:47.520
<v S01>So what workflow languages do you start with? CWL, WDL, Nextflow, Snakemake or whatever.

44
00:05:47.520 --> 00:05:48.800
<v S01>

45
00:05:48.800 --> 00:05:52.330
<v S01>Also workflow descriptions can be very verbose.

46
00:05:52.530 --> 00:06:00.170
<v S01>So all these workflow languages have a lot of boilerplate code to describe what happens in your execution.

47
00:06:00.170 --> 00:06:12.420
<v S01>So it is hard if you have to write all those stuff by hand and also demanding technologies being used, for example, Docker and reward structures.

48
00:06:12.600 --> 00:06:18.304
<v S01>Fair data often is simply not rewarded by the journals you want to publish it.

49
00:06:19.168 --> 00:06:21.090
<v S01>So is there enough return on investment?

50
00:06:21.290 --> 00:06:22.560
<v S01>It&#x27;s the question.

51
00:06:22.560 --> 00:06:24.992
<v S01>

52
00:06:24.992 --> 00:06:30.280
<v S01>Speaking of fragmented tools, there are a lot of workflow management systems solutions.

53
00:06:30.600 --> 00:06:44.670
<v S01>I just compiled a few logos of the most mentioned workflow management systems on the internet, where on the left-hand side more, there are more systems and the right-hand side more language style.

54
00:06:45.570 --> 00:06:52.530
<v S01>The CWL team compiled a list of 362 different workflow management systems.

55
00:06:53.030 --> 00:07:03.776
<v S01>Of course, they wanted to have a new universal standard, which they did, the common workflow language, CWL.

56
00:07:03.776 --> 00:07:05.472
<v S01>

57
00:07:05.472 --> 00:07:11.490
<v S01>CWL is an open standard describing command line tools and their connections to create a workflow.

58
00:07:11.770 --> 00:07:14.550
<v S01>So it&#x27;s metadata describing the execution.

59
00:07:14.550 --> 00:07:17.470
<v S01>It describes what to run, not how to run.

60
00:07:17.670 --> 00:07:21.568
<v S01>So what programs and not what algorithms do we run.

61
00:07:21.568 --> 00:07:22.624
<v S01>

62
00:07:22.624 --> 00:07:28.850
<v S01>It&#x27;s a community based standards and it&#x27;s a standard, not a specific software package.

63
00:07:29.030 --> 00:07:36.032
<v S01>So there&#x27;s no vendor login and you have a workflow that is portable between systems.

64
00:07:36.032 --> 00:07:37.312
<v S01>

65
00:07:37.312 --> 00:07:38.770
<v S01>Also it&#x27;s YAML based.

66
00:07:39.110 --> 00:07:47.140
<v S01>So it&#x27;s readable for humans and also for machines, which makes it easy parsible by software.

67
00:07:47.140 --> 00:07:48.352
<v S01>

68
00:07:48.352 --> 00:07:54.590
<v S01>And there are three types of CWL files, a workflow, a command line tool and expression tool.

69
00:07:54.790 --> 00:07:58.208
<v S01>However, expression tools are not used so often.

70
00:07:59.200 --> 00:08:06.230
<v S01>The tools describe granular steps used in workflows and workflows describe how the tool connect.

71
00:08:06.230 --> 00:08:16.340
<v S01>So for example, on the right side, the green boxes are tools and the whole figure with all those errors is a workflow.

72
00:08:16.340 --> 00:08:18.528
<v S01>

73
00:08:18.528 --> 00:08:20.910
<v S01>Let&#x27;s look into the structure of CWL documents.

74
00:08:20.910 --> 00:08:24.510
<v S01>We have here a command line tools and the workflow.

75
00:08:24.510 --> 00:08:25.536
<v S01>

76
00:08:25.536 --> 00:08:40.870
<v S01>First of all, all CWL files start with a small block of metadata telling the reader or the software parsing it, which version of CWL was used and which file class it is.

77
00:08:40.870 --> 00:08:44.670
<v S01>So this is a command line tool on the left side and on the right side of workflow.

78
00:08:44.670 --> 00:08:45.790
<v S01>

79
00:08:45.790 --> 00:08:47.770
<v S01>And then there are input parameters.

80
00:08:48.390 --> 00:08:53.070
<v S01>In the blue box, I put there one input parameter for each file.

81
00:08:53.510 --> 00:08:56.170
<v S01>So imagine there are a lot of more.

82
00:08:56.390 --> 00:09:03.232
<v S01>So the file gets long very quick because for example, input parameter of command line 2 has five lines of YAML code.

83
00:09:04.160 --> 00:09:07.880
<v S01>And yeah, as you can see, it has ID and a type each.

84
00:09:08.140 --> 00:09:11.220
<v S01>Then there are output parameters also with ID and type.

85
00:09:12.080 --> 00:09:20.720
<v S01>Command line tools have an output binding as the inputs have an input binding and they search for a glob to find those files.

86
00:09:20.720 --> 00:09:22.496
<v S01>

87
00:09:22.496 --> 00:09:27.880
<v S01>In workflows, the outputs have an output source, which refers to an output of a workflow step.

88
00:09:28.340 --> 00:09:31.960
<v S01>Then CWL command line tools have requirements.

89
00:09:31.960 --> 00:09:34.140
<v S01>In this case, there&#x27;s a Docker requirements.

90
00:09:34.380 --> 00:09:40.140
<v S01>For example, here is GDAL is used and there is the base command ogr2ogr.

91
00:09:40.400 --> 00:09:42.740
<v S01>And for workflows, there&#x27;s a list of steps.

92
00:09:43.000 --> 00:09:51.640
<v S01>So as you can see, each block kind for input, outputs, steps or requirements has one entry here.

93
00:09:51.860 --> 00:09:56.540
<v S01>This can be very lengthy files very quickly.

94
00:09:56.540 --> 00:10:06.070
<v S01>So our solution to that is SciWIn client because we believe you are not supposed to write those CWL files by hand.

95
00:10:06.670 --> 00:10:15.690
<v S01>So SciWIn client is a command line application to produce, edit and execute CWL and rewrote it in Rust.

96
00:10:15.690 --> 00:10:17.312
<v S01>

97
00:10:17.312 --> 00:10:21.728
<v S01>So I will now show you what features SciWIn client has.

98
00:10:21.728 --> 00:10:23.328
<v S01>

99
00:10:23.328 --> 00:10:33.790
<v S01>So I think the most prominent feature is the Easy CommandLineTool Creation where you have your research environment with code and data set.

100
00:10:34.150 --> 00:10:38.920
<v S01>And then a Git repository is added.

101
00:10:38.920 --> 00:10:45.280
<v S01>So for those who don&#x27;t know Git, Git is like a time machine for code data and files.

102
00:10:45.500 --> 00:10:55.620
<v S01>It is a version control system and Git knows so-called commits, which are snapshots in time of a well-defined status of your research.

103
00:10:55.620 --> 00:10:58.820
<v S01>So then you execute S4n.

104
00:10:59.020 --> 00:11:05.020
<v S01>S4n is short for SciWIn because there are four letters between the S and the N in SciWIn.

105
00:11:05.020 --> 00:11:06.500
<v S01>

106
00:11:06.500 --> 00:11:11.500
<v S01>S4n create and then your usual command, for example, python calculation.py whatever.

107
00:11:11.500 --> 00:11:12.544
<v S01>

108
00:11:12.544 --> 00:11:16.730
<v S01>Then it creates a commit which contains the generated CWL file.

109
00:11:17.110 --> 00:11:21.952
<v S01>Then have your research environment with a command line tool in it.

110
00:11:22.656 --> 00:11:23.488
<v S01>How does it work?

111
00:11:23.488 --> 00:11:24.800
<v S01>

112
00:11:24.800 --> 00:11:35.980
<v S01>So upon calling S4n create, as I said, with a command, which is the command you usually use to execute Python files.

113
00:11:36.100 --> 00:11:46.780
<v S01>For example, you can see that the script call python calculation.py is then used to generate the base command block.

114
00:11:46.940 --> 00:11:55.470
<v S01>And from the arguments you give to the file, the input parameters are generated.

115
00:11:55.950 --> 00:12:11.100
<v S01>For example, it can also check whether you mean a string for a file name for a file that does not exist yet or a file that is already existing to determine the input type by just checking whether the file exists.

116
00:12:11.360 --> 00:12:14.400
<v S01>You enter it as a argument.

117
00:12:14.400 --> 00:12:24.680
<v S01>Then we still need the output parameters and there Git comes into play because Git tracks file changes.

118
00:12:24.860 --> 00:12:28.190
<v S01>As I said, there are commits, so snapshots in time.

119
00:12:28.290 --> 00:12:35.970
<v S01>And you can ask the Git command, what&#x27;s changed between two points in time.

120
00:12:35.970 --> 00:12:40.192
<v S01>And you can then build the output parameters by using Git.

121
00:12:40.192 --> 00:12:41.568
<v S01>

122
00:12:41.568 --> 00:12:44.580
<v S01>And the runtime environment is made by options.

123
00:12:44.600 --> 00:12:50.420
<v S01>For example, Docker files network connection or environment variables.

124
00:12:50.420 --> 00:12:55.970
<v S01>So as I learned at the Plenary, not every researcher knows about Docker.

125
00:12:55.970 --> 00:12:57.056
<v S01>

126
00:12:57.056 --> 00:13:01.520
<v S01>So I will briefly tell you what the Docker is in a nutshell.

127
00:13:01.780 --> 00:13:08.260
<v S01>Docker are reproducible, isolated and small research environments.

128
00:13:08.800 --> 00:13:10.680
<v S01>The research is optional here.

129
00:13:11.380 --> 00:13:14.800
<v S01>So it&#x27;s like a virtual machine, but way smaller.

130
00:13:14.800 --> 00:13:20.450
<v S01>It contains all necessary code data and tools to run an app or a workflow.

131
00:13:20.830 --> 00:13:29.330
<v S01>And it enables it to run everywhere on a laptop, HPC, on a cloud, wherever, with consistent results.

132
00:13:29.370 --> 00:13:31.790
<v S01>So it resolves the, it works on my machine problems.

133
00:13:31.790 --> 00:13:33.024
<v S01>

134
00:13:33.024 --> 00:13:35.890
<v S01>You can build own images by special instruction files.

135
00:13:36.390 --> 00:13:44.470
<v S01>So called Dockerfile, which is an example on the bottom, which is constructed by the install commands,

136
00:13:44.470 --> 00:13:50.230
<v S01>for example, for the Python packages. Existing images can be downloaded from a registry.

137
00:13:50.230 --> 00:13:56.730
<v S01>And if the image is then running, it&#x27;s called container. Next feature.

138
00:13:57.090 --> 00:13:58.490
<v S01>Streamlined workflow creation.

139
00:13:58.770 --> 00:14:05.950
<v S01>So imagine you now have a research environment where you did the before step multiple times.

140
00:14:05.950 --> 00:14:10.510
<v S01>You have a lot of code, a lot of data sets and a lot of CWL files.

141
00:14:10.610 --> 00:14:13.550
<v S01>You now want to connect them to workflows.

142
00:14:13.550 --> 00:14:23.670
<v S01>So there is the connect command, which has a parameter workflow and from and to, which adds arrows.

143
00:14:23.870 --> 00:14:27.750
<v S01>In the so-called directed acyclic graph, you can see on the right side.

144
00:14:27.870 --> 00:14:36.890
<v S01>So for example, the highlighted red election arrow is added by the call that is described in the middle terminal view.

145
00:14:37.050 --> 00:14:42.336
<v S01>So you can use this command to build workflows step by step.

146
00:14:42.336 --> 00:14:43.968
<v S01>

147
00:14:43.968 --> 00:14:44.704
<v S00>How does it work?

148
00:14:45.472 --> 00:14:44.704
<v S00>

149
00:14:45.472 --> 00:14:52.490
<v S01>So you use s4n connect main, main here resembles the workflow name.

150
00:14:53.376 --> 00:14:54.890
<v S01>So it can be anything.

151
00:14:54.890 --> 00:14:56.192
<v S01>

152
00:14:56.192 --> 00:14:57.300
<v S01>I chose main here.

153
00:14:57.760 --> 00:15:04.860
<v S01>Then you have from @inputs slash election, which creates because there&#x27;s inputs an input parameter.

154
00:15:04.860 --> 00:15:06.960
<v S01>You can also create output parameter

155
00:15:07.220 --> 00:15:09.360
<v S01>of a workflow by using outputs.

156
00:15:09.660 --> 00:15:21.110
<v S01>And if you use this step ID slot name syntax, then you can combine CWL command line tools.

157
00:15:21.690 --> 00:15:33.984
<v S01>In the next version, @inputs and @outputs are optional because of course, you know, if there is no slash, the user means inputs or outputs.

158
00:15:34.752 --> 00:15:46.320
<v S01>The files are then resolved by searching for workflow that has the step ID you gave as CWL file name.

159
00:15:46.580 --> 00:15:53.024
<v S01>And the slot is searched in the tool workflow, whatever CWL file.

160
00:15:53.024 --> 00:15:54.976
<v S01>

161
00:15:54.976 --> 00:16:01.400
<v S01>So you may ask yourself download election data slash election in this case is very lengthy stuff.

162
00:16:01.640 --> 00:16:04.224
<v S01>How do I get information of what to put here?

163
00:16:04.768 --> 00:16:06.960
<v S01>There is the list command.

164
00:16:07.160 --> 00:16:17.680
<v S01>So s4n list minus a lists all CWL files in your folder structure with their slot information.

165
00:16:17.680 --> 00:16:23.510
<v S01>So, on the right hand side, there is a example table.

166
00:16:23.690 --> 00:16:26.490
<v S01>You can see a tool or workflow name.

167
00:16:26.650 --> 00:16:33.070
<v S01>Then there are the inputs and the outputs, which you can just copy and paste to use with the connect command.

168
00:16:33.350 --> 00:16:36.030
<v S01>And you can also check if you

169
00:16:36.510 --> 00:16:39.170
<v S01>use list on a specific CWL file.

170
00:16:39.350 --> 00:16:43.680
<v S01>You get a detailed information sheet about the CWL file.

171
00:16:43.780 --> 00:16:48.540
<v S01>In this case, it&#x27;s a workflow and where the three dots are under connection status

172
00:16:48.540 --> 00:17:05.820
<v S01>there is a bigger table, which due to size limitations isn&#x27;t here on the slides, but it says you do to each input outputs for steps, whether it&#x27;s properly connected using a tool default or no connection with the red X.

173
00:17:06.040 --> 00:17:10.112
<v S01>So if there&#x27;s red X still connections have to be done.

174
00:17:10.112 --> 00:17:12.480
<v S01>

175
00:17:12.480 --> 00:17:17.184
<v S01>You can also reuse existing workflows with the install commands.

176
00:17:17.184 --> 00:17:18.240
<v S01>I

177
00:17:18.240 --> 00:17:28.650
<v S01>n this case the command shown installs an ARC from the PLANTdataHub called Ru_ChlamyHeatstress, whatever this is.

178
00:17:28.830 --> 00:17:33.150
<v S01>But imagine there maybe is a Workflow or CommandLineTool you want to use.

179
00:17:33.290 --> 00:17:37.130
<v S01>You just install it, which users git Submodules under the hood.

180
00:17:37.130 --> 00:17:44.710
<v S01>So it&#x27;s later committed as a link to code, not, or data, not as a copy of it.

181
00:17:44.990 --> 00:17:47.370
<v S01>And it&#x27;s fixed to the current commit.

182
00:17:47.990 --> 00:17:49.930
<v S01>So remember snapshot in time.

183
00:17:50.210 --> 00:17:57.690
<v S01>So if the code in the ARC changes, it will not break your workflows.

184
00:17:57.690 --> 00:17:58.944
<v S01>T

185
00:17:58.944 --> 00:18:03.840
<v S01>his of course can then be used, in your own workflows by the connect command.

186
00:18:04.380 --> 00:18:06.580
<v S01>And you can also use uninstall to remove.

187
00:18:06.580 --> 00:18:13.472
<v S01>When you are finished with building your workflow, you of course want to execute it.

188
00:18:13.472 --> 00:18:14.752
<v S01>

189
00:18:14.752 --> 00:18:20.850
<v S01>There are two major possibilities, local execution and remote execution.

190
00:18:21.550 --> 00:18:31.510
<v S01>Local execution is possible, using, of course, the CWL reference implementation called CWL tool, which you can install from PIP 

191
00:18:31.510 --> 00:18:40.850
<v S01>or you use S for an execute local, which is supporting a subset of the CWL implementation.

192
00:18:41.150 --> 00:18:48.930
<v S01>So it&#x27;s not 100% compliant for CWL tool, but should work for most cases.

193
00:18:49.350 --> 00:18:56.450
<v S01>This is here for the reason because CWL tool is not able to run on windows.

194
00:18:56.450 --> 00:19:07.390
<v S01>

195
00:18:56.450 --> 00:19:07.390
<v S01>So, if you have tested your workflow, a lot of time locally, you may want to use remote execution.

196
00:19:07.750 --> 00:19:18.170
<v S01>There we settle on the platform called reana, which is developed by CERN and you can use reana instance.

197
00:19:18.170 --> 00:19:28.430
<v S01>If you manage to get a log in or credentials on an instance with s4n execute remote, or of course the official reana client.

198
00:19:28.430 --> 00:19:29.440
<v S01>

199
00:19:29.440 --> 00:19:36.960
<v S01>There, to my knowledge are three publicly available instances, FAIRagro, CERN and PUNCH4NFDI.

200
00:19:36.960 --> 00:19:41.984
<v S01>However, CERN and PUNCH4NFDI may is harder to get an account on.

201
00:19:42.880 --> 00:19:49.800
<v S01>So the FAIRagro instance is operated by us or more specific by Xaver.

202
00:19:50.020 --> 00:19:54.300
<v S01>It is under the URL reana.bi.denbi.de.

203
00:19:54.740 --> 00:20:05.040
<v S01>And currently you have to ask Xaver for an account, but we are adding the FAIRagro Keycloak login in early 2026.

204
00:20:07.680 --> 00:20:09.984
<v S01>So remote execution, how does it work?

205
00:20:09.984 --> 00:20:11.680
<v S01>T

206
00:20:11.680 --> 00:20:20.090
<v S01>here is the s4n execute remote start command where you can enter your workflow CWL file and the inputs.

207
00:20:20.230 --> 00:20:27.210
<v S01>And then SciWIn Client communicates with reana&#x27;s rest API, which is on our reana Kubernetes cluster,

208
00:20:27.430 --> 00:20:37.730
<v S01>which is a five node Kubernetes cluster, which has about 280 CPUs, 640 gigabytes of RAM and five terabytes of storage available.

209
00:20:37.730 --> 00:20:51.520
<v S01>This gives you easier execution compared to reana client because SciWIn Client auto generates a lot of boilerplate config files you need for reana to work.

210
00:20:52.192 --> 00:20:57.820
<v S01>And also there are other subcommands other than start, I have at the top.

211
00:20:57.820 --> 00:21:01.400
<v S01>There is status, which queries the execution status.

212
00:21:01.760 --> 00:21:02.700
<v S01>There&#x27;s downloads.

213
00:21:02.960 --> 00:21:06.620
<v S01>So if the workflow finished running, you can download resulting files.

214
00:21:06.800 --> 00:21:11.820
<v S01>And there&#x27;s rocrate, which creates a provenance run crate from the execution results.

215
00:21:12.620 --> 00:21:17.300
<v S01>So provenance run crate is a profile of the workflow run RO-crate.

216
00:21:17.300 --> 00:21:24.990
<v S01>So now you have all your results and all your workflows finished.

217
00:21:25.230 --> 00:21:28.330
<v S01>So maybe you&#x27;re writing a README file or a paper

218
00:21:28.630 --> 00:21:30.710
<v S01>you may want to visualize your workflows.

219
00:21:31.290 --> 00:21:36.990
<v S01>So there is the s4n visualize command where you add the path to your workflow file.

220
00:21:36.990 --> 00:21:41.230
<v S01>And if you just use that, it gives you mermaid code.

221
00:21:41.390 --> 00:21:49.340
<v S01>Mermaid is a diagram language, which is compatible with GitHub markdown.

222
00:21:49.480 --> 00:21:55.500
<v S01>For example, you can put it in a README file and then it generates an image like the one on the top right.

223
00:21:55.500 --> 00:22:16.690
<v S01>And if you want to publish an image of the workflow in a paper you may use minus r dot and then you can pipe it into the graph with a dot executable and then redirect it into a SVG file to get a image that is looking like the one on the bottom right.

224
00:22:16.690 --> 00:22:29.400
<v S01>So now I will show you an overview of how SciWIn Client integrates into existing research data management infrastructure.

225
00:22:30.220 --> 00:22:35.420
<v S01>SciWIn Client operates on Git repository, which has a file system.

226
00:22:35.580 --> 00:22:38.100
<v S01>CWL also uses Docker.

227
00:22:38.260 --> 00:22:45.760
<v S01>And this optionally can be an Annotated Research Context or ARC. ARCs can be pushed, of course, to PLANTdataHub.

228
00:22:45.760 --> 00:23:02.144
<v S01>And as SciWIn Client uses GitHub or GitLab to install workflows from, you can also use ARCs from PLANTdataHub to combine into new ARCs, for example, because the PLANTdataHub uses GitLab.

229
00:23:02.816 --> 00:23:10.850
<v S01>We are planning currently to have automatic publishing on workflowhub.eu.

230
00:23:10.850 --> 00:23:17.630
<v S01>And also on the left side, you can see there is the execution possibility with reana.

231
00:23:17.890 --> 00:23:22.510
<v S01>This is not limited to the Denbi Cloud, reana instance of FAIRagro.

232
00:23:22.670 --> 00:23:27.930
<v S01>You can use any reana instance you find or get an account for.

233
00:23:28.130 --> 00:23:32.760
<v S01>Unfortunately reana isn&#x27;t able to build Docker files itself.

234
00:23:32.760 --> 00:23:42.620
<v S01>So the Docker file is built locally and then pushed to ttl.sh, which is a Ephemeral Docker registry.

235
00:23:42.920 --> 00:23:46.660
<v S01>So it just gets a UUID, so weird 

236
00:23:46.980 --> 00:23:49.600
<v S01>character string, 

237
00:23:49.940 --> 00:23:58.560
<v S01>and this vanishes after a few hours, so it&#x27;s ephemeral, as it says.

238
00:23:58.560 --> 00:24:06.380
<v S01>And reana is able to create Provenance Run Crate, as I said, which is a profile of workflow run RO-crate.

239
00:24:07.140 --> 00:24:10.730
<v S01>So what&#x27;s next? Ideas for the future.

240
00:24:10.950 --> 00:24:17.060
<v S01>As I already mentioned, we are planning to automate WorkflowHub.eu publications.

241
00:24:17.520 --> 00:24:22.800
<v S01>We want to work together with DATAplant on a better ARC Stack interop UX.

242
00:24:22.800 --> 00:24:31.510
<v S01>So we want to create a common user experience for both consortia.

243
00:24:31.690 --> 00:24:36.470
<v S01>We want to generate even more code.

244
00:24:36.670 --> 00:24:41.190
<v S01>So we also want to semi-automate the creation of Dockerfiles.

245
00:24:41.410 --> 00:24:44.550
<v S01>So researchers do not have to work with Dockerfiles.

246
00:24:45.230 --> 00:24:51.930
<v S01>And we are thinking about the integration of Tool Registry Service API for the install command,

247
00:24:51.930 --> 00:24:58.850
<v S01>which makes all workflows from WorkflowHub and also Dockstore available, which is about 3000 workflows.

248
00:24:59.570 --> 00:25:04.050
<v S01>And also we are planning a graphical user interface for SciWIn.

249
00:25:04.090 --> 00:25:06.490
<v S01>We may rebrand to SciWIn Studio then.

250
00:25:07.424 --> 00:25:15.760
<v S01>And for example, this is a screenshot of a software like Blender, which where you just have to click and connect

251
00:25:16.720 --> 00:25:22.660
<v S01>the connections of a workflow and then have your pipeline created visually.

252
00:25:22.660 --> 00:25:26.800
<v S01>With that, I thank you for your attention on this slide t

253
00:25:27.020 --> 00:25:28.500
<v S01>here are the most important links.

254
00:25:28.680 --> 00:25:32.800
<v S01>There is the download link to SciWIn client, which the QR code also leads to.

255
00:25:33.020 --> 00:25:37.240
<v S01>There&#x27;s documentation at fairagro.github.io and then SciWIn client.

256
00:25:37.500 --> 00:25:39.780
<v S01>And also there is an example repository.

257
00:25:42.304 --> 00:25:42.912
<v S01>So thank you.

258
00:25:46.528 --> 00:25:55.000
<v S00>Thank you, Jens, for this very great insight in SciWIn S4N.

259
00:25:55.680 --> 00:26:02.840
<v S00>For all of you who want to check back the charts, they will be available soon on our website.

260
00:26:02.920 --> 00:26:10.870
<v S00>As well as the video from the presentation we just witnessed.

261
00:26:10.870 --> 00:26:15.800
<v S01>So there is still some time for questions.

262
00:26:15.800 --> 00:26:16.085
<v S01>

