Semi-Automatic Translation from CWL to Nextflow for Genomics Workflows

 
Martin L. Putra*
, 
In Kee Kim
,
Haryadi S. Gunawi
*
, Robert L. Grossman
*
2
CNT @ BIBE’23
 
CWL
Used in many production-grade genomics
workflows.
Has been around longer, wider user base
Decoupled
 from any workflow engine.
 
Nextflow
Rising in popularity
Coupled
 with its own workflow engine.
Native supports for 
parallel
 
workflow
execution.
3
CNT @ BIBE’23
Parallel Workflow Execution
Sequential Workflow Execution
 
Order of {2, 3, 4} might vary
depending on workflow engine.
Can be executed
concurrently.
4
CNT @ BIBE’23
Production
system using
CWL
 
Manual Translation
Dataflow programming model
Nextflow syntax & operators
Javascript 

 Groovy
Automate?
 
CNT @ BIBE’23
 
5
 
Manual translation is 
time-consuming
and requires 
domain-specific expertise.
No existing tools
 for automatic
translation.
We propose CNT
, the first semi-automatic
translator from CWL to Nextflow.
6
CNT @ BIBE’23
Production
system using
CWL
 
Our solution: CNT
C
wl-to-
N
extflow 
T
ranslator
Tool & workflow-level translation
Partially handle JavaScript
Automate?
7
CNT @ BIBE’23
 
High similarity
High coverage
High performance gain*
 
*
Evaluated against a sequential workflow engine.
8
 
Challenges
Fully Automatic Translation
Semi-automatic Translation
Evaluation
Conclusion
 
9
 
Challenges
Fully Automatic Translation
Semi-automatic Translation
Evaluation
Conclusion
10
CNT @ BIBE’23
Analysis
Step
Flow of
data
11
CNT @ BIBE’23
 
=
 
Step is a subworkflow.
 
=
 
Step is a single tool.
 
A step can be a (sub)workflow
12
CNT @ BIBE’23
13
CNT @ BIBE’23
 
… up to arbitrary depth.
2b
2c
2d
Another
subworkflow to
explore!
14
CNT @ BIBE’23
 
Difference in 
invocations:
CWL: 
named
 arguments.
Nextflow: 
positional 
arguments.
15
CNT @ BIBE’23
Difference in 
invocations:
CWL: 
named
 arguments.
Nextflow: 
positional 
arguments.
16
CNT @ BIBE’23
 
arguments:
  - valueFrom: |
      ${
        function to_rg() {
          var readgroup_str = "@RG";
          var keys = Object.keys(inputs.readgroup_meta).sort();
          for (var i = 0; i < keys.length; i++) {
            var key = keys[i];
            var value = inputs.readgroup_meta[key];
            if (key.length == 2 && value != null) {
              readgroup_str = readgroup_str + "\\t" + key + ":" + value;
            }
          }
          return readgroup_str
        }
 
Javascript
 
usage is complex and pervasive.
Translation to 
groovy
 or 
nextflow operators
.
 
Example of JavaScript in CWL
 
17
 
CNT @ BIBE’23
 
C1. 
Exploration & ordering of subworkflows.
C2. 
Ordering of input & output variables.
C3. 
Scripting language.
 
18
 
CNT @ BIBE’23
 
19
 
Challenges
Fully Automatic Translation
Semi-automatic Translation
Evaluation
Conclusion
 
20
 
CNT @ BIBE’23
 
21
 
CNT @ BIBE’23
Tool-level Translation
Graph-dependency
Analysis
Correctness Check
22
CNT @ BIBE’23
 
Gather all CWL files by doing 
recursive
exploration
.
Classify 
CommandLineTool vs Workflow files.
For each CommandLineTool, call 
tool-level-
translation module.
If Workflow, repeat recursive exploration.
23
CNT @ BIBE’23
24
CNT @ BIBE’23
25
CNT @ BIBE’23
 
26
 
CNT @ BIBE’23
 
[Tool-level Translation Module]
 
27
 
CNT @ BIBE’23
Tool-level Translation
Graph-dependency
Analysis
Correctness Check
28
CNT @ BIBE’23
 
CWL’s named-arguments invocation
abstract away positional information.
Insight
: possible to reconstruct given
suitable 
data structure
.
29
CNT @ BIBE’23
 
Create DAG data structure.
Input & output variable names as vertex
attribute.
Store both 
caller’s and callee’s perspective
.
Reconstruct type signature by
traversing edges and 
reconciling
caller’s/callee’s perspective.
 
 
30
 
CNT @ BIBE’23
Tool-level Translation
Graph-dependency
Analysis
Correctness Check
31
CNT @ BIBE’23
 
 Variable type is important for
intermediate 
data staging
.
Tracks
 files, ensuring proper variable
type (i.e. 
‘File’ or ‘Path’
).
 
32
 
Challenges
Automatic Translation
Semi-automatic Translation
Evaluation
Conclusion
33
CNT @ BIBE’23
 
 
Focus: 
JavaScript expressions
Regex-based
 automatic 
classifier
:
Pattern #1: 
Object attribute access.
Pattern #2: 
Object method call.
Pattern #3: 
‘Actual’ code block.
34
CNT @ BIBE’23
Focus: 
JavaScript expressions
Regex-based
 automatic 
classifier
:
Pattern #1: 
Object attribute access.
Pattern #2: 
Object method call.
Pattern #3: 
‘Actual’ code block.
<Auto
Translated>
JS Code
Automatic
Classifier
Uses non-exhaustive
mapping table
 
35
 
Challenges
Automatic Translation
Manual Translation
Evaluation
Conclusion
36
CNT @ BIBE’23
 
MD5 Similarity
Translation Coverage
Performance gain
37
CNT @ BIBE’23
 
Two production workflows:
GDC DNA-Seq & RNA-Seq alignment.
High MD5 similarity for both.
38
CNT @ BIBE’23
 
 73%-81% fully
automated.
Expected to reduce
~75% development
time.
39
CNT @ BIBE’23
 
Speedup:
Avg. of 52.5% for RNA-Seq
Avg. of 30% for DNA-Seq
CPU Utilization:
Avg. of 65% for RNA-Seq
Avg. of 25.5% for DNA-Seq.
*
Compared against cwltool, a sequential CWL workflow engine.
 
40
 
Challenges
Automatic Translation
Manual Translation
Evaluation
Conclusion
41
CNT @ BIBE’23
 
 The 
first semi-automatic translator
from CWL 
 Nextflow.
High translation 
accuracy
 and 
coverage
.
Potential to reduce
 development time
and increase job processing
throughput
.
 
 
 
42
 
Questions?
Questions?
 
44
 
CNT @ BIBE’23
 
45
 
CNT @ BIBE’23
Slide Note

[30 secs] [0-30]

Hello everyone. I am Martin Putra. In this talk I will present CNT, a tool to semi-automatically translate CWL, a popular workflow language used in many production systems, to Nextflow, a continuously growing one with attractive support for parallelization. This is a joint work with In Kee Kim from the University of Georgia, Haryadi Gunawi, and Robert Grossman from the University of Chicago. 

Embed
Share

Explore the CNT tool enabling semi-automatic translation from Common Workflow Language (CWL) to Nextflow for genomics workflows. Understand the benefits, challenges, and workflow execution aspects for efficient genomics pipeline development and automation.

  • Genomics Workflows
  • CNT Tool
  • CWL Translation
  • Nextflow Automation
  • Workflow Execution

Uploaded on Apr 07, 2024 | 14 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. CNT: Semi-Automatic Translation from CWL to Nextflow for Genomics Workflows Martin L. Putra*, In Kee Kim , Haryadi S. Gunawi*, Robert L. Grossman* *

  2. 2 CNT @ BIBE 23 Two popular workflow languages CWL Used in many production-grade genomics workflows. Has been around longer, wider user base Decoupled from any workflow engine. Nextflow Rising in popularity Coupled with its own workflow engine. Native supports for parallelworkflow execution.

  3. 3 CNT @ BIBE 23 Parallel Workflow Execution 2 1 5 3 4 Can be executed concurrently. Sequential Workflow Execution 1 3 2 5 4 Order of {2, 3, 4} might vary depending on workflow engine.

  4. 4 CNT @ BIBE 23 Let s use Nextflow Production system using CWL Uhh .. Sure Manual Translation Dataflow programming model Nextflow syntax & operators Javascript Groovy Automate?

  5. 5 CNT @ BIBE 23 The Need for Automatic Translator Manual translation is time-consuming and requires domain-specific expertise. No existing tools for automatic translation. We propose CNT, the first semi-automatic translator from CWL to Nextflow.

  6. 6 CNT @ BIBE 23 Let s use Nextflow Production system using CWL Oh, sure! Our solution: CNT Cwl-to-Nextflow Translator Tool & workflow-level translation Partially handle JavaScript Automate?

  7. 7 CNT @ BIBE 23 CNT High similarity High coverage High performance gain* *Evaluated against a sequential workflow engine.

  8. 8 Outline Challenges Fully Automatic Translation Semi-automatic Translation Evaluation Conclusion

  9. 9 Outline Challenges

  10. 10 CNT @ BIBE 23 Workflow as DAG 2 1 5 3 4 Analysis Step Flow of data

  11. 11 CNT @ BIBE 23 Challenge #1: Exploration & ordering of subworkflows. A step can be a (sub)workflow 2 1 5 3 4 =Step is a single tool. =Step is a subworkflow.

  12. 12 CNT @ BIBE 23 Challenge #1: Exploration & ordering of subworkflows. 2 1 5 3 4 =Step is a single tool. =Step is a subworkflow.

  13. 13 CNT @ BIBE 23 Challenge #1: Exploration & ordering of subworkflows. Another subworkflow to explore! 2 1 5 3 4 2c 2b 2a 2d up to arbitrary depth. =Step is a single tool. =Step is a subworkflow.

  14. 14 CNT @ BIBE 23 Challenge #2: Ordering of input & output variables. Difference in invocations: CWL: named arguments. Nextflow: positional arguments. 2 1 5 3 4

  15. 15 CNT @ BIBE 23 Challenge #2: Ordering of input & output variables. Difference in invocations: CWL: named arguments. Nextflow: positional arguments. Order affects invocation! CWL Nextflow 2 Step1: -out: 1a: X 1b: Y 1c: Z Process Step2 { file 2b file 2c var 2a } 1 5 3 4 Step2: -inp: 2b: 1b 2a: 1a 2c: 1c Step2(1b, 1c, 1a)

  16. 16 CNT @ BIBE 23 Challenge #3: Scripting Languages. Javascript usage is complex and pervasive. Translation to groovy or nextflow operators. arguments: - valueFrom: | ${ function to_rg() { var readgroup_str = "@RG"; var keys = Object.keys(inputs.readgroup_meta).sort(); for (var i = 0; i < keys.length; i++) { var key = keys[i]; var value = inputs.readgroup_meta[key]; if (key.length == 2 && value != null) { readgroup_str = readgroup_str + "\\t" + key + ":" + value; } } return readgroup_str } Example of JavaScript in CWL

  17. 17 CNT @ BIBE 23 Summary of Challenges C1. Exploration & ordering of subworkflows. C2. Ordering of input & output variables. C3. Scripting language.

  18. 18 CNT @ BIBE 23 CNT: Summary of Design Challenges Automatic/Manual Exploration & Ordering of Subworkflows Automatic #1 Ordering of Input/Output Variables Automatic #2 #3 Scripting Language Automatic, Manual

  19. 19 Outline Fully Automatic Translation

  20. 20 CNT @ BIBE 23 Fully Automatic Translation 1 2 3 Graph-dependency Analysis Correctness Check Tool-level Translation

  21. 21 CNT @ BIBE 23 Fully Automatic Translation 1 2 3 Graph-dependency Analysis Correctness Check Tool-level Translation

  22. 22 CNT @ BIBE 23 Tool-Level Translation Gather all CWL files by doing recursive exploration. Classify CommandLineTool vs Workflow files. CommandLineTool == single tool Workflow == subworkflow For each CommandLineTool, call tool-level- translation module. If Workflow, repeat recursive exploration. Correctness Check Tool-level Translation Fully Automatic Translation: Ch#1, Ch#2 Graph-dependency Analysis

  23. 23 CNT @ BIBE 23 Tool-Level Translation 2 1 5 3 4 Step Type Path 1 Tool < > 2 Workflow < > Correctness Check Tool-level Translation Graph-dependency Analysis =Step is a single tool. =Step is a subworkflow. Fully Automatic Translation

  24. 24 CNT @ BIBE 23 Tool-Level Translation 2 1 5 3 4 Step Type Path 1 Tool < > 2 Workflow < > Correctness Check Tool-level Translation Graph-dependency Analysis =Step is a single tool. =Step is a subworkflow. Fully Automatic Translation

  25. 25 CNT @ BIBE 23 Tool-Level Translation 2 2c 1 5 3 2b 2a 4 2d Step Type Path 1 Tool < > 2 Workflow < > 2a Tool < > 2b Tool < > 2c Workflow < > Correctness Check Tool-level Translation Graph-dependency Analysis 2d Tool < > Fully Automatic Translation

  26. 26 CNT @ BIBE 23 Tool-Level Translation [Tool-level Translation Module] More details on our paper! Correctness Check Tool-level Translation Graph-dependency Analysis Fully Automatic Translation

  27. 27 CNT @ BIBE 23 Fully Automatic Translation 1 2 3 Graph-dependency Analysis Correctness Check Tool-level Translation

  28. 28 CNT @ BIBE 23 Graph-dependency analysis CWL s named-arguments invocation abstract away positional information. Insight: possible to reconstruct given suitable data structure. 2 1 5 3 4 Correctness Check Tool-level Translation Graph-dependency Analysis Fully Automatic Translation

  29. 29 CNT @ BIBE 23 Graph-dependency analysis Create DAG data structure. Input & output variable names as vertex attribute. Store both caller s and callee s perspective. Reconstruct type signature by traversing edges and reconciling caller s/callee s perspective. Correctness Check Tool-level Translation Graph-dependency Analysis Fully Automatic Translation

  30. 30 CNT @ BIBE 23 Fully Automatic Translation 1 2 3 Graph-dependency Analysis Correctness Check Tool-level Translation

  31. 31 CNT @ BIBE 23 Correctness Check Variable type is important for intermediate data staging. Tracks files, ensuring proper variable type (i.e. File or Path ). Correctness Check Tool-level Translation Graph-dependency Analysis More details on our paper! Fully Automatic Translation

  32. 32 Outline Semi-automatic Translation

  33. 33 CNT @ BIBE 23 Semi-Automatic Translation Focus: JavaScript expressions Regex-based automatic classifier: Pattern #1: Object attribute access. Pattern #2: Object method call. Pattern #3: Actual code block. 1 Attribute Access Semi- automated handling <Auto Translated> JS Code 2 Method Call Automatic Classifier 3 Actual code block Manual with guideline Final Result

  34. 34 CNT @ BIBE 23 Semi-Automatic Translation Focus: JavaScript expressions Regex-based automatic classifier: Pattern #1: Object attribute access. Pattern #2: Object method call. Pattern #3: Actual code block. Uses non-exhaustive mapping table 1 Attribute Access Semi- automated handling <Auto Translated> JS Code 2 Method Call Automatic Classifier 3 Actual code block Manual with guideline Final Result

  35. 35 Outline Evaluation

  36. 36 CNT @ BIBE 23 Evaluation MD5 Similarity Translation Coverage Performance gain

  37. 37 CNT @ BIBE 23 MD5 Similarity Two production workflows: GDC DNA-Seq & RNA-Seq alignment. High MD5 similarity for both.

  38. 38 CNT @ BIBE 23 Translation Coverage 73%-81% fully automated. Expected to reduce ~75% development time.

  39. 39 CNT @ BIBE 23 Performance Gain* *Compared against cwltool, a sequential CWL workflow engine. Speedup: Avg. of 52.5% for RNA-Seq Avg. of 30% for DNA-Seq CPU Utilization: Avg. of 65% for RNA-Seq Avg. of 25.5% for DNA-Seq.

  40. 40 Outline Conclusion

  41. 41 CNT @ BIBE 23 Conclusion The first semi-automatic translator from CWL Nextflow. High translation accuracy and coverage. Potential to reduce development time and increase job processing throughput.

  42. 42 Thank you! Questions?

  43. Backup Slides

  44. 44 CNT @ BIBE 23

  45. 45 CNT @ BIBE 23

More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#