【9.3.5.5】Debugging Workflows

问题:

  • 如何检查CWL文件中的错误?
  • 如何获取更多信息以帮助解决错误?
  • 使用CWL时常见的错误消息是什么?

目标:

  • 检查CWL文件中的错误
  • 输出调试信息
  • 解释和修复常见的错误消息

在处理CWL工作流时,您可能会遇到错误。可能存在许多不同的错误。检查终端中的错误消息总是非常重要的,因为它会为您提供有关错误的信息。此错误消息将为您提供错误类型以及包含错误的代码行。其中一些错误将在本期节目中解释。

作为检查CWL脚本是否包含任何错误的第一步,您可以使用–validate标志运行工作流。

cwltool --validate CWL_SCRIPT.cwl

脚本可能已经过验证,但是仍然会出现错误。如果遇到错误,最佳做法是使用–debug标志运行工作流。这将为您提供有关您遇到的错误的详细信息。

cwltool --debug CWL_SCRIPT.cwl

一、YAML errors

首先,YAML语法中的错误。在编写一段代码时,很容易出错。

一些非常常见的YAML错误包括:

Tabs

使用制表符而不是空格。在YAML文件中,缩进是使用空格而不是制表符进行的。请下载并运行此示例,其中包含一个制表符。

cwltool tab-error.cwl workflow_input.yml

ERROR Tool definition failed validation:
while scanning for the next token
file:///tab-error.cwl:5:1:   found character '\t' that cannot start any token

字段名称类型 Field Name Typos

字段名称中的打字。例如,很容易忘记字段名称中的大写字母。字段名称中有拼写错误的错误将显示无效字段。

rna_seq_workflow_fieldname_fail.cwl

cwlVersion: v1.2
class: Workflow

inputs:
  rna_reads_fruitfly: File
  ref_fruitfly_genome: Directory

steps:
  quality_control:
    run: bio-cwl-tools/fastqc/fastqc_2.cwl
    in:
      reads_file: rna_reads_fruitfly
    out: [html_file]

  mapping_reads:
    requirements:
      ResourceRequirement:
        ramMin: 5120
    run: bio-cwl-tools/STAR/STAR-Align.cwl
    in:
      RunThreadN: {default: 4}
      GenomeDir: ref_fruitfly_genome
      ForwardReads: rna_reads_fruitfly
      OutSAMtype: {default: BAM}
      SortedByCoordinate: {default: true}
      OutSAMunmapped: {default: Within}
    out: [alignment]

  index_alignment:
    run: bio-cwl-tools/samtools/samtools_index.cwl
    in:
      bam_sorted: mapping_reads/alignment
    out: [bam_sorted_indexed]

outputs:
  qc_html:
    type: File
    outputsource: quality_control/html_file
  bam_sorted_indexed:
    type: File
    outputSource: index_alignment/bam_sorted_indexed

workflow_input_debug.yml

rna_reads_fruitfly:
  class: File
  location: rnaseq/GSM461177_1_subsampled.fastqsanger
  format: http://edamontology.org/format_1930  # FASTQ
ref_fruitfly_genome:
  class: Directory
  location: rnaseq/dm6-STAR-index

运行:

cwltool rna_seq_workflow_fieldname_fail.cwl workflow_input_debug.yml


ERROR Tool definition failed validation:
rna_seq_workflow_fieldname_fail.cwl:1:1:   Object `rna_seq_workflow_fieldname_fail.cwl` is not valid
                                          because
                                            tried `Workflow` but
rna_seq_workflow_fieldname_fail.cwl:35:1:     the `outputs` field is not valid because
rna_seq_workflow_fieldname_fail.cwl:36:3:       item is invalid because
rna_seq_workflow_fieldname_fail.cwl:38:5:         invalid field `outputsource`, expected one of:
                                                  'label', 'secondaryFiles', 'streamable', 'doc', 'id',
                                                  'format', 'outputSource', 'linkMerge', 'pickValue', 'type'

Variable Name Typos

变量名称中的打字。与字段名称中的拼写错误类似,在引用变量时很容易出错。这些错误将显示字段引用未知标识符。

rna_seq_workflow_varname_fail.cwl

cwlVersion: v1.2
class: Workflow

inputs:
  rna_reads_fruitfly: File
  ref_fruitfly_genome: Directory

steps:
  quality_control:
    run: bio-cwl-tools/fastqc/fastqc_2.cwl
    in:
      reads_file: rna_reads_fruitfly
    out: [html_file]

  mapping_reads:
    requirements:
      ResourceRequirement:
        ramMin: 5120
    run: bio-cwl-tools/STAR/STAR-Align.cwl
    in:
      RunThreadN: {default: 4}
      GenomeDir: ref_fruitfly_genome
      ForwardReads: rna_reads_fruitfly
      OutSAMtype: {default: BAM}
      SortedByCoordinate: {default: true}
      OutSAMunmapped: {default: Within}
    out: [alignment]

  index_alignment:
    run: bio-cwl-tools/samtools/samtools_index.cwl
    in:
      bam_sorted: mapping_reads/alignments
    out: [bam_sorted_indexed]

outputs:
  qc_html:
    type: File
    outputSource: quality_control/html_file
  bam_sorted_indexed:
    type: File
    outputSource: index_alignment/bam_sorted_indexed

运行:

cwltool rna_seq_workflow_varname_fail.cwl workflow_input_debug.yml

报错:

ERROR Tool definition failed validation:
rna_seq_workflow_varname_fail.cwl:8:1:  checking field `steps`
rna_seq_workflow_varname_fail.cwl:29:3:   checking object
                                          `rna_seq_workflow_varname_fail.cwl#index_alignment`
rna_seq_workflow_varname_fail.cwl:31:5:     checking field `in`
rna_seq_workflow_varname_fail.cwl:32:7:       checking object
                                              `rna_seq_workflow_varname_fail.cwl#index_alignment/bam_sorted`
                                                Field `source` references unknown identifier
                                                `mapping_reads/alignments`, tried
                                                file:///.../rna_seq_workflow_varname_fail.cwl#mapping_reads/alignments

二、接线错误 Wiring error

当您忘记将工作流步骤的输出添加到输出部分时,经常会出现连接错误。这不会导致错误消息,但目录中不会有任何输出。要获得所需的输出,您必须再次运行工作流。最佳实践是在运行脚本之前检查您的输出部分,以确保您想要的所有输出都在那里。

三、 类型不匹配 Type mismatch

当变量之间的类型不匹配时,就会发生类型错误。当您在inputs部分声明一个变量时,该变量的类型必须与YAML inputs文件中的类型和工作流步骤中使用的类型相匹配。发生此错误时显示的错误消息将告诉您发生不匹配的行。

rna_seq_workflow_type_fail.cwl

cwlVersion: v1.2
class: Workflow

inputs:
  rna_reads_fruitfly: int
  ref_fruitfly_genome: Directory

steps:
  quality_control:
    run: bio-cwl-tools/fastqc/fastqc_2.cwl
    in:
      reads_file: rna_reads_fruitfly
    out: [html_file]

  mapping_reads:
    requirements:
      ResourceRequirement:
        ramMin: 5120
    run: bio-cwl-tools/STAR/STAR-Align.cwl
    in:
      RunThreadN: {default: 4}
      GenomeDir: ref_fruitfly_genome
      ForwardReads: rna_reads_fruitfly
      OutSAMtype: {default: BAM}
      SortedByCoordinate: {default: true}
      OutSAMunmapped: {default: Within}
    out: [alignment]

  index_alignment:
    run: bio-cwl-tools/samtools/samtools_index.cwl
    in:
      bam_sorted: mapping_reads/alignment
    out: [bam_sorted_indexed]

outputs:
  qc_html:
    type: File
    outputSource: quality_control/html_file
  bam_sorted_indexed:
    type: File
    outputSource: index_alignment/bam_sorted_indexed

运行:

cwltool rna_seq_workflow_type_fail.cwl workflow_input_debug.yml

报错:

ERROR Tool definition failed validation:

rna_seq_workflow_type_fail.cwl:5:3:   Source 'rna_reads_fruitfly' of type "int" is incompatible
rna_seq_workflow_type_fail.cwl:12:7:   with sink 'reads_file' of type "File"
rna_seq_workflow_type_fail.cwl:5:3:   Source 'rna_reads_fruitfly' of type "int" is incompatible
rna_seq_workflow_type_fail.cwl:23:7:   with sink 'ForwardReads' of type ["File", {"type":
                                       "array", "items": "File"}]

四、格式错误

有些文件需要在YAML输入文件中指定特定的格式,例如RNA-seq分析中的fastq文件。如果未指定格式,则会发生错误。例如,您可以使用EDAM本体。

rna_seq_workflow_debug.cwl

cwlVersion: v1.2
class: Workflow

inputs:
  rna_reads_fruitfly: File
  ref_fruitfly_genome: Directory

steps:
  quality_control:
    run: bio-cwl-tools/fastqc/fastqc_2.cwl
    in:
      reads_file: rna_reads_fruitfly
    out: [html_file]

  mapping_reads:
    requirements:
      ResourceRequirement:
        ramMin: 5120
    run: bio-cwl-tools/STAR/STAR-Align.cwl
    in:
      RunThreadN: {default: 4}
      GenomeDir: ref_fruitfly_genome
      ForwardReads: rna_reads_fruitfly
      OutSAMtype: {default: BAM}
      SortedByCoordinate: {default: true}
      OutSAMunmapped: {default: Within}
    out: [alignment]

  index_alignment:
    run: bio-cwl-tools/samtools/samtools_index.cwl
    in:
      bam_sorted: mapping_reads/alignment
    out: [bam_sorted_indexed]

outputs:
  qc_html:
    type: File
    outputSource: quality_control/html_file
  bam_sorted_indexed:
    type: File
    outputSource: index_alignment/bam_sorted_indexed

workflow_input_undefined.yml

rna_reads_fruitfly:
  class: File
  location: rnaseq/GSM461177_1_subsampled.fastqsanger
ref_fruitfly_genome:
  class: Directory
  location: rnaseq/dm6-STAR-index

运行:

cwltool rna_seq_workflow_debug.cwl workflow_input_undefined.yml

报错:

ERROR Exception on step 'mapping_reads'
ERROR [step mapping_reads] Cannot make job: Expected value of 'ForwardReads' to have format http://edamontology.org/format_1930 but
  File has no 'format' defined: {
    "class": "File",
    "location": "file:///.../rnaseq/GSM461177_1_subsampled.fastqsanger",
    "size": 142867948,
    "basename": "GSM461177_1_subsampled.fastqsanger",
    "nameroot": "GSM461177_1_subsampled",
    "nameext": ".fastqsanger"
}

总结:

  • Run the workflow with the –validate option to check for errors
  • The –debug option will output more information
  • ‘Wiring’ errors won’t necessarily yield an error message

参考资料

药企,独角兽,苏州。团队长期招人,感兴趣的都可以发邮件聊聊:tiehan@sina.cn
个人公众号,比较懒,很少更新,可以在上面提问题,如果回复不及时,可发邮件给我: tiehan@sina.cn