View on GitHub

vidarr

Analysis provenance tracking server

Anatomy of a Submission

The following is a real submission request for bcl2fastq and a dissection of what is in the request.

{
  "arguments": {
    "bcl2fastq.basesMask": "y51,i8n*,i8n*,y51",
    "bcl2fastq.lanes": [
      2
    ],
    "bcl2fastq.mismatches": 1,
    "bcl2fastq.modules": "bcl2fastq/2.20.0.422 bcl2fastq-jail/3.1.2b barcodex-rs/0.1.2",
    "bcl2fastq.process.bcl2fastq": null,
    "bcl2fastq.process.bcl2fastqJail": null,
    "bcl2fastq.process.extraOptions": null,
    "bcl2fastq.process.ignoreMissingBcls": null,
    "bcl2fastq.process.ignoreMissingFilter": null,
    "bcl2fastq.process.ignoreMissingPositions": null,
    "bcl2fastq.process.memory": 14,
    "bcl2fastq.process.temporaryDirectory": "$TMP",
    "bcl2fastq.process.threads": null,
    "bcl2fastq.runDirectory": {
      "contents": {
        "configuration": "/.mounts/labs/prod/archive/A00469/200820_A00469_0115_BHTMWLDMXX",
        "externalIds": [
          {
            "id": "4703_2_LDI44042",
            "provider": "pinery-miso"
          }
        ]
      },
      "type": "EXTERNAL"
    },
    "bcl2fastq.samples": [
      {
        "acceptableUmiList": null,
        "barcodes": [
          "TGGCATGT-TGAAGACG"
        ],
        "inlineUmi": false,
        "name": "PBCM_0044_Ly_n_PE_3072005_CM_200820_A00469_0115_BHTMWLDMXX_2_TGGCATGT-TGAAGACG",
        "patterns": null
      }
    ],
    "bcl2fastq.timeout": 20
  },
  "attempt": 0,
  "consumableResources": {
      "priority": 1
    },
  "engineParameters": {
    "final_call_logs_dir": "/scratch2/groups/gsi/production/cromwell/cromwell-prod.hpc.oicr.on.ca_call_logs",
    "final_workflow_log_dir": "/scratch2/groups/gsi/production/cromwell/cromwell-prod.hpc.oicr.on.ca_workflow_logs"
  },
  "externalKeys": [
    {
      "id": "4703_2_LDI44042",
      "provider": "pinery-miso",
      "versions": {
        "pinery-hash-7": "cf4b332c847a9594463167453b993a285a11d7c13b29fcfe2fa951b2345ce17d",
        "shesmu-sha1": "15C99BE88D03CE10E575155DD3BCBF5BA7ED3D93"
      }
    }
  ],
  "labels": {},
  "metadata": {
    "bcl2fastq.fastqs": [
      {
        "fastqs": {
          "contents": [
            {
              "outputDirectory": "/oicr/data/archive/seqware/seqware_analysis_12/hsqwprod/"
            },
            [
              {
                "id": "4703_2_LDI44042",
                "provider": "pinery-miso"
              }
            ]
          ],
          "type": "MANUAL"
        },
        "name": "PBCM_0044_Ly_n_PE_3072005_CM_200820_A00469_0115_BHTMWLDMXX_2_TGGCATGT-TGAAGACG"
      }
    ]
  },
  "mode": "RUN",
  "target": "hpc",
  "workflow": "bcl2fastq",
  "workflowVersion": "3.1.3"
}

Starting at the end, "mode" property indicates whether we want to run the workflow. "DRY_RUN" and "VALIDATE" are also possible. Both will check that the workflow could be executed. "DRY_RUN" will also search the database to determine if the workflow has been previously run. Neither will start it.

The "workflow" and "workflowVersion" properties sets which workflow to execute. The workflow selected will set the "labels" required. bcl2fastq doesn’t require any labels, so this is an empty object. The labels must match the labels when the workflow was registered. The version determines what inputs and outputs the workflow requires and informs the "arguments" and "metadata". More details to follow.

The "target" determines the execution target on the Víðarr server. The "consumableResources", "arguments", "metadata" and "engineParameters" will have to match the target configuration.

The "attempt" field allows relaunching a failed workflow run. If this workflow run matches an existing workflow run and that run has either failed or not started (waiting on resources), then if "attempt" is one more than the previous attempt, the previous attempt will be discarded and the workflow retried with the arguments and metadata provided.

The workflow version defines what input it takes and what output it provides. The exact data that must be provided is defined by a combination of the target and the workflow version.

For most simple inputs, the data is exactly what is expected by the workflow (e.g., if the workflow requests an integer, the submission must provide an integer). The two special cases are files and directories. In this case, the submitter must provide information to the input provisioner plugin on how to fetch the file. The input can be a file output by another workflow in the Víðarr database by specifying the Víðarr ID. In this case "type":"external". For bcl2fastq, the input is not the output of an existing workflow, so "type":"EXTERNAL" is found in "arguments"."bcl2fastq.runDirectory". The "contents"."configuration" is the information required by the input provisioner. The "configuration"."externalIds" are the external IDs associated.

Víðarr is built to associate data with an external LIMS system. The "externalKeys" lists all the keys that are known to the workflow. A key is a provider + identifier + versions. The versions are used in matching (details in the architecture guide). All the input provider+identifiers must be in this list. For EXTERNAL inputs, the provider+identifiers must be listed with each input, just as in the bcl2fastq request above. For INTERNAL inputs, Víðarr knows the provider+identifier from the database.

All the output must also be associated with these external identifiers. Most workflows use "type": "ALL" to associate the outputs with all the provider+identifiers found in the input. Splitting workflows (root workflow including bcl2fastq and bam-merge-preprocessing), use "type" : "MANUAL" to assign the output appropriately. All keys must be accounted for in the output. The output provisioner determines what additional data must be provided to perform the output provisioning.