View on GitHub

vidarr

Analysis provenance tracking server

Overall Design and Goals

Niassa/SeqWare started off as 3 things: a LIMS, a workflow engine, and a metadata tracking system. We have replaced the LIMS with our Pinery interface over MISO and, previously, GeoSpiza. We have outsourced the workflow engine with the Broad’s Cromwell. Víðarr is our replacement for the metadata tracking.

It schedules workflows using a workflow engine and collects the output from these workflows and stores the metadata about these files and connections to the Pinery LIMS information. It supports using Cromwell as a workflow engine, with the future goal to support other workflow engines. It has to support provisioning out other kinds of data and associate LIMS metadata, with the future goal to provision out logs, QC information, and database fragments (i.e., ETL data). It supports communicating with multiple workflow engines at once so it can schedule on a local HPC and a cloud instance.

Design Summary

The Víðarr analysis tracking system has:

Components of Víðarr include:

Additional components include:

The goal for Víðarr web service is to:

Shesmu Integration

Shesmu needs file provenance and can use Cerberus as a plugin. Shesmu also needs to run workflows and can use the /api/submit endpoint for that purpose. It needs to know what it can run and uses /api/workflows to get the known workflow and /api/targets to know where it can run them; it generates an action definition for every valid workflow version and target combination. The submit action in Shesmu has commands to re-run failed workflows (Reattempt) and unload-and-rerun (Reprocess). For more details, see the Víðarr plugin for Shesmu.

How a workflow runs

Víðarr takes a submission request that includes:

Once received:

  1. The target and workflow name and version are checked. If not found or the workflow is not supported by the workflow engine, it is rejected.
  2. The parameters, metadata, labels, consumable resource values, and engine parameters are type checked.
  3. References to other Víðarr files used as input are resolved.
  4. External keys are checked.
  5. The identifier is calculated.
  6. The database is checked for the identifier. If it is found and in a failed state and the request asked for reattempting, the workflow is started. If it is found, the identifier is returned to the client. If the identifier is not found, the workflow run is started.

Once a workflow is started (or restarted), it goes through phases:

We keep running into issue of running out of temp disk space. Shesmu can throttle, but it’s often too late. The Víðarr server has a number of consumable resources (e.g., scratch disk) and then workflow runs can reserve a chunk of that resource. The reservation decreases the available count of that resource until the workflow is finished running. If there isn’t enough of the resource, it must wait until later. Consumable resources can also be used to block starting workflows in ways similar to Shesmu throttlers. The implementation is intentionally flexible.