View on GitHub

vidarr

Analysis provenance tracking server

Loading and Unloading Data

It can be desirable to remove successfully completed workflow runs from the Víðarr database. Some reasons include:

Víðarr has a way to unload workflow runs from the database. These extracts can be used as archives or loaded into a Víðarr instance (either the same one or another at a later date). This can be used to split up a Víðarr instance, move or copy workflow runs between environments, create backups, or migrate workflow runs from another system into Víðarr.

Workflow runs are selected using a filter and then one of three operations happens:

The unload operation is always recursive since doing anything else would leave orphaned records in the database. Unloaded data is written to a file on the server and the file path is provided in the HTTP response while copy-out operations are provided in the HTTP response.

A request looks like:

{
  "filter": ...,
  "recursive": true
}

This can be sent to either the /api/copy-out or /api/unload endpoints. The unload endpoint ignores the value of the "recursive" property and always works recursively.

The output format is similar to /api/provenance with "versionPolicy": "ALL" in the request. It consists of a JSON object with the following fields:

{
  "workflows": [...],
  "workflowVersions": [...],
  "workflowRuns": [...]
}

The "workflows" are objects with two properties: "name", which is a string, and "labels", which is an object of label names and the type of the label values; the labels are in same format as the request to create a new workflow.

The "workflowVersions" are objects with a "name" and "version" property and all the properties used when adding a new workflow version using the /api/workflow/version endpoint.

All of the workflow runs in the request must be present in the workflow and workflow version definitions provided. Extra definitions may be provided. If a workflow is new to Víðarr, it will be created with max-in-flight at 0.

Unloaded data can then be inserted by pushing this object to /api/load. The workflows will be validated and installed if necessary. If there are any errors, the entire load will be abandoned.

During the load procedure, all the identifiers are verified. If the contents of the file have been manipulated, then the identifiers must also be updated. For information on how identifiers are calculated, see Víðarr identifiers.

An example load file is available in examples/loadable_workflows.json.