Víðarr Identifiers
Rather than using incrementing identifiers, Víðarr identifies workflow versions, workflow runs, and output files using SHA-256 hashes of key metadata. The assumption is that if two objects have the same hash, they must be equivalent. All workflow run matching is done by hash matching.
For all hashes, the hash is computed using a SHA-256 of the data described
below. All strings are converted to UTF-8 encoded bytes. Strings are not
permitted to contain the NUL
(zero) byte. Some hashes contain the IDs of
other hashes. The hashes are encoded as ASCII strings in lowercase hexadecimal
representation. All JSON objects have keys in alphabetical order.
Workflow Versions
A workflow version hash is present for each version of a workflow installed. Even if the same WDL file is installed under two different names, there will be two different workflow version hashes. It is computed as follows:
- name
NUL
- version
NUL
HEX_DIGITS(SHA256(
WDL file UTF-8 bytes))
JSON(
output parameters)
JSON(
input parameters)
- for filename, contents in accessory files; sorted by filename:
NUL
- filename
NUL
HEX_DIGITS(SHA256(
contents))
Workflow Runs
Each workflow run has a hash consisting of data that is considered to uniquely identify it but this does not include all information in a workflow run. That is, there are intentional hash collisions for different workflow runs.
- workflow-name
- for input in input-ids; sorted and unique
NUL
- hash from input
=~ vidarr:
server/
hash
- for provider, identifier in external-keys; sorted by provider, then identifier:
NUL
NUL
- provider
NUL
- identifier
NUL
- for name, value in labels; sorted by key:
NUL
- name
NUL
JSON(
value)
NUL
Output Analysis: Files
The files provisioned out are given the ID:
- workflow-run-identifier
BASENAME(
final output path)
Note that if the provisioning output workflow renames files, that is now the hash.
Output Analysis: URLs
The URLs provisioned out are given the ID:
- workflow-run-identifier
- URL
Nulls in Hashes
The NUL
characters are a kind of insurance against malicious names. Say the
hash was just did name followed by version, then foobar
+ 1.0.0
becomes
indistinguishable from foo
+ bar1.0.0
. Although no one would construct such
a name, but the nulls make it an easy way to prevent anyone from trying.