A Shesmu script contains:
The version can determine what language features are available and provides a mechanism to change syntax in the future. Currently, only one version is supported.
Version 1;
The input declaration determines the input format that will be read by olives in the file. This is the only required entry in a file.
Input format;
Pragmas modify the behaviour of the entire script, mostly to do with when it will execute.
Constants and functions are automatically imported from plugins and other files but can also be defined locally.
Olives then process the input data. Define olives are a reusable set of olive data.
Olives, define olives, functions, and constants may be mixed in any order.
After the Input
line, various script modifiers can be added.
Import
qname;
Access any qualified name by the final section (i.e., Import
std::string::to_path;
will make to_path
the same as std::string::to_path
).
Import
qname As
name;
Access a qualified name by a custom name (i.e., Import std::string::to_path
As pathify;
will make pathify
the same as std::string::to_path
).
Import
qname::{
name1 [As
name1P] [,
name2 …]};
Perform multiple of the above access patterns at once. That is:
Import std::string::{length As strlen, to_path};
is equivalent to:
Import std::string::length As strlen;
Import std::string::to_path;
Import
qname::*;
Access all children of a qualified name (i.e., Import std::string::*;
will
make to_path
the same as std::string::to_path
, length
the same as
std::string::length
and so on).
Timeout
integer ;
Stop the script after integer seconds. The usual integer suffixes, especially
minutes
and hours
can be added here. When a script runs over its time
budget, the Prometheus variable shesmu_run_overtime
will be set.
Frequency
integer ;
Run the script every integer seconds. The usual integer suffixes can be used, where seconds
is the default and others such as minutes
and hours
can be used.
RequiredServices
service1 [, service2, …] ;
Ensures that the olive will only run if the specified services are not throttled. Multiple services are separated by a comma.
Check
format Into
name =
collector Require
expr;
This prevents the script from running based on input data. It takes all the input data from the format format and sorts it into one giant group using collector to aggregate the records. Once done, it evaluates expr with name defined to that aggregate value. If expr is true, the script can run; otherwise, it will be blocked.
This can be useful to require temperamental data sources to have provided input:
Check unix_file Into c = Count Require c > 0;
Since tuple types can get unwieldy, a type alias can be created:
TypeAlias
name type;
This will make name available in all the places where types are permitted.
Note that all the variables are already available as variable_type
.
These are the olives, functions, and constants. Define olives, functions, constants, actions, refillers, and gangs are in separate namespaces, so it is possible to reuse the same name for any of these without an error.
Export
] name =
expression ;
Creates a new constant. The name cannot be used for any other constant. If
Export
is present, this constant will be available to other scripts as
olive::
script::
name.
Export
] Function
name(
type1 arg1[,
…])
expr;
Create a new function. The function must take at least one argument. The
possible types are defined below. If
Export
is present, this constant will be available to other scripts as
olive::
script::
name.
Export
] Define
name(
[type1 arg1[,
…]])
clauses ;
Create a new define olive. This is a section of olive that can be reused among different olives in the file. It is intended for when olives share similar logic. The define olive cannot be used after reshaping has been done, so it must occur early in the olive. If reshaping is required, write the define olives in a nested way.
Parameters are optional.
Olive
[Description "
info"
] [Tag
tagname1 [Tag
tagname2 …]] clauses terminal ;
Create a new olive that does something. The something is determined by terminal. Olives have optional descriptions and tags which will be displayed in the UI. For olives that produce actions, any tags will be added to the action and can be used for filtering.
Terminals determine what an olive will do.
Run
action tags With
param1 =
expr1[,
param2 =
expr2[,
…]];
Creates action to be scheduled and run. action determines which action will be run and what parameters are available. Optional parameters can be conditionally assigned:
param =
expr If
condition
Tags can also be attached to the action. These tags, unlike the ones at the start of the olive, are dynamically generated. This makes it possible to create tags based on the data. For instance, to tag action by project/customer. See Dynamic Tags.
Alert
label1 =
expr1[,
label2 =
expr2[,
…]] [Annotations
ann2 =
aexpr1[,
…]] For timeexpr;
Creates a Prometheus alert. According to Prometheus’s design, an alert is
defined by its labels, all of which must be strings. For additional data that
might change, use Annotations
, which are also string-valued. An alert has a
finite duration, after which it will expire unless refreshed. timeexpr
defines the number of seconds an alert should fire for. Every time the olive is
re-run the alert will be refreshed.
This may be used in Reject
and Require
clauses.
Refill
refiller With
With
param1 =
expr1[,
param2 =
expr2[,
…]];
Replace the contents of a database with the output from the olive. Each record the olive emits is another “row” sent to the database. How the refiller interprets the data and its behaviour is defined by the refiller.
Tags can be attached to an action based on the data in the olive. They can be any string. Duplicate tags are removed.
Tag
exprAdds the result of expr, which must be a string, to the tags associated with this action.
Tags
exprAdds the elements in the result of expr, which must be a list of strings, to the tags associated with this action.
An olive can have many clauses that filter and reshape the data. All clauses
can be preceded with Label "
text"
to have text appear in the dataflow
diagram instead of the name of the clause.
Dump
[Label
name1] expr1[,
[Label
name2] expr2[,
…]] To
dumperDump All To
dumperExports data to a dumper for debugging analysis. The expressions can be of any
type. If All
is used, all variables are dumped in alphabetical order. In
output some output formats (e.g., TSV) column order is preserved. Column
names can be provided with the Label
prefix to provide a name. If no column
name is provided, Shesmu will attempt to infer an “obvious” column name (the
variable name if a variable or a simple transformation of a variable). If no
column name is obvious and none is provided, Shesmu will create an arbitrary
name.
This may be used in Reject
and Require
clauses. This does not reshape the data.
Flatten
name In
exprCreates copies of a row with an additional variable name for each value in the list provided by expr. If there are no items in the list, the row is dropped.
This reshapes the data.
Group
By
discriminator1[,
…] [Where
condition] Into
collectionname1 =
collector1[,
…] [OnReject
reject1[ reject2[ …]] Resume
]Group
By
discriminator1[,
…] Using
grouper param =
expr1[,
…] [With
output[,
…]] [Where
condition] Into
collectionname1 =
collector1[,
…] [OnReject
reject1[ reject2[ …]] Resume
]Performs a grouping of the data. First, rows are collected in subgroups by
their discriminators. If Using
is provided, those subgroups are modified by
the grouper. Finally, all items in a subgroup are passed through the
collectors. The output will have all the discriminators and collectors as
variables.
Discriminators come in multiple forms:
Syntax | Behaviour |
---|---|
name = expr |
Compute the value from expr for each row and use it for grouping; assign it to name in the output. name can use destructuring. |
name = OnlyIf expr |
Compute the value from expr, which must be an optional value, for each row and, if it contains a value, use it for grouping; assign it to name in the output. name can use destructuring. |
name = Univalued expr |
Compute the value from expr, which must be a list, for each row and, if it contains a single value, use it for grouping; assign it to name in the output. name can use destructuring. |
name | Use an existing variable name for grouping and copy it to the output. |
@ gang |
Use all variables in a gang for grouping and copy them to the output. |
Custom groupers take parameters. Some parameters are per-row, which may use
variables, and some are fixed, which must use constants or parameters to a
define olive. Custom groupers may also define output variables. These are
available in the collectors. They have default names; if those names are a
problem, With
can be used to rename them.
The collectors aggregate from the values in a group. They are described in
another section. Each collector can have Where
filters that limit the
collected data. Optionally, a Where
filter can be applied to all the
collectors by providing condition.
Rows which are rejected are passed to the rejection handlers. These are
Monitor
or Dump
clauses or an Alert
terminal. Rejection handlers can only
access the discriminators.
This reshapes the data.
Join
outerkey To
input innerkeyIntersectionJoin
outerkey To
input innerkeyJoin
outerkey To Call
name(
args)
innerkeyIntersectionJoin
outerkey To Call
name(
args)
innerkeyDoes a join where incoming rows are joined against rows from the input data
source or the output of the Define
olive name. Names between the two data
sources must not overlap. Rows are joined if outerkey and innerkey match:
Operation | Outer Key | Inner Key | Behaviour |
---|---|---|---|
Join |
k | i | Matches if k = i. |
IntersectionJoin |
k | i | Matches if For x In k: Any x In i |
IntersectionJoin |
[ k] |
i | Matches if k In i |
IntersectionJoin |
k | [ i] |
Matches if i In k |
In Join
, keys are values that must match exactly. In IntersectionJoin
, the
keys are lists of values and the join occurs if any items found in both inner
and outer key lists. Consider a situation where a process outputs several files
and another process ingests a subset of them; a IntersectionJoin
could be
used to find output process that used some of the input.
If a set-to-single value join is required, use IntersectionJoin
and put the
single element in a list.
This reshapes the data.
LeftJoin
outerkey To
[Prefix
prefix] input innerkey [Where
condition] collectionname1 =
collector1 [,
…]LeftIntersectionJoin
outerkey To
[Prefix
prefix] input innerkey [Where
condition] collectionname1 =
collector1 [,
…]LeftJoin
outerkey To
[Prefix
prefix] Call
name(
args)
innerkey [Where
condition] collectionname1 =
collector1 [,
…]LeftIntersectionJoin
outerkey To
[Prefix
prefix] Call
name (
args)
innerkey [Where
condition] collectionname1 =
collector1 [,
…]Does a left-join operation between the current data and the data from the
input data format or the output of the Define
olive name. This is done
using a merge join where keys are computed for both datasets and then only
matching entries are processed. outerkey is the key on the incoming data and
innerkey is the key on the data being joined against. A tuple can be used if
joining on multiple keys is required. Each row in the outer data is treated as
a kind of group and the matching inner keys are processed through the
collectors. This means that outer data is used only once but inner data maybe
reused multiple times if multiple outer rows have the same key. Each collector
can have Where
filters that limit the collected data. Optionally, a Where
filter can be applied to all the collectors by providing condition.
When doing left join, there will likely be collisions between many variables,
including all the signatures. While it is possible to reshape the data to avoid
this conflict, the Prefix
option allows renaming the joined data rather than
the source data.
Operation | Outer Key | Inner Key | Behaviour |
---|---|---|---|
LeftJoin |
k | i | Matches if k = i. |
LeftIntersectionJoin |
k | i | Matches if For x In k_ : Any x In `_i |
LeftIntersectionJoin |
[ k] |
i | Matches if k_ In `_i |
LeftIntersectionJoin |
k | [ i] |
Matches if i In k |
In LeftJoin
, keys are values and must match exactly. In
LeftIntersectionJoin
, the keys are lists of values and the join occurs if any
items are found in both inner and outer key lists. Consider a situation where a
process outputs several files and another process ingests a subset of them; a
LeftIntersectionJoin
could be used to find output process that used some of
the input.
If a set-to-single value join is required, use LeftIntersectionJoin
and put
the single element in a list.
This reshapes the data.
Let
assignment1[,
assignment2[,
…]]Reshapes the data by creating new variables from existing expressions. There are several assignments available:
Syntax | Behaviour |
---|---|
name = expr |
Compute the value from expr and assign it to name. |
name | Copy an existing variable name without modification. |
@ gang |
Copy all variables in a gang without modification. |
name = OnlyIf expr |
Compute an optional value from expr; if it contains a value, assign it to name; if empty, discard the row. |
name = Univalued expr |
Compute a list from expr; if it contains exactly one value, assign it to name; otherwise discard the row. |
Prefix name, … With prefix |
Copy all the variables, but renaming them by adding the supplied prefix. This is useful for self-joins. |
This reshapes the data.
Monitor
metric "
help"
{
name =
expr1[,
…] }
Exports the number of rows as a Prometheus variable. metric must be unique
and shesmu_user_
metric will be the Prometheus variable name. This variable
will have help associated with it as the help text and the names define the
keys used. The expressions must return strings.
This may be used in Reject
and Require
clauses. This does not reshape the data.
Pick
Max
expr By
expr1[,
expr2[,
…]]Pick
Min
expr By
expr1[,
expr2[,
…]]Performs a grouping by expr1, expr2, … and then allows a single row with the largest or smallest value of expr to pass and discards the rest. If there is a tie, an arbitrary row with the largest or smallest value of expr will be passed on.
This does not reshape the data.
Reject
cond OnReject
reject1[ reject2[ …]] Resume
Filter rows from the input. If cond is false, the row will be kept; if true,
it will be discarded. This is the opposite of Where
. Rows which are rejected
are passed to the rejection handlers. These are Monitor
or Dump
clauses or
an Alert
terminal.
This does not reshape the data.
Require
name =
expr OnReject
reject1[ reject2[ …]] Resume
Evaluate expr, which must return an optional. If the result is empty, the row
will be discarded; if the optional has a value, this value will be assigned to
name. The name can use destructuring. Discarded rows are given to the reject
clauses which are Monitor
or Dump
clauses or an Alert
terminal.
This reshapes the data.
Where
exprFilter rows from the input. If expr is true, the row will be kept; if false,
it will be discarded. This is the opposite of Reject
.
This does not reshape the data.
(
[expr1[,
expr2[,
…]]])
Call a define olive. If the olive takes any parameters, they must be provided and cannot use any data from the input format. The define olive cannot be called after reshaping the data.
The define olive may reshape the data, so this rule will be considered to reshape the data based only if the define olive reshapes it.
In a grouping operation, a collector will see all the data and aggregate it into a resulting property.
Any
exprCheck if expr returns true for at least one row . If none are collected, the result is true.
All
exprCheck if expr is true for all rows. If none are collected, the result is true.
Count
Count the number of matched rows
Flatten
exprCollect all values into a list from existing lists (duplicates are removed).
Dict
keyexpr =
valueexprCollects the results into a dictionary. Duplicate values are resolved arbitrarily.
LexicalConcat
expr With
delimiterConcatenate all values, which must be strings, into a single string separated by delimiter, which must also be a string.
List
exprCollect all values into a list (duplicates are removed).
Max
exprCollect the largest value; if none are collected, the group is rejected.
Min
exprCollect the smallest; if none are collected, the group is rejected.
None
exprCheck if expr is false for all rows. If none are collected, the result is true.
PartitionCount
exprCollect a counter of the number of times expr was true and the number of
times it was false. The resulting value will be an object with two fields:
matched_count
with the number of rows that satisfied the condition and
not_matched_count
with the number that failed the provided condition
Sum
exprCompute the sum of the resulting value from expr, which must be an integer or floating point number.
Tuple
expr Require
countTake the inputs and put them into a tuple. The values must be ordered. Tuples have a defined number of elements, so there must be exactly the right number of items available. If there are count then a tuple of this length will be produced with all of the items in the input order. If the number of items is either too few or too many, an empty optional will be returned instead.
Depending on the situation, Skip
and Limit
can be used to trim the input
appropriately.
Univalued
exprCollect exactly one value; if none are collected, the group is rejected; if more than one are collected, the group is rejected. It is fine if the same value is collected multiple times.
Where
expr collectorPerforms filtering before collector.
{
name1 =
collector1,
name2 =
collector2,
… }
Performs multiple collections at once and converts the results into an object.
This can be very useful to share a Where
condition while collecting multiple
pieces of information.
`
collector `
This allows using optional values in other collectors. For instance, suppose
there is an optional number and the minimum is desired, one could write: `
Min x? `
.
There are special behaviours for how to handle records with missing data:
Where x != ` ` Min x Default
1234
; since x
will never have the empty optional, the default will never
be used.OnlyIf All
requires that no empty input makes it to the collector; if any
optional input is found, the output from the collector is replaced by the
empty optional. So, if v = OnlyIf All `Min x?`
was given `3`
and
` `
, it would produce v == ` `
.OnlyIf Any
requires that one non-empty input makes it to the collector; if
no optional input is found, the output from the collector is replaced by the
empty optional. So, if v = OnlyIf Any `Min x?`
was given `3`
and
` `
, it would produce v == `3`
, but ` `
and ` `
would
produce v == ` `
.Require All
requires that no empty input makes it to the collector; if any
optional input is found, the group will be rejected. So, if
v = Require All `Min x?`
was given `3`
and ` `
, the group
would not be present in the output.Require Any
requires that one non-empty input makes it to the collector; if
no optional input is found, the group will be rejected. So, if
v = Require Any `Min x?`
was given `3`
and ` `
, it would
produce v == 3
, but if the input were ` `
and ` `
, the group
would not be present in the output.Shesmu has the following expressions, for lowest precedence to highest precedence.
Switch
refexpr (When
testexpr Then
valueexpr)* Else
altexprCompares refexpr to every testexpr for equality and returns the matching valueexpr. If none match, returns altexpr. The altexpr and every valueexpr must have the same type. The refexpr and every testexpr must have the same type, but not necessarily the same type as the altexpr and valueexpr.
If
testexpr Then
trueexpr Else
falseexprEvaluates testexpr and if true, returns trueexpr; if false, returns falseexpr. testexpr must be boolean and both trueexpr and falseexpr must have the same type.
IfDefined
tests Then
trueexpr Else
falseexprPerforms a conditional compilation. The tests are a comma-separated list of
constant names or Function
+ function names. If all of these items are
defined, trueexpr is used; otherwise, falseexpr is used. It’s important to
note that, unlike If
, this is a compile-time decision. Therefore, trueexpr
and falseexpr don’t have to return the same type and the unused path can
depend on constants and functions that are not defined.
This expression is intended for use with the simulator’s constants. This allows embedding logic like:
Where IfDefined shesmu::simulator::run
Then shesmu::simulator::run == run_name
Else True
to allow the simulator to be used as a diagnostic tool.
For
var In
expr:
modifications… collectorTakes the elements in a dictionary, list, JSON blob, or optional and process them using the supplied modifications and then computes a result using the collector. The modifications and collectors are described below.
expr Type | x Type | Operation |
---|---|---|
[ t] |
t | Processes each item in the list. |
` t ` |
t | If the optional contains a value, process it; otherwise act like the empty list has been provided. |
k -> v |
{ k, v} |
Process each pair of items in a dictionary. |
json |
json |
If the type is a JSON array, use the elements; if a JSON object use the values. Otherwise, acts as if the empty list. |
Any scalar JSON value is treated as an empty collection.
For
var Fields
expr:
modifications… collectorTakes the properties in a JSON object and process them using the supplied modifications and then computes a result using the collector. The modifications and collectors are described below.
Any scalar or array JSON value is treated as an empty collection. var will be a tuple of the property name and value.
For
var From
startexpr To
endexpr:
modifications… collectorIterates over the range of number from startexpr, inclusive, to endexpr, exclusive, and process them using the supplied modifications and then computes a result using the collector. The modifications and collectors are described below.
For
var Splitting
expr By /
regex/
flags:
modifications… collectorTakes the string expr and splits it into chunks delimited by regex and then processes them using the supplied modifications and then computes a result using the collector. The modifications and collectors are described below.
flags sets the behaviour of the regular expression. For details, see regular expression flags.
For
var Zipping
expr With
expr:
modifications… collectorThis takes two expressions, which must be lists containing tuples. The first elements of each tuple, which must be the same type, are matched up and then the tuples are joined and iterated over. The modifications and collectors are described below.
Since entries might not match, the non-index elements in the tuple are converted to optionals.
So, For {index, left, right} Zipping [ {"a", 1}, {"b", 2} ] With [ {"a", True} ]:
will produce:
index |
left |
right |
---|---|---|
"a" |
`1` |
`True` |
"b" |
`2` |
` ` |
Match refexpr (When algmatch Then valueexpr)* (Else altexpr |
Remainder ( name) altexpr)? |
Allows separating the algebraic or optional value returned by refexpr and
accessing its contents. An optional value is treated as an algebraic value with
NONE
and SOME{x}
entries. A When
branch can be provided for every
possible algebraic type returned by refexpr. If all possible types are
matched, the matching is exhaustive. If the matching is not exhaustive, the
remaining cases can be handled via Else
or Remainder
. Else
allows an
expression to be used in all other cases, much like the Else
in a Switch
.
Remainder
provides access to the case being handled.
For example:
Function analysis_for_project(string project)
Switch project
When "a" Then CANCER {"hg38"}
When "b" Then CANCER {"hg19"}
When "c" Then VIRAL {"hpv", "hg19"}
When "d" Then VIRAL {"hpv", "hg19"}
Else SEQUENCING_ONLY;
Function reference_for_analysis(CANCER{string} | VIRAL{string, string} analysis)
# Match is exhaustive, so no Else/Remainder
Match analysis
When CANCER{genome} Then genome
When VIRAL{_, genome} Then genome;
...
# Determine if this olive should run on this data; use Else to cover other cases
Where Match analysis_for_project(project)
When CANCER {_} Then True
Else False
...
...
Let
project, sample,
reference = OnlyIf
# We remove the SEQUENCING_ONLY case and pass the other values to reference_for_analysis
Match analysis_for_project(project)
When SEQUENCING_ONLY Then ``
Remainder (a) `reference_for_analysis(a)`
...
Two special pieces of syntax are allowed in When
:
When
NAME _
will match the name and discard the value’s contentsWhen
NAME *
will match the name and turn all fields in an object algebraic value into variablesFor details on algebraic values, see Algebraic Values without Algebra.
Order
exprRearrange the values in a tuple in ascending order. The types of the values
must be homogenous and orderable. For example that Order {"b", "c", "a"}
would result in {"a", "b", "c"}
. While Order {"a", 1}
is an error.
This can be useful to ensure ranges provided by user data are in order:
Begin
{min, max} = Order {min_from_user, max_from_user};
Return max - min;
End
As
typeConvert a value to or from JSON. If type is json
, then the result from
expr will be converted to JSON in the Shesmu-standard way. If type is any
other type, then expr must be a json
value and it will be converted from
JSON to the matching Shesmu type. Since the conversion from JSON to Shesmu
cannot be guaranteed, it will return an optional of type. To create a JSON
null
value, use ` ` As json
.
Begin
name0 =
expr0;
[ name1 =
expr1;
]
[…]
Return
expr;
End
Creates local variables in name0 by evaluating expr1. These variables are
then accessible in _expr1_
and so on. Finally expr is evaluated with all
the defined names and its result is used. The names can use destructuring.
Tabulate
name0 = (
expr00 ,
expr01 ,
… );
[ name1 = (
expr11 ,
expr11 ,
… );
]
[…]
End
Performs an order-sensitive matched assignment. This language construct is
useful for generating data the Vidarr retry
types. Suppose a Vidarr job with
two fields: memory and time, and the job should be run with an increasing
amount of memory and time if it fails. This could be accomplished as follows:
Run vidarr::production::hpc::some_job With
arguments =
Begin
{; memory, timeout} = Tabulate
memory = 5Gi, 10Gi;
timeout = 1hours, 6hours;
End;
Return {
foo__memory = memory,
foo__files = files,
foo__timeout = timeout,
foo__modules = "foo/1.2.3"
};
End,
...
This construct checks that all the names have the same number of values (two in this case) and turns each of them into an ordered dictionary with the keys as strings containing numbers (the format expected by Vidarr).
The names can be destructuring:
Run vidarr::production::hpc::some_job With
arguments =
Begin
{; foo_memory, foo_timeout; bar_timeout} = Tabulate
{foo_memory, foo_timeout} = {5Gi, 1hours}, {10Gi, 6hours};
bar_timeout = (30mins, 90mins);
End;
Return {
foo__memory = foo_memory,
foo__files = files,
foo__timeout = foo_timeout,
foo__modules = "foo/1.2.3",
foo__bar__timeout = bar_timeout
};
End,
...
Default
defaultComputes an optional value using expr; if this value is empty, returns default. expr must be the optional version of expr.
For details on optional values, see the Mandatory Guide to Optional Values.
||
exprLogical short-circuiting or
. If operands are boolean, the result is boolean.
If both are optionals of the matching type, if the first optional has a value, returns that optional; otherwise the second.
&&
exprLogical short-circuiting and
. Both operands must be boolean and the result is boolean.
==
exprCompare two types for equality. This is supported for all types. For tuples, the values in the tuples must be the same. For lists, the items must be the same, but the order is not considered.
!=
exprCompare two values for inequality. This is the logical complement to ==
.
<
expr<=
expr>=
expr>
exprCompare two values for order. This is only defined for integers and dates. For dates, the lesser value occurs temporally earlier.
~ /
re/
flagsCheck whether expr, which must be a string, matches the provided regular expression.
flags sets the behaviour of the regular expression. For details, see regular expression flags.
!~ /
re/
flagsCheck whether expr, which must be a string, does not match the provided regular expression.
flags sets the behaviour of the regular expression. For details, see regular expression flags.
=~ /
re/
flagsMatches expr, which must be a string, against the provided regular expression and returns a tuple of the values of each capture group. Since individual capture groups may be missing, this returns an optional tuple of optional strings. If the outer optional is the missing value, then the regular expression failed to match. If any element of the tuple is the missing value, then that capture group did not match.
flags sets the behaviour of the regular expression. For details, see regular expression flags.
+
exprAdds two values.
Left | Right | Result | Description |
---|---|---|---|
integer |
integer |
integer |
Summation |
float |
integer |
float |
Summation |
integer |
float |
float |
Summation |
float |
float |
float |
Summation |
date |
integer |
date |
Add seconds to date |
path |
path |
path |
Resolve paths (concatenate, unless second path starts with / ) |
path |
string |
path |
Append component to path |
string |
string |
string |
Concatenate two strings |
string |
integer |
string |
Concatenate a string and an integer value by first converting it |
string |
float |
string |
Concatenate a string and a floating-point value by first converting it |
string |
date |
string |
Concatenate a string and a date value by first converting it |
string |
path |
string |
Concatenate a string and a path value by first converting it |
[ x] |
[ x] |
[ x] |
Union of two lists (removing duplicates) |
[ x] |
x | [ x] |
Add item to list (removing duplicates) |
{ a1, …, an} |
{ b1, …, bn} |
{ a1, …, an, b1, …, bn} |
Concatenate two tuples |
{ fa1= a1, …, fan= an} |
{ fb1= b1, …, fbn= bn} |
{ fa1= a1, …, fan= an, fb1= b1, …, fbn= bn} |
Merge two objects (with no duplicate fields) |
-
exprSubtracts two values.
Left | Right | Result | Description |
---|---|---|---|
integer |
integer |
integer |
Difference |
float |
integer |
float |
Difference |
integer |
float |
float |
Difference |
float |
float |
float |
Difference |
date |
integer |
date |
Subtract seconds to date |
date |
date |
int |
Difference in seconds |
[ x] |
[ x] |
[ x] |
Difference of two lists (first list without items from second) |
[ x] |
x | [ x] |
Remove item from list (if present) |
*
exprMultiplies two values.
Left | Right | Result | Description |
---|---|---|---|
integer |
integer |
integer |
Multiplication |
float |
integer |
float |
Multiplication |
integer |
float |
float |
Multiplication |
float |
float |
float |
Multiplication |
string |
integer |
string |
Repeats string a specified number of times |
/
exprDivides two values.
Left | Right | Result | Description |
---|---|---|---|
integer |
integer |
integer |
Division |
float |
integer |
float |
Division |
integer |
float |
float |
Division |
float |
float |
float |
Division |
%
exprComputes the remainder of diving the first value by the second, both of which must be integers.
In
haystackDetermines if the expression needle is present in the haystack and returns the result as a boolean. needle may be any type, but haystack must be either a list of the same type or a dictionary with keys of the same type.
?
This must be used inside optional creation. Evaluates expr, which must have an optional type, and provides the inner (non-optional) value inside. If the expression has an empty optional, the entire optional creation will be the empty optional.
These may be nested for function calls on optional values. For example:
x = `foo(x?)? + 3`
For details on optional values, see the Mandatory Guide to Optional Values.
!
exprCompute the logical complement of the expression, which must be a boolean.
-
exprComputes the arithmetic additive inverse of the expression, which must be an integer.
Count
exprCounts the number of elements expr, which must be a list.
ConvertWdlPair
exprWDL has a pair type, Pair[X, Y]
, which can be represented in Shesmu two ways:
as a tuple, {X, Y}
; or as an object, {left = X, right = Y}
. The tuple form
better matches how pairs are written in WDL, while the object better matches
how pairs are encoded as JSON. This function converts between the two
representations.
`
expr `
Puts the value of expr in an optional. In expr, the ?
suffix maybe used to apply changes to the entire optional.
For example:
Begin
x = `3`;
Return `x? * 2`;
End
In this example, the x?
will get the value inside the variable x
, which may
be missing. If it is missing, the block will return an empty optional;
otherwise it will return an optional containing the original value multiplied
by 2.
` `
Creates an optional that contains no value.
For details on optional values, see the Mandatory Guide to Optional Values.
[
n ]
Extracts an element from a tuple (or integer-indexed map). n is an integer that specifies the zero-based index of the item in the tuple. The result type will be based on the type of that position in the tuple. If n is beyond the number of items in the tuple, an error occurs.
The expr can also be an optional of a tuple. If it is, the result will be an optional of the appropriate type.
[
indexexpr ]
Extracts the value from a dictionary. The resulting value will always be optional in case the key specified by indexexpr is missing.
The expr can also be an optional of a dictionary.
.
fieldExtracts a field from a named tuple or JSON object. field is the name of the
field. The result type will be based on the type of that field in the named
tuple or a JSON blob when accessing a JSON blob. If field is not in the named
tuple, an error occurs. If field is not in the JSON blob (or applied to a
scalar or array), the result is a JSON null
value.
The expr can also be an optional of a named tuple or JSON object. If it is, the result will be an optional of the appropriate type.
.{
field1,
field2 ,
… }
Extracts multiple fields from a named tuple and constructs a tuple with the results. The result type will be based on the type of that field in the named tuple. If field is not in the named tuple, an error occurs.
The expr can also be an optional of a named tuple. If it is, the result will be an optional of the appropriate type.
ActionName
Get the name of the action being executed as a string?
. In the case of
Refill
and Alert
olives, this will be the missing optional value.
{
expr,
expr,
…}
{@
name}
{
field =
expr,
field =
expr,
…}
Shesmu supports creating algebraic values. The name of an algebraic type is a
combination of uppercase letters, digits, and underscore. It must start with an
uppercase letter and be at least two characters. Algebraic values come in three
types: ones which contain no information (and work something like an enum
in
other languages), types that are associated with a sequence of values, much
like a tuple, and ones which contained named fields, much like a named
tuple/object. It is also possible to use a gang to create a tuple-like
algebraic value.
For details on algebraic values, see Algebraic Values without Algebra.
Date
YYYY-
mm-
ddDate
YYYY-
mm-
ddT
HH:
MM:
SSZ
Date
YYYY-
mm-
ddT
HH:
MM:
SS+
zzDate
YYYY-
mm-
ddT
HH:
MM:
SS-
zzEpochSecond
sEpochMilli
mSpecifies a date and time. If the time is not specified, it is assumed to be midnight UTC.
{
expr,
expr,
…}
Creates a new tuple with the elements as specified. The type of the tuple is determined based on the elements.
Instead of an expression to create a single element in a tuple, a ...
expr
can be used to insert all the elements in a tuple inline into the new tuple.
{
field =
expr,
field =
expr, =
name ... [
; (_var_ |
@_gang_) ...]
}`Creates a new named tuple with the fields as specified. The type of the named tuple is determined based on the elements.
Instead of field =
expr, a ...
expr can be used and this will copy all
the elements in expr, which must be an object. If some fields are to be
excluded, use the form: ...
expr Without
field1 field2 …
A field can also be created from a variable of the same name by placing the
name after a ;
. For example { a = 1; b }
is short hand for { a = 1, b = b
}
. Named fields can be ommited if there are none (i.e., {; b, c}
is the
short hand for {b = b, c = c}
). If a gang is used here, this will create all
the members of the gang as fields.
{@
name}
Creates a new tuple with the elements as specified in the gang name.
[
expr,
expr,
…]
Creates a new list from the specified elements. All the expressions must be of the same type.
Dict {
keyexpr =
valueexpr,
… }
Creates a new dictionary from the specified elements. All keys must be the same type and all values must be the same type. If duplicate keys are present, one will be selected arbitrarily.
Instead of keyexpr =
valueexpr, a ...
expr can be used and this will
copy all the elements in expr, which must be a dictionary. If some entries
are to be excluded or transformed, use a For ... Dict
to preprocess the
dictionary.
'
path'
Paths are UNIX-like paths that can be manipulated. They may contain \'
if necessary.
"
parts"
Specified a new string literal. A string may contain the following special items in addition to text:
\\t
for a tab character\\n
for a new line character\\{
for an open brace character{
expr}
for a string interpolation; the expression must be a string, integer, or date{
expr:
n}
for a zero-padded integer string interpolation; the expression must be an integer and n is the number of digits to pad to{@
name}
interpolate a name from a gang; the variables in the gang must be strings and integers(
expr)
A subexpression.
An integer literal. Integer may be suffixed by one of the following multipliers:
Unit | Multiplier |
---|---|
G | 1000^3 |
Gi | 1024^3 |
M | 1000^2 |
Mi | 1024^2 |
k | 1000 |
ki | 1024 |
mins | 60 |
hours | 3600 |
days | 86400 |
weeks | 604800 |
True
False
The boolean true and false values, respectively.
Location
This creates a string containing the scripts source path, line, column, and hash. This is meant to help locate the originating olive in alerts and other output.
(
expr,
expr,
…)
Call a function. Functions are provided by external services to Shesmu and some are provided as tables of values.
The value of a variable. There are different kinds of variables in Shesmu:
Define
olivesMap
, Reduce
, Filter
)Only stream variables may be used as discriminators in Group
clauses.
Distinct
Discards any duplicate items in the list.
Let
x =
exprReplaces each item in the list with the value computed by expr. The values will be named x in the downstream operations.
Flatten (
x In
expr modifications )
Flatten (
x Fields
expr modifications )
Flatten (
x From
startexpr To
endexpr modifications…)
Flatten (
x Splitting
expr By /
regex/
flags modifications )
Performs nested iteration in the same was as For
. The variable name
available in the downstream operations is x. Additional list modification can
also be applied. The additional operations inside the brackets can also see the
outer variable.
Where
exprEliminates any item in the list where expr evaluates to false.
Limit
exprTruncates the list after the number of items specified by expr, which must return an integer. The list must already be sorted.
Skip
exprDiscards the number of items specified by expr, which must return an integer, from the beginning of the list. The list must already be sorted.
Sort
exprSorts the items in a list based on an integer or date returned by expr.
Reverse
Reverses the items in a list. The list must already be sorted.
Subsample
(subsampler, subsampler, subsampler, …)Perform sampling on items in a list based on the given subsamplers (the order matters). The list must already be sorted.
For example: Subsample(Fixed 1, Squish 5)
will first select the first item and then randomly select five more items in the rest of the list.
Fixed
integerSelect the first integer items in a sorted list.
Fixed
integer While
conditionSelect the first integer items in a sorted list while condition is evaluated to be true.
Squish
integerRandomly select integer items from a sorted list.
Count
Returns the number of items in the list.
First
exprReturns the first expr in the list or an empty optional if no items are present.
Since this returns optional, it may be useful to chain with Default
.
For details on optional values, see the Mandatory Guide to Optional Values.
LexicalConcat
expr With
delimexprFixedConcat
expr With
delimexprCreates a string from expr, which must return a string, for each item in the list separated by the value of delimexpr, which must also be a string.
LexicalConcat
sorts the strings lexicographically before joining.FixedConcat
assumes the strings are sorted by a Sort
operation before
joining.List
exprEvaluates expr for every item and collects all the unique into a list.
Dict
keyexpr =
valueexprEvaluates keyexpr and valueexpr for every item and collects all the results into a dictionary. Duplicate values are resolved arbitrarily.
Max
sortexprMin
sortexprFinds the minimum or maximum item in a list, based on the sortexpr, which must be an integer or date. If the list is empty, an empty optional is returned.
Since this returns optional, it may be useful to chain with Default
.
For details on optional values, see the Mandatory Guide to Optional Values.
None
exprAll
exprAny
exprChecks whether none, all, or any (some) of the items in the list meet the condition specified in expr, which must return a Boolean.
{
name1 =
modifications… collector,
name1 =
modifications… collector,
… }
Products an object. Each field in the object is made by sending the same items through individual collectors. Consider something like:
For x In xs: Where x > 5 { count = Count, sum = Sum x }
PartitionCount
exprProduces an object with two field: matched_count
is the number of items for
which expr was true, the not_matched_count
is the number of items for which
expr was false.
Reduce(
a =
initialexpr )
exprPerforms a reduction operation on all the items in the list. a is the accumulator, which will be returned, which is initially set to initialexpr. For every item, expr is evaluated with a set to the previously returned value.
Sum
exprEvaluates expr for every item and compute the sum of all the results. expr must return an integer or a floating-point number.
Table
name =
value,
… With
formatThis collects items into a table and formats that table as a string. This can be useful for creating HTML or Markdown tables for inserting into JIRA. The name, which must evaluate to a string, will be the name of the column, and value, which must also produce a string, will be the contents of that column for every item. The format determines how the text is laid out. It is an object with the following properties:
data_start
: the leader for each rowdata_separator
: the text to place in between inner columnsdata_end
: the trailer for each rowheader_start
: the leader for first rowheader_separator
: the text to place in between inner columns of the first rowheader_end
: the trailer for the first rowheader_underline
: optional text to add on the second line for each columnFor a few common formats, this object would be defined as:
html = {
data_start = "<tr><td>",
data_separator = "</td><td>",
data_end = "</td></tr>",
header_start = "<tr><th>",
header_separator = "</th><th>",
header_end = "</th></tr>",
header_underline = ``
}
markdown = {
data_start = "|",
data_separator = "|",
data_end = "|",
header_start = "|",
header_separator = "|",
header_end = "|",
header_underline = `"|---"`
}
jira = {
data_start = "|",
data_separator = "|",
data_end = "|",
header_start = "||",
header_separator = "||",
header_end = "||",
header_underline = ``
}
Univalued
exprEvaluates all expr for each item in the list and returns it if all are the same.
If they are different or there are no items, an empty optional is returned.
Since this returns optional, it may be useful to chain with Default
.
For details on optional values, see the Mandatory Guide to Optional Values.
There are a small number of types in the language, listed below. Each has syntax as it appears in the language and a descriptor that is used for machine-to-machine communication.
Name | Syntax | Descriptor |
---|---|---|
Integer | integer |
i |
Float | float |
f |
String | string |
s |
Boolean | boolean |
b |
Date | date |
d |
List | [ inner] |
a inner |
Empty List | [] |
A |
Tuple | { t1, t2, …} |
t n t1 t2 Where n is the number of elements in the tuple. |
Object | { field1 = t1, field2 = t2, …} |
o n field1$ t1 field2$ t2 Where n is the number of elements in the tuple. |
Optional | inner? |
q inner or Q |
Path | path |
p |
JSON | json |
j |
Algebraic | NAME | u1 NAME$t01 |
Algebraic | NAME { t1, t2, …} |
u1 NAME$t n t1 t2 Where n is the number of elements in the tuple. |
Algebraic | NAME { field1 = t1, field2 = t2, …} |
u1 NAME$o n field1$ t1 field2$ t2 Where n is the number of elements in the tuple. |
All the variables are already available as variable_type
.
For details on optional values, see the Mandatory Guide to Optional Values. For details on algebraic values, see Algebraic Values without Algebra.
ArgumentType
name(
number)
Provides the type of an argument to a function. The number is the zero-based index of the argument.
In
typeProvides the inner type of a list or optional.
InputType
format variableProvides the type of variable from the input format format. Variables from
the current input format selected with Input
are also available as
variable_type
.
ReturnType
nameProvides the return type of function
[
number]
Provides the type of an element in a tuple.
.
fieldProvides the type of a field in an object.
Descriptors are a machine-friendly form Shesmu uses to communicate type information between systems. Most of this does not involve human interaction, but some plugin configuration files require type information in descriptor form. For JSON configuration files, there is a JSON-enhanced descriptor. Any string is treated as a normal descriptor, but composite types can be expanded to a more readable form:
{ "is": "optional", "inner": X } // X?
{ "is": "list", "inner": X } // [X]
{ "is": "dictionary", "key": K, "value": V } // K -> V
{ "is": "object", "fields": { "f1": F1, "f2": F2 } } // { f1 = F1, f2 = F2 }
[ E1, E2 ] // {E1, E2}
Mixing the two representations is fine (e.g., ["qb", "s"]
is equivalent to
[{"optional", "inner": "b"}, "s"]
or t2qbs
).
Regular expressions can have modified behaviour. Any combination of the following flags can be used after a regular expression:
i
: perform a case-insensitive match. This only works on ASCII characters unless u
or e
are also set.m
: perform a multi-line match. This makes ^
and $
work on lines in the text rather than on the text as a whole.s
: perform a single-line match. This makes .
match the end of line.u
: use Unicode case in matching instead of ASCII.