shesmu

Shesmu Decision-Action Language Reference

A Shesmu script contains:

a version
an input declaration
pragmas if required
type aliases if required
constants and functions
define olives
olives

The version can determine what language features are available and provides a mechanism to change syntax in the future. Currently, only one version is supported.

Version 1;

The input declaration determines the input format that will be read by olives in the file. This is the only required entry in a file.

Input format;

Pragmas modify the behaviour of the entire script, mostly to do with when it will execute.

Constants and functions are automatically imported from plugins and other files but can also be defined locally.

Olives then process the input data. Define olives are a reusable set of olive data.

Olives, define olives, functions, and constants may be mixed in any order.

Pragmas

After the Input line, various script modifiers can be added.

Imports

Import qname;

Access any qualified name by the final section (i.e., Import std::string::to_path; will make to_path the same as std::string::to_path).

Import qname As name;

Access a qualified name by a custom name (i.e., Import std::string::to_path As pathify; will make pathify the same as std::string::to_path).

Import qname::{ name1 [As name1P] [, name2 …]};

Perform multiple of the above access patterns at once. That is:

Import std::string::{length As strlen, to_path};

is equivalent to:

Import std::string::length As strlen;
Import std::string::to_path;

Import qname::*;

Access all children of a qualified name (i.e., Import std::string::*; will make to_path the same as std::string::to_path, length the same as std::string::length and so on).

Timeouts

Timeout integer ;

Stop the script after integer seconds. The usual integer suffixes, especially minutes and hours can be added here. When a script runs over its time budget, the Prometheus variable shesmu_run_overtime will be set.

Frequency

Frequency integer ;

Run the script every integer seconds. The usual integer suffixes can be used, where seconds is the default and others such as minutes and hours can be used.

Required Services

RequiredServices service1 [, service2, …] ;

Ensures that the olive will only run if the specified services are not throttled. Multiple services are separated by a comma.

Data Checks

Check format Into name = collector Require expr;

This prevents the script from running based on input data. It takes all the input data from the format format and sorts it into one giant group using collector to aggregate the records. Once done, it evaluates expr with name defined to that aggregate value. If expr is true, the script can run; otherwise, it will be blocked.

This can be useful to require temperamental data sources to have provided input:

Check unix_file Into c = Count Require c > 0;

Type Aliases

Since tuple types can get unwieldy, a type alias can be created:

TypeAlias name type;

This will make name available in all the places where types are permitted. Note that all the variables are already available as variable_type.

Top-level Elements

These are the olives, functions, and constants. Define olives, functions, constants, actions, refillers, and gangs are in separate namespaces, so it is possible to reuse the same name for any of these without an error.

[Export] name = expression ;

Creates a new constant. The name cannot be used for any other constant. If Export is present, this constant will be available to other scripts as olive::script::name.

[Export] Function name(type1 arg1[, …]) expr;

Create a new function. The function must take at least one argument. The possible types are defined below. If Export is present, this constant will be available to other scripts as olive::script::name.

[Export] Define name([type1 arg1[, …]]) clauses ;

Create a new define olive. This is a section of olive that can be reused among different olives in the file. It is intended for when olives share similar logic. The define olive cannot be used after reshaping has been done, so it must occur early in the olive. If reshaping is required, write the define olives in a nested way.

Parameters are optional.

Olive [Description "info"] [Tag tagname1 [Tag tagname2 …]] clauses terminal ;

Create a new olive that does something. The something is determined by terminal. Olives have optional descriptions and tags which will be displayed in the UI. For olives that produce actions, any tags will be added to the action and can be used for filtering.

Olive Terminals

Terminals determine what an olive will do.

Run action tags With param1 = expr1[, param2 = expr2[, …]];

Creates action to be scheduled and run. action determines which action will be run and what parameters are available. Optional parameters can be conditionally assigned:

param = expr If condition

Tags can also be attached to the action. These tags, unlike the ones at the start of the olive, are dynamically generated. This makes it possible to create tags based on the data. For instance, to tag action by project/customer. See Dynamic Tags.

Alert label1 = expr1[, label2 = expr2[, …]] [Annotations ann2 = aexpr1[, …]] For timeexpr;

Creates a Prometheus alert. According to Prometheus’s design, an alert is defined by its labels, all of which must be strings. For additional data that might change, use Annotations, which are also string-valued. An alert has a finite duration, after which it will expire unless refreshed. timeexpr defines the number of seconds an alert should fire for. Every time the olive is re-run the alert will be refreshed.

This may be used in Reject and Require clauses.

Refill refiller With With param1 = expr1[, param2 = expr2[, …]];

Replace the contents of a database with the output from the olive. Each record the olive emits is another “row” sent to the database. How the refiller interprets the data and its behaviour is defined by the refiller.

Dynamic Tags

Tags can be attached to an action based on the data in the olive. They can be any string. Duplicate tags are removed.

Tag expr

Adds the result of expr, which must be a string, to the tags associated with this action.

Tags expr

Adds the elements in the result of expr, which must be a list of strings, to the tags associated with this action.

Clauses

An olive can have many clauses that filter and reshape the data. All clauses can be preceded with Label "text" to have text appear in the dataflow diagram instead of the name of the clause.

Dump [Label name1] expr1[, [Label name2] expr2[, …]] To dumper
Dump All To dumper

Exports data to a dumper for debugging analysis. The expressions can be of any type. If All is used, all variables are dumped in alphabetical order. In output some output formats (e.g., TSV) column order is preserved. Column names can be provided with the Label prefix to provide a name. If no column name is provided, Shesmu will attempt to infer an “obvious” column name (the variable name if a variable or a simple transformation of a variable). If no column name is obvious and none is provided, Shesmu will create an arbitrary name.

This may be used in Reject and Require clauses. This does not reshape the data.

Flatten name In expr

Creates copies of a row with an additional variable name for each value in the list provided by expr. If there are no items in the list, the row is dropped.

This reshapes the data.

Group By discriminator1[, …] [Where condition] Into collectionname1 = collector1[, …] [OnReject reject1[ reject2[ …]] Resume]
Group By discriminator1[, …] Using grouper param = expr1[, …] [With output[, …]] [Where condition] Into collectionname1 = collector1[, …] [OnReject reject1[ reject2[ …]] Resume]

Performs a grouping of the data. First, rows are collected in subgroups by their discriminators. If Using is provided, those subgroups are modified by the grouper. Finally, all items in a subgroup are passed through the collectors. The output will have all the discriminators and collectors as variables.

Discriminators come in multiple forms:

Syntax	Behaviour
name `=` expr	Compute the value from expr for each row and use it for grouping; assign it to name in the output. name can use destructuring.
name `= OnlyIf` expr	Compute the value from expr, which must be an optional value, for each row and, if it contains a value, use it for grouping; assign it to name in the output. name can use destructuring.
name `= Univalued` expr	Compute the value from expr, which must be a list, for each row and, if it contains a single value, use it for grouping; assign it to name in the output. name can use destructuring.
name	Use an existing variable name for grouping and copy it to the output.
`@`gang	Use all variables in a gang for grouping and copy them to the output.

Custom groupers take parameters. Some parameters are per-row, which may use variables, and some are fixed, which must use constants or parameters to a define olive. Custom groupers may also define output variables. These are available in the collectors. They have default names; if those names are a problem, With can be used to rename them.

The collectors aggregate from the values in a group. They are described in another section. Each collector can have Where filters that limit the collected data. Optionally, a Where filter can be applied to all the collectors by providing condition.

Rows which are rejected are passed to the rejection handlers. These are Monitor or Dump clauses or an Alert terminal. Rejection handlers can only access the discriminators.

This reshapes the data.

Join outerkey To input innerkey
IntersectionJoin outerkey To input innerkey
Join outerkey To Call name(args) innerkey
IntersectionJoin outerkey To Call name(args) innerkey

Does a join where incoming rows are joined against rows from the input data source or the output of the Define olive name. Names between the two data sources must not overlap. Rows are joined if outerkey and innerkey match:

Operation	Outer Key	Inner Key	Behaviour
`Join`	k	i	Matches if k = i.
`IntersectionJoin`	k	i	Matches if `For x In` k`: Any x In` i
`IntersectionJoin`	`[`k`]`	i	Matches if k `In` i
`IntersectionJoin`	k	`[`i`]`	Matches if i `In` k

In Join, keys are values that must match exactly. In IntersectionJoin, the keys are lists of values and the join occurs if any items found in both inner and outer key lists. Consider a situation where a process outputs several files and another process ingests a subset of them; a IntersectionJoin could be used to find output process that used some of the input.

If a set-to-single value join is required, use IntersectionJoin and put the single element in a list.

This reshapes the data.

LeftJoin outerkey To [Prefix prefix] input innerkey [Where condition] collectionname1 = collector1 [, …]
LeftIntersectionJoin outerkey To [Prefix prefix] input innerkey [Where condition] collectionname1 = collector1 [, …]
LeftJoin outerkey To [Prefix prefix] Call name(args) innerkey [Where condition] collectionname1 = collector1 [, …]
LeftIntersectionJoin outerkey To [Prefix prefix] Call name (args) innerkey [Where condition] collectionname1 = collector1 [, …]

Does a left-join operation between the current data and the data from the input data format or the output of the Define olive name. This is done using a merge join where keys are computed for both datasets and then only matching entries are processed. outerkey is the key on the incoming data and innerkey is the key on the data being joined against. A tuple can be used if joining on multiple keys is required. Each row in the outer data is treated as a kind of group and the matching inner keys are processed through the collectors. This means that outer data is used only once but inner data maybe reused multiple times if multiple outer rows have the same key. Each collector can have Where filters that limit the collected data. Optionally, a Where filter can be applied to all the collectors by providing condition.

When doing left join, there will likely be collisions between many variables, including all the signatures. While it is possible to reshape the data to avoid this conflict, the Prefix option allows renaming the joined data rather than the source data.

Operation	Outer Key	Inner Key	Behaviour
`LeftJoin`	k	i	Matches if k = i.
`LeftIntersectionJoin`	k	i	Matches if `For x In` k`_`: Any x In `_i
`LeftIntersectionJoin`	`[`k`]`	i	Matches if k`_` In `_i
`LeftIntersectionJoin`	k	`[`i`]`	Matches if i `In` k

In LeftJoin, keys are values and must match exactly. In LeftIntersectionJoin, the keys are lists of values and the join occurs if any items are found in both inner and outer key lists. Consider a situation where a process outputs several files and another process ingests a subset of them; a LeftIntersectionJoin could be used to find output process that used some of the input.

If a set-to-single value join is required, use LeftIntersectionJoin and put the single element in a list.

This reshapes the data.

Let assignment1[, assignment2[, …]]

Reshapes the data by creating new variables from existing expressions. There are several assignments available:

Syntax	Behaviour
name `=` expr	Compute the value from expr and assign it to name.
name	Copy an existing variable name without modification.
`@`gang	Copy all variables in a gang without modification.
name `= OnlyIf` expr	Compute an optional value from expr; if it contains a value, assign it to name; if empty, discard the row.
name `= Univalued` expr	Compute a list from expr; if it contains exactly one value, assign it to name; otherwise discard the row.
`Prefix` name`,` … `With` prefix	Copy all the variables, but renaming them by adding the supplied prefix. This is useful for self-joins.

This reshapes the data.

Monitor metric "help" {name = expr1[, …] }

Exports the number of rows as a Prometheus variable. metric must be unique and shesmu_user_metric will be the Prometheus variable name. This variable will have help associated with it as the help text and the names define the keys used. The expressions must return strings.

This may be used in Reject and Require clauses. This does not reshape the data.

Pick Max expr By expr1[, expr2[, …]]
Pick Min expr By expr1[, expr2[, …]]

Performs a grouping by expr1, expr2, … and then allows a single row with the largest or smallest value of expr to pass and discards the rest. If there is a tie, an arbitrary row with the largest or smallest value of expr will be passed on.

This does not reshape the data.

Reject cond OnReject reject1[ reject2[ …]] Resume

Filter rows from the input. If cond is false, the row will be kept; if true, it will be discarded. This is the opposite of Where. Rows which are rejected are passed to the rejection handlers. These are Monitor or Dump clauses or an Alert terminal.

This does not reshape the data.

Require name = expr OnReject reject1[ reject2[ …]] Resume

Evaluate expr, which must return an optional. If the result is empty, the row will be discarded; if the optional has a value, this value will be assigned to name. The name can use destructuring. Discarded rows are given to the reject clauses which are Monitor or Dump clauses or an Alert terminal.

This reshapes the data.

Where expr

Filter rows from the input. If expr is true, the row will be kept; if false, it will be discarded. This is the opposite of Reject.

This does not reshape the data.

name([expr1[, expr2[, …]]])

Call a define olive. If the olive takes any parameters, they must be provided and cannot use any data from the input format. The define olive cannot be called after reshaping the data.

The define olive may reshape the data, so this rule will be considered to reshape the data based only if the define olive reshapes it.

Grouping Collectors

In a grouping operation, a collector will see all the data and aggregate it into a resulting property.

Any expr

Check if expr returns true for at least one row . If none are collected, the result is true.

All expr

Check if expr is true for all rows. If none are collected, the result is true.

Count

Count the number of matched rows

Flatten expr

Collect all values into a list from existing lists (duplicates are removed).

Dict keyexpr = valueexpr

Collects the results into a dictionary. Duplicate values are resolved arbitrarily.

LexicalConcat expr With delimiter

Concatenate all values, which must be strings, into a single string separated by delimiter, which must also be a string.

List expr

Collect all values into a list (duplicates are removed).

Max expr

Collect the largest value; if none are collected, the group is rejected.

Min expr

Collect the smallest; if none are collected, the group is rejected.

None expr

Check if expr is false for all rows. If none are collected, the result is true.

PartitionCount expr

Collect a counter of the number of times expr was true and the number of times it was false. The resulting value will be an object with two fields: matched_count with the number of rows that satisfied the condition and not_matched_count with the number that failed the provided condition

Sum expr

Compute the sum of the resulting value from expr, which must be an integer or floating point number.

Tuple expr Require count

Take the inputs and put them into a tuple. The values must be ordered. Tuples have a defined number of elements, so there must be exactly the right number of items available. If there are count then a tuple of this length will be produced with all of the items in the input order. If the number of items is either too few or too many, an empty optional will be returned instead.

Depending on the situation, Skip and Limit can be used to trim the input appropriately.

Univalued expr

Collect exactly one value; if none are collected, the group is rejected; if more than one are collected, the group is rejected. It is fine if the same value is collected multiple times.

Where expr collector

Performs filtering before collector.

{ name1 = collector1, name2 = collector2, … }

Performs multiple collections at once and converts the results into an object. This can be very useful to share a Where condition while collecting multiple pieces of information.

[behaviour] ` collector `

This allows using optional values in other collectors. For instance, suppose there is an optional number and the minimum is desired, one could write: ` Min x? `.

There are special behaviours for how to handle records with missing data:

the unspecified behaviour is to simply drop empty optionals and proceed with the collector as normal. This is equivalent to Where x != ` ` Min x Default 1234; since x will never have the empty optional, the default will never be used.
OnlyIf All requires that no empty input makes it to the collector; if any optional input is found, the output from the collector is replaced by the empty optional. So, if v = OnlyIf All `Min x?` was given `3` and ` `, it would produce v == ` `.
OnlyIf Any requires that one non-empty input makes it to the collector; if no optional input is found, the output from the collector is replaced by the empty optional. So, if v = OnlyIf Any `Min x?` was given `3` and ` `, it would produce v == `3`, but ` ` and ` ` would produce v == ` `.
Require All requires that no empty input makes it to the collector; if any optional input is found, the group will be rejected. So, if v = Require All `Min x?` was given `3` and ` `, the group would not be present in the output.
Require Any requires that one non-empty input makes it to the collector; if no optional input is found, the group will be rejected. So, if v = Require Any `Min x?` was given `3` and ` `, it would produce v == 3, but if the input were ` ` and ` `, the group would not be present in the output.

Expressions

Shesmu has the following expressions, for lowest precedence to highest precedence.

Flow Control

Switch refexpr (When testexpr Then valueexpr)* Else altexpr

Compares refexpr to every testexpr for equality and returns the matching valueexpr. If none match, returns altexpr. The altexpr and every valueexpr must have the same type. The refexpr and every testexpr must have the same type, but not necessarily the same type as the altexpr and valueexpr.

If testexpr Then trueexpr Else falseexpr

Evaluates testexpr and if true, returns trueexpr; if false, returns falseexpr. testexpr must be boolean and both trueexpr and falseexpr must have the same type.

IfDefined tests Then trueexpr Else falseexpr

Performs a conditional compilation. The tests are a comma-separated list of constant names or Function + function names. If all of these items are defined, trueexpr is used; otherwise, falseexpr is used. It’s important to note that, unlike If, this is a compile-time decision. Therefore, trueexpr and falseexpr don’t have to return the same type and the unused path can depend on constants and functions that are not defined.

This expression is intended for use with the simulator’s constants. This allows embedding logic like:

 Where IfDefined shesmu::simulator::run
   Then shesmu::simulator::run == run_name
   Else True

to allow the simulator to be used as a diagnostic tool.

For var In expr: modifications… collector

Takes the elements in a dictionary, list, JSON blob, or optional and process them using the supplied modifications and then computes a result using the collector. The modifications and collectors are described below.

expr Type	x Type	Operation
`[`t`]`	t	Processes each item in the list.
` t `	t	If the optional contains a value, process it; otherwise act like the empty list has been provided.
k `->` v	`{`k`,` v`}`	Process each pair of items in a dictionary.
`json`	`json`	If the type is a JSON array, use the elements; if a JSON object use the values. Otherwise, acts as if the empty list.

Any scalar JSON value is treated as an empty collection.

For var Fields expr: modifications… collector

Takes the properties in a JSON object and process them using the supplied modifications and then computes a result using the collector. The modifications and collectors are described below.

Any scalar or array JSON value is treated as an empty collection. var will be a tuple of the property name and value.

For var From startexpr To endexpr: modifications… collector

Iterates over the range of number from startexpr, inclusive, to endexpr, exclusive, and process them using the supplied modifications and then computes a result using the collector. The modifications and collectors are described below.

For var Splitting expr By /regex/flags: modifications… collector

Takes the string expr and splits it into chunks delimited by regex and then processes them using the supplied modifications and then computes a result using the collector. The modifications and collectors are described below.

flags sets the behaviour of the regular expression. For details, see regular expression flags.

For var Zipping expr With expr: modifications… collector

This takes two expressions, which must be lists containing tuples. The first elements of each tuple, which must be the same type, are matched up and then the tuples are joined and iterated over. The modifications and collectors are described below.

Since entries might not match, the non-index elements in the tuple are converted to optionals.

So, For {index, left, right} Zipping [ {"a", 1}, {"b", 2} ] With [ {"a", True} ]: will produce:

`index`	`left`	`right`
`"a"`	`1`	`True`
`"b"`	`2`	` `

Match refexpr (When algmatch Then valueexpr)* (Else altexpr Remainder (name) altexpr)?

Allows separating the algebraic or optional value returned by refexpr and accessing its contents. An optional value is treated as an algebraic value with NONE and SOME{x} entries. A When branch can be provided for every possible algebraic type returned by refexpr. If all possible types are matched, the matching is exhaustive. If the matching is not exhaustive, the remaining cases can be handled via Else or Remainder. Else allows an expression to be used in all other cases, much like the Else in a Switch. Remainder provides access to the case being handled.

For example:

Function analysis_for_project(string project)
	  Switch project
		  When "a" Then CANCER {"hg38"}
		  When "b" Then CANCER {"hg19"}
		  When "c" Then VIRAL {"hpv", "hg19"}
		  When "d" Then VIRAL {"hpv", "hg19"}
    Else SEQUENCING_ONLY;

 Function reference_for_analysis(CANCER{string} | VIRAL{string, string} analysis)
    # Match is exhaustive, so no Else/Remainder
    Match analysis
      When CANCER{genome} Then genome
      When VIRAL{_, genome} Then genome;

...
  # Determine if this olive should run on this data; use Else to cover other cases
  Where Match analysis_for_project(project)
      When CANCER {_} Then True
      Else False
...

...
   Let
     project, sample,
     reference = OnlyIf
       # We remove the SEQUENCING_ONLY case and pass the other values to reference_for_analysis
       Match analysis_for_project(project)
         When SEQUENCING_ONLY Then ``
         Remainder (a) `reference_for_analysis(a)`
...

Two special pieces of syntax are allowed in When:

When NAME _ will match the name and discard the value’s contents
When NAME * will match the name and turn all fields in an object algebraic value into variables

For details on algebraic values, see Algebraic Values without Algebra.

Order expr

Rearrange the values in a tuple in ascending order. The types of the values must be homogenous and orderable. For example that Order {"b", "c", "a"} would result in {"a", "b", "c"}. While Order {"a", 1} is an error.

This can be useful to ensure ranges provided by user data are in order:

Begin
 {min, max} = Order {min_from_user, max_from_user};
 Return max - min;
End

JSON Conversion

expr As type

Convert a value to or from JSON. If type is json, then the result from expr will be converted to JSON in the Shesmu-standard way. If type is any other type, then expr must be a json value and it will be converted from JSON to the matching Shesmu type. Since the conversion from JSON to Shesmu cannot be guaranteed, it will return an optional of type. To create a JSON null value, use ` ` As json.

Blocks

Begin name0 = expr0; [ name1 = expr1;] […] Return expr; End

Creates local variables in name0 by evaluating expr1. These variables are then accessible in _expr1_ and so on. Finally expr is evaluated with all the defined names and its result is used. The names can use destructuring.

Tabulation

Tabulate name0 = ( expr00 , expr01 , … ); [ name1 = ( expr11 , expr11 , … );] […] End

Performs an order-sensitive matched assignment. This language construct is useful for generating data the Vidarr retry types. Suppose a Vidarr job with two fields: memory and time, and the job should be run with an increasing amount of memory and time if it fails. This could be accomplished as follows:

Run vidarr::production::hpc::some_job With
  arguments =
     Begin
        {; memory, timeout} = Tabulate
             memory = 5Gi, 10Gi;
             timeout = 1hours, 6hours;
            End;
        Return {
            foo__memory = memory,
            foo__files = files,
            foo__timeout = timeout,
            foo__modules = "foo/1.2.3"
        };
     End,
   ...

This construct checks that all the names have the same number of values (two in this case) and turns each of them into an ordered dictionary with the keys as strings containing numbers (the format expected by Vidarr).

The names can be destructuring:

Run vidarr::production::hpc::some_job With
  arguments =
     Begin
        {; foo_memory, foo_timeout; bar_timeout} = Tabulate
             {foo_memory, foo_timeout} = {5Gi, 1hours}, {10Gi, 6hours};
             bar_timeout = (30mins, 90mins);
            End;
        Return {
            foo__memory = foo_memory,
            foo__files = files,
            foo__timeout = foo_timeout,
            foo__modules = "foo/1.2.3",
            foo__bar__timeout = bar_timeout
        };
     End,
   ...

Optional Coalescence

expr Default default

Computes an optional value using expr; if this value is empty, returns default. expr must be the optional version of expr.

For details on optional values, see the Mandatory Guide to Optional Values.

Logical Disjunction and Optional Merging

expr || expr

Logical short-circuiting or. If operands are boolean, the result is boolean.

If both are optionals of the matching type, if the first optional has a value, returns that optional; otherwise the second.

Logical Conjunction

expr && expr

Logical short-circuiting and. Both operands must be boolean and the result is boolean.

Comparison

Equality

expr == expr

Compare two types for equality. This is supported for all types. For tuples, the values in the tuples must be the same. For lists, the items must be the same, but the order is not considered.

Inequality

expr != expr

Compare two values for inequality. This is the logical complement to ==.

Ordering

expr < expr
expr <= expr
expr >= expr
expr > expr

Compare two values for order. This is only defined for integers and dates. For dates, the lesser value occurs temporally earlier.

Regular Expression

expr ~ /re/flags

Check whether expr, which must be a string, matches the provided regular expression.

flags sets the behaviour of the regular expression. For details, see regular expression flags.

expr !~ /re/flags

Check whether expr, which must be a string, does not match the provided regular expression.

flags sets the behaviour of the regular expression. For details, see regular expression flags.

expr =~ /re/flags

Matches expr, which must be a string, against the provided regular expression and returns a tuple of the values of each capture group. Since individual capture groups may be missing, this returns an optional tuple of optional strings. If the outer optional is the missing value, then the regular expression failed to match. If any element of the tuple is the missing value, then that capture group did not match.

flags sets the behaviour of the regular expression. For details, see regular expression flags.

Disjunction

Addition

expr + expr

Adds two values.

Left	Right	Result	Description
`integer`	`integer`	`integer`	Summation
`float`	`integer`	`float`	Summation
`integer`	`float`	`float`	Summation
`float`	`float`	`float`	Summation
`date`	`integer`	`date`	Add seconds to date
`path`	`path`	`path`	Resolve paths (concatenate, unless second path starts with `/`)
`path`	`string`	`path`	Append component to path
`string`	`string`	`string`	Concatenate two strings
`string`	`integer`	`string`	Concatenate a string and an integer value by first converting it
`string`	`float`	`string`	Concatenate a string and a floating-point value by first converting it
`string`	`date`	`string`	Concatenate a string and a date value by first converting it
`string`	`path`	`string`	Concatenate a string and a path value by first converting it
`[`x`]`	`[`x`]`	`[`x`]`	Union of two lists (removing duplicates)
`[`x`]`	x	`[`x`]`	Add item to list (removing duplicates)
`{`a1`,`…`,` an`}`	`{`b1`,`…`,` bn`}`	`{`a1`,`…`,` an`,`b1`,`…`,` bn`}`	Concatenate two tuples
`{`fa1`=`a1`,` …`,` fan`=`an`}`	`{`fb1`=`b1`,`…`,` fbn`=`bn`}`	`{`fa1`=`a1`,`…`,` fan`=`an`,`fb1`=`b1`,`…`,` fbn`=`bn`}`	Merge two objects (with no duplicate fields)

Subtraction

expr - expr

Subtracts two values.

Left	Right	Result	Description
`integer`	`integer`	`integer`	Difference
`float`	`integer`	`float`	Difference
`integer`	`float`	`float`	Difference
`float`	`float`	`float`	Difference
`date`	`integer`	`date`	Subtract seconds to date
`date`	`date`	`int`	Difference in seconds
`[`x`]`	`[`x`]`	`[`x`]`	Difference of two lists (first list without items from second)
`[`x`]`	x	`[`x`]`	Remove item from list (if present)

Conjunction

Multiplication

expr * expr

Multiplies two values.

Left	Right	Result	Description
`integer`	`integer`	`integer`	Multiplication
`float`	`integer`	`float`	Multiplication
`integer`	`float`	`float`	Multiplication
`float`	`float`	`float`	Multiplication
`string`	`integer`	`string`	Repeats string a specified number of times

Division

expr / expr

Divides two values.

Left	Right	Result	Description
`integer`	`integer`	`integer`	Division
`float`	`integer`	`float`	Division
`integer`	`float`	`float`	Division
`float`	`float`	`float`	Division

Modulus

expr % expr

Computes the remainder of diving the first value by the second, both of which must be integers.

Suffix Operators

List Membership

needle In haystack

Determines if the expression needle is present in the haystack and returns the result as a boolean. needle may be any type, but haystack must be either a list of the same type or a dictionary with keys of the same type.

Optional Use

expr ?

This must be used inside optional creation. Evaluates expr, which must have an optional type, and provides the inner (non-optional) value inside. If the expression has an empty optional, the entire optional creation will be the empty optional.

These may be nested for function calls on optional values. For example:

x = `foo(x?)? + 3`

For details on optional values, see the Mandatory Guide to Optional Values.

Unary Operators

Boolean Not

! expr

Compute the logical complement of the expression, which must be a boolean.

Integer Negation

- expr

Computes the arithmetic additive inverse of the expression, which must be an integer.

List Size

Count expr

Counts the number of elements expr, which must be a list.

WDL Pair Conversion

ConvertWdlPair expr

WDL has a pair type, Pair[X, Y], which can be represented in Shesmu two ways: as a tuple, {X, Y}; or as an object, {left = X, right = Y}. The tuple form better matches how pairs are written in WDL, while the object better matches how pairs are encoded as JSON. This function converts between the two representations.

Optional Creation

` expr `

Puts the value of expr in an optional. In expr, the ? suffix maybe used to apply changes to the entire optional.

For example:

Begin
  x = `3`;
  Return `x? * 2`;
End

In this example, the x? will get the value inside the variable x, which may be missing. If it is missing, the block will return an empty optional; otherwise it will return an optional containing the original value multiplied by 2.

Creates an optional that contains no value.

For details on optional values, see the Mandatory Guide to Optional Values.

Access Operators

Tuple and Dictionary Access

expr [ n ]

Extracts an element from a tuple (or integer-indexed map). n is an integer that specifies the zero-based index of the item in the tuple. The result type will be based on the type of that position in the tuple. If n is beyond the number of items in the tuple, an error occurs.

The expr can also be an optional of a tuple. If it is, the result will be an optional of the appropriate type.

expr [ indexexpr ]

Extracts the value from a dictionary. The resulting value will always be optional in case the key specified by indexexpr is missing.

The expr can also be an optional of a dictionary.

Named Tuple Access

expr . field

Extracts a field from a named tuple or JSON object. field is the name of the field. The result type will be based on the type of that field in the named tuple or a JSON blob when accessing a JSON blob. If field is not in the named tuple, an error occurs. If field is not in the JSON blob (or applied to a scalar or array), the result is a JSON null value.

The expr can also be an optional of a named tuple or JSON object. If it is, the result will be an optional of the appropriate type.

Swizzled Named Tuple Access

expr .{ field1, field2 , … }

Extracts multiple fields from a named tuple and constructs a tuple with the results. The result type will be based on the type of that field in the named tuple. If field is not in the named tuple, an error occurs.

The expr can also be an optional of a named tuple. If it is, the result will be an optional of the appropriate type.

Terminals

Action Name Literal

ActionName

Get the name of the action being executed as a string?. In the case of Refill and Alert olives, this will be the missing optional value.

Algebraic Values

NAME
NAME{expr, expr, …}
NAME{@name}
NAME{field = expr, field = expr, …}

Shesmu supports creating algebraic values. The name of an algebraic type is a combination of uppercase letters, digits, and underscore. It must start with an uppercase letter and be at least two characters. Algebraic values come in three types: ones which contain no information (and work something like an enum in other languages), types that are associated with a sequence of values, much like a tuple, and ones which contained named fields, much like a named tuple/object. It is also possible to use a gang to create a tuple-like algebraic value.

For details on algebraic values, see Algebraic Values without Algebra.

Date Literal

Date YYYY-mm-dd
Date YYYY-mm-ddTHH:MM:SSZ
Date YYYY-mm-ddTHH:MM:SS+zz
Date YYYY-mm-ddTHH:MM:SS-zz
EpochSecond s
EpochMilli m

Specifies a date and time. If the time is not specified, it is assumed to be midnight UTC.

Tuple Literal

{expr, expr, …}

Creates a new tuple with the elements as specified. The type of the tuple is determined based on the elements.

Instead of an expression to create a single element in a tuple, a ...expr can be used to insert all the elements in a tuple inline into the new tuple.

Named Tuple Literal

{field = expr, field = expr, = name ... [; (_var_ | @_gang_) ...]}`

Creates a new named tuple with the fields as specified. The type of the named tuple is determined based on the elements.

Instead of field = expr, a ...expr can be used and this will copy all the elements in expr, which must be an object. If some fields are to be excluded, use the form: ...expr Without field1 field2 …

A field can also be created from a variable of the same name by placing the name after a ;. For example { a = 1; b } is short hand for { a = 1, b = b }. Named fields can be ommited if there are none (i.e., {; b, c} is the short hand for {b = b, c = c}). If a gang is used here, this will create all the members of the gang as fields.

Synthetic Tuple

{@name}

Creates a new tuple with the elements as specified in the gang name.

List Literal

[expr, expr, …]

Creates a new list from the specified elements. All the expressions must be of the same type.

Dictionary Literal

Dict { keyexpr = valueexpr, … }

Creates a new dictionary from the specified elements. All keys must be the same type and all values must be the same type. If duplicate keys are present, one will be selected arbitrarily.

Instead of keyexpr = valueexpr, a ...expr can be used and this will copy all the elements in expr, which must be a dictionary. If some entries are to be excluded or transformed, use a For ... Dict to preprocess the dictionary.

Path Literals

'path'

Paths are UNIX-like paths that can be manipulated. They may contain \' if necessary.

String Literal

"parts"

Specified a new string literal. A string may contain the following special items in addition to text:

\\t for a tab character
\\n for a new line character
\\{ for an open brace character
{expr} for a string interpolation; the expression must be a string, integer, or date
{expr:n} for a zero-padded integer string interpolation; the expression must be an integer and n is the number of digits to pad to
{@name} interpolate a name from a gang; the variables in the gang must be strings and integers

Sub-expression

(expr)

A subexpression.

Integer Literal

An integer literal. Integer may be suffixed by one of the following multipliers:

Unit	Multiplier
G	1000^3
Gi	1024^3
M	1000^2
Mi	1024^2
k	1000
ki	1024
mins	60
hours	3600
days	86400
weeks	604800

Boolean Literals

True
False

The boolean true and false values, respectively.

Source Location String

Location

This creates a string containing the scripts source path, line, column, and hash. This is meant to help locate the originating olive in alerts and other output.

Function Call

function(expr, expr, …)

Call a function. Functions are provided by external services to Shesmu and some are provided as tables of values.

Variables

var

The value of a variable. There are different kinds of variables in Shesmu:

stream variables, attached to the data being processed (or the grouped versions of it)
parameters, as specified in Define olives
constants, provided by plugins
lambda variables, as specified in list operations (e.g., Map, Reduce, Filter)

Only stream variables may be used as discriminators in Group clauses.

List Modifiers

Distinct

Distinct

Discards any duplicate items in the list.

Map

Let x = expr

Replaces each item in the list with the value computed by expr. The values will be named x in the downstream operations.

Flatten

Flatten ( x In expr modifications )
Flatten ( x Fields expr modifications )
Flatten ( x From startexpr To endexpr modifications…)
Flatten ( x Splitting expr By /regex/flags modifications )

Performs nested iteration in the same was as For. The variable name available in the downstream operations is x. Additional list modification can also be applied. The additional operations inside the brackets can also see the outer variable.

Filter

Where expr

Eliminates any item in the list where expr evaluates to false.

Limit

Limit expr

Truncates the list after the number of items specified by expr, which must return an integer. The list must already be sorted.

Skip

Skip expr

Discards the number of items specified by expr, which must return an integer, from the beginning of the list. The list must already be sorted.

Sort

Sort expr

Sorts the items in a list based on an integer or date returned by expr.

Reverse

Reverse

Reverses the items in a list. The list must already be sorted.

Subsample

Subsample(subsampler, subsampler, subsampler, …)

Perform sampling on items in a list based on the given subsamplers (the order matters). The list must already be sorted. For example: Subsample(Fixed 1, Squish 5) will first select the first item and then randomly select five more items in the rest of the list.

Subsamplers

Fixed

Fixed integer

Select the first integer items in a sorted list.

FixedWithCondition

Fixed integer While condition

Select the first integer items in a sorted list while condition is evaluated to be true.

Squish

Squish integer

Randomly select integer items from a sorted list.

Collectors

Count

Count

Returns the number of items in the list.

First Item

First expr

Returns the first expr in the list or an empty optional if no items are present.

Since this returns optional, it may be useful to chain with Default.

For details on optional values, see the Mandatory Guide to Optional Values.

Concatenate Strings

LexicalConcat expr With delimexpr
FixedConcat expr With delimexpr

Creates a string from expr, which must return a string, for each item in the list separated by the value of delimexpr, which must also be a string.

LexicalConcat sorts the strings lexicographically before joining.
FixedConcat assumes the strings are sorted by a Sort operation before joining.

List

List expr

Evaluates expr for every item and collects all the unique into a list.

Dictionary

Dict keyexpr = valueexpr

Evaluates keyexpr and valueexpr for every item and collects all the results into a dictionary. Duplicate values are resolved arbitrarily.

Optima

Max sortexpr
Min sortexpr

Finds the minimum or maximum item in a list, based on the sortexpr, which must be an integer or date. If the list is empty, an empty optional is returned.

Since this returns optional, it may be useful to chain with Default.

For details on optional values, see the Mandatory Guide to Optional Values.

Item Matches

None expr
All expr
Any expr

Checks whether none, all, or any (some) of the items in the list meet the condition specified in expr, which must return a Boolean.

Object Collector

{ name1 = modifications… collector, name1 = modifications… collector, … }

Products an object. Each field in the object is made by sending the same items through individual collectors. Consider something like:

For x In xs: Where x > 5 { count = Count, sum = Sum x }

Partitioned Counter

PartitionCount expr

Produces an object with two field: matched_count is the number of items for which expr was true, the not_matched_count is the number of items for which expr was false.

Reduce

Reduce(a = initialexpr ) expr

Performs a reduction operation on all the items in the list. a is the accumulator, which will be returned, which is initially set to initialexpr. For every item, expr is evaluated with a set to the previously returned value.

Sum

Sum expr

Evaluates expr for every item and compute the sum of all the results. expr must return an integer or a floating-point number.

Table

Table name = value, … With format

This collects items into a table and formats that table as a string. This can be useful for creating HTML or Markdown tables for inserting into JIRA. The name, which must evaluate to a string, will be the name of the column, and value, which must also produce a string, will be the contents of that column for every item. The format determines how the text is laid out. It is an object with the following properties:

data_start: the leader for each row
data_separator: the text to place in between inner columns
data_end: the trailer for each row
header_start: the leader for first row
header_separator: the text to place in between inner columns of the first row
header_end: the trailer for the first row
header_underline: optional text to add on the second line for each column

For a few common formats, this object would be defined as:

html = {
  data_start = "<tr><td>",
  data_separator = "</td><td>",
  data_end = "</td></tr>",
  header_start = "<tr><th>",
  header_separator = "</th><th>",
  header_end = "</th></tr>",
  header_underline = ``
}

markdown = {
  data_start = "|",
  data_separator = "|",
  data_end = "|",
  header_start = "|",
  header_separator = "|",
  header_end = "|",
  header_underline = `"|---"`
}

jira = {
  data_start = "|",
  data_separator = "|",
  data_end = "|",
  header_start = "||",
  header_separator = "||",
  header_end = "||",
  header_underline = ``
}

Univalued

Univalued expr

Evaluates all expr for each item in the list and returns it if all are the same.

If they are different or there are no items, an empty optional is returned.

Since this returns optional, it may be useful to chain with Default.

For details on optional values, see the Mandatory Guide to Optional Values.

Types

There are a small number of types in the language, listed below. Each has syntax as it appears in the language and a descriptor that is used for machine-to-machine communication.

Name	Syntax	Descriptor
Integer	`integer`	`i`
Float	`float`	`f`
String	`string`	`s`
Boolean	`boolean`	`b`
Date	`date`	`d`
List	`[`inner`]`	`a`inner
Empty List	`[]`	`A`
Tuple	`{`t1`,`t2`,` …`}`	`t` n t1 t2 Where n is the number of elements in the tuple.
Object	`{`field1 `=` t1`,`field2 `=` t2`,` …`}`	`o` n field1`$`t1 field2`$`t2 Where n is the number of elements in the tuple.
Optional	inner`?`	`q`inner or `Q`
Path	`path`	`p`
JSON	`json`	`j`
Algebraic	NAME	`u1`NAME`$t01`
Algebraic	NAME `{`t1`,` t2`,` …`}`	`u1`NAME`$t`n t1 t2 Where n is the number of elements in the tuple.
Algebraic	NAME `{`field1 `=` t1`,`field2 `=` t2`,` …`}`	`u1`NAME`$o` n field1`$`t1 field2`$`t2 Where n is the number of elements in the tuple.

All the variables are already available as variable_type.

For details on optional values, see the Mandatory Guide to Optional Values. For details on algebraic values, see Algebraic Values without Algebra.

ArgumentType name(number)

Provides the type of an argument to a function. The number is the zero-based index of the argument.

In type

Provides the inner type of a list or optional.

InputType format variable

Provides the type of variable from the input format format. Variables from the current input format selected with Input are also available as variable_type.

ReturnType name

Provides the return type of function

type[number]

Provides the type of an element in a tuple.

type.field

Provides the type of a field in an object.

Descriptors are a machine-friendly form Shesmu uses to communicate type information between systems. Most of this does not involve human interaction, but some plugin configuration files require type information in descriptor form. For JSON configuration files, there is a JSON-enhanced descriptor. Any string is treated as a normal descriptor, but composite types can be expanded to a more readable form:

{ "is": "optional", "inner": X } // X?
{ "is": "list", "inner": X } // [X]
{ "is": "dictionary", "key": K, "value": V } //  K -> V
{ "is": "object", "fields": { "f1": F1, "f2": F2 } } // { f1 = F1, f2 = F2 }
[ E1, E2 ] // {E1, E2}

Mixing the two representations is fine (e.g., ["qb", "s"] is equivalent to [{"optional", "inner": "b"}, "s"] or t2qbs).

Regular Expression Flags

Regular expressions can have modified behaviour. Any combination of the following flags can be used after a regular expression:

i: perform a case-insensitive match. This only works on ASCII characters unless u or e are also set.
m: perform a multi-line match. This makes ^ and $ work on lines in the text rather than on the text as a whole.
s: perform a single-line match. This makes . match the end of line.
u: use Unicode case in matching instead of ASCII.

This site is open source. Improve this page.