shesmu

Shesmu Decision-Action Language Reference

A Shesmu script contains:

The version can determine what language features are available and provides a mechanism to change syntax in the future. Currently, only one version is supported.

Version 1;

The input declaration determines the input format that will be read by olives in the file. This is the only required entry in a file.

Input format;

Pragmas modify the behaviour of the entire script, mostly to do with when it will execute.

Constants and functions are automatically imported from plugins and other files but can also be defined locally.

Olives then process the input data. Define olives are a reusable set of olive data.

Olives, define olives, functions, and constants may be mixed in any order.

Pragmas

After the Input line, various script modifiers can be added.

Imports

Access any qualified name by the final section (i.e., Import std::string::to_path; will make to_path the same as std::string::to_path).

Access a qualified name by a custom name (i.e., Import std::string::to_path As pathify; will make pathify the same as std::string::to_path).

Perform multiple of the above access patterns at once. That is:

Import std::string::{length As strlen, to_path};

is equivalent to:

Import std::string::length As strlen;
Import std::string::to_path;

Access all children of a qualified name (i.e., Import std::string::*; will make to_path the same as std::string::to_path, length the same as std::string::length and so on).

Timeouts

Stop the script after integer seconds. The usual integer suffixes, especially minutes and hours can be added here. When a script runs over its time budget, the Prometheus variable shesmu_run_overtime will be set.

Frequency

Run the script every integer seconds. The usual integer suffixes can be used, where seconds is the default and others such as minutes and hours can be used.

Required Services

Ensures that the olive will only run if the specified services are not throttled. Multiple services are separated by a comma.

Data Checks

This prevents the script from running based on input data. It takes all the input data from the format format and sorts it into one giant group using collector to aggregate the records. Once done, it evaluates expr with name defined to that aggregate value. If expr is true, the script can run; otherwise, it will be blocked.

This can be useful to require temperamental data sources to have provided input:

Check unix_file Into c = Count Require c > 0;

Type Aliases

Since tuple types can get unwieldy, a type alias can be created:

TypeAlias name type;

This will make name available in all the places where types are permitted. Note that all the variables are already available as variable_type.

Top-level Elements

These are the olives, functions, and constants. Define olives, functions, constants, actions, refillers, and gangs are in separate namespaces, so it is possible to reuse the same name for any of these without an error.

Creates a new constant. The name cannot be used for any other constant. If Export is present, this constant will be available to other scripts as olive::script::name.

Create a new function. The function must take at least one argument. The possible types are defined below. If Export is present, this constant will be available to other scripts as olive::script::name.

Create a new define olive. This is a section of olive that can be reused among different olives in the file. It is intended for when olives share similar logic. The define olive cannot be used after reshaping has been done, so it must occur early in the olive. If reshaping is required, write the define olives in a nested way.

Parameters are optional.

Create a new olive that does something. The something is determined by terminal. Olives have optional descriptions and tags which will be displayed in the UI. For olives that produce actions, any tags will be added to the action and can be used for filtering.

Olive Terminals

Terminals determine what an olive will do.

Creates action to be scheduled and run. action determines which action will be run and what parameters are available. Optional parameters can be conditionally assigned:

param = expr If condition

Tags can also be attached to the action. These tags, unlike the ones at the start of the olive, are dynamically generated. This makes it possible to create tags based on the data. For instance, to tag action by project/customer. See Dynamic Tags.

Creates a Prometheus alert. According to Prometheus’s design, an alert is defined by its labels, all of which must be strings. For additional data that might change, use Annotations, which are also string-valued. An alert has a finite duration, after which it will expire unless refreshed. timeexpr defines the number of seconds an alert should fire for. Every time the olive is re-run the alert will be refreshed.

This may be used in Reject and Require clauses.

Replace the contents of a database with the output from the olive. Each record the olive emits is another “row” sent to the database. How the refiller interprets the data and its behaviour is defined by the refiller.

Dynamic Tags

Tags can be attached to an action based on the data in the olive. They can be any string. Duplicate tags are removed.

Adds the result of expr, which must be a string, to the tags associated with this action.

Adds the elements in the result of expr, which must be a list of strings, to the tags associated with this action.

Clauses

An olive can have many clauses that filter and reshape the data. All clauses can be preceded with Label "text" to have text appear in the dataflow diagram instead of the name of the clause.

Exports data to a dumper for debugging analysis. The expressions can be of any type. If All is used, all variables are dumped in alphabetical order. In output some output formats (e.g., TSV) column order is preserved. Column names can be provided with the Label prefix to provide a name. If no column name is provided, Shesmu will attempt to infer an “obvious” column name (the variable name if a variable or a simple transformation of a variable). If no column name is obvious and none is provided, Shesmu will create an arbitrary name.

This may be used in Reject and Require clauses. This does not reshape the data.

Creates copies of a row with an additional variable name for each value in the list provided by expr. If there are no items in the list, the row is dropped.

This reshapes the data.

Performs a grouping of the data. First, rows are collected in subgroups by their discriminators. If Using is provided, those subgroups are modified by the grouper. Finally, all items in a subgroup are passed through the collectors. The output will have all the discriminators and collectors as variables.

Discriminators come in multiple forms:

Syntax Behaviour
name = expr Compute the value from expr for each row and use it for grouping; assign it to name in the output. name can use destructuring.
name = OnlyIf expr Compute the value from expr, which must be an optional value, for each row and, if it contains a value, use it for grouping; assign it to name in the output. name can use destructuring.
name = Univalued expr Compute the value from expr, which must be a list, for each row and, if it contains a single value, use it for grouping; assign it to name in the output. name can use destructuring.
name Use an existing variable name for grouping and copy it to the output.
@gang Use all variables in a gang for grouping and copy them to the output.

Custom groupers take parameters. Some parameters are per-row, which may use variables, and some are fixed, which must use constants or parameters to a define olive. Custom groupers may also define output variables. These are available in the collectors. They have default names; if those names are a problem, With can be used to rename them.

The collectors aggregate from the values in a group. They are described in another section. Each collector can have Where filters that limit the collected data. Optionally, a Where filter can be applied to all the collectors by providing condition.

Rows which are rejected are passed to the rejection handlers. These are Monitor or Dump clauses or an Alert terminal. Rejection handlers can only access the discriminators.

This reshapes the data.

Does a join where incoming rows are joined against rows from the input data source or the output of the Define olive name. Names between the two data sources must not overlap. Rows are joined if outerkey and innerkey match:

Operation Outer Key Inner Key Behaviour
Join k i Matches if k = i.
IntersectionJoin k i Matches if For x In k: Any x In i
IntersectionJoin [k] i Matches if k In i
IntersectionJoin k [i] Matches if i In k

In Join, keys are values that must match exactly. In IntersectionJoin, the keys are lists of values and the join occurs if any items found in both inner and outer key lists. Consider a situation where a process outputs several files and another process ingests a subset of them; a IntersectionJoin could be used to find output process that used some of the input.

If a set-to-single value join is required, use IntersectionJoin and put the single element in a list.

This reshapes the data.

Does a left-join operation between the current data and the data from the input data format or the output of the Define olive name. This is done using a merge join where keys are computed for both datasets and then only matching entries are processed. outerkey is the key on the incoming data and innerkey is the key on the data being joined against. A tuple can be used if joining on multiple keys is required. Each row in the outer data is treated as a kind of group and the matching inner keys are processed through the collectors. This means that outer data is used only once but inner data maybe reused multiple times if multiple outer rows have the same key. Each collector can have Where filters that limit the collected data. Optionally, a Where filter can be applied to all the collectors by providing condition.

When doing left join, there will likely be collisions between many variables, including all the signatures. While it is possible to reshape the data to avoid this conflict, the Prefix option allows renaming the joined data rather than the source data.

Operation Outer Key Inner Key Behaviour
LeftJoin k i Matches if k = i.
LeftIntersectionJoin k i Matches if For x In k_: Any x In `_i
LeftIntersectionJoin [k] i Matches if k_ In `_i
LeftIntersectionJoin k [i] Matches if i In k

In LeftJoin, keys are values and must match exactly. In LeftIntersectionJoin, the keys are lists of values and the join occurs if any items are found in both inner and outer key lists. Consider a situation where a process outputs several files and another process ingests a subset of them; a LeftIntersectionJoin could be used to find output process that used some of the input.

If a set-to-single value join is required, use LeftIntersectionJoin and put the single element in a list.

This reshapes the data.

Reshapes the data by creating new variables from existing expressions. There are several assignments available:

Syntax Behaviour
name = expr Compute the value from expr and assign it to name.
name Copy an existing variable name without modification.
@gang Copy all variables in a gang without modification.
name = OnlyIf expr Compute an optional value from expr; if it contains a value, assign it to name; if empty, discard the row.
name = Univalued expr Compute a list from expr; if it contains exactly one value, assign it to name; otherwise discard the row.
Prefix name,With prefix Copy all the variables, but renaming them by adding the supplied prefix. This is useful for self-joins.

This reshapes the data.

Exports the number of rows as a Prometheus variable. metric must be unique and shesmu_user_metric will be the Prometheus variable name. This variable will have help associated with it as the help text and the names define the keys used. The expressions must return strings.

This may be used in Reject and Require clauses. This does not reshape the data.

Performs a grouping by expr1, expr2, … and then allows a single row with the largest or smallest value of expr to pass and discards the rest. If there is a tie, an arbitrary row with the largest or smallest value of expr will be passed on.

This does not reshape the data.

Filter rows from the input. If cond is false, the row will be kept; if true, it will be discarded. This is the opposite of Where. Rows which are rejected are passed to the rejection handlers. These are Monitor or Dump clauses or an Alert terminal.

This does not reshape the data.

Evaluate expr, which must return an optional. If the result is empty, the row will be discarded; if the optional has a value, this value will be assigned to name. The name can use destructuring. Discarded rows are given to the reject clauses which are Monitor or Dump clauses or an Alert terminal.

This reshapes the data.

Filter rows from the input. If expr is true, the row will be kept; if false, it will be discarded. This is the opposite of Reject.

This does not reshape the data.

Call a define olive. If the olive takes any parameters, they must be provided and cannot use any data from the input format. The define olive cannot be called after reshaping the data.

The define olive may reshape the data, so this rule will be considered to reshape the data based only if the define olive reshapes it.

Grouping Collectors

In a grouping operation, a collector will see all the data and aggregate it into a resulting property.

Check if expr returns true for at least one row . If none are collected, the result is true.

Check if expr is true for all rows. If none are collected, the result is true.

Count the number of matched rows

Collect all values into a list from existing lists (duplicates are removed).

Collects the results into a dictionary. Duplicate values are resolved arbitrarily.

Concatenate all values, which must be strings, into a single string separated by delimiter, which must also be a string.

Collect all values into a list (duplicates are removed).

Collect the largest value; if none are collected, the group is rejected.

Collect the smallest; if none are collected, the group is rejected.

Check if expr is false for all rows. If none are collected, the result is true.

Collect a counter of the number of times expr was true and the number of times it was false. The resulting value will be an object with two fields: matched_count with the number of rows that satisfied the condition and not_matched_count with the number that failed the provided condition

Compute the sum of the resulting value from expr, which must be an integer or floating point number.

Take the inputs and put them into a tuple. The values must be ordered. Tuples have a defined number of elements, so there must be exactly the right number of items available. If there are count then a tuple of this length will be produced with all of the items in the input order. If the number of items is either too few or too many, an empty optional will be returned instead.

Depending on the situation, Skip and Limit can be used to trim the input appropriately.

Collect exactly one value; if none are collected, the group is rejected; if more than one are collected, the group is rejected. It is fine if the same value is collected multiple times.

Performs filtering before collector.

Performs multiple collections at once and converts the results into an object. This can be very useful to share a Where condition while collecting multiple pieces of information.

This allows using optional values in other collectors. For instance, suppose there is an optional number and the minimum is desired, one could write: ` Min x? `.

There are special behaviours for how to handle records with missing data:

Expressions

Shesmu has the following expressions, for lowest precedence to highest precedence.

Flow Control

Compares refexpr to every testexpr for equality and returns the matching valueexpr. If none match, returns altexpr. The altexpr and every valueexpr must have the same type. The refexpr and every testexpr must have the same type, but not necessarily the same type as the altexpr and valueexpr.

Evaluates testexpr and if true, returns trueexpr; if false, returns falseexpr. testexpr must be boolean and both trueexpr and falseexpr must have the same type.

Performs a conditional compilation. The tests are a comma-separated list of constant names or Function + function names. If all of these items are defined, trueexpr is used; otherwise, falseexpr is used. It’s important to note that, unlike If, this is a compile-time decision. Therefore, trueexpr and falseexpr don’t have to return the same type and the unused path can depend on constants and functions that are not defined.

This expression is intended for use with the simulator’s constants. This allows embedding logic like:

 Where IfDefined shesmu::simulator::run
   Then shesmu::simulator::run == run_name
   Else True

to allow the simulator to be used as a diagnostic tool.

Takes the elements in a dictionary, list, JSON blob, or optional and process them using the supplied modifications and then computes a result using the collector. The modifications and collectors are described below.

expr Type x Type Operation
[t] t Processes each item in the list.
` t ` t If the optional contains a value, process it; otherwise act like the empty list has been provided.
k -> v {k, v} Process each pair of items in a dictionary.
json json If the type is a JSON array, use the elements; if a JSON object use the values. Otherwise, acts as if the empty list.

Any scalar JSON value is treated as an empty collection.

Takes the properties in a JSON object and process them using the supplied modifications and then computes a result using the collector. The modifications and collectors are described below.

Any scalar or array JSON value is treated as an empty collection. var will be a tuple of the property name and value.

Iterates over the range of number from startexpr, inclusive, to endexpr, exclusive, and process them using the supplied modifications and then computes a result using the collector. The modifications and collectors are described below.

Takes the string expr and splits it into chunks delimited by regex and then processes them using the supplied modifications and then computes a result using the collector. The modifications and collectors are described below.

flags sets the behaviour of the regular expression. For details, see regular expression flags.

This takes two expressions, which must be lists containing tuples. The first elements of each tuple, which must be the same type, are matched up and then the tuples are joined and iterated over. The modifications and collectors are described below.

Since entries might not match, the non-index elements in the tuple are converted to optionals.

So, For {index, left, right} Zipping [ {"a", 1}, {"b", 2} ] With [ {"a", True} ]: will produce:

index left right
"a" `1` `True`
"b" `2` ` `

Allows separating the algebraic or optional value returned by refexpr and accessing its contents. An optional value is treated as an algebraic value with NONE and SOME{x} entries. A When branch can be provided for every possible algebraic type returned by refexpr. If all possible types are matched, the matching is exhaustive. If the matching is not exhaustive, the remaining cases can be handled via Else or Remainder. Else allows an expression to be used in all other cases, much like the Else in a Switch. Remainder provides access to the case being handled.

For example:

Function analysis_for_project(string project)
	  Switch project
		  When "a" Then CANCER {"hg38"}
		  When "b" Then CANCER {"hg19"}
		  When "c" Then VIRAL {"hpv", "hg19"}
		  When "d" Then VIRAL {"hpv", "hg19"}
    Else SEQUENCING_ONLY;

 Function reference_for_analysis(CANCER{string} | VIRAL{string, string} analysis)
    # Match is exhaustive, so no Else/Remainder
    Match analysis
      When CANCER{genome} Then genome
      When VIRAL{_, genome} Then genome;

...
  # Determine if this olive should run on this data; use Else to cover other cases
  Where Match analysis_for_project(project)
      When CANCER {_} Then True
      Else False
...

...
   Let
     project, sample,
     reference = OnlyIf
       # We remove the SEQUENCING_ONLY case and pass the other values to reference_for_analysis
       Match analysis_for_project(project)
         When SEQUENCING_ONLY Then ``
         Remainder (a) `reference_for_analysis(a)`
...

Two special pieces of syntax are allowed in When:

For details on algebraic values, see Algebraic Values without Algebra.

Rearrange the values in a tuple in ascending order. The types of the values must be homogenous and orderable. For example that Order {"b", "c", "a"} would result in {"a", "b", "c"}. While Order {"a", 1} is an error.

This can be useful to ensure ranges provided by user data are in order:

Begin
 {min, max} = Order {min_from_user, max_from_user};
 Return max - min;
End

JSON Conversion

Convert a value to or from JSON. If type is json, then the result from expr will be converted to JSON in the Shesmu-standard way. If type is any other type, then expr must be a json value and it will be converted from JSON to the matching Shesmu type. Since the conversion from JSON to Shesmu cannot be guaranteed, it will return an optional of type. To create a JSON null value, use ` ` As json.

Blocks

Creates local variables in name0 by evaluating expr1. These variables are then accessible in _expr1_ and so on. Finally expr is evaluated with all the defined names and its result is used. The names can use destructuring.

Tabulation

Performs an order-sensitive matched assignment. This language construct is useful for generating data the Vidarr retry types. Suppose a Vidarr job with two fields: memory and time, and the job should be run with an increasing amount of memory and time if it fails. This could be accomplished as follows:

Run vidarr::production::hpc::some_job With
  arguments =
     Begin
        {; memory, timeout} = Tabulate
             memory = 5Gi, 10Gi;
             timeout = 1hours, 6hours;
            End;
        Return {
            foo__memory = memory,
            foo__files = files,
            foo__timeout = timeout,
            foo__modules = "foo/1.2.3"
        };
     End,
   ...

This construct checks that all the names have the same number of values (two in this case) and turns each of them into an ordered dictionary with the keys as strings containing numbers (the format expected by Vidarr).

The names can be destructuring:

Run vidarr::production::hpc::some_job With
  arguments =
     Begin
        {; foo_memory, foo_timeout; bar_timeout} = Tabulate
             {foo_memory, foo_timeout} = {5Gi, 1hours}, {10Gi, 6hours};
             bar_timeout = (30mins, 90mins);
            End;
        Return {
            foo__memory = foo_memory,
            foo__files = files,
            foo__timeout = foo_timeout,
            foo__modules = "foo/1.2.3",
            foo__bar__timeout = bar_timeout
        };
     End,
   ...

Optional Coalescence

Computes an optional value using expr; if this value is empty, returns default. expr must be the optional version of expr.

For details on optional values, see the Mandatory Guide to Optional Values.

Logical Disjunction and Optional Merging

Logical short-circuiting or. If operands are boolean, the result is boolean.

If both are optionals of the matching type, if the first optional has a value, returns that optional; otherwise the second.

Logical Conjunction

Logical short-circuiting and. Both operands must be boolean and the result is boolean.

Comparison

Equality

Compare two types for equality. This is supported for all types. For tuples, the values in the tuples must be the same. For lists, the items must be the same, but the order is not considered.

Inequality

Compare two values for inequality. This is the logical complement to ==.

Ordering

Compare two values for order. This is only defined for integers and dates. For dates, the lesser value occurs temporally earlier.

Regular Expression

Check whether expr, which must be a string, matches the provided regular expression.

flags sets the behaviour of the regular expression. For details, see regular expression flags.

Check whether expr, which must be a string, does not match the provided regular expression.

flags sets the behaviour of the regular expression. For details, see regular expression flags.

Matches expr, which must be a string, against the provided regular expression and returns a tuple of the values of each capture group. Since individual capture groups may be missing, this returns an optional tuple of optional strings. If the outer optional is the missing value, then the regular expression failed to match. If any element of the tuple is the missing value, then that capture group did not match.

flags sets the behaviour of the regular expression. For details, see regular expression flags.

Disjunction

Addition

Adds two values.

Left Right Result Description
integer integer integer Summation
float integer float Summation
integer float float Summation
float float float Summation
date integer date Add seconds to date
path path path Resolve paths (concatenate, unless second path starts with /)
path string path Append component to path
string string string Concatenate two strings
string integer string Concatenate a string and an integer value by first converting it
string float string Concatenate a string and a floating-point value by first converting it
string date string Concatenate a string and a date value by first converting it
string path string Concatenate a string and a path value by first converting it
[x] [x] [x] Union of two lists (removing duplicates)
[x] x [x] Add item to list (removing duplicates)
{a1,, an} {b1,, bn} {a1,, an,b1,, bn} Concatenate two tuples
{fa1=a1, , fan=an} {fb1=b1,, fbn=bn} {fa1=a1,, fan=an,fb1=b1,, fbn=bn} Merge two objects (with no duplicate fields)

Subtraction

Subtracts two values.

Left Right Result Description
integer integer integer Difference
float integer float Difference
integer float float Difference
float float float Difference
date integer date Subtract seconds to date
date date int Difference in seconds
[x] [x] [x] Difference of two lists (first list without items from second)
[x] x [x] Remove item from list (if present)

Conjunction

Multiplication

Multiplies two values.

Left Right Result Description
integer integer integer Multiplication
float integer float Multiplication
integer float float Multiplication
float float float Multiplication
string integer string Repeats string a specified number of times

Division

Divides two values.

Left Right Result Description
integer integer integer Division
float integer float Division
integer float float Division
float float float Division

Modulus

Computes the remainder of diving the first value by the second, both of which must be integers.

Suffix Operators

List Membership

Determines if the expression needle is present in the haystack and returns the result as a boolean. needle may be any type, but haystack must be either a list of the same type or a dictionary with keys of the same type.

Optional Use

This must be used inside optional creation. Evaluates expr, which must have an optional type, and provides the inner (non-optional) value inside. If the expression has an empty optional, the entire optional creation will be the empty optional.

These may be nested for function calls on optional values. For example:

x = `foo(x?)? + 3`

For details on optional values, see the Mandatory Guide to Optional Values.

Unary Operators

Boolean Not

Compute the logical complement of the expression, which must be a boolean.

Integer Negation

Computes the arithmetic additive inverse of the expression, which must be an integer.

List Size

Counts the number of elements expr, which must be a list.

WDL Pair Conversion

WDL has a pair type, Pair[X, Y], which can be represented in Shesmu two ways: as a tuple, {X, Y}; or as an object, {left = X, right = Y}. The tuple form better matches how pairs are written in WDL, while the object better matches how pairs are encoded as JSON. This function converts between the two representations.

Optional Creation

Puts the value of expr in an optional. In expr, the ? suffix maybe used to apply changes to the entire optional.

For example:

Begin
  x = `3`;
  Return `x? * 2`;
End

In this example, the x? will get the value inside the variable x, which may be missing. If it is missing, the block will return an empty optional; otherwise it will return an optional containing the original value multiplied by 2.

Creates an optional that contains no value.

For details on optional values, see the Mandatory Guide to Optional Values.

Access Operators

Tuple and Dictionary Access

Extracts an element from a tuple (or integer-indexed map). n is an integer that specifies the zero-based index of the item in the tuple. The result type will be based on the type of that position in the tuple. If n is beyond the number of items in the tuple, an error occurs.

The expr can also be an optional of a tuple. If it is, the result will be an optional of the appropriate type.

Extracts the value from a dictionary. The resulting value will always be optional in case the key specified by indexexpr is missing.

The expr can also be an optional of a dictionary.

Named Tuple Access

Extracts a field from a named tuple or JSON object. field is the name of the field. The result type will be based on the type of that field in the named tuple or a JSON blob when accessing a JSON blob. If field is not in the named tuple, an error occurs. If field is not in the JSON blob (or applied to a scalar or array), the result is a JSON null value.

The expr can also be an optional of a named tuple or JSON object. If it is, the result will be an optional of the appropriate type.

Swizzled Named Tuple Access

Extracts multiple fields from a named tuple and constructs a tuple with the results. The result type will be based on the type of that field in the named tuple. If field is not in the named tuple, an error occurs.

The expr can also be an optional of a named tuple. If it is, the result will be an optional of the appropriate type.

Terminals

Action Name Literal

Get the name of the action being executed as a string?. In the case of Refill and Alert olives, this will be the missing optional value.

Algebraic Values

Shesmu supports creating algebraic values. The name of an algebraic type is a combination of uppercase letters, digits, and underscore. It must start with an uppercase letter and be at least two characters. Algebraic values come in three types: ones which contain no information (and work something like an enum in other languages), types that are associated with a sequence of values, much like a tuple, and ones which contained named fields, much like a named tuple/object. It is also possible to use a gang to create a tuple-like algebraic value.

For details on algebraic values, see Algebraic Values without Algebra.

Date Literal

Specifies a date and time. If the time is not specified, it is assumed to be midnight UTC.

Tuple Literal

Creates a new tuple with the elements as specified. The type of the tuple is determined based on the elements.

Instead of an expression to create a single element in a tuple, a ...expr can be used to insert all the elements in a tuple inline into the new tuple.

Named Tuple Literal

Creates a new named tuple with the fields as specified. The type of the named tuple is determined based on the elements.

Instead of field = expr, a ...expr can be used and this will copy all the elements in expr, which must be an object. If some fields are to be excluded, use the form: ...expr Without field1 field2

A field can also be created from a variable of the same name by placing the name after a ;. For example { a = 1; b } is short hand for { a = 1, b = b }. Named fields can be ommited if there are none (i.e., {; b, c} is the short hand for {b = b, c = c}). If a gang is used here, this will create all the members of the gang as fields.

Synthetic Tuple

Creates a new tuple with the elements as specified in the gang name.

List Literal

Creates a new list from the specified elements. All the expressions must be of the same type.

Dictionary Literal

Creates a new dictionary from the specified elements. All keys must be the same type and all values must be the same type. If duplicate keys are present, one will be selected arbitrarily.

Instead of keyexpr = valueexpr, a ...expr can be used and this will copy all the elements in expr, which must be a dictionary. If some entries are to be excluded or transformed, use a For ... Dict to preprocess the dictionary.

Path Literals

Paths are UNIX-like paths that can be manipulated. They may contain \' if necessary.

String Literal

Specified a new string literal. A string may contain the following special items in addition to text:

Sub-expression

A subexpression.

Integer Literal

An integer literal. Integer may be suffixed by one of the following multipliers:

Unit Multiplier
G 1000^3
Gi 1024^3
M 1000^2
Mi 1024^2
k 1000
ki 1024
mins 60
hours 3600
days 86400
weeks 604800

Boolean Literals

The boolean true and false values, respectively.

Source Location String

This creates a string containing the scripts source path, line, column, and hash. This is meant to help locate the originating olive in alerts and other output.

Function Call

Call a function. Functions are provided by external services to Shesmu and some are provided as tables of values.

Variables

The value of a variable. There are different kinds of variables in Shesmu:

Only stream variables may be used as discriminators in Group clauses.

List Modifiers

Distinct

Discards any duplicate items in the list.

Map

Replaces each item in the list with the value computed by expr. The values will be named x in the downstream operations.

Flatten

Performs nested iteration in the same was as For. The variable name available in the downstream operations is x. Additional list modification can also be applied. The additional operations inside the brackets can also see the outer variable.

Filter

Eliminates any item in the list where expr evaluates to false.

Limit

Truncates the list after the number of items specified by expr, which must return an integer. The list must already be sorted.

Skip

Discards the number of items specified by expr, which must return an integer, from the beginning of the list. The list must already be sorted.

Sort

Sorts the items in a list based on an integer or date returned by expr.

Reverse

Reverses the items in a list. The list must already be sorted.

Subsample

Perform sampling on items in a list based on the given subsamplers (the order matters). The list must already be sorted. For example: Subsample(Fixed 1, Squish 5) will first select the first item and then randomly select five more items in the rest of the list.

Subsamplers

Fixed

Select the first integer items in a sorted list.

FixedWithCondition

Select the first integer items in a sorted list while condition is evaluated to be true.

Squish

Randomly select integer items from a sorted list.

Collectors

Count

Returns the number of items in the list.

First Item

Returns the first expr in the list or an empty optional if no items are present.

Since this returns optional, it may be useful to chain with Default.

For details on optional values, see the Mandatory Guide to Optional Values.

Concatenate Strings

Creates a string from expr, which must return a string, for each item in the list separated by the value of delimexpr, which must also be a string.

List

Evaluates expr for every item and collects all the unique into a list.

Dictionary

Evaluates keyexpr and valueexpr for every item and collects all the results into a dictionary. Duplicate values are resolved arbitrarily.

Optima

Finds the minimum or maximum item in a list, based on the sortexpr, which must be an integer or date. If the list is empty, an empty optional is returned.

Since this returns optional, it may be useful to chain with Default.

For details on optional values, see the Mandatory Guide to Optional Values.

Item Matches

Checks whether none, all, or any (some) of the items in the list meet the condition specified in expr, which must return a Boolean.

Object Collector

Products an object. Each field in the object is made by sending the same items through individual collectors. Consider something like:

For x In xs: Where x > 5 { count = Count, sum = Sum x }

Partitioned Counter

Produces an object with two field: matched_count is the number of items for which expr was true, the not_matched_count is the number of items for which expr was false.

Reduce

Performs a reduction operation on all the items in the list. a is the accumulator, which will be returned, which is initially set to initialexpr. For every item, expr is evaluated with a set to the previously returned value.

Sum

Evaluates expr for every item and compute the sum of all the results. expr must return an integer or a floating-point number.

Table

This collects items into a table and formats that table as a string. This can be useful for creating HTML or Markdown tables for inserting into JIRA. The name, which must evaluate to a string, will be the name of the column, and value, which must also produce a string, will be the contents of that column for every item. The format determines how the text is laid out. It is an object with the following properties:

For a few common formats, this object would be defined as:

html = {
  data_start = "<tr><td>",
  data_separator = "</td><td>",
  data_end = "</td></tr>",
  header_start = "<tr><th>",
  header_separator = "</th><th>",
  header_end = "</th></tr>",
  header_underline = ``
}

markdown = {
  data_start = "|",
  data_separator = "|",
  data_end = "|",
  header_start = "|",
  header_separator = "|",
  header_end = "|",
  header_underline = `"|---"`
}

jira = {
  data_start = "|",
  data_separator = "|",
  data_end = "|",
  header_start = "||",
  header_separator = "||",
  header_end = "||",
  header_underline = ``
}

Univalued

Evaluates all expr for each item in the list and returns it if all are the same.

If they are different or there are no items, an empty optional is returned.

Since this returns optional, it may be useful to chain with Default.

For details on optional values, see the Mandatory Guide to Optional Values.

Types

There are a small number of types in the language, listed below. Each has syntax as it appears in the language and a descriptor that is used for machine-to-machine communication.

Name Syntax Descriptor
Integer integer i
Float float f
String string s
Boolean boolean b
Date date d
List [inner] ainner
Empty List [] A
Tuple {t1,t2,} t n t1 t2 Where n is the number of elements in the tuple.
Object {field1 = t1,field2 = t2,} o n field1$t1 field2$t2 Where n is the number of elements in the tuple.
Optional inner? qinner or Q
Path path p
JSON json j
Algebraic NAME u1NAME$t01
Algebraic NAME {t1, t2, } u1NAME$tn t1 t2 Where n is the number of elements in the tuple.
Algebraic NAME {field1 = t1,field2 = t2,} u1NAME$o n field1$t1 field2$t2 Where n is the number of elements in the tuple.

All the variables are already available as variable_type.

For details on optional values, see the Mandatory Guide to Optional Values. For details on algebraic values, see Algebraic Values without Algebra.

Provides the type of an argument to a function. The number is the zero-based index of the argument.

Provides the inner type of a list or optional.

Provides the type of variable from the input format format. Variables from the current input format selected with Input are also available as variable_type.

Provides the return type of function

Provides the type of an element in a tuple.

Provides the type of a field in an object.

Descriptors are a machine-friendly form Shesmu uses to communicate type information between systems. Most of this does not involve human interaction, but some plugin configuration files require type information in descriptor form. For JSON configuration files, there is a JSON-enhanced descriptor. Any string is treated as a normal descriptor, but composite types can be expanded to a more readable form:

{ "is": "optional", "inner": X } // X?
{ "is": "list", "inner": X } // [X]
{ "is": "dictionary", "key": K, "value": V } //  K -> V
{ "is": "object", "fields": { "f1": F1, "f2": F2 } } // { f1 = F1, f2 = F2 }
[ E1, E2 ] // {E1, E2}

Mixing the two representations is fine (e.g., ["qb", "s"] is equivalent to [{"optional", "inner": "b"}, "s"] or t2qbs).

Regular Expression Flags

Regular expressions can have modified behaviour. Any combination of the following flags can be used after a regular expression: