shesmu

The Shesmu Demo Server

Since Shesmu is heavily dependent on its configuration, a blank Shesmu server is really pointless. To explore the operation of a Shesmu server, this demo server configuration contains real olives from OICR GSI’s production environment.

Getting is Running

To launch the demo server, run a Shesmu server with this directory as the configuration path. For the Docker instance:

docker run -p 8081:8081 \
  --mount type=bind,source=/this/repo/shesmu/demo,target=/srv/shesmu \
  oicrgsi/shesmu:latest

Or using the build directory:

mvn package -DskipTests=true && SHESMU_DATA=demo java \
  -cp shesmu-server/target/shesmu.jar:shesmu-pluginapi/target/shesmu-pluginapi.jar:$(ls plugin-*/target/shesmu-plugin-*.jar|tr '\n' :) \
  ca.on.oicr.gsi.shesmu.Server

And then the server will be available on [http://locallhost:8081/]

Understanding OICR’s Data and Pipeline

OICR GSI starts data analysis at the end of sequencing. Our LIMS system, MISO tracks sample preparation through to sequencing and monitors sequencing instruments and exports this information via a common LIMS interface we developed, Pinery.

For this demo, we have included a subset of data from Pinery in demo.pinery_ius-local which contains information about each barcode in an Illumina sequencing instrument plus a record to represent the lane itself with the special barcode NoIndex. We call each of these an individual unit of sequencing and you will see ius in various places.

This data will be ingested by vidarr-bcl2fastq3.shesmu which will run BCL2FASTQ for each sample IUS. We run mixed length barcodes frequently, so we have to a lot of complicated grouping and processing to handle mismatches between barcodes and bases masks. Additionally, MISO only provides some information about Illumina’s flowcell architecture, so Shesmu is responsible for determining which lanes are loaded by a single port.

If there are no reads for a particular IUS, we send a JIRA ticket to the lab to determine the source of the problem. This olive is in ticket-missingreads.shesmu.

Once a workflow runs, its output is captured by our file provenance system and those files are associated with the original LIMS metadata. That interface is called Cerberus and demo data is provided in demo.cerberus_fp-local. The remaining workflows each ingest files that are the output of another workflow, starting with BCL2FASTQ. At this point, we start making customer-specific decisions about how the data is processed. project_info.jsonconfig describes customer configuration that our systems need. We also need to make decisions based on the lab work and kit_info.jsonconfig holds information about different processing kits out lab uses and provides information for using them.

For demonstration, we have included vidarr-bwamem.shesmu which does alignment on the FASTQs generated by BCL2FASTQ and vidarr-bmpp.shesmu, which collects related BAMs for merging multiple BAMs from the same donor and performing co-cleaning on a per-project basis.

We have not provided the workflows for any of these; merely the minimal information Shesmu needs as a .fakeaction file. The full workflows are available on OICR GSI, but we have not included our workflow engine server and its configuration.

Exploring the Demo

When Shesmu first starts up, you might see errors that the scripts cannot be compiled. Since plugins have to discover and present actions and functions for use by the olives. The olive compiler will keep retrying to compile the scripts, so the server will settle after a few minutes.

On the Olives page, the installed olives will appear on the menu. The dashboard will show a summary of the number of actions and alerts associated with each olive. There is a view for each file and each olive in that file. For individual olives, there will be a data flow diagram. After Shesmu loads an olive, it will schedule it for execution. Before the olive has scheduled, the status will note that it hasn’t yet run and the dataflow diagram will be blank on the left-hand side. Once finished, the actions or alerts associated with an olive will be displayed and numbers, indicating the number of records associated with each clause in an olive.

There is an olive simulator available from the Tools menu. On the Olives dashboard, it is possible to preload the simulator with an existing olive script using the Edit in Simulator link.

On the Actions page, a similar view to the actions on the Olives page is shown, but the output of all olives are mixed together. It is possible to filter the displayed actions using the Add Filter button. Since the demo uses fake actions, all actions will be in a ZOMBIE state, but multiple states would be present in a server with real actions.

The actions display, on both the Olives and Actions pages, a variety of filters that can limit the actions shown. The current filter is saved in the URL, so the URL can be shared easily. On the olives page, it is also possible to save the current filter with a name. The search interface is limited in what it can display, but a more sophisticated text filter search is available by clicking Advanced. The overview also provides a way to filter the actions. Clicking on table headers or cells will filter to only include actions that match the label of the rows and/or columns. There may also be histograms, in which case, clicking and dragging a time range will filter for that range.