How simple-action-pipeline works
simple-action-pipeline (sap) provides a generic command line tool for creating configuration based reproducible pipelines.
Behind the scenes sap uses some features of the workflow manager Jug to both create and then manage the pipeline.
Firstly, sap treats a single directory as a pipeline, containing all of the required configuration and code for the pipeline. This is referred to as the <pipeline-directory>
or <target-directory>
.
Secondly, sap builds the pipeline from pipeline.yaml
, application.yaml
and a scripts
subdirectory. Building the pipeline is done by invoking the 'build' action. Once built there are a number of other actions as detailed in the Usage section.
Minimum configuration
As a bare minimum the pipeline.yaml
and application.yaml
must contain the following as demonstrated below:
pipeline.yaml
pipeline:
name: pipeline_name
description: pipeline configuration file
env:
description: environment variables for the pipeline
create-env-file: true
env-filename: pipeline.env
variables:
- PIPELINE_DIRECTORY: "./"
- PIPELINE_MAXWORKERS: 1
application.yaml
application:
name: application_name
description: application configuration file
env:
description: environment variables for the application
create-env-file: true
env-filename: application.env
variables:
- PIPELINE_DIRECTORY: "./"
- SCRIPTS_DIRECTORY: "./scripts"
sequence:
description: sequence of actions for the application
sequence:
- script:
name: first_script.sh
depends: ''
As you can see, this pipeline runs just a single script which depends on no other scripts. The maximum number of workers is set to 1 (this is how many parallel scripts can run at any one time).
The PIPELINE_DIRECTORY
must be defined in both yaml files as these are checked during the build process. PIPELINE_MAXWORKERS
and SCRIPTS_DIRECTORY
must also be defined. Also, any scripts defined under the 'sequence' section must exist in the scripts directory for the build to succeed.
The workflow manager
The pipeline 'build' command creates or re-creates from the application.yaml
, pipeline.yaml
and scripts
directory, a python script that is used by the Jug parallelisation package. Inspection of this auto-generated python script shows how all the dependancies are set up.
sap invokes Jug with this python script for each WORKER
up to PIPELINE_MAXWORKERS
, creating one or more parallel processes that can complete multiple tasks whilst being monitored. This collection of python script, Jug and WORKERS
is referred to as the 'workflow-manager'.
Everything related to the workflow manager's operation is contained within the <pipeline-directory>/workflow-manager/
directory, which is created by the 'build' command.
Optimising PIPELINE_MAXWORKERS
This pipeline.yaml
file contains the PIPELINE_MAXWORKERS
definition. The workflow manager will attempt to allocate up to this many workers to the pipeline.
It can be important to think carefully about setting the maximum number of workers as described in the example below.
Example:
| You have 10 tasks the could all execute in parallel.
| You are using a platform that has 6 CPU threads.
* If you setMAXWORKERS
to2
the workflow manager will invoke 2 workers, meaning that the 2 CPU threads can complete all 10 tasks twice as quickly as if there was only 1 worker (i.e. 1 task done at a time).
* If you setMAXWORKERS
to10
the workflow manager will invoke 10 workers but because this is more than available CPU threads there will be a significant amount of CPU context switching to achieve the effect of 10 CPU threads running. This results in slower performance.
* If you setMAXWORKERS
to5
the workflow manager will invoke 5 workers, meaning that the 5 CPU threads can complete all 10 tasks five times as quickly as if there was only 1 worker (i.e. 1 task done at a time). This would also avoid CPU context switch and also leave 1 CPU thread free for the underlying platform.
Environment variables
If your pipeline relies upon constants held within environment variables, these can be pre-defined under the env:variables:
section of either yaml config file. sap will make sure these environment variables are available whenever the pipeline runs.
Task sequence
The sequence order and dependancies of the tasks (scripts) are defined under the sequence:
section of the application.yaml
config file.
Each task (script) in the sequence has a name:
and depends:
field. The name is the name of the script to be found in the scripts directory. The depends can be either a single script name or a list of script names if there are multiple dependancies. If a script has no dependancy then the depends:
field should contain an empty string ''
.
Currently shell scripts .sh
and python scripts .py
are the only supported task (script) names.
Further detail
For more detail on the inner workings of Jug or sap, please refer to the documentation for:
- Jug
- simple-action-pipeline repo