Configuration
This pipeline revolves around the ENVS
file to provide the necessary configuration items. This can easily be derived from the ENVS.example
file to a new file, then symbolically linked. Comments are available in ENVS.example
to assist you with the editing process.
cp ENVS.example ENVS.myconfig
ln -sf ENVS.myconfig ENVS
# Edit ENVS.myconfig to customise parameters for the pipeline
Please inspect this file when running the pipeline for the first time. In particular $FIRST_RUN
might prevent you from succesfully running the pipeline when set to false
on first run.
ENVS Parameters
The ENVS
file contains the parameters which control the pipeline. An example is provided in ENVS.example
:
Project Setup
These parameters are provided to set up the project's folders and environment:
- PIPELINE_DIRECTORY: The directory you are running the pipeline in.
- DATA_DIR: The directory ERA5 monthly data should be donwloaded to.
- OUTPUT_DIR: The directory ASLI calculation output should be written to.
- ASLI_VENV: The location of the project's virtual environment.
- CURRENT_DATE: Current date, no need to change.
- CURRENT_YEAR: Current year, no need to change.
Run Configuration
These parameters are provided to configure how the pipeline should run:
- EXPORT_ROCRATE: Should the pipeline generate an RO-Crate object. Should be one of true/false.
- NUM_CORES: Number of parallel jobs to run the pipeline with.
- FIRST_RUN: Is this the first run? Should be one of true/false. Setting this to true will prevent the pipeline from running verification checks against a file that does not exist yet.
File Export
These parameters are provided to determine file export:
- FILE_DESTINATION: What type of file storage should the ASLI calculation be export to? One of
OBJECT_STORAGE
,FILE_SYSTEM
orBOTH
- VALID_DESTINATIONS: A list of valid destinations (
OBJECT_STORAGE
,FILE_SYSTEM
,BOTH
). No need to change. - S3_BUCKET: If
FILE_DESTINATION
isOBJECT_STORAGE
orBOTH
, the S3 bucket endpoint to export ASLI calculation results to. - RSYNC_LOCATION: If
FILE_DESTINATION
isFILE_SYSTEM
orBOTH
, the file path to export ASLI calculation results to. - FILE_IDENTIFIER: A unique identifier for the ASLI calculation output file.
Data Request
These parameters control the data request that is submitted to the CDS API:
- START_YEAR: The first year to request.
- END_YEAR: The last year to request.
- DATA_ARGS_ERA5: The full request submitted to the CDS API (e.g.
"-s ${START_YEAR} -n ${CURRENT_YEAR}"
). Additional arguments can be provided.
Quality Control
These parameters control quality control values:
- SD_FROM_MEAN: Standard deviations from the mean, to check no values lie outwith
SD_FROM_MEAN
. - ACTCENPRES_BOUNDS_MIN: Minimum value (in hPA) we expect actual_central_pressure to be above.
- ACTCENPRES_BOUNDS_MAX: Maximum value (in hPA) we expect actual_central_pressure to be below.
Data Output
The pipeline allows data output to the JASMIN Object Store, a local file system, or both - depending on where you are running this pipeline and which output file formats you would like to use.
Data Output to JASMIN Object Store
The pipeline uses s3cmd
to interact with S3 compatible Object Storage. If you configure your data to be written out to the JASMIN Object Store, you will need to configure s3cmd
to access your object storage tenancy and bucket.
You will need to generate an access key, and store it in a ~/.s3cfg
file. Full instructions on how to generate an access key on JASMIN and an s3cfg file to use s3cmd
are in the JASMIN documentation.
Data Output to local file system
If you require data to be copied to a different location (e.g. the BAS SAN, for archival into the Polar Data Centre) you can configure this destination in ENVS
. This will then rsync
your output to that location.