Abstract Scalar Dataloader

class meshiphi.dataloaders.scalar.abstract_scalar.ScalarDataLoader(bounds, params)

Abstract class for all scalar Datasets.

__init__(bounds, params)

This is where large-scale operations are performed, such as importing data, downsampling, reprojecting, and renaming variables

  • bounds (Boundary) – Initial mesh boundary to limit scope of data ingest

  • params (dict) – Values needed by dataloader to initialise. Unique to each dataloader


Data stored by dataloader to use when called upon by the mesh. Must be saved in mercator projection (EPSG:4326), with coordinates names ‘lat’, ‘long’, and ‘time’ (if applicable).


pd.DataFrame or xr.Dataset


Name of scalar variable. Must be the column name if self.data is pd.DataFrame. Must be variable if self.data is xr.Dataset




ValueError – If no data lies within the parsed boundary


Set default values for all scalar dataloaders. This function should be overloaded to include any extra params for a specific dataloader


params (dict) – Dictionary containing attributes that are required for each dataloader.


Dictionary of attributes the dataloader will require, completed with default values if not provided in config.

Return type:


calculate_coverage(bounds, data=None)

Calculates percentage of boundary covered by dataset

  • bounds (Boundary) – Boundary being compared against

  • data (pd.DataFrame or xr.Dataset) – Dataset with ‘lat’ and ‘long’ coordinates. Extent calculated from min/max of these coordinates. Defaults to objects internal dataset.


Decimal fraction of boundary covered by the dataset

Return type:



Downsamples imported data to be more easily manipulated. Data size should be reduced by a factor of m*n, where (m,n) are the downsample_factors defined in the params. self.data can be pd.DataFrame or xr.Dataset


agg_type (str) – Method of aggregation to bin data by to downsample. Default is same method used for homogeneity condition.


Downsampled data

Return type:

xr.Dataset or pd.DataFrame


Retrieve name of data column (for pd.DataFrame), or variable (for xr.Dataset). Used for when data_name not defined in params.


Name of data column

Return type:



ValueError – If multiple possible data columns found, can’t retrieve data name

get_hom_condition(bounds, splitting_conds, data=None)

Retrieves homogeneity condition of data within boundary.

  • bounds (Boundary) – Boundary object with limits of datarange to analyse

  • splitting_conds (dict) –

    Containing the following keys:


    (float) The threshold at which data points of type ‘value’ within this CellBox are checked to be either above or below


    (float) The lowerbound of acceptable percentage of data_points of type value within this boundary that are above ‘threshold’


    (float) The upperbound of acceptable percentage of data_points of type value within this boundary that are above ‘threshold’


    (bool) If true, a cellbox will not be split by other splitting conditions if it is deemed homogeneous. default = False.


The homogeniety condtion returned is of the form:

’CLR’ = the proportion of data points within this cellbox over a given threshold is lower than the lowerbound

’HOM’ = the proportion of data points within this cellbox over a given threshold is higher than the upperbound

’MIN’ = the cellbox contains less than a minimum number of data points

’HET’ = the proportion of data points within this cellbox over a given threshold if between the upper and lower bound

Return type:


get_value(bounds, data=None, agg_type=None, skipna=True)

Retrieve aggregated value from within bounds

  • aggregation_type (str) – Method of aggregation of datapoints within bounds. Can be upper or lower case. Accepts ‘MIN’, ‘MAX’, ‘MEAN’, ‘MEDIAN’, ‘STD’, ‘COUNT’

  • bounds (Boundary) – Boundary object with limits of lat/long

  • skipna (bool) – Defines whether to propogate NaN’s or not Default = True (ignore’s NaN’s)


{variable (str): aggregated_value (float)} Aggregated value within bounds following aggregation_type

Return type:



ValueError – aggregation type not in list of available methods

abstract import_data(bounds)

User defined method for importing data from files, or even generating data from scratch


Coordinates and data being imported from file

if xr.Dataset,
  • Must have coordinates ‘lat’ and ‘long’

  • Must have single data variable

if pd.DataFrame,
  • Must have columns ‘lat’ and ‘long’

  • Must have single data column

Downsampling and reprojecting happen in __init__() method

Return type:

xr.Dataset or pd.DataFrame

reproject(in_proj='EPSG:4326', out_proj='EPSG:4326', x_col='lat', y_col='long')

Reprojects data using pyProj.Transformer self.data can be pd.DataFrame or xr.Dataset

  • in_proj (str) – Projection that the imported dataset is in Must be allowed by PyProj.CRS (Coordinate Reference System)

  • out_proj (str) – Projection required for final data output Must be allowed by PyProj.CRS (Coordinate Reference System) Shouldn’t change from default value (EPSG:4326)

  • x_col (str) – Name of coordinate column 1

  • y_col (str) – Name of coordinate column 2 x_col and y_col will be cast into lat and long by the reprojection


Reprojected data with ‘lat’, ‘long’ columns replacing ‘x_col’ and ‘y_col’

Return type:



Sets name of data column/data variable


name (str) – Name to replace currently stored name with


Data with variable name changed

Return type:

xr.Dataset or pd.DataFrame

trim_datapoints(bounds, data=None)

Trims datapoints from self.data within boundary defined by ‘bounds’. self.data can be pd.DataFrame or xr.Dataset


bounds (Boundary) – Limits of lat/long/time to select data from


Trimmed dataset in same format as self.data

Return type:

pd.DataFrame or xr.Dataset