7.5.1.1. Abstract Look Up Table Dataloader
- class meshiphi.dataloaders.lut.abstract_lut.LutDataLoader(bounds, params)
Abstract class for all LookUp Table Datasets.
- __init__(bounds, params)
This is where large-scale operations are performed, such as importing data, downsampling, reprojecting, and renaming variables
- Parameters:
bounds (Boundary) – Initial mesh boundary to limit scope of data ingest
params (dict) – Values needed by dataloader to initialise. Unique to each dataloader
- self.data
Data stored by dataloader to use when called upon by the mesh. Must be saved in mercator projection (EPSG:4326), with columns ‘geometry’ and data_name.
- Type:
gpd.DataFrame
- self.data_name
Name of scalar variable. Must be the column name in the dataframe
- Type:
str
- Raises:
ValueError – If no data lies within the parsed boundary
- add_default_params(params)
Set default values for all LUT dataloaders. This function should be overloaded to include any extra params for a specific dataloader
- Parameters:
params (dict) – Dictionary containing attributes that are required for each dataloader.
- Returns:
Dictionary of attributes the dataloader will require, completed with default values if not provided in config.
- Return type:
(dict)
- calculate_coverage(bounds, data=None)
Calculates percentage of boundary covered by dataset
- Parameters:
bounds (Boundary) – Boundary being compared against
data (pd.DataFrame) – Dataset with shapely polygons in ‘geometry’ column Defaults to objects internal dataset.
- Returns:
Decimal fraction of boundary covered by the dataset
- Return type:
float
- downsample()
Downsampling not supported by LookUpTable Dataloader
- get_data_col_name()
Retrieve name of data column. Used for when data_name not defined in params.
- Returns:
Name of data column
- Return type:
str
- Raises:
AssertionError – If multiple possible data columns found, can’t retrieve data name
- get_hom_condition(bounds, splitting_conds, data=None)
Retrieves homogeneity condition of data within boundary.
- Parameters:
bounds (Boundary) – Boundary object with limits of datarange to analyse
splitting_conds (dict) –
Containing the following keys:
- ’boundary’:
(boolean) True if user wants to split when polygon boundary goes through bounds
- Returns:
The homogeniety condtion returned is of the form:
’CLR’ = the boundary is completely contained within the LUT regions, no need to split
’MIN’ = the boundary contains no LUT data, can’t split
’HET’ = the boundary contains an edge within the LUT data, should split
- Return type:
str
- get_value(bounds, agg_type=None, skipna=False, data=None)
Retrieve aggregated value from within bounds
- Parameters:
aggregation_type (str) – Method of aggregation of datapoints within bounds. Can be upper or lower case. Accepts ‘MIN’, ‘MAX’, ‘MEAN’, ‘MEDIAN’, ‘STD’, ‘COUNT’
bounds (Boundary) – Boundary object with limits of lat/long
skipna (bool) – Defines whether to propogate NaN’s or not Default = False (includes NaN’s)
- Returns:
{variable (str): aggregated_value (float)} Aggregated value within bounds following aggregation_type
- Return type:
dict
- Raises:
ValueError – aggregation type not in list of available methods
- abstract import_data(bounds)
User defined method for importing data from files, or even generating data from scratch
- Returns:
Coordinates and data being imported from file
- if pd.DataFrame,
Must have columns ‘geometry’ and data_name
Must have single data column
- Return type:
pd.DataFrame
- reproject()
Reprojection not supported by LookUpTable Dataloader
- set_data_col_name(new_name)
Sets name of data column/data variable
- Parameters:
name (str) – Name to replace currently stored name with
- Returns:
Data with variable name changed
- Return type:
pd.DataFrame
- trim_datapoints(bounds, data=None)
Trims datapoints from self.data within boundary defined by ‘bounds’. self.data can be pd.DataFrame or xr.Dataset
- Parameters:
bounds (Boundary) – Limits of lat/long/time to select data from
- Returns:
Trimmed dataset in same format as self.data
- Return type:
pd.DataFrame
- verify_data(data=None)
Verifies that all geometries read in are Polygons or MultiPolygons If MultiPolygon, then split out into multiple Polygons
- Parameters:
data (pd.DataFrame, optional) – DataFrame with at least columns ‘geometry’ and a variable. Defaults to dataloader’s data attribute.
- Raises:
ValueError – If read in a geometry that is not Polygon or MultiPolygon