7.6. Adding New Dataloaders

7.6.1. Adding to the repository

Each dataloader is to be implemented as a separate object for the environmental mesh to interface with. The general workflow for creating a new dataloader is as follows:

  1. Choose an appropriate dataloader type (see Dataloader Types).

  2. Create a new file under meshiphi.DataLoaders/{dataloader-type} with an appropriate name.

  3. Create import_data() and (optionally) add_default_params() methods. Examples of how to do this are shown on the abstractScalar and abstractVector pages.

  4. Add a new entry to the dataloader factory object, within meshiphi.Dataloaders/Factory.py. Instructions on how to do so are shown in Dataloader Factory

After performing these actions, the dataloader should be ready to go. It is useful for debugging purposes to create the dataloader object from within meshiphi.Dataloaders/Factory.py (e.g. within if __name__=='__main__': ) and test its functionality before deploying it.

7.6.2. Adding within iPython Notebooks

If you do not wish to modify the repo to add a dataloader, you may add one into the mesh by calling the add_dataloader() method of MeshBuilder.

An example of how to do this is detailed below. Assuming you’re working out of a Jupyter notebook, the basic steps would be to

  1. Create a dataloader

    # Import the abstract dataloader as the base class
    from meshiphi.dataloaders.scalar.abstract_scalar import ScalarDataLoader
    
    # Set up dataloader in the same way as the existing dataloaders
    class MyDataLoader(ScalarDataLoader):
       # Only user defined function required
       def import_data(self, bounds):
          # Read in data
          if len(self.files) == 1:    data = xr.open_dataset(self.files[0])
          else:                       data = xr.open_mfdataset(self.files)
          # Trim data to boundary
          data = self.trim_datapoints(bounds, data=data)
    
          return data
    
  2. Create a dictionary of parameters to initialise the dataloader

    # Params formatted same way as dataloaders in config
    params = {
       'files': [
          'PATH_TO_FILE_1',
          'PATH_TO_FILE_2',
          ... # Populate with as many files as you need
       ],
       'data_name': 'my_data',
       'splitting_conditions':[
          {
          'my_data':{
             'threshold': 0.5,
             'upper_bound': 0.9,
             'lower_bound': 0.1
             }
          }
       ]
    }
    
  3. Initialise an Environmental Mesh

    import json
    from meshiphi.import MeshBuilder
    
    # Config to initialise mesh from
    with open('config.json', 'r') as fp:
       config = json.load(fp)
    
    # Build a mesh from the config
    mesh_builder = MeshBuilder(config)
    env_mesh = mesh_builder.build_environmental_mesh()
    
  4. Add dataloader to mesh

    # Set up bounds of data in dataloader
    from meshiphi.import Boundary
    bounds = Boundary.from_json(config)
    
    # Add dataloader to mesh builder and regenerate mesh
    modified_builder = mesh_builder.add_dataloader(MyDataLoader, params, bounds)
    modified_mesh = modified_builder.build_environmental_mesh()