Checking Airborne Gravity data¶
Author: Alice Fremand (@almand)
Date: 12/11/2021
Aim¶
The goal of this tutorial is to easily check the airborne gravity data provided in XYZ format.
Virtual environment¶
For the code to run, it is important to install the correct dependancies and libraries. In particular the following libraries are crucial for the code to be run:
pandas module to check CSV and text data in python
geopandas module to check data geospatially in python
To set up the virtual environment with Conda:¶
>conda create -n aerogeophysics_env
>conda activate aerogeophysics_env
>conda config --env --add channels conda-forge
>conda config --env --set channel_priority strict
>conda install python=3 geopandas
To set up the virtual environment on UNIX:¶
Load your python module:
module load python/conda3
Then in the folder where you have your code, you need to launch:
python3 -m venv aerogeophysics_env
It will create a folder with all the environment for python. To activate the virtual environment you need to lauch it:
source aerogeophysics_env/bin/activate.csh
You need to make sure that [aerogeophysics_env] appears before your name on the machine. That means that you are using the virtual environment Then you need to upgrade pip which is the command that install the packages
python3 -m pip install --upgrade pip
And install the other libraries
python3 -m pip install geopandas
In this tutorial, the virtual environment is already set up. The list of the current libraries loaded is given in the list below.
pip list
Package Version
----------------------------- -------------------
alabaster 0.7.12
anyio 3.3.4
argon2-cffi 21.1.0
async-generator 1.10
attrs 21.2.0
Babel 2.9.1
backcall 0.2.0
backports.functools-lru-cache 1.6.4
beautifulsoup4 4.10.0
bleach 4.1.0
brotlipy 0.7.0
certifi 2021.10.8
cffi 1.15.0
cftime 1.5.1.1
chardet 4.0.0
charset-normalizer 2.0.0
click 7.1.2
click-completion 0.5.2
click-log 0.3.2
click-plugins 1.1.1
cligj 0.7.1
cloudpickle 1.6.0
colorama 0.4.4
cryptography 35.0.0
cycler 0.10.0
dataclasses 0.8
decorator 5.0.9
defusedxml 0.7.1
docutils 0.16
entrypoints 0.3
Fiona 1.8.18
future 0.18.2
GDAL 3.1.4
geopandas 0.9.0
gitdb 4.0.9
GitPython 3.1.24
greenlet 1.1.2
idna 3.1
imagesize 1.3.0
importlib-metadata 4.8.2
importlib-resources 3.3.1
ipykernel 5.5.5
ipython 7.23.1
ipython-genutils 0.2.0
ipywidgets 7.6.5
jedi 0.18.0
Jinja2 3.0.3
jsonschema 3.2.0
jupyter 1.0.0
jupyter-book 0.12.0
jupyter-cache 0.4.3
jupyter-client 6.1.12
jupyter-console 6.4.0
jupyter-core 4.7.1
jupyter-server 1.11.2
jupyter-server-mathjax 0.2.3
jupyter-sphinx 0.3.2
jupyterlab-pygments 0.1.2
jupyterlab-widgets 1.0.2
jupytext 1.11.5
kiwisolver 1.3.2
latexcodec 2.0.1
linkify-it-py 1.0.2
lxml 4.6.4
markdown-it-py 1.1.0
MarkupSafe 2.0.1
matplotlib 3.4.3
matplotlib-inline 0.1.2
mdit-py-plugins 0.2.8
mistune 0.8.4
munch 2.5.0
myst-nb 0.13.1
myst-parser 0.15.2
nbclient 0.5.5
nbconvert 6.2.0
nbdime 3.1.1
nbformat 5.1.3
nest-asyncio 1.5.1
netCDF4 1.5.7
notebook 6.4.5
numpy 1.20.3
obspy 1.2.2
olefile 0.46
packaging 21.2
pandas 1.2.4
pandocfilters 1.5.0
parso 0.8.2
pickleshare 0.7.5
Pillow 8.3.1Note: you may need to restart the kernel to use updated packages.
pip 21.1.2
prometheus-client 0.12.0
prompt-toolkit 3.0.18
pybtex 0.24.0
pybtex-docutils 1.0.1
pycparser 2.21
pydata-sphinx-theme 0.6.3
Pygments 2.9.0
pyOpenSSL 21.0.0
pyparsing 2.4.7
pyproj 3.1.0
PyQt5 5.12.3
PyQt5-sip 4.19.18
PyQtChart 5.12
PyQtWebEngine 5.12.1
pyrsistent 0.18.0
PySocks 1.7.1
python-dateutil 2.8.1
pytz 2021.1
pywin32 300
pywinpty 1.1.4
PyYAML 6.0
pyzmq 22.1.0
qtconsole 5.1.1
QtPy 1.11.2
requests 2.26.0
Rtree 0.9.7
scipy 1.6.3
Send2Trash 1.8.0
setuptools 49.6.0.post20210108
Shapely 1.7.1
shellingham 1.4.0
six 1.16.0
smmap 3.0.5
sniffio 1.2.0
snowballstemmer 2.1.0
soupsieve 2.3
Sphinx 4.3.0
sphinx-book-theme 0.1.6
sphinx-comments 0.0.3
sphinx-copybutton 0.4.0
sphinx-external-toc 0.2.3
sphinx-jupyterbook-latex 0.4.5
sphinx-multitoc-numbering 0.1.3
sphinx-panels 0.6.0
sphinx-thebe 0.0.10
sphinx-togglebutton 0.2.3
sphinxcontrib-applehelp 1.0.2
Load the relevant modules¶
import os
import glob
import pandas as pd
import numpy as np
import geopandas as gpd
import os
import matplotlib.pyplot as plt
#Specific module to plot the graph in the Jupyter Notebook
%matplotlib inline
sphinxcontrib-bibtex 2.2.1
sphinxcontrib-devhelp 1.0.2
sphinxcontrib-htmlhelp 2.0.0
sphinxcontrib-jsmath 1.0.1
sphinxcontrib-qthelp 1.0.3
sphinxcontrib-serializinghtml 1.1.5
spyder-kernels 1.10.2
SQLAlchemy 1.4.26
terminado 0.12.1
testpath 0.5.0
toml 0.10.2
tornado 6.1
traitlets 5.0.5
typing-extensions 3.10.0.2
uc-micro-py 1.0.1
urllib3 1.26.7
wcwidth 0.2.5
webencodings 0.5.1
websocket-client 0.57.0
wheel 0.36.2
widgetsnbextension 3.5.2
win-inet-pton 1.1.0
wincertstore 0.2
zipp 3.6.0
Checking the XYZ files¶
Example given for GRADES-IMAGE data.
Data available for download here: Jordan, T., Ferraccioli, F., Bell, R., Damaske, D., & Robinson, C. (2020). Antarctica’s Gamburtsev Province (AGAP) Project - Airborne gravity data (2007-2009) [Data set]. UK Polar Data Centre, Natural Environment Research Council, UK Research & Innovation. https://doi.org/10.5285/8E5F910B-11D6-4A9D-BDF7-175C9B98CFB8
Reading the XYZ data¶
aerograv_data = 'E:/UKPDC/JupyterExample/AGAP_BAS_Grav.XYZ'
The XYZ data are composed of a large header and empty rows at the top. To read the file, it is recommended to remove these empty rows. The goal is to keep the 5th row which corresponds to the header but remove all the other ten first rows, we can specify these rows like this:
skiprow= list(range(0,11))
skiprow.remove(5)
skiprow
[0, 1, 2, 3, 4, 6, 7, 8, 9, 10]
Then, we can use pandas
to read the data. The data are separated by a space, we can specify the separator by using the sep
parameter, we will use `skiprow
file = pd.read_csv(aerograv_data, skiprows=skiprow, sep= ' +', engine='python' )
file.head()
/ | Line_no | Flight_ID | Lon | Lat | x | y | Height_WGS1984 | Date | Time | ... | EotvosCor | LatCor | FaCor | HaccCor | Free_air | FAA_filt | FAA_clip | Level_cor | FAA_level | Fa_4600m | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0 | 3 | 166.668647 | -77.883648 | 3.357848e+06 | 983220.628748 | 862.8 | 2008/12/06 | * | 13097.7 | ... | 982988.29 | 267.530 | -10.88 | 4190.5 | -47.69 | -47.69 | * | * | * | NaN |
1 | 0 | 3 | 166.671337 | -77.883584 | 3.357861e+06 | 983158.404045 | 863.0 | 2008/12/06 | * | 13097.7 | ... | 982988.28 | 267.594 | -9.84 | 825.0 | -47.93 | -47.93 | * | * | * | NaN |
2 | 0 | 3 | 166.674028 | -77.883518 | 3.357873e+06 | 983096.166514 | 863.1 | 2008/12/06 | * | 13097.7 | ... | 982988.28 | 267.636 | 0.69 | -2295.0 | -48.17 | -48.17 | * | * | * | NaN |
3 | 0 | 3 | 166.676720 | -77.883450 | 3.357886e+06 | 983033.908201 | 863.2 | 2008/12/06 | * | 13097.7 | ... | 982988.28 | 267.659 | 10.87 | 34.9 | -48.41 | -48.41 | * | * | * | NaN |
4 | 0 | 3 | 166.679414 | -77.883381 | 3.357898e+06 | 982971.612286 | 863.1 | 2008/12/06 | * | 13097.7 | ... | 982988.28 | 267.649 | 23.72 | -1154.6 | -48.64 | -48.64 | * | * | * | NaN |
5 rows × 32 columns
As we can see, the first column is filled with 0, so we might want to remove it:
column_names = file.columns.tolist()
column_names.remove('/')
column_names.append('toDelete')
file.columns = column_names
file = file.drop(columns='toDelete')
file.head()
Line_no | Flight_ID | Lon | Lat | x | y | Height_WGS1984 | Date | Time | ST | ... | EotvosCor | LatCor | FaCor | HaccCor | Free_air | FAA_filt | FAA_clip | Level_cor | FAA_level | Fa_4600m | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0 | 3 | 166.668647 | -77.883648 | 3.357848e+06 | 983220.628748 | 862.8 | 2008/12/06 | * | 13097.7 | ... | 255.92 | 982988.29 | 267.530 | -10.88 | 4190.5 | -47.69 | -47.69 | * | * | * |
1 | 0 | 3 | 166.671337 | -77.883584 | 3.357861e+06 | 983158.404045 | 863.0 | 2008/12/06 | * | 13097.7 | ... | 256.08 | 982988.28 | 267.594 | -9.84 | 825.0 | -47.93 | -47.93 | * | * | * |
2 | 0 | 3 | 166.674028 | -77.883518 | 3.357873e+06 | 983096.166514 | 863.1 | 2008/12/06 | * | 13097.7 | ... | 256.27 | 982988.28 | 267.636 | 0.69 | -2295.0 | -48.17 | -48.17 | * | * | * |
3 | 0 | 3 | 166.676720 | -77.883450 | 3.357886e+06 | 983033.908201 | 863.2 | 2008/12/06 | * | 13097.7 | ... | 256.53 | 982988.28 | 267.659 | 10.87 | 34.9 | -48.41 | -48.41 | * | * | * |
4 | 0 | 3 | 166.679414 | -77.883381 | 3.357898e+06 | 982971.612286 | 863.1 | 2008/12/06 | * | 13097.7 | ... | 256.84 | 982988.28 | 267.649 | 23.72 | -1154.6 | -48.64 | -48.64 | * | * | * |
5 rows × 31 columns
As you can see, a number of values are given a star for non value data. In this analysis, we are only interested n the longitude, latitude and free air anomaly. We will thus select these specific parameters.
Selecting the variables¶
longitude = [variable for variable in file.columns.tolist() if (variable.startswith('Lon'))][0]
latitude = [variable for variable in file.columns.tolist() if (variable.startswith('Lat'))][0]
grav = file.columns.tolist()[-1]
file = file[[longitude,latitude,grav]]
file = file.replace('*', np.nan)
file = file.dropna()
file[longitude] = file[longitude].astype(float) #To make sure lat and Lon are float
file[latitude] = file[latitude].astype(float)
file[grav] = file[grav].astype(float)
file.head()
Lon | Lat | Fa_4600m | |
---|---|---|---|
3962 | 106.821538 | -89.023179 | -56.5 |
3963 | 106.828688 | -89.022523 | -56.4 |
3964 | 106.835703 | -89.021868 | -56.2 |
3965 | 106.842580 | -89.021212 | -56.1 |
3966 | 106.849316 | -89.020555 | -55.9 |
We can also get some specific parameters:
ID = aerograv_data.split('/')[-1].strip('.XYZ')
survey = aerograv_data.split('/')[3]
longitude = [variable for variable in file.columns.tolist() if (variable.startswith('Lon'))][0]
latitude = [variable for variable in file.columns.tolist() if (variable.startswith('Lat'))][0]
grav = file.columns.tolist()[-1]
print('''ID: %s
Survey: %s
Name of gravity variable: %s
Name of longitude variable: %s
Name of latitude variable: %s''' %(ID, survey, grav, longitude, latitude))
ID: AGAP_BAS_Grav
Survey: AGAP_BAS_Grav.XYZ
Name of gravity variable: Fa_4600m
Name of longitude variable: Lon
Name of latitude variable: Lat
Have a look at the data geospatially¶
We can easily convert the data to spatial objects using geopandas
. The geopandas
module is used to convert the data to a geodataframe. It will convert the latitude/longitude to points.
To do that, you will need to identify the specific header used for longitude and latitude in the CSV file.
gdf = gpd.GeoDataFrame(file, geometry=gpd.points_from_xy(file[longitude], file[latitude]))
We can check the conversion:
gdf.head()
Lon | Lat | Fa_4600m | geometry | |
---|---|---|---|---|
3962 | 106.821538 | -89.023179 | -56.5 | POINT (106.82154 -89.02318) |
3963 | 106.828688 | -89.022523 | -56.4 | POINT (106.82869 -89.02252) |
3964 | 106.835703 | -89.021868 | -56.2 | POINT (106.83570 -89.02187) |
3965 | 106.842580 | -89.021212 | -56.1 | POINT (106.84258 -89.02121) |
3966 | 106.849316 | -89.020555 | -55.9 | POINT (106.84932 -89.02056) |
Setting up the coordinate system¶
It is important to then set the coordinate system. Here the WGS84 coordinate system is used, it corresponds to the EPSG: 4326.
gdf = gdf.set_crs("EPSG:4326")
With geopandas, it is also possible to convert the data to another coordinate system and project it. You just need to know the EPSG ID of the output coordinate system. Here is how to convert the data to the Polar Antarctic stereographic geographic system (https://epsg.io/3031).
gdf = gdf.to_crs("EPSG:3031")
Plotting the data¶
plt.rcParams["figure.figsize"] = (100,100)
plt.rcParams.update({'font.size': 75})
fig, ax = plt.subplots(1, 1)
fig=plt.figure(figsize=(100,100), dpi= 100, facecolor='w', edgecolor='k')
gdf.plot(column=grav, ax=ax, legend=True, legend_kwds={'label': "Free Air anomaly (mGal)",'orientation': "horizontal"})
<AxesSubplot:>
<Figure size 10000x10000 with 0 Axes>
Calculate statistics about the data¶
Size of the dataset¶
size = round(os.path.getsize(aerograv_data)/1e+6)
print('size: %s MB' %size)
size: 315 MB
Gravity anomaly statistics¶
mean_grav=round(file[grav].mean())
max_grav=round(file[grav].max())
latlong_max_grav=(file[longitude][file[grav].idxmax()],file[latitude][file[grav].idxmax()])
min_grav=round(file[grav].min())
latlong_min_grav=(file[longitude][file[grav].idxmin()],file[latitude][file[grav].idxmin()])
print('''
Mean gravity anomaly: %s mGal
Max gravity anomaly: %s mGal
Longitude and latitude of maximum gravity anomaly: %s
Minimum gravity anomaly: %s mGal
Longitude/Latitude of minimum gravity anomaly: %s''' %(mean_grav, max_grav, latlong_max_grav, min_grav, latlong_min_grav))
Mean gravity anomaly: 3 mGal
Max gravity anomaly: 122 mGal
Longitude and latitude of maximum gravity anomaly: (79.039578, -79.205101)
Minimum gravity anomaly: -106 mGal
Longitude/Latitude of minimum gravity anomaly: (77.899772, -77.681907)
Calculate distance along the profile¶
file['distance'] = gdf.distance(gdf.shift(1))
distance=round(sum(file.distance[file.distance<15000])/1000)
print('Dstance along the profile: %s km' %distance)
Dstance along the profile: 36654 km
Calculate number of points¶
nb_points= len(file)
print('Number of points: %s' %nb_points)
Number of points: 576187
Looking at the gravity anomaly along the profile¶
file['distance'] = file['distance'].cumsum() #To have the cumulative sum of the distance
plt.scatter(file.distance, file[grav])
plt.xlabel('Distance (m)')
plt.ylabel('%s mGal' %grav)
plt.title('Free air anomaly along the profile')
Text(0.5, 1.0, 'Free air anomaly along the profile')