🔧
Working with scientific data formats in GIS
  • 📔Intro notes
  • 😃"Good" HDF
  • 😁"Good" GRIB
  • 🧐"Not-so-good" NetCDF
  • ☕Reading Sentinel-3 data
Powered by GitBook
On this page
  • Data source
  • QGIS
  • Python + Xarray
  • Open dataset
  • Plot data variable
  • Clip data to AOI
  • Resample
  • Save data
  • View in QGIS

"Good" GRIB

GRIB format is widely used for weather reanalysis and forecast.

Previous"Good" HDFNext"Not-so-good" NetCDF

Last updated 2 years ago

GRIB files could contain multiple data layers, and working with them in GIS could be difficult. But tools like xarray could help pre-process them.

Data source

.

The downloaded GRIB file contains hourly Surface pressure and Sea surface temperature for the 13th-17th of March 2023.

QGIS

GRIB files are supported by QGIS and could be easily added to the map. But not all the metadata from the file is displayed properly.

The layer is georeferenced, but sub-dataset names are not read properly, and time in the metadata is read as a Unix time tag.

Python + Xarray

Xarray is a Python library for working with multidimensional data. It provides the capability of reading almost every multidimensional data format along with its metadata, as well as subsetting, resampling, and saving the data. It is convenient to use Jupyter Notebook GUI to study and work with datasets.

Open dataset

import xarray as xr

ds = xr.open_dataset('era5_sst_sp.grib')

It is very useful to study the dataset structure by printing it.

You can see, that dataset has 3 dimensions: time, latitude, and longitude. 2 data variables are listed: sst - sea surface temperature, and sp - surface pressure, they have the same dimensions: 120*721*1440. Every variable line could be expanded to view the details.

Plot data variable

As all the dimensions are labeled, it is enough to name the dimension and pass the value for data selection.

import matplotlib.pyplot as plt
import pandas as pd

To view a separate 1-hour layer

plt.imshow(ds.sel(time = pd.to_datetime('2023-03-15T07:00:00')).sst.values, cmap = 'Spectral')
plt.colorbar()
plt.show()

To view values for a specific point

plt.plot(ds.sel(latitude=-55 , longitude=360-40).time,
         ds.sel(latitude=-55 , longitude=360-40).sp)
plt.ylabel('Pa')
plt.grid()

Clip data to AOI

As it was shown before, subsetting by coordinates is very simple, as all the dimensions are labeled with physical values. To subset to South Georgia's extent we will use vector file.

import geopandas as gpd

extent = gpd.read_file('SouthGeorgiaExtent.shp')
extent.total_bounds
> [-42.0, -58.0, -32.0, -52.0]
sg_ds = ds.sel(latitude=slice(-52, -57) , longitude=slice(360-42, 360-32))

To convert temperature to Celsius

sg_ds['sst'] = sg_ds['sst'] - 273.15

Resample

sg_ds_daily = sg_ds.resample(time="D").mean()
sg_ds_monthly = sg_ds.resample(time="M").mean()

Save data

sg_ds_daily.to_netcdf('export_netcdf_daily.nc')
sg_ds_daily['sst'].to_netcdf('export_netcdf_daily_sst.nc')

View in QGIS

The output file is georeferenced, has variable name and readable time.

Converting NetCDF to GeoTIFF will be covered in the next section.

The full notebook download is below. To run it, , save the notebook to your working directory, and run jupyter notebook from this directory. Data for the notebook is not provided.

😁
ERA5 hourly data on single levels from 1940 to present
set up conda environment
381KB
ERA5_GRIB_exp.ipynb
QGIS doesn't show sub-datasets names
GRIB layer properties
dataset printout by xarray
Note, that longitude hase 0-360 values, not -180 +180
Plot of surface pressure for point (-50, -40)
Resulting dataset has dimensions 120*21*45
Resulting dataset has dimensions 5*21*45, as it was resampled tio daily means