# General Dataset Utilities

Authors:

- [Tom Vo](https://github.com/tomvothecoder/)
- [Stephen Po-Chedley](https://github.com/pochedls/)

Date: 05/26/22


## Overview

This notebook demonstrates the use of general utility methods available in `xcdat`, including
the reorientation of the longitude axis, centering of time coordinates using time bounds, and
adding and getting bounds.


## Notebook Setup

Create an Anaconda environment for this notebook using the command below, then select the
kernel in Jupyter.

```bash
conda create -n xcdat_notebook -c conda-forge python xarray netcdf4 xcdat xesmf matplotlib nc-time-axis jupyter
```

- `xesmf` is required for horizontal regridding with xESMF
- `matplotlib` is an optional dependency required for plotting with xarray
- `nc-time-axis` is an optional dependency required for `matplotlib` to plot `cftime` coordinates


In [1]:
import xcdat

## Open a dataset

Datasets can be opened and read using `open_dataset()` or `open_mfdataset()` (multi-file).

Related APIs:

- [xcdat.open_dataset()](../generated/xcdat.open_dataset.rst)
- [xcdat.open_mfdataset()](../generated/xcdat.open_mfdataset.rst)


In [2]:
dataset_links = [
    "https://esgf-data2.llnl.gov/thredds/dodsC/user_pub_work/E3SM/1_0/amip_1850_aeroF/1deg_atm_60-30km_ocean/atmos/180x360/time-series/mon/ens2/v3/TS_187001_189412.nc",
    "https://esgf-data2.llnl.gov/thredds/dodsC/user_pub_work/E3SM/1_0/amip_1850_aeroF/1deg_atm_60-30km_ocean/atmos/180x360/time-series/mon/ens2/v3/TS_189501_191912.nc",
]


In [3]:
# NOTE: Opening a multi-file dataset will result in data variables to be dask
# arrays.
ds = xcdat.open_mfdataset(dataset_links)

In [4]:
ds

Unnamed: 0,Array,Chunk
Bytes,2.81 kiB,2.81 kiB
Shape,"(180, 2)","(180, 2)"
Count,5 Tasks,1 Chunks
Type,float64,numpy.ndarray
"Array Chunk Bytes 2.81 kiB 2.81 kiB Shape (180, 2) (180, 2) Count 5 Tasks 1 Chunks Type float64 numpy.ndarray",2  180,

Unnamed: 0,Array,Chunk
Bytes,2.81 kiB,2.81 kiB
Shape,"(180, 2)","(180, 2)"
Count,5 Tasks,1 Chunks
Type,float64,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,5.62 kiB,5.62 kiB
Shape,"(360, 2)","(360, 2)"
Count,5 Tasks,1 Chunks
Type,float64,numpy.ndarray
"Array Chunk Bytes 5.62 kiB 5.62 kiB Shape (360, 2) (360, 2) Count 5 Tasks 1 Chunks Type float64 numpy.ndarray",2  360,

Unnamed: 0,Array,Chunk
Bytes,5.62 kiB,5.62 kiB
Shape,"(360, 2)","(360, 2)"
Count,5 Tasks,1 Chunks
Type,float64,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,1.41 kiB,1.41 kiB
Shape,"(180,)","(180,)"
Count,5 Tasks,1 Chunks
Type,float64,numpy.ndarray
"Array Chunk Bytes 1.41 kiB 1.41 kiB Shape (180,) (180,) Count 5 Tasks 1 Chunks Type float64 numpy.ndarray",180  1,

Unnamed: 0,Array,Chunk
Bytes,1.41 kiB,1.41 kiB
Shape,"(180,)","(180,)"
Count,5 Tasks,1 Chunks
Type,float64,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,9.38 kiB,4.69 kiB
Shape,"(600, 2)","(300, 2)"
Count,6 Tasks,2 Chunks
Type,object,numpy.ndarray
"Array Chunk Bytes 9.38 kiB 4.69 kiB Shape (600, 2) (300, 2) Count 6 Tasks 2 Chunks Type object numpy.ndarray",2  600,

Unnamed: 0,Array,Chunk
Bytes,9.38 kiB,4.69 kiB
Shape,"(600, 2)","(300, 2)"
Count,6 Tasks,2 Chunks
Type,object,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,506.25 kiB,506.25 kiB
Shape,"(180, 360)","(180, 360)"
Count,5 Tasks,1 Chunks
Type,float64,numpy.ndarray
"Array Chunk Bytes 506.25 kiB 506.25 kiB Shape (180, 360) (180, 360) Count 5 Tasks 1 Chunks Type float64 numpy.ndarray",360  180,

Unnamed: 0,Array,Chunk
Bytes,506.25 kiB,506.25 kiB
Shape,"(180, 360)","(180, 360)"
Count,5 Tasks,1 Chunks
Type,float64,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,148.32 MiB,74.16 MiB
Shape,"(600, 180, 360)","(300, 180, 360)"
Count,6 Tasks,2 Chunks
Type,float32,numpy.ndarray
"Array Chunk Bytes 148.32 MiB 74.16 MiB Shape (600, 180, 360) (300, 180, 360) Count 6 Tasks 2 Chunks Type float32 numpy.ndarray",360  180  600,

Unnamed: 0,Array,Chunk
Bytes,148.32 MiB,74.16 MiB
Shape,"(600, 180, 360)","(300, 180, 360)"
Count,6 Tasks,2 Chunks
Type,float32,numpy.ndarray


## Reorient the longitude axis

Longitude can be represented from 0 to 360 E or as 180 W to 180 E. `xcdat` allows you to convert between these axes systems.

- Related API: [xcdat.swap_lon_axis()](../generated/xcdat.swap_lon_axis.rst)
- Alternative solution: `xcdat.open_mfdataset(dataset_links, lon_orient=(-180, 180))`


In [5]:
ds.lon

In [6]:
ds2 = xcdat.swap_lon_axis(ds, to=(-180,180))

In [7]:
ds2.lon

## Center the time coordinates

A given point of time often represents some time period (e.g., a monthly average). In this situation, data providers sometimes record the time as the beginning, middle, or end of the period. `center_times()` places the time coordinate in the center of the time interval (using time bounds to determine the center of the period).

- Related API: [xcdat.center_times()](../generated/xcdat.center_times.rst)
- Alternative solution: `xcdat.open_mfdataset(dataset_links, center_times=True)`


The time bounds used for centering time coordinates:


In [8]:
# We access the values with .values because it is a dask array.
ds.time_bnds.values

array([[cftime.DatetimeNoLeap(1870, 1, 1, 0, 0, 0, 0, has_year_zero=True),
        cftime.DatetimeNoLeap(1870, 2, 1, 0, 0, 0, 0, has_year_zero=True)],
       [cftime.DatetimeNoLeap(1870, 2, 1, 0, 0, 0, 0, has_year_zero=True),
        cftime.DatetimeNoLeap(1870, 3, 1, 0, 0, 0, 0, has_year_zero=True)],
       [cftime.DatetimeNoLeap(1870, 3, 1, 0, 0, 0, 0, has_year_zero=True),
        cftime.DatetimeNoLeap(1870, 4, 1, 0, 0, 0, 0, has_year_zero=True)],
       ...,
       [cftime.DatetimeNoLeap(1919, 10, 1, 0, 0, 0, 0, has_year_zero=True),
        cftime.DatetimeNoLeap(1919, 11, 1, 0, 0, 0, 0, has_year_zero=True)],
       [cftime.DatetimeNoLeap(1919, 11, 1, 0, 0, 0, 0, has_year_zero=True),
        cftime.DatetimeNoLeap(1919, 12, 1, 0, 0, 0, 0, has_year_zero=True)],
       [cftime.DatetimeNoLeap(1919, 12, 1, 0, 0, 0, 0, has_year_zero=True),
        cftime.DatetimeNoLeap(1920, 1, 1, 0, 0, 0, 0, has_year_zero=True)]],
      dtype=object)

Before centering time coordinates:


In [9]:
ds.time

In [10]:
ds3 = xcdat.center_times(ds)

After centering time coordinates:


In [11]:
ds3.time

## Add bounds

Bounds are critical to many `xcdat` operations. For example, they are used in determining the weights in spatial or temporal averages and in regridding operations. `add_bounds()` will attempt to produce bounds if they do not exist in the original dataset.

- Related API: [xarray.Dataset.bounds.add_bounds()](../generated/xarray.Dataset.bounds.add_bounds.rst)
- Alternative solution: `xcdat.open_mfdataset(dataset_links, add_bounds=True)`
  - (Assuming the file doesn't already have bounds for your desired axis/axes)


In [12]:
# We are dropping the existing bounds to demonstrate adding bounds.
ds4 = ds.drop_vars("time_bnds")

In [13]:
try:
    ds4.bounds.get_bounds("T")
except KeyError as e:
    print(e)

"Bounds were not found for the coordinate variable 'time'. They must be added (Dataset.bounds.add_bounds)."


In [14]:
# A `width` kwarg can be specified, which is width of the bounds relative to
# the position of the nearest points. The default value is 0.5.
ds4 = ds4.bounds.add_bounds("T", width=0.5)

In [15]:
ds4.bounds.get_bounds("T")

## Add missing bounds for all axes supported by xcdat (X, Y, T, Z)

- Related API: [xarray.Dataset.bounds.add_missing_bounds()](../generated/xarray.Dataset.bounds.add_missing_bounds.rst)


In [16]:
# We drop the dataset axes bounds to demonstrate generating missing bounds.
ds5 = ds.drop_vars(["time_bnds", "lat_bnds", "lon_bnds"])

In [17]:
ds5

Unnamed: 0,Array,Chunk
Bytes,1.41 kiB,1.41 kiB
Shape,"(180,)","(180,)"
Count,5 Tasks,1 Chunks
Type,float64,numpy.ndarray
"Array Chunk Bytes 1.41 kiB 1.41 kiB Shape (180,) (180,) Count 5 Tasks 1 Chunks Type float64 numpy.ndarray",180  1,

Unnamed: 0,Array,Chunk
Bytes,1.41 kiB,1.41 kiB
Shape,"(180,)","(180,)"
Count,5 Tasks,1 Chunks
Type,float64,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,506.25 kiB,506.25 kiB
Shape,"(180, 360)","(180, 360)"
Count,5 Tasks,1 Chunks
Type,float64,numpy.ndarray
"Array Chunk Bytes 506.25 kiB 506.25 kiB Shape (180, 360) (180, 360) Count 5 Tasks 1 Chunks Type float64 numpy.ndarray",360  180,

Unnamed: 0,Array,Chunk
Bytes,506.25 kiB,506.25 kiB
Shape,"(180, 360)","(180, 360)"
Count,5 Tasks,1 Chunks
Type,float64,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,148.32 MiB,74.16 MiB
Shape,"(600, 180, 360)","(300, 180, 360)"
Count,6 Tasks,2 Chunks
Type,float32,numpy.ndarray
"Array Chunk Bytes 148.32 MiB 74.16 MiB Shape (600, 180, 360) (300, 180, 360) Count 6 Tasks 2 Chunks Type float32 numpy.ndarray",360  180  600,

Unnamed: 0,Array,Chunk
Bytes,148.32 MiB,74.16 MiB
Shape,"(600, 180, 360)","(300, 180, 360)"
Count,6 Tasks,2 Chunks
Type,float32,numpy.ndarray


In [18]:
ds5 = ds5.bounds.add_missing_bounds(width=0.5)

In [19]:
ds5

Unnamed: 0,Array,Chunk
Bytes,1.41 kiB,1.41 kiB
Shape,"(180,)","(180,)"
Count,5 Tasks,1 Chunks
Type,float64,numpy.ndarray
"Array Chunk Bytes 1.41 kiB 1.41 kiB Shape (180,) (180,) Count 5 Tasks 1 Chunks Type float64 numpy.ndarray",180  1,

Unnamed: 0,Array,Chunk
Bytes,1.41 kiB,1.41 kiB
Shape,"(180,)","(180,)"
Count,5 Tasks,1 Chunks
Type,float64,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,506.25 kiB,506.25 kiB
Shape,"(180, 360)","(180, 360)"
Count,5 Tasks,1 Chunks
Type,float64,numpy.ndarray
"Array Chunk Bytes 506.25 kiB 506.25 kiB Shape (180, 360) (180, 360) Count 5 Tasks 1 Chunks Type float64 numpy.ndarray",360  180,

Unnamed: 0,Array,Chunk
Bytes,506.25 kiB,506.25 kiB
Shape,"(180, 360)","(180, 360)"
Count,5 Tasks,1 Chunks
Type,float64,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,148.32 MiB,74.16 MiB
Shape,"(600, 180, 360)","(300, 180, 360)"
Count,6 Tasks,2 Chunks
Type,float32,numpy.ndarray
"Array Chunk Bytes 148.32 MiB 74.16 MiB Shape (600, 180, 360) (300, 180, 360) Count 6 Tasks 2 Chunks Type float32 numpy.ndarray",360  180  600,

Unnamed: 0,Array,Chunk
Bytes,148.32 MiB,74.16 MiB
Shape,"(600, 180, 360)","(300, 180, 360)"
Count,6 Tasks,2 Chunks
Type,float32,numpy.ndarray


## Get the dimension coordinates for an axis.

In `xarray`, you can get a dimension coordinates by directly referencing its name (e.g., `ds.lat`). `xcdat` provides an alternative way to get dimension coordinates agnostically by simply passing the CF axis key to applicable APIs.

- Related API: [xcdat.get_dim_coords()](../generated/xcdat.get_dim_coords.rst)

Helpful knowledge:

- This API uses `cf_xarray` to interpret CF axis names and coordinate names in the xarray object attributes. Refer to [Metadata Interpretation](../getting-started-guide/faqs.rst) for more information.

Xarray documentation on coordinates ([source](https://docs.xarray.dev/en/stable/user-guide/data-structures.html#coordinates)):

- There are two types of coordinates in xarray:

  - **dimension coordinates** are one dimensional coordinates with a name equal to their sole dimension (marked by \* when printing a dataset or data array). They are used for label based indexing and alignment, like the index found on a pandas DataFrame or Series. Indeed, these “dimension” coordinates use a pandas.Index internally to store their values.

  - **non-dimension coordinates** are variables that contain coordinate data, but are not a dimension coordinate. They can be multidimensional (see Working with Multidimensional Coordinates), and there is no relationship between the name of a non-dimension coordinate and the name(s) of its dimension(s). Non-dimension coordinates can be useful for indexing or plotting; otherwise, xarray does not make any direct use of the values associated with them. They are not used for alignment or automatic indexing, nor are they required to match when doing arithmetic (see Coordinates).

- Xarray’s terminology differs from the [CF terminology](https://cfconventions.org/cf-conventions/v1.6.0/cf-conventions.html#terminology), where the “dimension coordinates” are called “coordinate variables”, and the “non-dimension coordinates” are called “auxiliary coordinate variables” (see [GH1295](https://github.com/pydata/xarray/issues/1295) for more details).


### 1. `axis` attr


In [20]:
ds.lat.attrs["axis"]

'Y'

### 2. `standard_name` attr


In [21]:
ds.lat.attrs["standard_name"]

'latitude'

In [22]:
"lat" in ds.dims

True

In [24]:
xcdat.get_axis_coord(ds, axis="Y")