General Dataset Utilities#

Authors:

Date: 05/26/22

Overview#

This notebook demonstrates the use of general utility methods available in xcdat, including the reorientation of the longitude axis, centering of time coordinates using time bounds, and adding and getting bounds.

Notebook Setup#

Create an Anaconda environment for this notebook using the command below, then select the kernel in Jupyter.

conda create -n xcdat_notebook -c conda-forge python xarray netcdf4 xcdat xesmf matplotlib nc-time-axis jupyter
  • xesmf is required for horizontal regridding with xESMF

  • matplotlib is an optional dependency required for plotting with xarray

  • nc-time-axis is an optional dependency required for matplotlib to plot cftime coordinates

[1]:
import xcdat

Open a dataset#

Datasets can be opened and read using open_dataset() or open_mfdataset() (multi-file).

Related APIs:

[2]:
dataset_links = [
    "https://esgf-data2.llnl.gov/thredds/dodsC/user_pub_work/E3SM/1_0/amip_1850_aeroF/1deg_atm_60-30km_ocean/atmos/180x360/time-series/mon/ens2/v3/TS_187001_189412.nc",
    "https://esgf-data2.llnl.gov/thredds/dodsC/user_pub_work/E3SM/1_0/amip_1850_aeroF/1deg_atm_60-30km_ocean/atmos/180x360/time-series/mon/ens2/v3/TS_189501_191912.nc",
]

[3]:
# NOTE: Opening a multi-file dataset will result in data variables to be dask
# arrays.
ds = xcdat.open_mfdataset(dataset_links)
[4]:
ds
[4]:
<xarray.Dataset>
Dimensions:    (lat: 180, lon: 360, nbnd: 2, time: 600)
Coordinates:
  * lat        (lat) float64 -89.5 -88.5 -87.5 -86.5 ... 86.5 87.5 88.5 89.5
  * lon        (lon) float64 0.5 1.5 2.5 3.5 4.5 ... 356.5 357.5 358.5 359.5
  * time       (time) object 1870-02-01 00:00:00 ... 1920-01-01 00:00:00
Dimensions without coordinates: nbnd
Data variables:
    lat_bnds   (lat, nbnd) float64 dask.array<chunksize=(180, 2), meta=np.ndarray>
    lon_bnds   (lon, nbnd) float64 dask.array<chunksize=(360, 2), meta=np.ndarray>
    gw         (lat) float64 dask.array<chunksize=(180,), meta=np.ndarray>
    time_bnds  (time, nbnd) object dask.array<chunksize=(300, 2), meta=np.ndarray>
    area       (lat, lon) float64 dask.array<chunksize=(180, 360), meta=np.ndarray>
    TS         (time, lat, lon) float32 dask.array<chunksize=(300, 180, 360), meta=np.ndarray>
Attributes: (12/21)
    ne:                              30
    np:                              4
    Conventions:                     CF-1.0
    source:                          CAM
    case:                            20180622.DECKv1b_A2_1850aeroF.ne30_oEC.e...
    title:                           UNSET
    ...                              ...
    remap_script:                    ncremap
    remap_hostname:                  acme1
    remap_version:                   4.9.6
    map_file:                        /export/zender1/data/maps/map_ne30np4_to...
    input_file:                      /p/user_pub/e3sm/baldwin32/workshop/amip...
    DODS_EXTRA.Unlimited_Dimension:  time

Reorient the longitude axis#

Longitude can be represented from 0 to 360 E or as 180 W to 180 E. xcdat allows you to convert between these axes systems.

  • Related API: xcdat.swap_lon_axis()

  • Alternative solution: xcdat.open_mfdataset(dataset_links, lon_orient=(-180, 180))

[5]:
ds.lon
[5]:
<xarray.DataArray 'lon' (lon: 360)>
array([  0.5,   1.5,   2.5, ..., 357.5, 358.5, 359.5])
Coordinates:
  * lon      (lon) float64 0.5 1.5 2.5 3.5 4.5 ... 355.5 356.5 357.5 358.5 359.5
Attributes:
    long_name:      Longitude of Grid Cell Centers
    standard_name:  longitude
    units:          degrees_east
    axis:           X
    valid_min:      0.0
    valid_max:      360.0
    bounds:         lon_bnds
[6]:
ds2 = xcdat.swap_lon_axis(ds, to=(-180,180))
[7]:
ds2.lon
[7]:
<xarray.DataArray 'lon' (lon: 360)>
array([-179.5, -178.5, -177.5, ...,  177.5,  178.5,  179.5])
Coordinates:
  * lon      (lon) float64 -179.5 -178.5 -177.5 -176.5 ... 177.5 178.5 179.5
Attributes:
    long_name:      Longitude of Grid Cell Centers
    standard_name:  longitude
    units:          degrees_east
    axis:           X
    valid_min:      0.0
    valid_max:      360.0
    bounds:         lon_bnds

Center the time coordinates#

A given point of time often represents some time period (e.g., a monthly average). In this situation, data providers sometimes record the time as the beginning, middle, or end of the period. center_times() places the time coordinate in the center of the time interval (using time bounds to determine the center of the period).

  • Related API: xcdat.center_times()

  • Alternative solution: xcdat.open_mfdataset(dataset_links, center_times=True)

The time bounds used for centering time coordinates:

[8]:
# We access the values with .values because it is a dask array.
ds.time_bnds.values
[8]:
array([[cftime.DatetimeNoLeap(1870, 1, 1, 0, 0, 0, 0, has_year_zero=True),
        cftime.DatetimeNoLeap(1870, 2, 1, 0, 0, 0, 0, has_year_zero=True)],
       [cftime.DatetimeNoLeap(1870, 2, 1, 0, 0, 0, 0, has_year_zero=True),
        cftime.DatetimeNoLeap(1870, 3, 1, 0, 0, 0, 0, has_year_zero=True)],
       [cftime.DatetimeNoLeap(1870, 3, 1, 0, 0, 0, 0, has_year_zero=True),
        cftime.DatetimeNoLeap(1870, 4, 1, 0, 0, 0, 0, has_year_zero=True)],
       ...,
       [cftime.DatetimeNoLeap(1919, 10, 1, 0, 0, 0, 0, has_year_zero=True),
        cftime.DatetimeNoLeap(1919, 11, 1, 0, 0, 0, 0, has_year_zero=True)],
       [cftime.DatetimeNoLeap(1919, 11, 1, 0, 0, 0, 0, has_year_zero=True),
        cftime.DatetimeNoLeap(1919, 12, 1, 0, 0, 0, 0, has_year_zero=True)],
       [cftime.DatetimeNoLeap(1919, 12, 1, 0, 0, 0, 0, has_year_zero=True),
        cftime.DatetimeNoLeap(1920, 1, 1, 0, 0, 0, 0, has_year_zero=True)]],
      dtype=object)

Before centering time coordinates:

[9]:
ds.time
[9]:
<xarray.DataArray 'time' (time: 600)>
array([cftime.DatetimeNoLeap(1870, 2, 1, 0, 0, 0, 0, has_year_zero=True),
       cftime.DatetimeNoLeap(1870, 3, 1, 0, 0, 0, 0, has_year_zero=True),
       cftime.DatetimeNoLeap(1870, 4, 1, 0, 0, 0, 0, has_year_zero=True), ...,
       cftime.DatetimeNoLeap(1919, 11, 1, 0, 0, 0, 0, has_year_zero=True),
       cftime.DatetimeNoLeap(1919, 12, 1, 0, 0, 0, 0, has_year_zero=True),
       cftime.DatetimeNoLeap(1920, 1, 1, 0, 0, 0, 0, has_year_zero=True)],
      dtype=object)
Coordinates:
  * time     (time) object 1870-02-01 00:00:00 ... 1920-01-01 00:00:00
Attributes:
    long_name:     time
    bounds:        time_bnds
    cell_methods:  time: mean
[10]:
ds3 = xcdat.center_times(ds)

After centering time coordinates:

[11]:
ds3.time
[11]:
<xarray.DataArray 'time' (time: 600)>
array([cftime.DatetimeNoLeap(1870, 1, 16, 12, 0, 0, 0, has_year_zero=True),
       cftime.DatetimeNoLeap(1870, 2, 15, 0, 0, 0, 0, has_year_zero=True),
       cftime.DatetimeNoLeap(1870, 3, 16, 12, 0, 0, 0, has_year_zero=True),
       ...,
       cftime.DatetimeNoLeap(1919, 10, 16, 12, 0, 0, 0, has_year_zero=True),
       cftime.DatetimeNoLeap(1919, 11, 16, 0, 0, 0, 0, has_year_zero=True),
       cftime.DatetimeNoLeap(1919, 12, 16, 12, 0, 0, 0, has_year_zero=True)],
      dtype=object)
Coordinates:
  * time     (time) object 1870-01-16 12:00:00 ... 1919-12-16 12:00:00
Attributes:
    long_name:     time
    bounds:        time_bnds
    cell_methods:  time: mean

Add bounds#

Bounds are critical to many xcdat operations. For example, they are used in determining the weights in spatial or temporal averages and in regridding operations. add_bounds() will attempt to produce bounds if they do not exist in the original dataset.

  • Related API: xarray.Dataset.bounds.add_bounds()

  • Alternative solution: xcdat.open_mfdataset(dataset_links, add_bounds=True)

    • (Assuming the file doesn’t already have bounds for your desired axis/axes)

[12]:
# We are dropping the existing bounds to demonstrate adding bounds.
ds4 = ds.drop_vars("time_bnds")
[13]:
try:
    ds4.bounds.get_bounds("T")
except KeyError as e:
    print(e)
"Bounds were not found for the coordinate variable 'time'. They must be added (Dataset.bounds.add_bounds)."
[14]:
# A `width` kwarg can be specified, which is width of the bounds relative to
# the position of the nearest points. The default value is 0.5.
ds4 = ds4.bounds.add_bounds("T", width=0.5)
[15]:
ds4.bounds.get_bounds("T")
[15]:
<xarray.DataArray 'time_bnds' (time: 600, bnds: 2)>
array([[cftime.DatetimeNoLeap(1870, 1, 18, 0, 0, 0, 0, has_year_zero=True),
        cftime.DatetimeNoLeap(1870, 2, 15, 0, 0, 0, 0, has_year_zero=True)],
       [cftime.DatetimeNoLeap(1870, 2, 15, 0, 0, 0, 0, has_year_zero=True),
        cftime.DatetimeNoLeap(1870, 3, 16, 12, 0, 0, 0, has_year_zero=True)],
       [cftime.DatetimeNoLeap(1870, 3, 16, 12, 0, 0, 0, has_year_zero=True),
        cftime.DatetimeNoLeap(1870, 4, 16, 0, 0, 0, 0, has_year_zero=True)],
       ...,
       [cftime.DatetimeNoLeap(1919, 10, 16, 12, 0, 0, 0, has_year_zero=True),
        cftime.DatetimeNoLeap(1919, 11, 16, 0, 0, 0, 0, has_year_zero=True)],
       [cftime.DatetimeNoLeap(1919, 11, 16, 0, 0, 0, 0, has_year_zero=True),
        cftime.DatetimeNoLeap(1919, 12, 16, 12, 0, 0, 0, has_year_zero=True)],
       [cftime.DatetimeNoLeap(1919, 12, 16, 12, 0, 0, 0, has_year_zero=True),
        cftime.DatetimeNoLeap(1920, 1, 16, 12, 0, 0, 0, has_year_zero=True)]],
      dtype=object)
Coordinates:
  * time     (time) object 1870-02-01 00:00:00 ... 1920-01-01 00:00:00
Dimensions without coordinates: bnds
Attributes:
    xcdat_bounds:  True

Add missing bounds for all axes supported by xcdat (X, Y, T, Z)#

[16]:
# We drop the dataset axes bounds to demonstrate generating missing bounds.
ds5 = ds.drop_vars(["time_bnds", "lat_bnds", "lon_bnds"])
[17]:
ds5
[17]:
<xarray.Dataset>
Dimensions:  (lat: 180, lon: 360, time: 600)
Coordinates:
  * lat      (lat) float64 -89.5 -88.5 -87.5 -86.5 -85.5 ... 86.5 87.5 88.5 89.5
  * lon      (lon) float64 0.5 1.5 2.5 3.5 4.5 ... 355.5 356.5 357.5 358.5 359.5
  * time     (time) object 1870-02-01 00:00:00 ... 1920-01-01 00:00:00
Data variables:
    gw       (lat) float64 dask.array<chunksize=(180,), meta=np.ndarray>
    area     (lat, lon) float64 dask.array<chunksize=(180, 360), meta=np.ndarray>
    TS       (time, lat, lon) float32 dask.array<chunksize=(300, 180, 360), meta=np.ndarray>
Attributes: (12/21)
    ne:                              30
    np:                              4
    Conventions:                     CF-1.0
    source:                          CAM
    case:                            20180622.DECKv1b_A2_1850aeroF.ne30_oEC.e...
    title:                           UNSET
    ...                              ...
    remap_script:                    ncremap
    remap_hostname:                  acme1
    remap_version:                   4.9.6
    map_file:                        /export/zender1/data/maps/map_ne30np4_to...
    input_file:                      /p/user_pub/e3sm/baldwin32/workshop/amip...
    DODS_EXTRA.Unlimited_Dimension:  time
[18]:
ds5 = ds5.bounds.add_missing_bounds(width=0.5)
[19]:
ds5
[19]:
<xarray.Dataset>
Dimensions:    (lat: 180, lon: 360, time: 600, bnds: 2)
Coordinates:
  * lat        (lat) float64 -89.5 -88.5 -87.5 -86.5 ... 86.5 87.5 88.5 89.5
  * lon        (lon) float64 0.5 1.5 2.5 3.5 4.5 ... 356.5 357.5 358.5 359.5
  * time       (time) object 1870-02-01 00:00:00 ... 1920-01-01 00:00:00
Dimensions without coordinates: bnds
Data variables:
    gw         (lat) float64 dask.array<chunksize=(180,), meta=np.ndarray>
    area       (lat, lon) float64 dask.array<chunksize=(180, 360), meta=np.ndarray>
    TS         (time, lat, lon) float32 dask.array<chunksize=(300, 180, 360), meta=np.ndarray>
    lon_bnds   (lon, bnds) float64 0.0 1.0 1.0 2.0 ... 358.0 359.0 359.0 360.0
    lat_bnds   (lat, bnds) float64 -90.0 -89.0 -89.0 -88.0 ... 89.0 89.0 90.0
    time_bnds  (time, bnds) object 1870-01-18 00:00:00 ... 1920-01-16 12:00:00
Attributes: (12/21)
    ne:                              30
    np:                              4
    Conventions:                     CF-1.0
    source:                          CAM
    case:                            20180622.DECKv1b_A2_1850aeroF.ne30_oEC.e...
    title:                           UNSET
    ...                              ...
    remap_script:                    ncremap
    remap_hostname:                  acme1
    remap_version:                   4.9.6
    map_file:                        /export/zender1/data/maps/map_ne30np4_to...
    input_file:                      /p/user_pub/e3sm/baldwin32/workshop/amip...
    DODS_EXTRA.Unlimited_Dimension:  time

Get the dimension coordinates for an axis.#

In xarray, you can get a dimension coordinates by directly referencing its name (e.g., ds.lat). xcdat provides an alternative way to get dimension coordinates agnostically by simply passing the CF axis key to applicable APIs.

Helpful knowledge:

  • This API uses cf_xarray to interpret CF axis names and coordinate names in the xarray object attributes. Refer to Metadata Interpretation for more information.

Xarray documentation on coordinates (source):

  • There are two types of coordinates in xarray:

    • dimension coordinates are one dimensional coordinates with a name equal to their sole dimension (marked by * when printing a dataset or data array). They are used for label based indexing and alignment, like the index found on a pandas DataFrame or Series. Indeed, these “dimension” coordinates use a pandas.Index internally to store their values.

    • non-dimension coordinates are variables that contain coordinate data, but are not a dimension coordinate. They can be multidimensional (see Working with Multidimensional Coordinates), and there is no relationship between the name of a non-dimension coordinate and the name(s) of its dimension(s). Non-dimension coordinates can be useful for indexing or plotting; otherwise, xarray does not make any direct use of the values associated with them. They are not used for alignment or automatic indexing, nor are they required to match when doing arithmetic (see Coordinates).

  • Xarray’s terminology differs from the CF terminology, where the “dimension coordinates” are called “coordinate variables”, and the “non-dimension coordinates” are called “auxiliary coordinate variables” (see GH1295 for more details).

1. axis attr#

[20]:
ds.lat.attrs["axis"]
[20]:
'Y'

2. standard_name attr#

[21]:
ds.lat.attrs["standard_name"]
[21]:
'latitude'
[22]:
"lat" in ds.dims
[22]:
True
[24]:
xcdat.get_axis_coord(ds, axis="Y")
[24]:
<xarray.DataArray 'lat' (lat: 180)>
array([-89.5, -88.5, -87.5, -86.5, -85.5, -84.5, -83.5, -82.5, -81.5, -80.5,
       -79.5, -78.5, -77.5, -76.5, -75.5, -74.5, -73.5, -72.5, -71.5, -70.5,
       -69.5, -68.5, -67.5, -66.5, -65.5, -64.5, -63.5, -62.5, -61.5, -60.5,
       -59.5, -58.5, -57.5, -56.5, -55.5, -54.5, -53.5, -52.5, -51.5, -50.5,
       -49.5, -48.5, -47.5, -46.5, -45.5, -44.5, -43.5, -42.5, -41.5, -40.5,
       -39.5, -38.5, -37.5, -36.5, -35.5, -34.5, -33.5, -32.5, -31.5, -30.5,
       -29.5, -28.5, -27.5, -26.5, -25.5, -24.5, -23.5, -22.5, -21.5, -20.5,
       -19.5, -18.5, -17.5, -16.5, -15.5, -14.5, -13.5, -12.5, -11.5, -10.5,
        -9.5,  -8.5,  -7.5,  -6.5,  -5.5,  -4.5,  -3.5,  -2.5,  -1.5,  -0.5,
         0.5,   1.5,   2.5,   3.5,   4.5,   5.5,   6.5,   7.5,   8.5,   9.5,
        10.5,  11.5,  12.5,  13.5,  14.5,  15.5,  16.5,  17.5,  18.5,  19.5,
        20.5,  21.5,  22.5,  23.5,  24.5,  25.5,  26.5,  27.5,  28.5,  29.5,
        30.5,  31.5,  32.5,  33.5,  34.5,  35.5,  36.5,  37.5,  38.5,  39.5,
        40.5,  41.5,  42.5,  43.5,  44.5,  45.5,  46.5,  47.5,  48.5,  49.5,
        50.5,  51.5,  52.5,  53.5,  54.5,  55.5,  56.5,  57.5,  58.5,  59.5,
        60.5,  61.5,  62.5,  63.5,  64.5,  65.5,  66.5,  67.5,  68.5,  69.5,
        70.5,  71.5,  72.5,  73.5,  74.5,  75.5,  76.5,  77.5,  78.5,  79.5,
        80.5,  81.5,  82.5,  83.5,  84.5,  85.5,  86.5,  87.5,  88.5,  89.5])
Coordinates:
  * lat      (lat) float64 -89.5 -88.5 -87.5 -86.5 -85.5 ... 86.5 87.5 88.5 89.5
Attributes:
    long_name:      Latitude of Grid Cell Centers
    standard_name:  latitude
    units:          degrees_north
    axis:           Y
    valid_min:      -90.0
    valid_max:      90.0
    bounds:         lat_bnds