General Dataset Utilities#

Authors:

Date: 05/26/22

Overview#

This notebook demonstrates the use of general utility methods available in xcdat, including the reorientation of the longitude axis, centering of time coordinates using time bounds, and adding and getting bounds.

[1]:
import xcdat

Open a dataset#

Datasets can be opened and read using open_dataset() or open_mfdataset() (multi-file).

Related APIs:

[2]:
dataset_links = [
    "https://esgf-data2.llnl.gov/thredds/dodsC/user_pub_work/E3SM/1_0/amip_1850_aeroF/1deg_atm_60-30km_ocean/atmos/180x360/time-series/mon/ens2/v3/TS_187001_189412.nc",
    "https://esgf-data2.llnl.gov/thredds/dodsC/user_pub_work/E3SM/1_0/amip_1850_aeroF/1deg_atm_60-30km_ocean/atmos/180x360/time-series/mon/ens2/v3/TS_189501_191912.nc",
]

[3]:
# NOTE: Opening a multi-file dataset will result in data variables to be dask
# arrays.
ds = xcdat.open_mfdataset(dataset_links)
[4]:
ds
[4]:
<xarray.Dataset>
Dimensions:    (lat: 180, lon: 360, nbnd: 2, time: 600)
Coordinates:
  * lat        (lat) float64 -89.5 -88.5 -87.5 -86.5 ... 86.5 87.5 88.5 89.5
  * lon        (lon) float64 0.5 1.5 2.5 3.5 4.5 ... 356.5 357.5 358.5 359.5
  * time       (time) object 1870-02-01 00:00:00 ... 1920-01-01 00:00:00
Dimensions without coordinates: nbnd
Data variables:
    lat_bnds   (lat, nbnd) float64 dask.array<chunksize=(180, 2), meta=np.ndarray>
    lon_bnds   (lon, nbnd) float64 dask.array<chunksize=(360, 2), meta=np.ndarray>
    gw         (lat) float64 dask.array<chunksize=(180,), meta=np.ndarray>
    time_bnds  (time, nbnd) object dask.array<chunksize=(300, 2), meta=np.ndarray>
    area       (lat, lon) float64 dask.array<chunksize=(180, 360), meta=np.ndarray>
    TS         (time, lat, lon) float32 dask.array<chunksize=(300, 180, 360), meta=np.ndarray>
Attributes: (12/21)
    ne:                              30
    np:                              4
    Conventions:                     CF-1.0
    source:                          CAM
    case:                            20180622.DECKv1b_A2_1850aeroF.ne30_oEC.e...
    title:                           UNSET
    ...                              ...
    remap_script:                    ncremap
    remap_hostname:                  acme1
    remap_version:                   4.9.6
    map_file:                        /export/zender1/data/maps/map_ne30np4_to...
    input_file:                      /p/user_pub/e3sm/baldwin32/workshop/amip...
    DODS_EXTRA.Unlimited_Dimension:  time

Reorient the longitude axis#

Longitude can be represented from 0 to 360 E or as 180 W to 180 E. xcdat allows you to convert between these axes systems.

  • Related API: xcdat.swap_lon_axis()

  • Alternative solution: xcdat.open_mfdataset(dataset_links, lon_orient=(-180, 180))

[5]:
ds.lon
[5]:
<xarray.DataArray 'lon' (lon: 360)>
array([  0.5,   1.5,   2.5, ..., 357.5, 358.5, 359.5])
Coordinates:
  * lon      (lon) float64 0.5 1.5 2.5 3.5 4.5 ... 355.5 356.5 357.5 358.5 359.5
Attributes:
    long_name:      Longitude of Grid Cell Centers
    standard_name:  longitude
    units:          degrees_east
    axis:           X
    valid_min:      0.0
    valid_max:      360.0
    bounds:         lon_bnds
[6]:
ds2 = xcdat.swap_lon_axis(ds, to=(-180,180))
[7]:
ds2.lon
[7]:
<xarray.DataArray 'lon' (lon: 360)>
array([-179.5, -178.5, -177.5, ...,  177.5,  178.5,  179.5])
Coordinates:
  * lon      (lon) float64 -179.5 -178.5 -177.5 -176.5 ... 177.5 178.5 179.5
Attributes:
    long_name:      Longitude of Grid Cell Centers
    standard_name:  longitude
    units:          degrees_east
    axis:           X
    valid_min:      0.0
    valid_max:      360.0
    bounds:         lon_bnds

Center the time coordinates#

A given point of time often represents some time period (e.g., a monthly average). In this situation, data providers sometimes record the time as the beginning, middle, or end of the period. center_times() places the time coordinate in the center of the time interval (using time bounds to determine the center of the period).

  • Related API: xcdat.center_times()

  • Alternative solution: xcdat.open_mfdataset(dataset_links, center_times=True)

The time bounds used for centering time coordinates:

[8]:
# We access the values with .values because it is a dask array.
ds.time_bnds.values
[8]:
array([[cftime.DatetimeNoLeap(1870, 1, 1, 0, 0, 0, 0, has_year_zero=True),
        cftime.DatetimeNoLeap(1870, 2, 1, 0, 0, 0, 0, has_year_zero=True)],
       [cftime.DatetimeNoLeap(1870, 2, 1, 0, 0, 0, 0, has_year_zero=True),
        cftime.DatetimeNoLeap(1870, 3, 1, 0, 0, 0, 0, has_year_zero=True)],
       [cftime.DatetimeNoLeap(1870, 3, 1, 0, 0, 0, 0, has_year_zero=True),
        cftime.DatetimeNoLeap(1870, 4, 1, 0, 0, 0, 0, has_year_zero=True)],
       ...,
       [cftime.DatetimeNoLeap(1919, 10, 1, 0, 0, 0, 0, has_year_zero=True),
        cftime.DatetimeNoLeap(1919, 11, 1, 0, 0, 0, 0, has_year_zero=True)],
       [cftime.DatetimeNoLeap(1919, 11, 1, 0, 0, 0, 0, has_year_zero=True),
        cftime.DatetimeNoLeap(1919, 12, 1, 0, 0, 0, 0, has_year_zero=True)],
       [cftime.DatetimeNoLeap(1919, 12, 1, 0, 0, 0, 0, has_year_zero=True),
        cftime.DatetimeNoLeap(1920, 1, 1, 0, 0, 0, 0, has_year_zero=True)]],
      dtype=object)

Before centering time coordinates:

[9]:
ds.time
[9]:
<xarray.DataArray 'time' (time: 600)>
array([cftime.DatetimeNoLeap(1870, 2, 1, 0, 0, 0, 0, has_year_zero=True),
       cftime.DatetimeNoLeap(1870, 3, 1, 0, 0, 0, 0, has_year_zero=True),
       cftime.DatetimeNoLeap(1870, 4, 1, 0, 0, 0, 0, has_year_zero=True), ...,
       cftime.DatetimeNoLeap(1919, 11, 1, 0, 0, 0, 0, has_year_zero=True),
       cftime.DatetimeNoLeap(1919, 12, 1, 0, 0, 0, 0, has_year_zero=True),
       cftime.DatetimeNoLeap(1920, 1, 1, 0, 0, 0, 0, has_year_zero=True)],
      dtype=object)
Coordinates:
  * time     (time) object 1870-02-01 00:00:00 ... 1920-01-01 00:00:00
Attributes:
    long_name:     time
    bounds:        time_bnds
    cell_methods:  time: mean
[10]:
ds3 = xcdat.center_times(ds)

After centering time coordinates:

[11]:
ds3.time
[11]:
<xarray.DataArray 'time' (time: 600)>
array([cftime.DatetimeNoLeap(1870, 1, 16, 12, 0, 0, 0, has_year_zero=True),
       cftime.DatetimeNoLeap(1870, 2, 15, 0, 0, 0, 0, has_year_zero=True),
       cftime.DatetimeNoLeap(1870, 3, 16, 12, 0, 0, 0, has_year_zero=True),
       ...,
       cftime.DatetimeNoLeap(1919, 10, 16, 12, 0, 0, 0, has_year_zero=True),
       cftime.DatetimeNoLeap(1919, 11, 16, 0, 0, 0, 0, has_year_zero=True),
       cftime.DatetimeNoLeap(1919, 12, 16, 12, 0, 0, 0, has_year_zero=True)],
      dtype=object)
Coordinates:
  * time     (time) object 1870-01-16 12:00:00 ... 1919-12-16 12:00:00
Attributes:
    long_name:     time
    bounds:        time_bnds
    cell_methods:  time: mean

Add bounds#

Bounds are critical to many xcdat operations. For example, they are used in determining the weights in spatial or temporal averages and in regridding operations. add_bounds() will attempt to produce bounds if they do not exist in the original dataset.

  • Related API: xarray.Dataset.bounds.add_bounds()

  • Alternative solution: xcdat.open_mfdataset(dataset_links, add_bounds=True)

    • (Assuming the file doesn’t already have bounds for your desired axis/axes)

[12]:
# We are dropping the existing bounds to demonstrate adding bounds.
ds4 = ds.drop_vars("time_bnds")
[13]:
try:
    ds4.bounds.get_bounds("T")
except KeyError as e:
    print(e)
"Bounds were not found for the coordinate variable 'time'. They must be added (Dataset.bounds.add_bounds)."
[14]:
# A `width` kwarg can be specified, which is width of the bounds relative to
# the position of the nearest points. The default value is 0.5.
ds4 = ds4.bounds.add_bounds("T", width=0.5)
[15]:
ds4.bounds.get_bounds("T")
[15]:
<xarray.DataArray 'time_bnds' (time: 600, bnds: 2)>
array([[cftime.DatetimeNoLeap(1870, 1, 18, 0, 0, 0, 0, has_year_zero=True),
        cftime.DatetimeNoLeap(1870, 2, 15, 0, 0, 0, 0, has_year_zero=True)],
       [cftime.DatetimeNoLeap(1870, 2, 15, 0, 0, 0, 0, has_year_zero=True),
        cftime.DatetimeNoLeap(1870, 3, 16, 12, 0, 0, 0, has_year_zero=True)],
       [cftime.DatetimeNoLeap(1870, 3, 16, 12, 0, 0, 0, has_year_zero=True),
        cftime.DatetimeNoLeap(1870, 4, 16, 0, 0, 0, 0, has_year_zero=True)],
       ...,
       [cftime.DatetimeNoLeap(1919, 10, 16, 12, 0, 0, 0, has_year_zero=True),
        cftime.DatetimeNoLeap(1919, 11, 16, 0, 0, 0, 0, has_year_zero=True)],
       [cftime.DatetimeNoLeap(1919, 11, 16, 0, 0, 0, 0, has_year_zero=True),
        cftime.DatetimeNoLeap(1919, 12, 16, 12, 0, 0, 0, has_year_zero=True)],
       [cftime.DatetimeNoLeap(1919, 12, 16, 12, 0, 0, 0, has_year_zero=True),
        cftime.DatetimeNoLeap(1920, 1, 16, 12, 0, 0, 0, has_year_zero=True)]],
      dtype=object)
Coordinates:
  * time     (time) object 1870-02-01 00:00:00 ... 1920-01-01 00:00:00
Dimensions without coordinates: bnds
Attributes:
    xcdat_bounds:  True

Add missing bounds for all axes supported by xcdat (X, Y, T, Z)#

[16]:
# We drop the dataset axes bounds to demonstrate generating missing bounds.
ds5 = ds.drop_vars(["time_bnds", "lat_bnds", "lon_bnds"])
[17]:
ds5
[17]:
<xarray.Dataset>
Dimensions:  (lat: 180, lon: 360, time: 600)
Coordinates:
  * lat      (lat) float64 -89.5 -88.5 -87.5 -86.5 -85.5 ... 86.5 87.5 88.5 89.5
  * lon      (lon) float64 0.5 1.5 2.5 3.5 4.5 ... 355.5 356.5 357.5 358.5 359.5
  * time     (time) object 1870-02-01 00:00:00 ... 1920-01-01 00:00:00
Data variables:
    gw       (lat) float64 dask.array<chunksize=(180,), meta=np.ndarray>
    area     (lat, lon) float64 dask.array<chunksize=(180, 360), meta=np.ndarray>
    TS       (time, lat, lon) float32 dask.array<chunksize=(300, 180, 360), meta=np.ndarray>
Attributes: (12/21)
    ne:                              30
    np:                              4
    Conventions:                     CF-1.0
    source:                          CAM
    case:                            20180622.DECKv1b_A2_1850aeroF.ne30_oEC.e...
    title:                           UNSET
    ...                              ...
    remap_script:                    ncremap
    remap_hostname:                  acme1
    remap_version:                   4.9.6
    map_file:                        /export/zender1/data/maps/map_ne30np4_to...
    input_file:                      /p/user_pub/e3sm/baldwin32/workshop/amip...
    DODS_EXTRA.Unlimited_Dimension:  time
[18]:
ds5 = ds5.bounds.add_missing_bounds(width=0.5)
[19]:
ds5
[19]:
<xarray.Dataset>
Dimensions:    (lat: 180, lon: 360, time: 600, bnds: 2)
Coordinates:
  * lat        (lat) float64 -89.5 -88.5 -87.5 -86.5 ... 86.5 87.5 88.5 89.5
  * lon        (lon) float64 0.5 1.5 2.5 3.5 4.5 ... 356.5 357.5 358.5 359.5
  * time       (time) object 1870-02-01 00:00:00 ... 1920-01-01 00:00:00
Dimensions without coordinates: bnds
Data variables:
    gw         (lat) float64 dask.array<chunksize=(180,), meta=np.ndarray>
    area       (lat, lon) float64 dask.array<chunksize=(180, 360), meta=np.ndarray>
    TS         (time, lat, lon) float32 dask.array<chunksize=(300, 180, 360), meta=np.ndarray>
    lon_bnds   (lon, bnds) float64 0.0 1.0 1.0 2.0 ... 358.0 359.0 359.0 360.0
    lat_bnds   (lat, bnds) float64 -90.0 -89.0 -89.0 -88.0 ... 89.0 89.0 90.0
    time_bnds  (time, bnds) object 1870-01-18 00:00:00 ... 1920-01-16 12:00:00
Attributes: (12/21)
    ne:                              30
    np:                              4
    Conventions:                     CF-1.0
    source:                          CAM
    case:                            20180622.DECKv1b_A2_1850aeroF.ne30_oEC.e...
    title:                           UNSET
    ...                              ...
    remap_script:                    ncremap
    remap_hostname:                  acme1
    remap_version:                   4.9.6
    map_file:                        /export/zender1/data/maps/map_ne30np4_to...
    input_file:                      /p/user_pub/e3sm/baldwin32/workshop/amip...
    DODS_EXTRA.Unlimited_Dimension:  time

Get an axis coordinate variable#

In xarray, you can get a coordinate variable by directly referencing its name (e.g., ds.lat).

xcdat provides an alternative way to get coordinate variable agnostically by simply passing the name of the axis to the related API.

This API requires that either the axis attr or standard_name attr is set, or the name of the dimension follows the valid short-hand convention (e.g., ‘lat’ for latitude).

1. axis attr#

[20]:
ds.lat.attrs["axis"]
[20]:
'Y'

2. standard_name attr#

[21]:
ds.lat.attrs["standard_name"]
[21]:
'latitude'

3. Name of the dimension#

Must be the short name (e.g., “lat” for latitude and “lon” for longitude)

[22]:
"lat" in ds.dims
[22]:
True

Requires at least at one of the three keys above to be set.

[24]:
xcdat.get_coord_var(ds, axis="Y")
[24]:
<xarray.DataArray 'lat' (lat: 180)>
array([-89.5, -88.5, -87.5, -86.5, -85.5, -84.5, -83.5, -82.5, -81.5, -80.5,
       -79.5, -78.5, -77.5, -76.5, -75.5, -74.5, -73.5, -72.5, -71.5, -70.5,
       -69.5, -68.5, -67.5, -66.5, -65.5, -64.5, -63.5, -62.5, -61.5, -60.5,
       -59.5, -58.5, -57.5, -56.5, -55.5, -54.5, -53.5, -52.5, -51.5, -50.5,
       -49.5, -48.5, -47.5, -46.5, -45.5, -44.5, -43.5, -42.5, -41.5, -40.5,
       -39.5, -38.5, -37.5, -36.5, -35.5, -34.5, -33.5, -32.5, -31.5, -30.5,
       -29.5, -28.5, -27.5, -26.5, -25.5, -24.5, -23.5, -22.5, -21.5, -20.5,
       -19.5, -18.5, -17.5, -16.5, -15.5, -14.5, -13.5, -12.5, -11.5, -10.5,
        -9.5,  -8.5,  -7.5,  -6.5,  -5.5,  -4.5,  -3.5,  -2.5,  -1.5,  -0.5,
         0.5,   1.5,   2.5,   3.5,   4.5,   5.5,   6.5,   7.5,   8.5,   9.5,
        10.5,  11.5,  12.5,  13.5,  14.5,  15.5,  16.5,  17.5,  18.5,  19.5,
        20.5,  21.5,  22.5,  23.5,  24.5,  25.5,  26.5,  27.5,  28.5,  29.5,
        30.5,  31.5,  32.5,  33.5,  34.5,  35.5,  36.5,  37.5,  38.5,  39.5,
        40.5,  41.5,  42.5,  43.5,  44.5,  45.5,  46.5,  47.5,  48.5,  49.5,
        50.5,  51.5,  52.5,  53.5,  54.5,  55.5,  56.5,  57.5,  58.5,  59.5,
        60.5,  61.5,  62.5,  63.5,  64.5,  65.5,  66.5,  67.5,  68.5,  69.5,
        70.5,  71.5,  72.5,  73.5,  74.5,  75.5,  76.5,  77.5,  78.5,  79.5,
        80.5,  81.5,  82.5,  83.5,  84.5,  85.5,  86.5,  87.5,  88.5,  89.5])
Coordinates:
  * lat      (lat) float64 -89.5 -88.5 -87.5 -86.5 -85.5 ... 86.5 87.5 88.5 89.5
Attributes:
    long_name:      Latitude of Grid Cell Centers
    standard_name:  latitude
    units:          degrees_north
    axis:           Y
    valid_min:      -90.0
    valid_max:      90.0
    bounds:         lat_bnds