General Dataset Utilities#
Authors: Tom Vo & Stephen Po-Chedley
Updated: 03/14/25 [xcdat v0.8.0]
Overview#
This notebook demonstrates the use of general utility methods available in xcdat, including the reorientation of the longitude axis, centering of time coordinates using time bounds, and adding and getting bounds.
The data used in this example can be found in the xcdat-data repository.
Notebook Kernel Setup#
Users can install their own instance of xcdat and follow these examples using their own environment (e.g., with VS Code, Jupyter, Spyder, iPython) or enable xcdat with existing JupyterHub instances.
First, create the conda environment:
conda create -n xcdat_notebook -c conda-forge xcdat xesmf matplotlib ipython ipykernel cartopy nc-time-axis gsw-xarray jupyter pooch
Then install the kernel from the xcdat_notebook environment using ipykernel and name the kernel with the display name (e.g., xcdat_notebook):
python -m ipykernel install --user --name xcdat_notebook --display-name xcdat_notebook
Then to select the kernel xcdat_notebook in Jupyter to use this kernel.
[1]:
import xcdat as xc
/opt/miniconda3/envs/xcdat_notebook/lib/python3.13/site-packages/esmpy/interface/loadESMF.py:94: VersionWarning: ESMF installation version 8.8.0, ESMPy version 8.8.0b0
warnings.warn("ESMF installation version {}, ESMPy version {}".format(
Open a dataset#
Datasets can be opened and read using open_dataset() or open_mfdataset() (multi-file).
Related APIs: xcdat.open_dataset() & xcdat.open_mfdataset()
Let’s use the example dataset and save it to netCDF (.nc), then open it with xCDAT.
[2]:
ds = xc.tutorial.open_dataset("tas_amon_access")
ds.to_netcdf("tas_amon_access.nc")
[3]:
# NOTE: Opening a multi-file dataset will result in data variables to be dask
# arrays.
ds = xc.open_mfdataset("tas_amon_access.nc")
# print dataset
ds
[3]:
<xarray.Dataset> Size: 7MB
Dimensions: (time: 60, bnds: 2, lat: 145, lon: 192)
Coordinates:
* lat (lat) float64 1kB -90.0 -88.75 -87.5 -86.25 ... 87.5 88.75 90.0
* lon (lon) float64 2kB 0.0 1.875 3.75 5.625 ... 354.4 356.2 358.1
height float64 8B ...
* time (time) object 480B 1870-01-16 12:00:00 ... 1874-12-16 12:00:00
Dimensions without coordinates: bnds
Data variables:
time_bnds (time, bnds) object 960B dask.array<chunksize=(1, 2), meta=np.ndarray>
lat_bnds (lat, bnds) float64 2kB dask.array<chunksize=(145, 2), meta=np.ndarray>
lon_bnds (lon, bnds) float64 3kB dask.array<chunksize=(192, 2), meta=np.ndarray>
tas (time, lat, lon) float32 7MB dask.array<chunksize=(1, 145, 192), meta=np.ndarray>
Attributes: (12/48)
Conventions: CF-1.7 CMIP-6.2
activity_id: CMIP
branch_method: standard
branch_time_in_child: 0.0
branch_time_in_parent: 87658.0
creation_date: 2020-06-05T04:06:11Z
... ...
variant_label: r10i1p1f1
version: v20200605
license: CMIP6 model data produced by CSIRO is li...
cmor_version: 3.4.0
tracking_id: hdl:21.14100/af78ae5e-f3a6-4e99-8cfe-5f2...
DODS_EXTRA.Unlimited_Dimension: timeReorient the longitude axis#
Longitude can be represented from 0 to 360 E or as 180 W to 180 E. xcdat allows you to convert between these axes systems.
Related API: xcdat.swap_lon_axis()
Alternative solution:
xcdat.open_mfdataset(dataset_links, lon_orient=(-180, 180))
[4]:
ds.lon
[4]:
<xarray.DataArray 'lon' (lon: 192)> Size: 2kB
array([ 0. , 1.875, 3.75 , 5.625, 7.5 , 9.375, 11.25 , 13.125,
15. , 16.875, 18.75 , 20.625, 22.5 , 24.375, 26.25 , 28.125,
30. , 31.875, 33.75 , 35.625, 37.5 , 39.375, 41.25 , 43.125,
45. , 46.875, 48.75 , 50.625, 52.5 , 54.375, 56.25 , 58.125,
60. , 61.875, 63.75 , 65.625, 67.5 , 69.375, 71.25 , 73.125,
75. , 76.875, 78.75 , 80.625, 82.5 , 84.375, 86.25 , 88.125,
90. , 91.875, 93.75 , 95.625, 97.5 , 99.375, 101.25 , 103.125,
105. , 106.875, 108.75 , 110.625, 112.5 , 114.375, 116.25 , 118.125,
120. , 121.875, 123.75 , 125.625, 127.5 , 129.375, 131.25 , 133.125,
135. , 136.875, 138.75 , 140.625, 142.5 , 144.375, 146.25 , 148.125,
150. , 151.875, 153.75 , 155.625, 157.5 , 159.375, 161.25 , 163.125,
165. , 166.875, 168.75 , 170.625, 172.5 , 174.375, 176.25 , 178.125,
180. , 181.875, 183.75 , 185.625, 187.5 , 189.375, 191.25 , 193.125,
195. , 196.875, 198.75 , 200.625, 202.5 , 204.375, 206.25 , 208.125,
210. , 211.875, 213.75 , 215.625, 217.5 , 219.375, 221.25 , 223.125,
225. , 226.875, 228.75 , 230.625, 232.5 , 234.375, 236.25 , 238.125,
240. , 241.875, 243.75 , 245.625, 247.5 , 249.375, 251.25 , 253.125,
255. , 256.875, 258.75 , 260.625, 262.5 , 264.375, 266.25 , 268.125,
270. , 271.875, 273.75 , 275.625, 277.5 , 279.375, 281.25 , 283.125,
285. , 286.875, 288.75 , 290.625, 292.5 , 294.375, 296.25 , 298.125,
300. , 301.875, 303.75 , 305.625, 307.5 , 309.375, 311.25 , 313.125,
315. , 316.875, 318.75 , 320.625, 322.5 , 324.375, 326.25 , 328.125,
330. , 331.875, 333.75 , 335.625, 337.5 , 339.375, 341.25 , 343.125,
345. , 346.875, 348.75 , 350.625, 352.5 , 354.375, 356.25 , 358.125])
Coordinates:
* lon (lon) float64 2kB 0.0 1.875 3.75 5.625 ... 352.5 354.4 356.2 358.1
height float64 8B ...
Attributes:
bounds: lon_bnds
units: degrees_east
axis: X
long_name: Longitude
standard_name: longitude[5]:
ds2 = xc.swap_lon_axis(ds, to=(-180, 180))
[6]:
ds2.lon
[6]:
<xarray.DataArray 'lon' (lon: 192)> Size: 2kB
array([-180. , -178.125, -176.25 , -174.375, -172.5 , -170.625, -168.75 ,
-166.875, -165. , -163.125, -161.25 , -159.375, -157.5 , -155.625,
-153.75 , -151.875, -150. , -148.125, -146.25 , -144.375, -142.5 ,
-140.625, -138.75 , -136.875, -135. , -133.125, -131.25 , -129.375,
-127.5 , -125.625, -123.75 , -121.875, -120. , -118.125, -116.25 ,
-114.375, -112.5 , -110.625, -108.75 , -106.875, -105. , -103.125,
-101.25 , -99.375, -97.5 , -95.625, -93.75 , -91.875, -90. ,
-88.125, -86.25 , -84.375, -82.5 , -80.625, -78.75 , -76.875,
-75. , -73.125, -71.25 , -69.375, -67.5 , -65.625, -63.75 ,
-61.875, -60. , -58.125, -56.25 , -54.375, -52.5 , -50.625,
-48.75 , -46.875, -45. , -43.125, -41.25 , -39.375, -37.5 ,
-35.625, -33.75 , -31.875, -30. , -28.125, -26.25 , -24.375,
-22.5 , -20.625, -18.75 , -16.875, -15. , -13.125, -11.25 ,
-9.375, -7.5 , -5.625, -3.75 , -1.875, 0. , 1.875,
3.75 , 5.625, 7.5 , 9.375, 11.25 , 13.125, 15. ,
16.875, 18.75 , 20.625, 22.5 , 24.375, 26.25 , 28.125,
30. , 31.875, 33.75 , 35.625, 37.5 , 39.375, 41.25 ,
43.125, 45. , 46.875, 48.75 , 50.625, 52.5 , 54.375,
56.25 , 58.125, 60. , 61.875, 63.75 , 65.625, 67.5 ,
69.375, 71.25 , 73.125, 75. , 76.875, 78.75 , 80.625,
82.5 , 84.375, 86.25 , 88.125, 90. , 91.875, 93.75 ,
95.625, 97.5 , 99.375, 101.25 , 103.125, 105. , 106.875,
108.75 , 110.625, 112.5 , 114.375, 116.25 , 118.125, 120. ,
121.875, 123.75 , 125.625, 127.5 , 129.375, 131.25 , 133.125,
135. , 136.875, 138.75 , 140.625, 142.5 , 144.375, 146.25 ,
148.125, 150. , 151.875, 153.75 , 155.625, 157.5 , 159.375,
161.25 , 163.125, 165. , 166.875, 168.75 , 170.625, 172.5 ,
174.375, 176.25 , 178.125])
Coordinates:
height float64 8B ...
* lon (lon) float64 2kB -180.0 -178.1 -176.2 -174.4 ... 174.4 176.2 178.1
Attributes:
bounds: lon_bnds
units: degrees_east
axis: X
long_name: Longitude
standard_name: longitudeCenter the time coordinates#
A given point of time often represents some time period (e.g., a monthly average). In this situation, data providers sometimes record the time as the beginning, middle, or end of the period. center_times() places the time coordinate in the center of the time interval (using time bounds to determine the center of the period).
Related API: xcdat.center_times()
Alternative solution:
xcdat.open_mfdataset(dataset_links, center_times=True)
The time bounds used for centering time coordinates:
[7]:
# We access the values with .values because it is a dask array.
ds.time_bnds.values[0:10]
[7]:
array([[cftime.DatetimeProlepticGregorian(1870, 1, 1, 0, 0, 0, 0, has_year_zero=True),
cftime.DatetimeProlepticGregorian(1870, 2, 1, 0, 0, 0, 0, has_year_zero=True)],
[cftime.DatetimeProlepticGregorian(1870, 2, 1, 0, 0, 0, 0, has_year_zero=True),
cftime.DatetimeProlepticGregorian(1870, 3, 1, 0, 0, 0, 0, has_year_zero=True)],
[cftime.DatetimeProlepticGregorian(1870, 3, 1, 0, 0, 0, 0, has_year_zero=True),
cftime.DatetimeProlepticGregorian(1870, 4, 1, 0, 0, 0, 0, has_year_zero=True)],
[cftime.DatetimeProlepticGregorian(1870, 4, 1, 0, 0, 0, 0, has_year_zero=True),
cftime.DatetimeProlepticGregorian(1870, 5, 1, 0, 0, 0, 0, has_year_zero=True)],
[cftime.DatetimeProlepticGregorian(1870, 5, 1, 0, 0, 0, 0, has_year_zero=True),
cftime.DatetimeProlepticGregorian(1870, 6, 1, 0, 0, 0, 0, has_year_zero=True)],
[cftime.DatetimeProlepticGregorian(1870, 6, 1, 0, 0, 0, 0, has_year_zero=True),
cftime.DatetimeProlepticGregorian(1870, 7, 1, 0, 0, 0, 0, has_year_zero=True)],
[cftime.DatetimeProlepticGregorian(1870, 7, 1, 0, 0, 0, 0, has_year_zero=True),
cftime.DatetimeProlepticGregorian(1870, 8, 1, 0, 0, 0, 0, has_year_zero=True)],
[cftime.DatetimeProlepticGregorian(1870, 8, 1, 0, 0, 0, 0, has_year_zero=True),
cftime.DatetimeProlepticGregorian(1870, 9, 1, 0, 0, 0, 0, has_year_zero=True)],
[cftime.DatetimeProlepticGregorian(1870, 9, 1, 0, 0, 0, 0, has_year_zero=True),
cftime.DatetimeProlepticGregorian(1870, 10, 1, 0, 0, 0, 0, has_year_zero=True)],
[cftime.DatetimeProlepticGregorian(1870, 10, 1, 0, 0, 0, 0, has_year_zero=True),
cftime.DatetimeProlepticGregorian(1870, 11, 1, 0, 0, 0, 0, has_year_zero=True)]],
dtype=object)
Before centering time coordinates:
[8]:
ds.time[0:10]
[8]:
<xarray.DataArray 'time' (time: 10)> Size: 80B
array([cftime.DatetimeProlepticGregorian(1870, 1, 16, 12, 0, 0, 0, has_year_zero=True),
cftime.DatetimeProlepticGregorian(1870, 2, 15, 0, 0, 0, 0, has_year_zero=True),
cftime.DatetimeProlepticGregorian(1870, 3, 16, 12, 0, 0, 0, has_year_zero=True),
cftime.DatetimeProlepticGregorian(1870, 4, 16, 0, 0, 0, 0, has_year_zero=True),
cftime.DatetimeProlepticGregorian(1870, 5, 16, 12, 0, 0, 0, has_year_zero=True),
cftime.DatetimeProlepticGregorian(1870, 6, 16, 0, 0, 0, 0, has_year_zero=True),
cftime.DatetimeProlepticGregorian(1870, 7, 16, 12, 0, 0, 0, has_year_zero=True),
cftime.DatetimeProlepticGregorian(1870, 8, 16, 12, 0, 0, 0, has_year_zero=True),
cftime.DatetimeProlepticGregorian(1870, 9, 16, 0, 0, 0, 0, has_year_zero=True),
cftime.DatetimeProlepticGregorian(1870, 10, 16, 12, 0, 0, 0, has_year_zero=True)],
dtype=object)
Coordinates:
height float64 8B ...
* time (time) object 80B 1870-01-16 12:00:00 ... 1870-10-16 12:00:00
Attributes:
bounds: time_bnds
axis: T
long_name: time
standard_name: time
_ChunkSizes: 1Now center the time coordinates:
[9]:
ds3 = xc.center_times(ds)
After centering time coordinates:
[10]:
ds3.time[0:10]
[10]:
<xarray.DataArray 'time' (time: 10)> Size: 80B
array([cftime.DatetimeProlepticGregorian(1870, 1, 16, 12, 0, 0, 0, has_year_zero=True),
cftime.DatetimeProlepticGregorian(1870, 2, 15, 0, 0, 0, 0, has_year_zero=True),
cftime.DatetimeProlepticGregorian(1870, 3, 16, 12, 0, 0, 0, has_year_zero=True),
cftime.DatetimeProlepticGregorian(1870, 4, 16, 0, 0, 0, 0, has_year_zero=True),
cftime.DatetimeProlepticGregorian(1870, 5, 16, 12, 0, 0, 0, has_year_zero=True),
cftime.DatetimeProlepticGregorian(1870, 6, 16, 0, 0, 0, 0, has_year_zero=True),
cftime.DatetimeProlepticGregorian(1870, 7, 16, 12, 0, 0, 0, has_year_zero=True),
cftime.DatetimeProlepticGregorian(1870, 8, 16, 12, 0, 0, 0, has_year_zero=True),
cftime.DatetimeProlepticGregorian(1870, 9, 16, 0, 0, 0, 0, has_year_zero=True),
cftime.DatetimeProlepticGregorian(1870, 10, 16, 12, 0, 0, 0, has_year_zero=True)],
dtype=object)
Coordinates:
height float64 8B ...
* time (time) object 80B 1870-01-16 12:00:00 ... 1870-10-16 12:00:00
Attributes:
bounds: time_bnds
axis: T
long_name: time
standard_name: time
_ChunkSizes: 1Add bounds#
Bounds are critical to many xcdat operations. For example, they are used in determining the weights in spatial or temporal averages and in regridding operations. add_bounds() will attempt to produce bounds if they do not exist in the original dataset.
Related API: xarray.Dataset.bounds.add_bounds()
Alternative solution:
xcdat.open_mfdataset(dataset_links, add_bounds=["X", "Y", "T"])(Assuming the file doesn’t already have bounds for your desired axis/axes)
[11]:
# We are dropping the existing bounds to demonstrate adding bounds.
# we are starting with the dataset with centered time points
ds4 = ds3.drop_vars("time_bnds")
[12]:
try:
ds4.bounds.get_bounds("T")
except KeyError as e:
print(e)
"No variable named 'time_bnds'. Did you mean one of ('lat_bnds', 'time')?"
There are two options for adding time bounds. The midpoint method places bounds at the midpoints between time bounds and the frequency method creates bounds based on the time stamp of each time point and the frequency of the data. This is the midpoint method:
[13]:
# midpoint method
ds4 = ds4.bounds.add_time_bounds(method="midpoint")
# print results
ds4.bounds.get_bounds("T")
[13]:
<xarray.DataArray 'time_bnds' (time: 60, bnds: 2)> Size: 960B
array([[cftime.DatetimeProlepticGregorian(1870, 1, 1, 18, 0, 0, 0, has_year_zero=True),
cftime.DatetimeProlepticGregorian(1870, 1, 31, 6, 0, 0, 0, has_year_zero=True)],
[cftime.DatetimeProlepticGregorian(1870, 1, 31, 6, 0, 0, 0, has_year_zero=True),
cftime.DatetimeProlepticGregorian(1870, 3, 1, 18, 0, 0, 0, has_year_zero=True)],
[cftime.DatetimeProlepticGregorian(1870, 3, 1, 18, 0, 0, 0, has_year_zero=True),
cftime.DatetimeProlepticGregorian(1870, 3, 31, 18, 0, 0, 0, has_year_zero=True)],
[cftime.DatetimeProlepticGregorian(1870, 3, 31, 18, 0, 0, 0, has_year_zero=True),
cftime.DatetimeProlepticGregorian(1870, 5, 1, 6, 0, 0, 0, has_year_zero=True)],
[cftime.DatetimeProlepticGregorian(1870, 5, 1, 6, 0, 0, 0, has_year_zero=True),
cftime.DatetimeProlepticGregorian(1870, 5, 31, 18, 0, 0, 0, has_year_zero=True)],
[cftime.DatetimeProlepticGregorian(1870, 5, 31, 18, 0, 0, 0, has_year_zero=True),
cftime.DatetimeProlepticGregorian(1870, 7, 1, 6, 0, 0, 0, has_year_zero=True)],
[cftime.DatetimeProlepticGregorian(1870, 7, 1, 6, 0, 0, 0, has_year_zero=True),
cftime.DatetimeProlepticGregorian(1870, 8, 1, 0, 0, 0, 0, has_year_zero=True)],
[cftime.DatetimeProlepticGregorian(1870, 8, 1, 0, 0, 0, 0, has_year_zero=True),
cftime.DatetimeProlepticGregorian(1870, 8, 31, 18, 0, 0, 0, has_year_zero=True)],
[cftime.DatetimeProlepticGregorian(1870, 8, 31, 18, 0, 0, 0, has_year_zero=True),
cftime.DatetimeProlepticGregorian(1870, 10, 1, 6, 0, 0, 0, has_year_zero=True)],
[cftime.DatetimeProlepticGregorian(1870, 10, 1, 6, 0, 0, 0, has_year_zero=True),
cftime.DatetimeProlepticGregorian(1870, 10, 31, 18, 0, 0, 0, has_year_zero=True)],
...
cftime.DatetimeProlepticGregorian(1874, 3, 31, 18, 0, 0, 0, has_year_zero=True)],
[cftime.DatetimeProlepticGregorian(1874, 3, 31, 18, 0, 0, 0, has_year_zero=True),
cftime.DatetimeProlepticGregorian(1874, 5, 1, 6, 0, 0, 0, has_year_zero=True)],
[cftime.DatetimeProlepticGregorian(1874, 5, 1, 6, 0, 0, 0, has_year_zero=True),
cftime.DatetimeProlepticGregorian(1874, 5, 31, 18, 0, 0, 0, has_year_zero=True)],
[cftime.DatetimeProlepticGregorian(1874, 5, 31, 18, 0, 0, 0, has_year_zero=True),
cftime.DatetimeProlepticGregorian(1874, 7, 1, 6, 0, 0, 0, has_year_zero=True)],
[cftime.DatetimeProlepticGregorian(1874, 7, 1, 6, 0, 0, 0, has_year_zero=True),
cftime.DatetimeProlepticGregorian(1874, 8, 1, 0, 0, 0, 0, has_year_zero=True)],
[cftime.DatetimeProlepticGregorian(1874, 8, 1, 0, 0, 0, 0, has_year_zero=True),
cftime.DatetimeProlepticGregorian(1874, 8, 31, 18, 0, 0, 0, has_year_zero=True)],
[cftime.DatetimeProlepticGregorian(1874, 8, 31, 18, 0, 0, 0, has_year_zero=True),
cftime.DatetimeProlepticGregorian(1874, 10, 1, 6, 0, 0, 0, has_year_zero=True)],
[cftime.DatetimeProlepticGregorian(1874, 10, 1, 6, 0, 0, 0, has_year_zero=True),
cftime.DatetimeProlepticGregorian(1874, 10, 31, 18, 0, 0, 0, has_year_zero=True)],
[cftime.DatetimeProlepticGregorian(1874, 10, 31, 18, 0, 0, 0, has_year_zero=True),
cftime.DatetimeProlepticGregorian(1874, 12, 1, 6, 0, 0, 0, has_year_zero=True)],
[cftime.DatetimeProlepticGregorian(1874, 12, 1, 6, 0, 0, 0, has_year_zero=True),
cftime.DatetimeProlepticGregorian(1874, 12, 31, 18, 0, 0, 0, has_year_zero=True)]],
dtype=object)
Coordinates:
height float64 8B ...
* time (time) object 480B 1870-01-16 12:00:00 ... 1874-12-16 12:00:00
Dimensions without coordinates: bnds
Attributes:
xcdat_bounds: TrueNotice that the midpoint method does not place the bounds between the last moment of month n and the first moment of month n+1. The frequency method was meant to try to infer the correct bounds by taking into account the time stamps and the frequency of the data. The frequency method (below) is what is used when add_bounds=["T"] is specified in open_dataset or open_mfdataset.
[14]:
# drop time bounds again
ds5 = ds4.drop_vars("time_bnds")
# timestamp / frequency method
ds5 = ds5.bounds.add_time_bounds(method="freq")
# print results
ds5.bounds.get_bounds("T")
[14]:
<xarray.DataArray 'time_bnds' (time: 60, bnds: 2)> Size: 960B
array([[cftime.DatetimeProlepticGregorian(1870, 1, 1, 0, 0, 0, 0, has_year_zero=True),
cftime.DatetimeProlepticGregorian(1870, 2, 1, 0, 0, 0, 0, has_year_zero=True)],
[cftime.DatetimeProlepticGregorian(1870, 2, 1, 0, 0, 0, 0, has_year_zero=True),
cftime.DatetimeProlepticGregorian(1870, 3, 1, 0, 0, 0, 0, has_year_zero=True)],
[cftime.DatetimeProlepticGregorian(1870, 3, 1, 0, 0, 0, 0, has_year_zero=True),
cftime.DatetimeProlepticGregorian(1870, 4, 1, 0, 0, 0, 0, has_year_zero=True)],
[cftime.DatetimeProlepticGregorian(1870, 4, 1, 0, 0, 0, 0, has_year_zero=True),
cftime.DatetimeProlepticGregorian(1870, 5, 1, 0, 0, 0, 0, has_year_zero=True)],
[cftime.DatetimeProlepticGregorian(1870, 5, 1, 0, 0, 0, 0, has_year_zero=True),
cftime.DatetimeProlepticGregorian(1870, 6, 1, 0, 0, 0, 0, has_year_zero=True)],
[cftime.DatetimeProlepticGregorian(1870, 6, 1, 0, 0, 0, 0, has_year_zero=True),
cftime.DatetimeProlepticGregorian(1870, 7, 1, 0, 0, 0, 0, has_year_zero=True)],
[cftime.DatetimeProlepticGregorian(1870, 7, 1, 0, 0, 0, 0, has_year_zero=True),
cftime.DatetimeProlepticGregorian(1870, 8, 1, 0, 0, 0, 0, has_year_zero=True)],
[cftime.DatetimeProlepticGregorian(1870, 8, 1, 0, 0, 0, 0, has_year_zero=True),
cftime.DatetimeProlepticGregorian(1870, 9, 1, 0, 0, 0, 0, has_year_zero=True)],
[cftime.DatetimeProlepticGregorian(1870, 9, 1, 0, 0, 0, 0, has_year_zero=True),
cftime.DatetimeProlepticGregorian(1870, 10, 1, 0, 0, 0, 0, has_year_zero=True)],
[cftime.DatetimeProlepticGregorian(1870, 10, 1, 0, 0, 0, 0, has_year_zero=True),
cftime.DatetimeProlepticGregorian(1870, 11, 1, 0, 0, 0, 0, has_year_zero=True)],
...
cftime.DatetimeProlepticGregorian(1874, 4, 1, 0, 0, 0, 0, has_year_zero=True)],
[cftime.DatetimeProlepticGregorian(1874, 4, 1, 0, 0, 0, 0, has_year_zero=True),
cftime.DatetimeProlepticGregorian(1874, 5, 1, 0, 0, 0, 0, has_year_zero=True)],
[cftime.DatetimeProlepticGregorian(1874, 5, 1, 0, 0, 0, 0, has_year_zero=True),
cftime.DatetimeProlepticGregorian(1874, 6, 1, 0, 0, 0, 0, has_year_zero=True)],
[cftime.DatetimeProlepticGregorian(1874, 6, 1, 0, 0, 0, 0, has_year_zero=True),
cftime.DatetimeProlepticGregorian(1874, 7, 1, 0, 0, 0, 0, has_year_zero=True)],
[cftime.DatetimeProlepticGregorian(1874, 7, 1, 0, 0, 0, 0, has_year_zero=True),
cftime.DatetimeProlepticGregorian(1874, 8, 1, 0, 0, 0, 0, has_year_zero=True)],
[cftime.DatetimeProlepticGregorian(1874, 8, 1, 0, 0, 0, 0, has_year_zero=True),
cftime.DatetimeProlepticGregorian(1874, 9, 1, 0, 0, 0, 0, has_year_zero=True)],
[cftime.DatetimeProlepticGregorian(1874, 9, 1, 0, 0, 0, 0, has_year_zero=True),
cftime.DatetimeProlepticGregorian(1874, 10, 1, 0, 0, 0, 0, has_year_zero=True)],
[cftime.DatetimeProlepticGregorian(1874, 10, 1, 0, 0, 0, 0, has_year_zero=True),
cftime.DatetimeProlepticGregorian(1874, 11, 1, 0, 0, 0, 0, has_year_zero=True)],
[cftime.DatetimeProlepticGregorian(1874, 11, 1, 0, 0, 0, 0, has_year_zero=True),
cftime.DatetimeProlepticGregorian(1874, 12, 1, 0, 0, 0, 0, has_year_zero=True)],
[cftime.DatetimeProlepticGregorian(1874, 12, 1, 0, 0, 0, 0, has_year_zero=True),
cftime.DatetimeProlepticGregorian(1875, 1, 1, 0, 0, 0, 0, has_year_zero=True)]],
dtype=object)
Coordinates:
height float64 8B ...
* time (time) object 480B 1870-01-16 12:00:00 ... 1874-12-16 12:00:00
Dimensions without coordinates: bnds
Attributes:
xcdat_bounds: TrueNote that ds.bounds.add_time_bounds(method="midpoint") is the same as ds.bounds.add_bounds("T"). The latter method can be used to add bounds to other axes (e.g., latitude) as show below.
[15]:
ds6 = ds.drop_vars("lat_bnds")
ds6 = ds6.bounds.add_bounds("Y")
ds6.lat_bnds
[15]:
<xarray.DataArray 'lat_bnds' (lat: 145, bnds: 2)> Size: 2kB
array([[-90. , -89.375],
[-89.375, -88.125],
[-88.125, -86.875],
[-86.875, -85.625],
[-85.625, -84.375],
[-84.375, -83.125],
[-83.125, -81.875],
[-81.875, -80.625],
[-80.625, -79.375],
[-79.375, -78.125],
[-78.125, -76.875],
[-76.875, -75.625],
[-75.625, -74.375],
[-74.375, -73.125],
[-73.125, -71.875],
[-71.875, -70.625],
[-70.625, -69.375],
[-69.375, -68.125],
[-68.125, -66.875],
[-66.875, -65.625],
...
[ 65.625, 66.875],
[ 66.875, 68.125],
[ 68.125, 69.375],
[ 69.375, 70.625],
[ 70.625, 71.875],
[ 71.875, 73.125],
[ 73.125, 74.375],
[ 74.375, 75.625],
[ 75.625, 76.875],
[ 76.875, 78.125],
[ 78.125, 79.375],
[ 79.375, 80.625],
[ 80.625, 81.875],
[ 81.875, 83.125],
[ 83.125, 84.375],
[ 84.375, 85.625],
[ 85.625, 86.875],
[ 86.875, 88.125],
[ 88.125, 89.375],
[ 89.375, 90. ]])
Coordinates:
* lat (lat) float64 1kB -90.0 -88.75 -87.5 -86.25 ... 87.5 88.75 90.0
height float64 8B ...
Dimensions without coordinates: bnds
Attributes:
xcdat_bounds: TrueAdd missing bounds for all axes supported by xcdat (X, Y, T, Z)#
Related API: xarray.Dataset.bounds.add_missing_bounds()
[16]:
# We drop the dataset axes bounds to demonstrate generating missing bounds.
ds7 = ds.drop_vars(["time_bnds", "lat_bnds", "lon_bnds"])
[17]:
ds7
[17]:
<xarray.Dataset> Size: 7MB
Dimensions: (lat: 145, lon: 192, time: 60)
Coordinates:
* lat (lat) float64 1kB -90.0 -88.75 -87.5 -86.25 ... 87.5 88.75 90.0
* lon (lon) float64 2kB 0.0 1.875 3.75 5.625 ... 352.5 354.4 356.2 358.1
height float64 8B ...
* time (time) object 480B 1870-01-16 12:00:00 ... 1874-12-16 12:00:00
Data variables:
tas (time, lat, lon) float32 7MB dask.array<chunksize=(1, 145, 192), meta=np.ndarray>
Attributes: (12/48)
Conventions: CF-1.7 CMIP-6.2
activity_id: CMIP
branch_method: standard
branch_time_in_child: 0.0
branch_time_in_parent: 87658.0
creation_date: 2020-06-05T04:06:11Z
... ...
variant_label: r10i1p1f1
version: v20200605
license: CMIP6 model data produced by CSIRO is li...
cmor_version: 3.4.0
tracking_id: hdl:21.14100/af78ae5e-f3a6-4e99-8cfe-5f2...
DODS_EXTRA.Unlimited_Dimension: time[18]:
# add now-missing bounds
ds7 = ds7.bounds.add_missing_bounds(["X", "Y", "T"])
# print dataset
ds7
[18]:
<xarray.Dataset> Size: 7MB
Dimensions: (lat: 145, lon: 192, time: 60, bnds: 2)
Coordinates:
* lat (lat) float64 1kB -90.0 -88.75 -87.5 -86.25 ... 87.5 88.75 90.0
* lon (lon) float64 2kB 0.0 1.875 3.75 5.625 ... 354.4 356.2 358.1
height float64 8B ...
* time (time) object 480B 1870-01-16 12:00:00 ... 1874-12-16 12:00:00
Dimensions without coordinates: bnds
Data variables:
tas (time, lat, lon) float32 7MB dask.array<chunksize=(1, 145, 192), meta=np.ndarray>
lon_bnds (lon, bnds) float64 3kB -0.9375 0.9375 0.9375 ... 357.2 359.1
lat_bnds (lat, bnds) float64 2kB -90.0 -89.38 -89.38 ... 89.38 89.38 90.0
time_bnds (time, bnds) object 960B 1870-01-01 00:00:00 ... 1875-01-01 00...
Attributes: (12/48)
Conventions: CF-1.7 CMIP-6.2
activity_id: CMIP
branch_method: standard
branch_time_in_child: 0.0
branch_time_in_parent: 87658.0
creation_date: 2020-06-05T04:06:11Z
... ...
variant_label: r10i1p1f1
version: v20200605
license: CMIP6 model data produced by CSIRO is li...
cmor_version: 3.4.0
tracking_id: hdl:21.14100/af78ae5e-f3a6-4e99-8cfe-5f2...
DODS_EXTRA.Unlimited_Dimension: timeNote that ds.bounds.add_missing_bounds uses ds.bounds.add_bounds for the latitude and longitude axes and defaults to the frequency method and add_time_bounds for the time axis. If you click on the database symbol for time_bnds above, the bounds are slightly mis-aligned because the time axis was not centered before adding the time axis. In this case, the user should call xcdat.center_times and then ds.bounds.add_missing_bounds (as shown earlier).
Get the dimension coordinates for an axis.#
In xarray, you can get a dimension coordinates by directly referencing its name (e.g., ds.lat). xcdat provides an alternative way to get dimension coordinates agnostically by simply passing the CF axis key to applicable APIs.
Related API: xcdat.get_dim_coords() & xcdat.get_dim_keys()
Helpful knowledge:
This API uses
cf_xarrayto interpret CF axis names and coordinate names in the xarray object attributes. Refer to Metadata Interpretation for more information.
Xarray documentation on coordinates (source):
There are two types of coordinates in xarray:
dimension coordinates are one dimensional coordinates with a name equal to their sole dimension (marked by * when printing a dataset or data array). They are used for label based indexing and alignment, like the index found on a pandas DataFrame or Series. Indeed, these “dimension” coordinates use a pandas.Index internally to store their values.
non-dimension coordinates are variables that contain coordinate data, but are not a dimension coordinate. They can be multidimensional (see Working with Multidimensional Coordinates), and there is no relationship between the name of a non-dimension coordinate and the name(s) of its dimension(s). Non-dimension coordinates can be useful for indexing or plotting; otherwise, xarray does not make any direct use of the values associated with them. They are not used for alignment or automatic indexing, nor are they required to match when doing arithmetic (see Coordinates).
Xarray’s terminology differs from the CF terminology, where the “dimension coordinates” are called “coordinate variables”, and the “non-dimension coordinates” are called “auxiliary coordinate variables” (see GH1295 for more details).
1. axis attr#
[19]:
ds.lat.attrs["axis"]
[19]:
'Y'
2. standard_name attr#
[20]:
ds.lat.attrs["standard_name"]
[20]:
'latitude'
[21]:
"lat" in ds.dims
[21]:
True
Utilities to get the coordinate axis and coordinate axis key#
[22]:
xc.get_dim_coords(ds, axis="Y")
[22]:
<xarray.DataArray 'lat' (lat: 145)> Size: 1kB
array([-90. , -88.75, -87.5 , -86.25, -85. , -83.75, -82.5 , -81.25, -80. ,
-78.75, -77.5 , -76.25, -75. , -73.75, -72.5 , -71.25, -70. , -68.75,
-67.5 , -66.25, -65. , -63.75, -62.5 , -61.25, -60. , -58.75, -57.5 ,
-56.25, -55. , -53.75, -52.5 , -51.25, -50. , -48.75, -47.5 , -46.25,
-45. , -43.75, -42.5 , -41.25, -40. , -38.75, -37.5 , -36.25, -35. ,
-33.75, -32.5 , -31.25, -30. , -28.75, -27.5 , -26.25, -25. , -23.75,
-22.5 , -21.25, -20. , -18.75, -17.5 , -16.25, -15. , -13.75, -12.5 ,
-11.25, -10. , -8.75, -7.5 , -6.25, -5. , -3.75, -2.5 , -1.25,
0. , 1.25, 2.5 , 3.75, 5. , 6.25, 7.5 , 8.75, 10. ,
11.25, 12.5 , 13.75, 15. , 16.25, 17.5 , 18.75, 20. , 21.25,
22.5 , 23.75, 25. , 26.25, 27.5 , 28.75, 30. , 31.25, 32.5 ,
33.75, 35. , 36.25, 37.5 , 38.75, 40. , 41.25, 42.5 , 43.75,
45. , 46.25, 47.5 , 48.75, 50. , 51.25, 52.5 , 53.75, 55. ,
56.25, 57.5 , 58.75, 60. , 61.25, 62.5 , 63.75, 65. , 66.25,
67.5 , 68.75, 70. , 71.25, 72.5 , 73.75, 75. , 76.25, 77.5 ,
78.75, 80. , 81.25, 82.5 , 83.75, 85. , 86.25, 87.5 , 88.75,
90. ])
Coordinates:
* lat (lat) float64 1kB -90.0 -88.75 -87.5 -86.25 ... 87.5 88.75 90.0
height float64 8B ...
Attributes:
bounds: lat_bnds
units: degrees_north
axis: Y
long_name: Latitude
standard_name: latitude[23]:
xc.get_dim_keys(ds, axis="X")
[23]:
'lon'