xcdat.bounds.BoundsAccessor#

class xcdat.bounds.BoundsAccessor(dataset)[source]#

An accessor class that provides bounds attributes and methods on xarray Datasets through the .bounds attribute.

Examples

Import BoundsAccessor class:

>>> import xcdat  # or from xcdat import bounds

Use BoundsAccessor class:

>>> ds = xcdat.open_dataset("/path/to/file")
>>>
>>> ds.bounds.<attribute>
>>> ds.bounds.<method>
>>> ds.bounds.<property>
Parameters:

dataset (xr.Dataset) – A Dataset object.

Examples

Import:

>>> from xcdat import bounds

Return dictionary of axis and coordinate keys mapped to bounds:

>>> ds.bounds.map

Return list of keys for bounds data variables:

>>> ds.bounds.keys

Add missing coordinate bounds for supported axes in the Dataset:

>>> ds = ds.bounds.add_missing_bounds(axes=["X", "Y", "T"])

Get coordinate bounds if they exist:

>>> lat_bounds = ds.bounds.get_bounds("Y")
>>> lon_bounds = ds.bounds.get_bounds("X")
>>> time_bounds = ds.bounds.get_bounds("T")

Add coordinate bounds for a specific axis if they don’t exist:

>>> ds = ds.bounds.add_bounds("Y")
__init__(dataset)[source]#

Methods

__init__(dataset)

add_bounds(axis)

Add bounds for an axis using its coordinates as midpoints.

add_missing_bounds([axes])

Adds missing coordinate bounds for supported axes in the Dataset.

add_time_bounds(method[, freq, ...])

Add bounds for an axis using its coordinate points.

get_bounds(axis[, var_key])

Gets coordinate bounds.

Attributes

keys

Returns a list of keys for the bounds data variables in the Dataset.

map

Returns a map of axis and coordinates keys to their bounds.

_dataset#
property map#

Returns a map of axis and coordinates keys to their bounds.

The dictionary provides all valid CF compliant keys for axis and coordinates. For example, latitude will includes keys for “lat”, “latitude”, and “Y”.

Returns:

Dict[str, Optional[xr.DataArray]] – Dictionary mapping axis and coordinate keys to their bounds.

property keys#

Returns a list of keys for the bounds data variables in the Dataset.

Returns:

List[str] – A list of sorted bounds data variable keys.

add_missing_bounds(axes=['X', 'Y', 'T'])[source]#

Adds missing coordinate bounds for supported axes in the Dataset.

This function loops through the Dataset’s axes and attempts to adds bounds to its coordinates if they don’t exist. “X”, “Y” , and “Z” axes bounds are the midpoints between coordinates. “T” axis bounds are based on the time frequency of the coordinates.

An axis must meet the following criteria to add bounds for it, otherwise they are ignored:

  1. Axis is either X”, “Y”, “T”, or “Z”

  2. Coordinates are a single dimension, not multidimensional

  3. Coordinates are a length > 1 (not singleton)

  4. Bounds must not already exist

    • Coordinates are mapped to bounds using the “bounds” attr. For example, bounds exist if ds.time.attrs["bounds"] is set to "time_bnds" and ds.time_bnds is present in the dataset.

  5. For the “T” axis, its coordinates must be composed of datetime-like objects (np.datetime64 or cftime). This method designed to operate on time axes that have constant temporal resolution with annual, monthly, daily, or sub-daily time frequencies. Alternate frequencies (e.g., pentad) are not supported.

Parameters:

axes (List[str]) – List of CF axes that function should operate on, by default [“X”, “Y”, “T”]. Options include “X”, “Y”, “T”, or “Z”.

Returns:

xr.Dataset

get_bounds(axis, var_key=None)[source]#

Gets coordinate bounds.

Parameters:
  • axis (CFAxisKey) – The CF axis key (“X”, “Y”, “T”, “Z”).

  • var_key (Optional[str]) – The key of the coordinate or data variable to get axis bounds for. This parameter is useful if you only want the single bounds DataArray related to the axis on the variable (e.g., “tas” has a “lat” dimension and you want “lat_bnds”).

Returns:

Union[xr.Dataset, xr.DataArray] – A Dataset of N bounds variables, or a single bounds variable DataArray.

Raises:
  • ValueError – If an incorrect axis argument is passed.

  • KeyError: – If bounds were not found for the specific axis.

add_bounds(axis)[source]#

Add bounds for an axis using its coordinates as midpoints.

This method loops over the axis’s coordinate variables and attempts to add bounds for each of them if they don’t exist. Each coordinate point is the midpoint between their lower and upper bounds.

To add bounds for an axis its coordinates must meet the following criteria, otherwise an error is thrown:

  1. Axis is either X”, “Y”, “T”, or “Z”

  2. Coordinates are single dimensional, not multidimensional

  3. Coordinates are a length > 1 (not singleton)

  4. Bounds must not already exist

    • Coordinates are mapped to bounds using the “bounds” attr. For example, bounds exist if ds.time.attrs["bounds"] is set to "time_bnds" and ds.time_bnds is present in the dataset.

Parameters:

axis (CFAxisKey) – The CF axis key (“X”, “Y”, “T”, “Z”).

Returns:

  • xr.Dataset – The dataset with bounds added.

  • Raises

add_time_bounds(method, freq=None, daily_subfreq=None, end_of_month=False)[source]#

Add bounds for an axis using its coordinate points.

This method designed to operate on time axes that have constant temporal resolution with annual, monthly, daily, or sub-daily time frequencies. Alternate frequencies (e.g., pentad) are not supported. It loops over the time axis coordinate variables and attempts to add bounds for each of them if they don’t exist.

To add time bounds for the time axis, its coordinates must be the following criteria:

  1. Coordinates are single dimensional, not multidimensional

  2. Coordinates are a length > 1 (not singleton)

  3. Bounds must not already exist

    • Coordinates are mapped to bounds using the “bounds” attr. For example, bounds exist if ds.time.attrs["bounds"] is set to "time_bnds" and ds.time_bnds is present in the dataset.

  4. If method=freq, coordinates must be composed of datetime-like objects (np.datetime64 or cftime)

Parameters:
  • method ({"freq", "midpoint"}) – The method for creating time bounds for time coordinates, either “freq” or “midpoint”.

    • “freq”: Create time bounds as the start and end of each timestep’s period using either the inferred or specified time frequency (freq parameter). For example, the time bounds will be the start and end of each month for each monthly coordinate point.

    • “midpoint”: Create time bounds using time coordinates as the midpoint between their upper and lower bounds.

  • freq ({"year", "month", "day", "hour"}, optional) – If method="freq", this parameter specifies the time frequency for creating time bounds. By default None, which infers the frequency using the time coordinates.

  • daily_subfreq ({1, 2, 3, 4, 6, 8, 12, 24}, optional) – If freq=="hour", this parameter sets the number of timepoints per day for time bounds, by default None.

    • daily_subfreq=None infers the daily time frequency from the time coordinates.

    • daily_subfreq=1 is daily

    • daily_subfreq=2 is twice daily

    • daily_subfreq=4 is 6-hourly

    • daily_subfreq=8 is 3-hourly

    • daily_subfreq=12 is 2-hourly

    • daily_subfreq=24 is hourly

  • end_of_month (bool, optional) – If freq=="month", this flag notes that the timepoint is saved at the end of the monthly interval (see Note), by default False.

    • Some timepoints are saved at the end of the interval, e.g., Feb. 1 00:00 for the time interval Jan. 1 00:00 - Feb. 1 00:00. Since this method determines the month and year from the time vector, the bounds will be set incorrectly if the timepoint is set to the end of the time interval. For these cases, set end_of_month=True.

Returns:

xr.Dataset – The dataset with time bounds added.

_drop_ancillary_singleton_coords(coord_vars)[source]#

Drop ancillary singleton coordinates from dimension coordinates.

Xarray coordinate variables retain all coordinates from the parent object. This means if singleton coordinates exist, they are attached to dimension coordinates as ancillary coordinates. For example, the “height” singleton coordinate will be attached to “time” coordinates even though “height” is related to the “Z” axis, not the “T” axis. Refer to [1] for more info on this Xarray behavior.

This is an undesirable behavior in xCDAT because the add bounds methods loop over coordinates related to an axis and attempts to add bounds if they don’t exist. If ancillary coordinates are present, “ValueError: Cannot generate bounds for coordinate variable ‘height’ which has a length <= 1 (singleton)” is raised. For the purpose of adding bounds, we temporarily drop any ancillary singletons from dimension coordinates before looping over those coordinates. Ancillary singletons will still be present in the final Dataset object to maintain the Dataset’s integrity.

Parameters:

coord_vars (Union[xr.Dataset, xr.DataArray]) – The dimension coordinate variables with ancillary coordinates (if they exist).

Returns:

Union[xr.Dataset, xr.DataArray] – The dimension coordinate variables with ancillary coordinates dropped (if they exist).

References

_get_bounds_keys(axis)[source]#

Get bounds keys for an axis’s coordinate variables in the dataset.

This function attempts to map bounds to an axis using cf_xarray and its interpretation of the CF “bounds” attribute.

Parameters:

axis (CFAxisKey) – The CF axis key (“X”, “Y”, “T”, or “Z”).

Returns:

List[str] – The axis bounds key(s).

_create_time_bounds(time, freq=None, daily_subfreq=None, end_of_month=False)[source]#

Creates time bounds for each timestep of the time coordinate axis.

This method creates time bounds as the start and end of each timestep’s period using either the inferred or specified time frequency (freq parameter). For example, the time bounds will be the start and end of each month for each monthly coordinate point.

Parameters:
  • time (xr.DataArray) – The temporal coordinate variable for the axis.

  • freq ({"year", "month", "day", "hour"}, optional) – The time frequency for creating time bounds, by default None (infer the frequency).

  • daily_subfreq ({1, 2, 3, 4, 6, 8, 12, 24}, optional) – If freq=="hour", this parameter sets the number of timepoints per day for bounds, by default None. If greater than 1, sub-daily bounds are created.

    • daily_subfreq=None infers the freq from the time coords (default)

    • daily_subfreq=1 is daily

    • daily_subfreq=2 is twice daily

    • daily_subfreq=4 is 6-hourly

    • daily_subfreq=8 is 3-hourly

    • daily_subfreq=12 is 2-hourly

    • daily_subfreq=24 is hourly

  • end_of_month (bool, optional) – If freq==”month”`, this flag notes that the timepoint is saved at the end of the monthly interval (see Note), by default False.

Returns:

xr.DataArray – A DataArray storing bounds for the time axis.

Raises:
  • ValueError – If coordinates are a singleton.

  • TypeError – If time coordinates are not composed of datetime-like objects.

Note

Some timepoints are saved at the end of the interval, e.g., Feb. 1 00:00 for the time interval Jan. 1 00:00 - Feb. 1 00:00. Since this function determines the month and year from the time vector, the bounds will be set incorrectly if the timepoint is set to the end of the time interval. For these cases, set end_of_month=True.

_create_yearly_time_bounds(timesteps, obj_type)[source]#

Creates time bounds for each timestep with the start and end of the year.

Bounds for each timestep correspond to Jan. 1 00:00:00 of the year of the timestep and Jan. 1 00:00:00 of the subsequent year.

Parameters:
  • timesteps (np.ndarray) – An array of timesteps, represented as either cftime.datetime or pd.Timestamp (casted from np.datetime64[ns] to support pandas time/date components).

  • obj_type (Union[cftime.datetime, pd.Timestamp]) – The object type for time bounds based on the dtype of time_values.

Returns:

List[Union[cftime.datetime, pd.Timestamp]] – A list of time bound values.

_create_monthly_time_bounds(timesteps, obj_type, end_of_month=False)[source]#

Creates time bounds for each timestep with the start and end of the month.

Bounds for each timestep correspond to 00:00:00 on the first of the month and 00:00:00 on the first of the subsequent month.

Parameters:
  • timesteps (np.ndarray) – An array of timesteps, represented as either cftime.datetime or pd.Timestamp (casted from np.datetime64[ns] to support pandas time/date components).

  • obj_type (Union[cftime.datetime, pd.Timestamp]) – The object type for time bounds based on the dtype of time_values.

  • end_of_month (bool, optional) – Flag to note that the timepoint is saved at the end of the monthly interval (see Note), by default False.

Returns:

List[Union[cftime.datetime, pd.Timestamp]] – A list of time bound values.

Note

Some timepoints are saved at the end of the interval, e.g., Feb. 1 00:00 for the time interval Jan. 1 00:00 - Feb. 1 00:00. Since this function determines the month and year from the time vector, the bounds will be set incorrectly if the timepoint is set to the end of the time interval. For these cases, set end_of_month=True.

_add_months_to_timestep(timestep, obj_type, delta)[source]#

Adds delta month(s) to a timestep.

The delta value can be positive or negative (for subtraction). Refer to [4] for logic.

Parameters:
  • timestep (Union[cftime.datime, pd.Timestamp]) – A timestep represented as cftime.datetime or pd.Timestamp.

  • obj_type (Union[cftime.datetime, pd.Timestamp]) – The object type for time bounds based on the dtype of timestep.

  • delta (int) – Integer months to be added to times (can be positive or negative)

Returns:

Union[cftime.datetime, pd.Timestamp]

References

_create_daily_time_bounds(timesteps, obj_type, freq=1)[source]#

Creates time bounds for each timestep with the start and end of the day.

Bounds for each timestep corresponds to 00:00:00 timepoint on the current day and 00:00:00 on the subsequent day.

If time steps are sub-daily, then the bounds will begin at 00:00 and end at 00:00 of the following day. For example, for 3-hourly data, the bounds would be:

[
    ["01/01/2000 00:00", "01/01/2000 03:00"],
    ["01/01/2000 03:00", "01/01/2000 06:00"],
    ...
    ["01/01/2000 21:00", "02/01/2000 00:00"],
]
Parameters:
  • timesteps (np.ndarray) – An array of timesteps, represented as either cftime.datetime or pd.Timestamp (casted from np.datetime64[ns] to support pandas time/date components).

  • obj_type (Union[cftime.datetime, pd.Timestamp]) – The object type for time bounds based on the dtype of time_values.

  • freq ({1, 2, 3, 4, 6, 8, 12, 24}, optional) – Number of timepoints per day, by default 1. If greater than 1, sub-daily bounds are created.

    • freq=1 is daily (default)

    • freq=2 is twice daily

    • freq=4 is 6-hourly

    • freq=8 is 3-hourly

    • freq=12 is 2-hourly

    • freq=24 is hourly

Returns:

List[Union[cftime.datetime, pd.Timestamp]] – A list of time bound values.

Raises:

ValueError – If an incorrect freq argument is passed. Should be 1, 2, 3, 4, 6, 8, 12, or 24.

Notes

This function is intended to reproduce CDAT’s setAxisTimeBoundsDaily method [5].

References

_validate_axis_arg(axis)[source]#