xcdat.temporal.TemporalAccessor#

class xcdat.temporal.TemporalAccessor(dataset)[source]#

An accessor class that provides temporal attributes and methods on xarray Datasets through the .temporal attribute.

This accessor class requires the dataset’s time coordinates to be decoded as np.datetime64 or cftime.datetime objects. The dataset must also have time bounds to generate weights for weighted calculations and to infer the grouping time frequency in average() (single-snap shot average).

Examples

Import TemporalAccessor class:

>>> import xcdat  # or from xcdat import temporal

Use TemporalAccessor class:

>>> ds = xcdat.open_dataset("/path/to/file")
>>>
>>> ds.temporal.<attribute>
>>> ds.temporal.<method>
>>> ds.temporal.<property>

Check the ‘axis’ attribute is set on the time coordinates:

>>> ds.time.attrs["axis"]
>>> T

Set the ‘axis’ attribute for the time coordinates if it isn’t:

>>> ds.time.attrs["axis"] = "T"
Parameters:

dataset (xr.Dataset) – A Dataset object.

__init__(dataset)[source]#

Methods

__init__(dataset)

average(data_var[, weighted, keep_weights])

Returns a Dataset with the average of a data variable and the time dimension removed.

climatology(data_var, freq[, weighted, ...])

Returns a Dataset with the climatology of a data variable.

departures(data_var, freq[, weighted, ...])

Returns a Dataset with the climatological departures (anomalies) for a data variable.

group_average(data_var, freq[, weighted, ...])

Returns a Dataset with average of a data variable by time group.

average(data_var, weighted=True, keep_weights=False)[source]#

Returns a Dataset with the average of a data variable and the time dimension removed.

This method infers the time grouping frequency by checking the distance between a set of upper and lower time bounds. This method is particularly useful for calculating the weighted averages of monthly or yearly time series data because the number of days per month/year can vary based on the calendar type, which can affect weighting. For other frequencies, the distribution of weights will be equal so weighted=True is the same as weighted=False.

Time bounds are used for inferring the time series frequency and for generating weights (refer to the weighted parameter documentation below).

Parameters:
  • data_var (str) – The key of the data variable for calculating averages

  • weighted (bool, optional) – Calculate averages using weights, by default True.

    Weights are calculated by first determining the length of time for each coordinate point using the difference of its upper and lower bounds. The time lengths are grouped, then each time length is divided by the total sum of the time lengths to get the weight of each coordinate point.

    The weight of masked (missing) data is excluded when averages are taken. This is the same as giving them a weight of 0.

    Note that weights are assigned by the labeled time point. If the dataset includes timepoints that span across typical boundaries (e.g., a timepoint on 2020-06-01 with bounds that begin in May 2020 and end in June 2020), the weights will not be assigned properly. See explanation in the Notes section below.

  • keep_weights (bool, optional) – If calculating averages using weights, keep the weights in the final dataset output, by default False.

Returns:

xr.Dataset – Dataset with the average of the data variable and the time dimension removed.

Notes

When using weighted averages, the weights are assigned based on the timepoint value. For example, a time point of 2020-06-15 with bounds (2020-06-01, 2020-06-30) has 30 days of weight assigned to June, 2020 (e.g., for an annual average calculation). This would be expected behavior, but it’s possible that data could span across typical temporal boundaries. For example, a time point of 2020-06-01 with bounds (2020-05-16, 2020-06-15) would have 30 days of weight, but this weight would be assigned to June, 2020, which would be incorrect (15 days of weight should be assigned to May and 15 days of weight should be assigned to June). This issue could plausibly arise when using pentad data.

Examples

Get weighted averages for a monthly time series data variable:

>>> ds_month = ds.temporal.average("ts")
>>> ds_month.ts
group_average(data_var, freq, weighted=True, keep_weights=False, season_config={'custom_seasons': None, 'dec_mode': 'DJF', 'drop_incomplete_djf': False, 'drop_incomplete_seasons': False})[source]#

Returns a Dataset with average of a data variable by time group.

Data is grouped into the labeled time point for the averaging operation. Time bounds are used for generating weights to calculate weighted group averages (refer to the weighted parameter documentation below).

Deprecated since version v0.8.0: The season_config dictionary argument "drop_incomplete_djf" is being deprecated. Please use "drop_incomplete_seasons" instead.

Parameters:
  • data_var (str) – The key of the data variable for calculating time series averages.

  • freq (Frequency) – The time frequency to group by.

    • “year”: groups by year for yearly averages.

    • “season”: groups by (year, season) for seasonal averages.

    • “month”: groups by (year, month) for monthly averages.

    • “day”: groups by (year, month, day) for daily averages.

    • “hour”: groups by (year, month, day, hour) for hourly averages.

  • weighted (bool, optional) – Calculate averages using weights, by default True.

    Weights are calculated by first determining the length of time for each coordinate point using the difference of its upper and lower bounds. The time lengths are grouped, then each time length is divided by the total sum of the time lengths to get the weight of each coordinate point.

    The weight of masked (missing) data is excluded when averages are calculated. This is the same as giving them a weight of 0.

    Note that weights are assigned by the labeled time point. If the dataset includes timepoints that span across typical boundaries (e.g., a timepoint on 2020-06-01 with bounds that begin in May 2020 and end in June 2020), the weights will not be assigned properly. See explanation in the Notes section below.

  • keep_weights (bool, optional) – If calculating averages using weights, keep the weights in the final dataset output, by default False.

  • season_config (SeasonConfigInput, optional) – A dictionary for “season” frequency configurations. If configs for predefined seasons are passed, configs for custom seasons are ignored and vice versa.

    • “drop_incomplete_seasons” (bool, by default False)

      Seasons are considered incomplete if they do not have all of the required months to form the season. This argument supersedes “drop_incomplete_djf”. For example, if we have the time coordinates [“2000-11-16”, “2000-12-16”, “2001-01-16”, “2001-02-16”] and we want to group seasons by “ND” (“Nov”, “Dec”) and “JFM” (“Jan”, “Feb”, “Mar”).

      • [“2000-11-16”, “2000-12-16”] is considered a complete “ND”

        season since both “Nov” and “Dec” are present.

      • [“2001-01-16”, “2001-02-16”] is considered an incomplete “JFM”

        season because it only has “Jan” and “Feb”. Therefore, these time coordinates are dropped.

    • “drop_incomplete_djf” (bool, by default False)

      If the “dec_mode” is “DJF”, this flag drops (True) or keeps (False) time coordinates that fall under incomplete DJF seasons Incomplete DJF seasons include the start year Jan/Feb and the end year Dec. This argument is superceded by “drop_incomplete_seasons” and will be deprecated in a future release.

    • “dec_mode” (Literal[“DJF”, “JFD”], by default “DJF”)

      The mode for the season that includes December in the list of list of pre-defined seasons (“DJF”/”JFD”, “MAM”, “JJA”, “SON”). This config is ignored if the custom_seasons config is set.

      • “DJF”: season includes the previous year December.

      • “JFD”: season includes the same year December.

        Xarray labels the season with December as “DJF”, but it is actually “JFD”.

    • “custom_seasons” ([List[List[str]]], by default None)

      List of sublists containing month strings, with each sublist representing a custom season.

      • Month strings must be in the three letter format (e.g., ‘Jan’)

      • Order of the months in each custom season does not matter

      • Custom seasons can vary in length

      >>> # Example of custom seasons in a three month format:
      >>> custom_seasons = [
      >>>     ["Jan", "Feb", "Mar"],  # "JanFebMar"
      >>>     ["Apr", "May", "Jun"],  # "AprMayJun"
      >>>     ["Jul", "Aug", "Sep"],  # "JulAugSep"
      >>>     ["Oct", "Nov", "Dec"],  # "OctNovDec"
      >>> ]
      
Returns:

xr.Dataset – Dataset with the average of a data variable by time group.

Notes

When using weighted averages, the weights are assigned based on the timepoint value. For example, a time point of 2020-06-15 with bounds (2020-06-01, 2020-06-30) has 30 days of weight assigned to June, 2020 (e.g., for an annual average calculation). This would be expected behavior, but it’s possible that data could span across typical temporal boundaries. For example, a time point of 2020-06-01 with bounds (2020-05-16, 2020-06-15) would have 30 days of weight, but this weight would be assigned to June, 2020, which would be incorrect (15 days of weight should be assigned to May and 15 days of weight should be assigned to June). This issue could plausibly arise when using pentad data.

Examples

Get seasonal averages for a data variable:

>>> ds_season = ds.temporal.group_average(
>>>     "ts",
>>>     "season",
>>>     season_config={
>>>         "dec_mode": "DJF",
>>>         "drop_incomplete_seasons": True
>>>     }
>>> )
>>> ds_season.ts
>>>
>>> ds_season_with_jfd = ds.temporal.group_average(
>>>     "ts",
>>>     "season",
>>>     season_config={"dec_mode": "JFD"}
>>> )
>>> ds_season_with_jfd.ts

Get seasonal averages with custom seasons for a data variable:

>>> custom_seasons = [
>>>     ["Jan", "Feb", "Mar"],  # "JanFebMar"
>>>     ["Apr", "May", "Jun"],  # "AprMayJun"
>>>     ["Jul", "Aug", "Sep"],  # "JulAugSep"
>>>     ["Oct", "Nov", "Dec"],  # "OctNovDec"
>>> ]
>>>
>>> ds_season_custom = ds.temporal.group_average(
>>>     "ts",
>>>     "season",
>>>     season_config={"custom_seasons": custom_seasons}
>>> )

Get the average() operation attributes:

>>> ds_season_with_djf.ts.attrs
{
    'operation': 'temporal_avg',
    'mode': 'average',
    'freq': 'season',
    'weighted': 'True',
    'dec_mode': 'DJF',
    'drop_incomplete_seasons': 'False'
}
climatology(data_var, freq, weighted=True, keep_weights=False, reference_period=None, season_config={'custom_seasons': None, 'dec_mode': 'DJF', 'drop_incomplete_djf': False, 'drop_incomplete_seasons': False})[source]#

Returns a Dataset with the climatology of a data variable.

Data is grouped into the labeled time point for the averaging operation. Time bounds are used for generating weights to calculate weighted climatology (refer to the weighted parameter documentation below).

Deprecated since version v0.8.0: The season_config dictionary argument "drop_incomplete_djf" is being deprecated. Please use "drop_incomplete_seasons" instead.

Parameters:
  • data_var (str) – The key of the data variable for calculating climatology.

  • freq (Frequency) – The time frequency to group by.

    • “season”: groups by season for the seasonal cycle climatology.

    • “month”: groups by month for the annual cycle climatology.

    • “day”: groups by (month, day) for the daily cycle climatology. If the CF calendar type is "gregorian", "proleptic_gregorian", or "standard", leap days (if present) are dropped to avoid inconsistencies when calculating climatologies. Refer to [1] for more details on this implementation decision.

  • weighted (bool, optional) – Calculate averages using weights, by default True.

    Weights are calculated by first determining the length of time for each coordinate point using the difference of its upper and lower bounds. The time lengths are grouped, then each time length is divided by the total sum of the time lengths to get the weight of each coordinate point.

    The weight of masked (missing) data is excluded when averages are taken. This is the same as giving them a weight of 0.

    Note that weights are assigned by the labeled time point. If the dataset includes timepoints that span across typical boundaries (e.g., a timepoint on 2020-06-01 with bounds that begin in May 2020 and end in June 2020), the weights will not be assigned properly. See explanation in the Notes section below.

  • keep_weights (bool, optional) – If calculating averages using weights, keep the weights in the final dataset output, by default False.

  • reference_period (Optional[Tuple[str, str]], optional) – The climatological reference period, which is a subset of the entire time series. This parameter accepts a tuple of strings in the format ‘yyyy-mm-dd’. For example, ('1850-01-01', '1899-12-31'). If no value is provided, the climatological reference period will be the full period covered by the dataset.

  • season_config (SeasonConfigInput, optional) – A dictionary for “season” frequency configurations. If configs for predefined seasons are passed, configs for custom seasons are ignored and vice versa.

    • “drop_incomplete_seasons” (bool, by default False)

      Seasons are considered incomplete if they do not have all of the required months to form the season. This argument supersedes “drop_incomplete_djf”. For example, if we have the time coordinates [“2000-11-16”, “2000-12-16”, “2001-01-16”, “2001-02-16”] and we want to group seasons by “ND” (“Nov”, “Dec”) and “JFM” (“Jan”, “Feb”, “Mar”).

      • [“2000-11-16”, “2000-12-16”] is considered a complete “ND”

        season since both “Nov” and “Dec” are present.

      • [“2001-01-16”, “2001-02-16”] is considered an incomplete “JFM”

        season because it only has “Jan” and “Feb”. Therefore, these time coordinates are dropped.

    • “drop_incomplete_djf” (bool, by default False)

      If the “dec_mode” is “DJF”, this flag drops (True) or keeps (False) time coordinates that fall under incomplete DJF seasons Incomplete DJF seasons include the start year Jan/Feb and the end year Dec. This argument is superceded by “drop_incomplete_seasons” and will be deprecated in a future release.

    • “dec_mode” (Literal[“DJF”, “JFD”], by default “DJF”)

      The mode for the season that includes December in the list of list of pre-defined seasons (“DJF”/”JFD”, “MAM”, “JJA”, “SON”). This config is ignored if the custom_seasons config is set.

      • “DJF”: season includes the previous year December.

      • “JFD”: season includes the same year December.

        Xarray labels the season with December as “DJF”, but it is actually “JFD”.

    • “custom_seasons” ([List[List[str]]], by default None)

      List of sublists containing month strings, with each sublist representing a custom season.

      • Month strings must be in the three letter format (e.g., ‘Jan’)

      • Order of the months in each custom season does not matter

      • Custom seasons can vary in length

      >>> # Example of custom seasons in a three month format:
      >>> custom_seasons = [
      >>>     ["Jan", "Feb", "Mar"],  # "JanFebMar"
      >>>     ["Apr", "May", "Jun"],  # "AprMayJun"
      >>>     ["Jul", "Aug", "Sep"],  # "JulAugSep"
      >>>     ["Oct", "Nov", "Dec"],  # "OctNovDec"
      >>> ]
      
Returns:

xr.Dataset – Dataset with the climatology of a data variable.

References

Notes

When using weighted averages, the weights are assigned based on the timepoint value. For example, a time point of 2020-06-15 with bounds (2020-06-01, 2020-06-30) has 30 days of weight assigned to June, 2020 (e.g., for an annual average calculation). This would be expected behavior, but it’s possible that data could span across typical temporal boundaries. For example, a time point of 2020-06-01 with bounds (2020-05-16, 2020-06-15) would have 30 days of weight, but this weight would be assigned to June, 2020, which would be incorrect (15 days of weight should be assigned to May and 15 days of weight should be assigned to June). This issue could plausibly arise when using pentad data.

Examples

Get a data variable’s seasonal climatology:

>>> ds_season = ds.temporal.climatology(
>>>     "ts",
>>>     "season",
>>>     season_config={
>>>         "dec_mode": "DJF",
>>>         "drop_incomplete_seasons": True
>>>     }
>>> )
>>> ds_season.ts
>>>
>>> ds_season = ds.temporal.climatology(
>>>     "ts",
>>>     "season",
>>>     season_config={"dec_mode": "JFD"}
>>> )
>>> ds_season.ts

Get a data variable’s seasonal climatology with custom seasons:

>>> custom_seasons = [
>>>     ["Jan", "Feb", "Mar"],  # "JanFebMar"
>>>     ["Apr", "May", "Jun"],  # "AprMayJun"
>>>     ["Jul", "Aug", "Sep"],  # "JulAugSep"
>>>     ["Oct", "Nov", "Dec"],  # "OctNovDec"
>>> ]
>>>
>>> ds_season_custom = ds.temporal.climatology(
>>>     "ts",
>>>     "season",
>>>     season_config={"custom_seasons": custom_seasons}
>>> )

Get climatology() operation attributes:

>>> ds_season_with_djf.ts.attrs
{
    'operation': 'temporal_avg',
    'mode': 'climatology',
    'freq': 'season',
    'weighted': 'True',
    'dec_mode': 'DJF',
    'drop_incomplete_seasons': 'False'
}
departures(data_var, freq, weighted=True, keep_weights=False, reference_period=None, season_config={'custom_seasons': None, 'dec_mode': 'DJF', 'drop_incomplete_djf': False, 'drop_incomplete_seasons': False})[source]#

Returns a Dataset with the climatological departures (anomalies) for a data variable.

In climatology, “anomalies” refer to the difference between the value during a given time interval (e.g., the January average surface air temperature) and the long-term average value for that time interval (e.g., the average surface temperature over the last 30 Januaries).

Time bounds are used for generating weights to calculate weighted climatology (refer to the weighted parameter documentation below).

Deprecated since version v0.8.0: The season_config dictionary argument "drop_incomplete_djf" is being deprecated. Please use "drop_incomplete_seasons" instead.

Parameters:
  • data_var (str) – The key of the data variable for calculating departures.

  • freq (Frequency) – The frequency of time to group by.

    • “season”: groups by season for the seasonal cycle departures.

    • “month”: groups by month for the annual cycle departures.

    • “day”: groups by (month, day) for the daily cycle departures. If the CF calendar type is "gregorian", "proleptic_gregorian", or "standard", leap days (if present) are dropped to avoid inconsistencies when calculating climatologies. Refer to [2] for more details on this implementation decision.

  • weighted (bool, optional) – Calculate averages using weights, by default True.

    Weights are calculated by first determining the length of time for each coordinate point using the difference of its upper and lower bounds. The time lengths are grouped, then each time length is divided by the total sum of the time lengths to get the weight of each coordinate point.

    The weight of masked (missing) data is excluded when averages are taken. This is the same as giving them a weight of 0.

    Note that weights are assigned by the labeled time point. If the dataset includes timepoints that span across typical boundaries (e.g., a timepoint on 2020-06-01 with bounds that begin in May 2020 and end in June 2020), the weights will not be assigned properly. See explanation in the Notes section below.

  • keep_weights (bool, optional) – If calculating averages using weights, keep the weights in the final dataset output, by default False.

  • reference_period (Optional[Tuple[str, str]], optional) – The climatological reference period, which is a subset of the entire time series and used for calculating departures. This parameter accepts a tuple of strings in the format ‘yyyy-mm-dd’. For example, ('1850-01-01', '1899-12-31'). If no value is provided, the climatological reference period will be the full period covered by the dataset.

  • season_config (SeasonConfigInput, optional) – A dictionary for “season” frequency configurations. If configs for predefined seasons are passed, configs for custom seasons are ignored and vice versa.

    General configs:

    • “drop_incomplete_seasons” (bool, by default False)

      Seasons are considered incomplete if they do not have all of the required months to form the season. This argument supersedes “drop_incomplete_djf”. For example, if we have the time coordinates [“2000-11-16”, “2000-12-16”, “2001-01-16”, “2001-02-16”] and we want to group seasons by “ND” (“Nov”, “Dec”) and “JFM” (“Jan”, “Feb”, “Mar”).

      • [“2000-11-16”, “2000-12-16”] is considered a complete “ND”

        season since both “Nov” and “Dec” are present.

      • [“2001-01-16”, “2001-02-16”] is considered an incomplete “JFM”

        season because it only has “Jan” and “Feb”. Therefore, these time coordinates are dropped.

    • “drop_incomplete_djf” (bool, by default False)

      If the “dec_mode” is “DJF”, this flag drops (True) or keeps (False) time coordinates that fall under incomplete DJF seasons Incomplete DJF seasons include the start year Jan/Feb and the end year Dec. This argument is superceded by “drop_incomplete_seasons” and will be deprecated in a future release.

    Configs for predefined seasons:

    • “dec_mode” (Literal[“DJF”, “JFD”], by default “DJF”)

      The mode for the season that includes December.

      • “DJF”: season includes the previous year December.

      • “JFD”: season includes the same year December.

        Xarray labels the season with December as “DJF”, but it is actually “JFD”.

    Configs for custom seasons:

    • “custom_seasons” ([List[List[str]]], by default None)

      List of sublists containing month strings, with each sublist representing a custom season.

      • Month strings must be in the three letter format (e.g., ‘Jan’)

      • Order of the months in each custom season does not matter

      • Custom seasons can vary in length

      >>> # Example of custom seasons in a three month format:
      >>> custom_seasons = [
      >>>     ["Jan", "Feb", "Mar"],  # "JanFebMar"
      >>>     ["Apr", "May", "Jun"],  # "AprMayJun"
      >>>     ["Jul", "Aug", "Sep"],  # "JulAugSep"
      >>>     ["Oct", "Nov", "Dec"],  # "OctNovDec"
      >>> ]
      
Returns:

xr.Dataset – The Dataset containing the departures for a data var’s climatology.

Notes

When using weighted averages, the weights are assigned based on the timepoint value. For example, a time point of 2020-06-15 with bounds (2020-06-01, 2020-06-30) has 30 days of weight assigned to June, 2020 (e.g., for an annual average calculation). This would be expected behavior, but it’s possible that data could span across typical temporal boundaries. For example, a time point of 2020-06-01 with bounds (2020-05-16, 2020-06-15) would have 30 days of weight, but this weight would be assigned to June, 2020, which would be incorrect (15 days of weight should be assigned to May and 15 days of weight should be assigned to June). This issue could plausibly arise when using pentad data.

This method uses xarray’s grouped arithmetic as a shortcut for mapping over all unique labels. Grouped arithmetic works by assigning a grouping label to each time coordinate of the observation data based on the averaging mode and frequency. Afterwards, the corresponding climatology is removed from the observation data at each time coordinate based on the matching labels.

Refer to [3] to learn more about how xarray’s grouped arithmetic works.

References

Examples

Get a data variable’s annual cycle departures:

>>> ds_depart = ds_climo.temporal.departures("ts", "month")

Get the departures() operation attributes:

>>> ds_depart.ts.attrs
{
    'operation': 'departures',
    'frequency': 'season',
    'weighted': 'True',
    'dec_mode': 'DJF',
    'drop_incomplete_seasons': 'False'
}
_averager(data_var, mode, freq, weighted=True, keep_weights=False, reference_period=None, season_config={'custom_seasons': None, 'dec_mode': 'DJF', 'drop_incomplete_djf': False, 'drop_incomplete_seasons': False})[source]#

Averages a data variable based on the averaging mode and frequency.

_set_data_var_attrs(data_var)[source]#

Set data variable metadata as object attributes and checks whether the time axis is decoded.

This includes the name of the data variable, the time axis dimension name, the calendar type and its corresponding cftime object (date type).

Parameters:

data_var (str) – The key of the data variable.

Raises:
  • TypeError – If the data variable’s time coordinates are not encoded as datetime-like objects.

  • KeyError – If the data variable does not have a “calendar” encoding attribute.

_set_arg_attrs(mode, freq, weighted, reference_period=None, season_config={'custom_seasons': None, 'dec_mode': 'DJF', 'drop_incomplete_djf': False, 'drop_incomplete_seasons': False})[source]#

Validates method arguments and sets them as object attributes.

Parameters:
  • mode (Mode) – The mode for temporal averaging.

  • freq (Frequency) – The frequency of time to group by.

  • weighted (bool) – Calculate averages using weights.

  • season_config (Optional[SeasonConfigInput]) – A dictionary for “season” frequency configurations. If configs for predefined seasons are passed, configs for custom seasons are ignored and vice versa, by default DEFAULT_SEASON_CONFIG.

Raises:
  • KeyError – If the Dataset does not have a time dimension.

  • ValueError – If an incorrect freq arg was passed.

  • ValueError – If an incorrect dec_mode arg was passed.

_set_season_config_attr(season_config)[source]#
_is_valid_reference_period(reference_period)[source]#
_form_seasons(custom_seasons)[source]#

Forms custom seasons from a nested list of months.

This method concatenates the strings in each sublist to form a a flat list of custom season strings

Parameters:

custom_seasons (List[List[str]]) – List of sublists containing month strings, with each sublist representing a custom season.

Returns:

Dict[str, List[str]] – A dictionary with the keys being the custom season and the values being the corresponding list of months.

Raises:
  • ValueError – If exactly 12 months are not passed in the list of custom seasons.

  • ValueError – If a duplicate month(s) were found in the list of custom seasons.

  • ValueError – If a month string(s) is not supported.

_preprocess_dataset(ds)[source]#

Preprocess the dataset based on averaging settings.

Operations include:
  1. Drop leap days for daily climatologies.

  2. Subset the dataset based on the reference period.

  3. Shift years for custom seasons spanning the calendar year.

  4. Shift Decembers for “DJF” mode and drop incomplete “DJF” seasons, if specified.

  5. Drop incomplete seasons if specified.

Parameters:

ds (xr.Dataset) – The dataset.

Returns:

xr.Dataset

_subset_coords_for_custom_seasons(ds, months)[source]#

Subsets time coordinates to the months included in custom seasons.

Parameters:
  • ds (xr.Dataset) – The dataset.

  • months (List[str]) – A list of months included in custom seasons. Example: [“Nov”, “Dec”, “Jan”]

Returns:

xr.Dataset – The dataset with time coordinate subsetted to months used in custom seasons.

_shift_custom_season_years(ds)[source]#

Shifts the year for custom seasons spanning the calendar year.

A season spans the calendar year if it includes “Jan” and “Jan” is not the first month. For example, for custom_seasons = ["Nov", "Dec", "Jan", "Feb", "Mar"]:

  • [“Nov”, “Dec”] are from the previous year.

  • [“Jan”, “Feb”, “Mar”] are from the current year.

Therefore, [“Nov”, “Dec”] need to be shifted a year forward for correct grouping.

Parameters:

ds (xr.Dataset) – The Dataset with time coordinates.

Returns:

xr.Dataset – The Dataset with shifted time coordinates.

Examples

Before and after shifting months for “NDJFM” seasons:

>>> # Before shifting months
>>> [(2000, "NDJFM", 11), (2000, "NDJFM", 12), (2001, "NDJFM", 1),
>>>  (2001, "NDJFM", 2), (2001, "NDJFM", 3)]
>>> # After shifting months
>>> [(2001, "NDJFM", 11), (2001, "NDJFM", 12), (2001, "NDJFM", 1),
>>>  (2001, "NDJFM", 2), (2001, "NDJFM", 3)]
_shift_djf_decembers(ds)[source]#

Shifts Decembers to the next year for “DJF” seasons.

This ensures correct grouping for “DJF” seasons by shifting Decembers to the next year. Without this, grouping defaults to “JFD”, which is the native Xarray behavior.

Parameters:

ds (xr.Dataset) – The Dataset with time coordinates.

Returns:

xr.Dataset – The Dataset with shifted time coordinates.

Examples

Comparison of “JFD” and “DJF” seasons:

>>> # "JFD" (native xarray behavior)
>>> [(2000, "DJF", 1), (2000, "DJF", 2), (2000, "DJF", 12),
>>>  (2001, "DJF", 1), (2001, "DJF", 2)]
>>> # "DJF" (shifted Decembers)
>>> [(2000, "DJF", 1), (2000, "DJF", 2), (2001, "DJF", 12),
>>>  (2001, "DJF", 1), (2001, "DJF", 2)]
_drop_incomplete_djf(dataset)[source]#

Drops incomplete DJF seasons within a continuous time series.

This method assumes that the time series is continuous and removes the leading and trailing incomplete seasons (e.g., the first January and February of a time series that are not complete, because the December of the previous year is missing). This method does not account for or remove missing time steps anywhere else.

Parameters:

dataset (xr.Dataset) – The dataset with some possibly incomplete DJF seasons.

Returns:

xr.Dataset – The dataset with only complete DJF seasons.

_drop_incomplete_seasons(ds)[source]#

Drops incomplete seasons within a continuous time series.

Seasons are considered incomplete if they do not have all of the required months to form the season. For example, if we have the time coordinates [“2000-11-16”, “2000-12-16”, “2001-01-16”, “2001-02-16”] and we want to group seasons by “ND” (“Nov”, “Dec”) and “JFM” (“Jan”, “Feb”, “Mar”).

  • [“2000-11-16”, “2000-12-16”] is considered a complete “ND” season

    since both “Nov” and “Dec” are present.

  • [“2001-01-16”, “2001-02-16”] is considered an incomplete “JFM”

    season because it only has “Jan” and “Feb”. Therefore, these time coordinates are dropped.

Parameters:

df (pd.DataFrame) – A DataFrame of seasonal datetime components with potentially incomplete seasons.

Returns:

pd.DataFrame – A DataFrame of seasonal datetime components with only complete seasons.

Notes

TODO: Refactor this method to use pure Xarray/NumPy operations, rather than Pandas.

_drop_leap_days(ds)[source]#

Drop leap days from time coordinates.

This method is used to drop 2/29 from leap years (if present) before calculating climatology/departures for high frequency time series data to avoid cftime breaking (ValueError: invalid day number provided in cftime.DatetimeProlepticGregorian(1, 2, 29, 0, 0, 0, 0, has_year_zero=True).

Parameters:

ds (xr.Dataset) – The dataset.

Returns:

xr.Dataset

_average(ds, data_var)[source]#

Averages a data variable with the time dimension removed.

Parameters:
  • ds (xr.Dataset) – The dataset.

  • data_var (str) – The key of the data variable.

Returns:

xr.DataArray – The data variable averaged with the time dimension removed.

_group_average(ds, data_var)[source]#

Averages a data variable by time group.

Parameters:
  • ds (xr.Dataset) – The dataset.

  • data_var (str) – The key of the data variable.

Returns:

xr.DataArray – The data variable averaged by time group.

_get_weights(time_bounds)[source]#

Calculates weights for a data variable using time bounds.

This method gets the length of time for each coordinate point by using the difference in the upper and lower time bounds. This approach ensures that the correct time lengths are calculated regardless of how time coordinates are recorded (e.g., monthly, daily, hourly) and the calendar type used.

The time lengths are labeled and grouped, then each time length is divided by the total sum of the time lengths in its group to get its corresponding weight.

Parameters:

time_bounds (xr.DataArray) – The time bounds.

Returns:

xr.DataArray – The weights based on a specified frequency.

Notes

Refer to [4] for the supported CF convention calendar types.

References

_group_data(data_var)[source]#

Groups a data variable.

This method groups a data variable by a single datetime component for the “average” mode or labeled time coordinates for all other modes.

Parameters:

data_var (xr.DataArray) – A data variable.

Returns:

DataArrayGroupBy – A data variable grouped by label.

_label_time_coords(time_coords)[source]#

Labels time coordinates with a group for grouping.

This methods labels time coordinates for grouping by first extracting specific xarray datetime components from time coordinates and storing them in a pandas DataFrame. After processing (if necessary) is performed on the DataFrame, it is converted to a numpy array of datetime objects. This numpy array serves as the data source for the final DataArray of labeled time coordinates.

Parameters:

time_coords (xr.DataArray) – The time coordinates.

Returns:

xr.DataArray – The DataArray of labeled time coordinates for grouping.

Examples

Original daily time coordinates:

>>> <xarray.DataArray 'time' (time: 4)>
>>> array(['2000-01-01T12:00:00.000000000',
>>>        '2000-01-31T21:00:00.000000000',
>>>        '2000-03-01T21:00:00.000000000',
>>>        '2000-04-01T03:00:00.000000000'],
>>>       dtype='datetime64[ns]')
>>> Coordinates:
>>> * time     (time) datetime64[ns] 2000-01-01T12:00:00 ... 2000-04-01T03:00:00

Daily time coordinates labeled by year and month:

>>> <xarray.DataArray 'time' (time: 3)>
>>> array(['2000-01-01T00:00:00.000000000',
>>>        '2000-03-01T00:00:00.000000000',
>>>        '2000-04-01T00:00:00.000000000'],
>>>       dtype='datetime64[ns]')
>>> Coordinates:
>>> * time     (time) datetime64[ns] 2000-01-01T00:00:00 ... 2000-04-01T00:00:00
_get_df_dt_components(time_coords, drop_obsolete_cols)[source]#

Returns a DataFrame of xarray datetime components.

This method extracts the applicable xarray datetime components from each time coordinate based on the averaging mode and frequency, and stores them in a DataFrame.

Additional processing is performed for the seasonal frequency, including:

  • If custom seasons are used, map them to each time coordinate based on the middle month of the custom season.

  • If season with December is “DJF”, shift Decembers over to the next year so DJF seasons are correctly grouped using the previous year December.

  • Drop obsolete columns after processing is done.

Parameters:
  • time_coords (xr.DataArray) – The time coordinates.

  • drop_obsolete_cols (bool) – Drop obsolete columns after processing seasonal DataFrame when self._freq="season". Set to False to keep datetime columns needed for preprocessing the dataset (e.g,. removing incomplete seasons), and set to True to remove obsolete columns when needing to group time coordinates.

Returns:

pd.DataFrame – A DataFrame of datetime components.

Notes

Refer to [5] for information on xarray datetime accessor components.

References

_map_months_to_custom_seasons(df)[source]#

Maps the month column in the DataFrame to a custom season.

This method maps each integer value in the “month” column to its string represention, which then maps to a custom season that is stored in the “season” column. For example, the month of 1 maps to “Jan” and “Jan” maps to the “JanFebMar” custom season.

Parameters:

df (pd.DataFrame) – The DataFrame of xarray datetime components.

Returns:

pd.DataFrame – The DataFrame of xarray datetime coordinates, with each row mapped to a custom season.

_map_seasons_to_mid_months(df)[source]#

Maps the season column values to the integer of its middle month.

DateTime objects don’t support storing seasons as strings, so the middle months are used to represent the season. For example, for the season “DJF”, the middle month “J” is mapped to the integer value 1.

The middle month of a custom season is extracted using the ceiling of the middle index from its list of months. For example, for the custom season “FebMarAprMay” with the list of months [“Feb”, “Mar”, “Apr”, “May”], the index 3 is used to get the month “Apr”. “Apr” is then mapped to the integer value 4.

After mapping the season to its month, the “season” column is renamed to “month”.

Parameters:

df (pd.DataFrame) – The dataframe of datetime components, including a “season” column.

Returns:

pd.DataFrame – The dataframe of datetime components, including a “month” column.

_drop_obsolete_columns(df_season)[source]#

Drops obsolete columns from the DataFrame of xarray datetime components.

For the “season” frequency, processing is required on the DataFrame of xarray datetime components, such as mapping custom seasons based on the month. Additional datetime component values must be included as DataFrame columns, which become obsolete after processing is done. The obsolete columns are dropped from the DataFrame before grouping time coordinates.

Parameters:

df_season (pd.DataFrame) – The DataFrame of time coordinates for the “season” frequency with obsolete columns.

Returns:

pd.DataFrame – The DataFrame of time coordinates for the “season” frequency with obsolete columns dropped.

_convert_df_to_dt(df)[source]#

Converts a DataFrame of datetime components to cftime datetime objects.

datetime objects require at least a year, month, and day value. However, some modes and time frequencies don’t require year, month, and/or day for grouping. For these cases, use default values of 1 in order to meet this datetime requirement.

Parameters:

df (pd.DataFrame) – The DataFrame of xarray datetime components.

Returns:

np.ndarray – A numpy ndarray of cftime.datetime objects.

Notes

Refer to [6] and [7] for more information on Timestamp-valid range. We use cftime.datetime objects to avoid these time range issues.

References

_keep_weights(ds)[source]#

Keep the weights in the dataset.

The labeled time coordinates for the weights are replaced with the original time coordinates and the dimension name is appended with “_original”.

Parameters:

ds (xr.Dataset) – The dataset.

Returns:

xr.Dataset – The dataset with the weights used for averaging.

_add_operation_attrs(data_var)[source]#

Adds attributes to the data variable describing the operation. These attributes distinguish a data variable that has been operated on from its original state. The attributes in netCDF4 files do not support booleans or nested dictionaries, so booleans are converted to strings and nested dictionaries are unpacked.

Parameters:

data_var (xr.DataArray) – The data variable.

Returns:

xr.DataArray – The data variable with a temporal averaging attributes.

_calculate_departures(ds_obs, ds_climo, data_var)[source]#

Calculate the departures for a data variable.

How this methods works:

  1. Label the observational data variable’s time coordinates by their appropriate time group. For example, the first two time coordinates 2000-01-01 and 2000-02-01 are replaced with the “01-01-01” and “01-02-01” monthly groups.

  2. Calculate departures by subtracting the climatology from the labeled observational data using Xarray’s grouped arithmetic with automatic broadcasting (departures = obs - climo).

  3. Restore the original time coordinates to the departures variable to preserve the “year” of the time coordinates. For example, the first two time coordinates 01-01-01 and 01-02-01 are reverted back to 2000-01-01 and 2000-02-01.

Parameters:
  • ds_obs (xr.Dataset) – The observational dataset.

  • dv_climo (xr.Dataset) – The climatology dataset.

  • data_var (str) – The key of the data variable for calculating departures.

Returns:

xr.Dataset – The dataset containing the departures for a data variable.