An accessor class that provides temporal attributes and methods on xarray
Datasets through the .temporal attribute.
This accessor class requires the dataset’s time coordinates to be decoded as
np.datetime64 or cftime.datetime objects. The dataset must also
have time bounds to generate weights for weighted calculations and to infer
the grouping time frequency in average() (single-snap shot average).
Returns a Dataset with the average of a data variable and the time
dimension removed.
This method infers the time grouping frequency by checking the distance
between a set of upper and lower time bounds. This method is
particularly useful for calculating the weighted averages of monthly or
yearly time series data because the number of days per month/year can
vary based on the calendar type, which can affect weighting. For other
frequencies, the distribution of weights will be equal so
weighted=True is the same as weighted=False.
Time bounds are used for inferring the time series frequency and for
generating weights (refer to the weighted parameter documentation
data_var (str) – The key of the data variable for calculating averages
weighted (bool, optional) – Calculate averages using weights, by default True.
Weights are calculated by first determining the length of time for
each coordinate point using the difference of its upper and lower
bounds. The time lengths are grouped, then each time length is
divided by the total sum of the time lengths to get the weight of
each coordinate point.
The weight of masked (missing) data is excluded when averages are
taken. This is the same as giving them a weight of 0.
Note that weights are assigned by the labeled time point. If the
dataset includes timepoints that span across typical boundaries
(e.g., a timepoint on 2020-06-01 with bounds that begin in May 2020
and end in June 2020), the weights will not be assigned properly.
See explanation in the Notes section below.
keep_weights (bool, optional) – If calculating averages using weights, keep the weights in the
final dataset output, by default False.
xr.Dataset – Dataset with the average of the data variable and the time dimension
When using weighted averages, the weights are assigned based on the
timepoint value. For example, a time point of 2020-06-15 with bounds
(2020-06-01, 2020-06-30) has 30 days of weight assigned to June, 2020
(e.g., for an annual average calculation). This would be expected
behavior, but it’s possible that data could span across typical temporal
boundaries. For example, a time point of 2020-06-01 with bounds
(2020-05-16, 2020-06-15) would have 30 days of weight, but this weight
would be assigned to June, 2020, which would be incorrect (15 days of
weight should be assigned to May and 15 days of weight should be
assigned to June). This issue could plausibly arise when using pentad
Get weighted averages for a monthly time series data variable:
Returns a Dataset with average of a data variable by time group.
Data is grouped into the labeled time point for the averaging operation.
Time bounds are used for generating weights to calculate weighted group
averages (refer to the weighted parameter documentation below).
Deprecated since version v0.8.0: The season_config dictionary argument "drop_incomplete_djf"
is being deprecated. Please use "drop_incomplete_seasons"
data_var (str) – The key of the data variable for calculating time series averages.
freq (Frequency) – The time frequency to group by.
“year”: groups by year for yearly averages.
“season”: groups by (year, season) for seasonal averages.
“month”: groups by (year, month) for monthly averages.
“day”: groups by (year, month, day) for daily averages.
“hour”: groups by (year, month, day, hour) for hourly averages.
weighted (bool, optional) – Calculate averages using weights, by default True.
Weights are calculated by first determining the length of time for
each coordinate point using the difference of its upper and lower
bounds. The time lengths are grouped, then each time length is
divided by the total sum of the time lengths to get the weight of
each coordinate point.
The weight of masked (missing) data is excluded when averages are
calculated. This is the same as giving them a weight of 0.
Note that weights are assigned by the labeled time point. If the
dataset includes timepoints that span across typical boundaries
(e.g., a timepoint on 2020-06-01 with bounds that begin in May 2020
and end in June 2020), the weights will not be assigned properly.
See explanation in the Notes section below.
keep_weights (bool, optional) – If calculating averages using weights, keep the weights in the
final dataset output, by default False.
season_config (SeasonConfigInput, optional) – A dictionary for “season” frequency configurations. If configs for
predefined seasons are passed, configs for custom seasons are
ignored and vice versa.
“drop_incomplete_seasons” (bool, by default False)
Seasons are considered incomplete if they do not have all of
the required months to form the season. This argument supersedes
“drop_incomplete_djf”. For example, if we have
the time coordinates [“2000-11-16”, “2000-12-16”, “2001-01-16”,
“2001-02-16”] and we want to group seasons by “ND” (“Nov”,
“Dec”) and “JFM” (“Jan”, “Feb”, “Mar”).
[“2000-11-16”, “2000-12-16”] is considered a complete “ND”
season since both “Nov” and “Dec” are present.
[“2001-01-16”, “2001-02-16”] is considered an incomplete “JFM”
season because it only has “Jan” and “Feb”. Therefore, these
time coordinates are dropped.
“drop_incomplete_djf” (bool, by default False)
If the “dec_mode” is “DJF”, this flag drops (True) or keeps
(False) time coordinates that fall under incomplete DJF seasons
Incomplete DJF seasons include the start year Jan/Feb and the
end year Dec. This argument is superceded by
“drop_incomplete_seasons” and will be deprecated in a future
“dec_mode” (Literal[“DJF”, “JFD”], by default “DJF”)
The mode for the season that includes December in the list of
list of pre-defined seasons (“DJF”/”JFD”, “MAM”, “JJA”, “SON”).
This config is ignored if the custom_seasons config is set.
“DJF”: season includes the previous year December.
“JFD”: season includes the same year December.
Xarray labels the season with December as “DJF”, but it is
actually “JFD”.
“custom_seasons” ([List[List[str]]], by default None)
List of sublists containing month strings, with each sublist
representing a custom season.
Month strings must be in the three letter format (e.g., ‘Jan’)
Order of the months in each custom season does not matter
Custom seasons can vary in length
>>> # Example of custom seasons in a three month format:>>> custom_seasons=[>>> ["Jan","Feb","Mar"],# "JanFebMar">>> ["Apr","May","Jun"],# "AprMayJun">>> ["Jul","Aug","Sep"],# "JulAugSep">>> ["Oct","Nov","Dec"],# "OctNovDec">>> ]
xr.Dataset – Dataset with the average of a data variable by time group.
When using weighted averages, the weights are assigned based on the
timepoint value. For example, a time point of 2020-06-15 with bounds
(2020-06-01, 2020-06-30) has 30 days of weight assigned to June, 2020
(e.g., for an annual average calculation). This would be expected
behavior, but it’s possible that data could span across typical temporal
boundaries. For example, a time point of 2020-06-01 with bounds
(2020-05-16, 2020-06-15) would have 30 days of weight, but this weight
would be assigned to June, 2020, which would be incorrect (15 days of
weight should be assigned to May and 15 days of weight should be
assigned to June). This issue could plausibly arise when using pentad
Returns a Dataset with the climatology of a data variable.
Data is grouped into the labeled time point for the averaging operation.
Time bounds are used for generating weights to calculate weighted
climatology (refer to the weighted parameter documentation below).
Deprecated since version v0.8.0: The season_config dictionary argument "drop_incomplete_djf"
is being deprecated. Please use "drop_incomplete_seasons"
data_var (str) – The key of the data variable for calculating climatology.
freq (Frequency) – The time frequency to group by.
“season”: groups by season for the seasonal cycle climatology.
“month”: groups by month for the annual cycle climatology.
“day”: groups by (month, day) for the daily cycle climatology.
If the CF calendar type is "gregorian",
"proleptic_gregorian", or "standard", leap days (if
present) are dropped to avoid inconsistencies when calculating
climatologies. Refer to [1] for more details on this
implementation decision.
weighted (bool, optional) – Calculate averages using weights, by default True.
Weights are calculated by first determining the length of time for
each coordinate point using the difference of its upper and lower
bounds. The time lengths are grouped, then each time length is
divided by the total sum of the time lengths to get the weight of
each coordinate point.
The weight of masked (missing) data is excluded when averages are
taken. This is the same as giving them a weight of 0.
Note that weights are assigned by the labeled time point. If the
dataset includes timepoints that span across typical boundaries
(e.g., a timepoint on 2020-06-01 with bounds that begin in May 2020
and end in June 2020), the weights will not be assigned properly.
See explanation in the Notes section below.
keep_weights (bool, optional) – If calculating averages using weights, keep the weights in the
final dataset output, by default False.
reference_period (Optional[Tuple[str, str]], optional) – The climatological reference period, which is a subset of the entire
time series. This parameter accepts a tuple of strings in the format
‘yyyy-mm-dd’. For example, ('1850-01-01','1899-12-31'). If no
value is provided, the climatological reference period will be the
full period covered by the dataset.
season_config (SeasonConfigInput, optional) – A dictionary for “season” frequency configurations. If configs for
predefined seasons are passed, configs for custom seasons are
ignored and vice versa.
“drop_incomplete_seasons” (bool, by default False)
Seasons are considered incomplete if they do not have all of
the required months to form the season. This argument supersedes
“drop_incomplete_djf”. For example, if we have
the time coordinates [“2000-11-16”, “2000-12-16”, “2001-01-16”,
“2001-02-16”] and we want to group seasons by “ND” (“Nov”,
“Dec”) and “JFM” (“Jan”, “Feb”, “Mar”).
[“2000-11-16”, “2000-12-16”] is considered a complete “ND”
season since both “Nov” and “Dec” are present.
[“2001-01-16”, “2001-02-16”] is considered an incomplete “JFM”
season because it only has “Jan” and “Feb”. Therefore, these
time coordinates are dropped.
“drop_incomplete_djf” (bool, by default False)
If the “dec_mode” is “DJF”, this flag drops (True) or keeps
(False) time coordinates that fall under incomplete DJF seasons
Incomplete DJF seasons include the start year Jan/Feb and the
end year Dec. This argument is superceded by
“drop_incomplete_seasons” and will be deprecated in a future
“dec_mode” (Literal[“DJF”, “JFD”], by default “DJF”)
The mode for the season that includes December in the list of
list of pre-defined seasons (“DJF”/”JFD”, “MAM”, “JJA”, “SON”).
This config is ignored if the custom_seasons config is set.
“DJF”: season includes the previous year December.
“JFD”: season includes the same year December.
Xarray labels the season with December as “DJF”, but it is
actually “JFD”.
“custom_seasons” ([List[List[str]]], by default None)
List of sublists containing month strings, with each sublist
representing a custom season.
Month strings must be in the three letter format (e.g., ‘Jan’)
Order of the months in each custom season does not matter
Custom seasons can vary in length
>>> # Example of custom seasons in a three month format:>>> custom_seasons=[>>> ["Jan","Feb","Mar"],# "JanFebMar">>> ["Apr","May","Jun"],# "AprMayJun">>> ["Jul","Aug","Sep"],# "JulAugSep">>> ["Oct","Nov","Dec"],# "OctNovDec">>> ]
xr.Dataset – Dataset with the climatology of a data variable.
When using weighted averages, the weights are assigned based on the
timepoint value. For example, a time point of 2020-06-15 with bounds
(2020-06-01, 2020-06-30) has 30 days of weight assigned to June, 2020
(e.g., for an annual average calculation). This would be expected
behavior, but it’s possible that data could span across typical temporal
boundaries. For example, a time point of 2020-06-01 with bounds
(2020-05-16, 2020-06-15) would have 30 days of weight, but this weight
would be assigned to June, 2020, which would be incorrect (15 days of
weight should be assigned to May and 15 days of weight should be
assigned to June). This issue could plausibly arise when using pentad
Returns a Dataset with the climatological departures (anomalies) for a
data variable.
In climatology, “anomalies” refer to the difference between the value
during a given time interval (e.g., the January average surface air
temperature) and the long-term average value for that time interval
(e.g., the average surface temperature over the last 30 Januaries).
Time bounds are used for generating weights to calculate weighted
climatology (refer to the weighted parameter documentation below).
Deprecated since version v0.8.0: The season_config dictionary argument "drop_incomplete_djf"
is being deprecated. Please use "drop_incomplete_seasons"
data_var (str) – The key of the data variable for calculating departures.
freq (Frequency) – The frequency of time to group by.
“season”: groups by season for the seasonal cycle departures.
“month”: groups by month for the annual cycle departures.
“day”: groups by (month, day) for the daily cycle departures.
If the CF calendar type is "gregorian",
"proleptic_gregorian", or "standard", leap days (if
present) are dropped to avoid inconsistencies when calculating
climatologies. Refer to [2] for more details on this
implementation decision.
weighted (bool, optional) – Calculate averages using weights, by default True.
Weights are calculated by first determining the length of time for
each coordinate point using the difference of its upper and lower
bounds. The time lengths are grouped, then each time length is
divided by the total sum of the time lengths to get the weight of
each coordinate point.
The weight of masked (missing) data is excluded when averages are
taken. This is the same as giving them a weight of 0.
Note that weights are assigned by the labeled time point. If the
dataset includes timepoints that span across typical boundaries
(e.g., a timepoint on 2020-06-01 with bounds that begin in May 2020
and end in June 2020), the weights will not be assigned properly.
See explanation in the Notes section below.
keep_weights (bool, optional) – If calculating averages using weights, keep the weights in the
final dataset output, by default False.
reference_period (Optional[Tuple[str, str]], optional) – The climatological reference period, which is a subset of the entire
time series and used for calculating departures. This parameter
accepts a tuple of strings in the format ‘yyyy-mm-dd’. For example,
('1850-01-01','1899-12-31'). If no value is provided, the
climatological reference period will be the full period covered by
the dataset.
season_config (SeasonConfigInput, optional) – A dictionary for “season” frequency configurations. If configs for
predefined seasons are passed, configs for custom seasons are
ignored and vice versa.
General configs:
“drop_incomplete_seasons” (bool, by default False)
Seasons are considered incomplete if they do not have all of
the required months to form the season. This argument supersedes
“drop_incomplete_djf”. For example, if we have
the time coordinates [“2000-11-16”, “2000-12-16”, “2001-01-16”,
“2001-02-16”] and we want to group seasons by “ND” (“Nov”,
“Dec”) and “JFM” (“Jan”, “Feb”, “Mar”).
[“2000-11-16”, “2000-12-16”] is considered a complete “ND”
season since both “Nov” and “Dec” are present.
[“2001-01-16”, “2001-02-16”] is considered an incomplete “JFM”
season because it only has “Jan” and “Feb”. Therefore, these
time coordinates are dropped.
“drop_incomplete_djf” (bool, by default False)
If the “dec_mode” is “DJF”, this flag drops (True) or keeps
(False) time coordinates that fall under incomplete DJF seasons
Incomplete DJF seasons include the start year Jan/Feb and the
end year Dec. This argument is superceded by
“drop_incomplete_seasons” and will be deprecated in a future
Configs for predefined seasons:
“dec_mode” (Literal[“DJF”, “JFD”], by default “DJF”)
The mode for the season that includes December.
“DJF”: season includes the previous year December.
“JFD”: season includes the same year December.
Xarray labels the season with December as “DJF”, but it is
actually “JFD”.
Configs for custom seasons:
“custom_seasons” ([List[List[str]]], by default None)
List of sublists containing month strings, with each sublist
representing a custom season.
Month strings must be in the three letter format (e.g., ‘Jan’)
Order of the months in each custom season does not matter
Custom seasons can vary in length
>>> # Example of custom seasons in a three month format:>>> custom_seasons=[>>> ["Jan","Feb","Mar"],# "JanFebMar">>> ["Apr","May","Jun"],# "AprMayJun">>> ["Jul","Aug","Sep"],# "JulAugSep">>> ["Oct","Nov","Dec"],# "OctNovDec">>> ]
xr.Dataset – The Dataset containing the departures for a data var’s climatology.
When using weighted averages, the weights are assigned based on the
timepoint value. For example, a time point of 2020-06-15 with bounds
(2020-06-01, 2020-06-30) has 30 days of weight assigned to June, 2020
(e.g., for an annual average calculation). This would be expected
behavior, but it’s possible that data could span across typical temporal
boundaries. For example, a time point of 2020-06-01 with bounds
(2020-05-16, 2020-06-15) would have 30 days of weight, but this weight
would be assigned to June, 2020, which would be incorrect (15 days of
weight should be assigned to May and 15 days of weight should be
assigned to June). This issue could plausibly arise when using pentad
This method uses xarray’s grouped arithmetic as a shortcut for mapping
over all unique labels. Grouped arithmetic works by assigning a grouping
label to each time coordinate of the observation data based on the
averaging mode and frequency. Afterwards, the corresponding climatology
is removed from the observation data at each time coordinate based on
the matching labels.
Refer to [3] to learn more about how xarray’s grouped arithmetic works.
Validates method arguments and sets them as object attributes.
mode (Mode) – The mode for temporal averaging.
freq (Frequency) – The frequency of time to group by.
weighted (bool) – Calculate averages using weights.
season_config (Optional[SeasonConfigInput]) – A dictionary for “season” frequency configurations. If configs for
predefined seasons are passed, configs for custom seasons are
ignored and vice versa, by default DEFAULT_SEASON_CONFIG.
KeyError – If the Dataset does not have a time dimension.
ValueError – If an incorrect freq arg was passed.
ValueError – If an incorrect dec_mode arg was passed.
Shifts the year for custom seasons spanning the calendar year.
A season spans the calendar year if it includes “Jan” and “Jan” is not
the first month. For example, for
[“Nov”, “Dec”] are from the previous year.
[“Jan”, “Feb”, “Mar”] are from the current year.
Therefore, [“Nov”, “Dec”] need to be shifted a year forward for correct
ds (xr.Dataset) – The Dataset with time coordinates.
xr.Dataset – The Dataset with shifted time coordinates.
Before and after shifting months for “NDJFM” seasons:
>>> # Before shifting months>>> [(2000,"NDJFM",11),(2000,"NDJFM",12),(2001,"NDJFM",1),>>> (2001,"NDJFM",2),(2001,"NDJFM",3)]
>>> # After shifting months>>> [(2001,"NDJFM",11),(2001,"NDJFM",12),(2001,"NDJFM",1),>>> (2001,"NDJFM",2),(2001,"NDJFM",3)]
Shifts Decembers to the next year for “DJF” seasons.
This ensures correct grouping for “DJF” seasons by shifting Decembers
to the next year. Without this, grouping defaults to “JFD”, which
is the native Xarray behavior.
ds (xr.Dataset) – The Dataset with time coordinates.
xr.Dataset – The Dataset with shifted time coordinates.
Drops incomplete DJF seasons within a continuous time series.
This method assumes that the time series is continuous and removes the
leading and trailing incomplete seasons (e.g., the first January and
February of a time series that are not complete, because the December of
the previous year is missing). This method does not account for or
remove missing time steps anywhere else.
dataset (xr.Dataset) – The dataset with some possibly incomplete DJF seasons.
xr.Dataset – The dataset with only complete DJF seasons.
Drops incomplete seasons within a continuous time series.
Seasons are considered incomplete if they do not have all of the
required months to form the season. For example, if we have the time
coordinates [“2000-11-16”, “2000-12-16”, “2001-01-16”, “2001-02-16”]
and we want to group seasons by “ND” (“Nov”, “Dec”) and “JFM” (“Jan”,
“Feb”, “Mar”).
[“2000-11-16”, “2000-12-16”] is considered a complete “ND” season
since both “Nov” and “Dec” are present.
[“2001-01-16”, “2001-02-16”] is considered an incomplete “JFM”
season because it only has “Jan” and “Feb”. Therefore, these
time coordinates are dropped.
df (pd.DataFrame) – A DataFrame of seasonal datetime components with potentially
incomplete seasons.
pd.DataFrame – A DataFrame of seasonal datetime components with only complete
TODO: Refactor this method to use pure Xarray/NumPy operations, rather
than Pandas.
This method is used to drop 2/29 from leap years (if present) before
calculating climatology/departures for high frequency time series data
to avoid cftime breaking (ValueError: invalid day number provided
in cftime.DatetimeProlepticGregorian(1, 2, 29, 0, 0, 0, 0,
Calculates weights for a data variable using time bounds.
This method gets the length of time for each coordinate point by using
the difference in the upper and lower time bounds. This approach ensures
that the correct time lengths are calculated regardless of how time
coordinates are recorded (e.g., monthly, daily, hourly) and the calendar
type used.
The time lengths are labeled and grouped, then each time length is
divided by the total sum of the time lengths in its group to get its
corresponding weight.
time_bounds (xr.DataArray) – The time bounds.
xr.DataArray – The weights based on a specified frequency.
Refer to [4] for the supported CF convention calendar types.
Labels time coordinates with a group for grouping.
This methods labels time coordinates for grouping by first extracting
specific xarray datetime components from time coordinates and storing
them in a pandas DataFrame. After processing (if necessary) is performed
on the DataFrame, it is converted to a numpy array of datetime objects.
This numpy array serves as the data source for the final DataArray of
labeled time coordinates.
time_coords (xr.DataArray) – The time coordinates.
xr.DataArray – The DataArray of labeled time coordinates for grouping.
Returns a DataFrame of xarray datetime components.
This method extracts the applicable xarray datetime components from each
time coordinate based on the averaging mode and frequency, and stores
them in a DataFrame.
Additional processing is performed for the seasonal frequency,
If custom seasons are used, map them to each time coordinate based
on the middle month of the custom season.
If season with December is “DJF”, shift Decembers over to the next
year so DJF seasons are correctly grouped using the previous year
Drop obsolete columns after processing is done.
time_coords (xr.DataArray) – The time coordinates.
drop_obsolete_cols (bool) – Drop obsolete columns after processing seasonal DataFrame when
self._freq="season". Set to False to keep datetime columns
needed for preprocessing the dataset (e.g,. removing incomplete
seasons), and set to True to remove obsolete columns when needing
to group time coordinates.
pd.DataFrame – A DataFrame of datetime components.
Refer to [5] for information on xarray datetime accessor components.
Maps the month column in the DataFrame to a custom season.
This method maps each integer value in the “month” column to its string
represention, which then maps to a custom season that is stored in the
“season” column. For example, the month of 1 maps to “Jan” and “Jan”
maps to the “JanFebMar” custom season.
df (pd.DataFrame) – The DataFrame of xarray datetime components.
pd.DataFrame – The DataFrame of xarray datetime coordinates, with each row mapped
to a custom season.
Maps the season column values to the integer of its middle month.
DateTime objects don’t support storing seasons as strings, so the middle
months are used to represent the season. For example, for the season
“DJF”, the middle month “J” is mapped to the integer value 1.
The middle month of a custom season is extracted using the ceiling of
the middle index from its list of months. For example, for the custom
season “FebMarAprMay” with the list of months [“Feb”, “Mar”, “Apr”,
“May”], the index 3 is used to get the month “Apr”. “Apr” is then mapped
to the integer value 4.
After mapping the season to its month, the “season” column is renamed to
df (pd.DataFrame) – The dataframe of datetime components, including a “season” column.
pd.DataFrame – The dataframe of datetime components, including a “month” column.
Drops obsolete columns from the DataFrame of xarray datetime components.
For the “season” frequency, processing is required on the DataFrame of
xarray datetime components, such as mapping custom seasons based on the
month. Additional datetime component values must be included as
DataFrame columns, which become obsolete after processing is done. The
obsolete columns are dropped from the DataFrame before grouping
time coordinates.
df_season (pd.DataFrame) – The DataFrame of time coordinates for the “season” frequency with
obsolete columns.
pd.DataFrame – The DataFrame of time coordinates for the “season” frequency with
obsolete columns dropped.
Converts a DataFrame of datetime components to cftime datetime
datetime objects require at least a year, month, and day value. However,
some modes and time frequencies don’t require year, month, and/or day
for grouping. For these cases, use default values of 1 in order to
meet this datetime requirement.
df (pd.DataFrame) – The DataFrame of xarray datetime components.
np.ndarray – A numpy ndarray of cftime.datetime objects.
Refer to [6] and [7] for more information on Timestamp-valid range.
We use cftime.datetime objects to avoid these time range issues.
Adds attributes to the data variable describing the operation.
These attributes distinguish a data variable that has been operated on
from its original state. The attributes in netCDF4 files do not support
booleans or nested dictionaries, so booleans are converted to strings
and nested dictionaries are unpacked.
data_var (xr.DataArray) – The data variable.
xr.DataArray – The data variable with a temporal averaging attributes.
Label the observational data variable’s time coordinates by their
appropriate time group. For example, the first two time
coordinates 2000-01-01 and 2000-02-01 are replaced with the
“01-01-01” and “01-02-01” monthly groups.
Calculate departures by subtracting the climatology from the
labeled observational data using Xarray’s grouped arithmetic with
automatic broadcasting (departures = obs - climo).
Restore the original time coordinates to the departures variable
to preserve the “year” of the time coordinates. For example,
the first two time coordinates 01-01-01 and 01-02-01 are reverted
back to 2000-01-01 and 2000-02-01.
ds_obs (xr.Dataset) – The observational dataset.
dv_climo (xr.Dataset) – The climatology dataset.
data_var (str) – The key of the data variable for calculating departures.
xr.Dataset – The dataset containing the departures for a data variable.