An accessor class that provides temporal attributes and methods on xarray
Datasets through the .temporal attribute.
This accessor class requires the dataset’s time coordinates to be decoded as
np.datetime64 or cftime.datetime objects. The dataset must also
have time bounds to generate weights for weighted calculations and to infer
the grouping time frequency in average() (single-snap shot average).
Returns a Dataset with the average of a data variable and the time
dimension removed.
This method infers the time grouping frequency by checking the distance
between a set of upper and lower time bounds. This method is
particularly useful for calculating the weighted averages of monthly or
yearly time series data because the number of days per month/year can
vary based on the calendar type, which can affect weighting. For other
frequencies, the distribution of weights will be equal so
weighted=True is the same as weighted=False.
Time bounds are used for inferring the time series frequency and for
generating weights (refer to the weighted parameter documentation
below).
Parameters:
data_var (str) – The key of the data variable for calculating averages
weighted (bool, optional) – Calculate averages using weights, by default True.
Weights are calculated by first determining the length of time for
each coordinate point using the difference of its upper and lower
bounds. The time lengths are grouped, then each time length is
divided by the total sum of the time lengths to get the weight of
each coordinate point.
The weight of masked (missing) data is excluded when averages are
taken. This is the same as giving them a weight of 0.
Note that weights are assigned by the labeled time point. If the
dataset includes timepoints that span across typical boundaries
(e.g., a timepoint on 2020-06-01 with bounds that begin in May 2020
and end in June 2020), the weights will not be assigned properly.
See explanation in the Notes section below.
keep_weights (bool, optional) – If calculating averages using weights, keep the weights in the
final dataset output, by default False.
Returns:
xr.Dataset – Dataset with the average of the data variable and the time dimension
removed.
Notes
When using weighted averages, the weights are assigned based on the
timepoint value. For example, a time point of 2020-06-15 with bounds
(2020-06-01, 2020-06-30) has 30 days of weight assigned to June, 2020
(e.g., for an annual average calculation). This would be expected
behavior, but it’s possible that data could span across typical temporal
boundaries. For example, a time point of 2020-06-01 with bounds
(2020-05-16, 2020-06-15) would have 30 days of weight, but this weight
would be assigned to June, 2020, which would be incorrect (15 days of
weight should be assigned to May and 15 days of weight should be
assigned to June). This issue could plausibly arise when using pentad
data.
Examples
Get weighted averages for a monthly time series data variable:
Returns a Dataset with average of a data variable by time group.
Data is grouped into the labeled time point for the averaging operation.
Time bounds are used for generating weights to calculate weighted group
averages (refer to the weighted parameter documentation below).
Parameters:
data_var (str) – The key of the data variable for calculating time series averages.
freq (Frequency) – The time frequency to group by.
“year”: groups by year for yearly averages.
“season”: groups by (year, season) for seasonal averages.
“month”: groups by (year, month) for monthly averages.
“day”: groups by (year, month, day) for daily averages.
“hour”: groups by (year, month, day, hour) for hourly averages.
weighted (bool, optional) – Calculate averages using weights, by default True.
Weights are calculated by first determining the length of time for
each coordinate point using the difference of its upper and lower
bounds. The time lengths are grouped, then each time length is
divided by the total sum of the time lengths to get the weight of
each coordinate point.
The weight of masked (missing) data is excluded when averages are
calculated. This is the same as giving them a weight of 0.
Note that weights are assigned by the labeled time point. If the
dataset includes timepoints that span across typical boundaries
(e.g., a timepoint on 2020-06-01 with bounds that begin in May 2020
and end in June 2020), the weights will not be assigned properly.
See explanation in the Notes section below.
keep_weights (bool, optional) – If calculating averages using weights, keep the weights in the
final dataset output, by default False.
season_config (SeasonConfigInput, optional) – A dictionary for “season” frequency configurations. If configs for
predefined seasons are passed, configs for custom seasons are
ignored and vice versa.
Configs for predefined seasons:
“dec_mode” (Literal[“DJF”, “JFD”], by default “DJF”)
The mode for the season that includes December.
“DJF”: season includes the previous year December.
“JFD”: season includes the same year December.
Xarray labels the season with December as “DJF”, but it is
actually “JFD”.
“drop_incomplete_djf” (bool, by default False)
If the “dec_mode” is “DJF”, this flag drops (True) or keeps
(False) time coordinates that fall under incomplete DJF seasons
Incomplete DJF seasons include the start year Jan/Feb and the
end year Dec.
Configs for custom seasons:
“custom_seasons” ([List[List[str]]], by default None)
List of sublists containing month strings, with each sublist
representing a custom season.
Month strings must be in the three letter format (e.g., ‘Jan’)
Each month must be included once in a custom season
Order of the months in each custom season does not matter
Custom seasons can vary in length
>>> # Example of custom seasons in a three month format:>>> custom_seasons=[>>> ["Jan","Feb","Mar"],# "JanFebMar">>> ["Apr","May","Jun"],# "AprMayJun">>> ["Jul","Aug","Sep"],# "JulAugSep">>> ["Oct","Nov","Dec"],# "OctNovDec">>> ]
Returns:
xr.Dataset – Dataset with the average of a data variable by time group.
Notes
When using weighted averages, the weights are assigned based on the
timepoint value. For example, a time point of 2020-06-15 with bounds
(2020-06-01, 2020-06-30) has 30 days of weight assigned to June, 2020
(e.g., for an annual average calculation). This would be expected
behavior, but it’s possible that data could span across typical temporal
boundaries. For example, a time point of 2020-06-01 with bounds
(2020-05-16, 2020-06-15) would have 30 days of weight, but this weight
would be assigned to June, 2020, which would be incorrect (15 days of
weight should be assigned to May and 15 days of weight should be
assigned to June). This issue could plausibly arise when using pentad
data.
Returns a Dataset with the climatology of a data variable.
Data is grouped into the labeled time point for the averaging operation.
Time bounds are used for generating weights to calculate weighted
climatology (refer to the weighted parameter documentation below).
Parameters:
data_var (str) – The key of the data variable for calculating climatology.
freq (Frequency) – The time frequency to group by.
“season”: groups by season for the seasonal cycle climatology.
“month”: groups by month for the annual cycle climatology.
“day”: groups by (month, day) for the daily cycle climatology.
If the CF calendar type is "gregorian",
"proleptic_gregorian", or "standard", leap days (if
present) are dropped to avoid inconsistencies when calculating
climatologies. Refer to [1] for more details on this
implementation decision.
weighted (bool, optional) – Calculate averages using weights, by default True.
Weights are calculated by first determining the length of time for
each coordinate point using the difference of its upper and lower
bounds. The time lengths are grouped, then each time length is
divided by the total sum of the time lengths to get the weight of
each coordinate point.
The weight of masked (missing) data is excluded when averages are
taken. This is the same as giving them a weight of 0.
Note that weights are assigned by the labeled time point. If the
dataset includes timepoints that span across typical boundaries
(e.g., a timepoint on 2020-06-01 with bounds that begin in May 2020
and end in June 2020), the weights will not be assigned properly.
See explanation in the Notes section below.
keep_weights (bool, optional) – If calculating averages using weights, keep the weights in the
final dataset output, by default False.
reference_period (Optional[Tuple[str, str]], optional) – The climatological reference period, which is a subset of the entire
time series. This parameter accepts a tuple of strings in the format
‘yyyy-mm-dd’. For example, ('1850-01-01','1899-12-31'). If no
value is provided, the climatological reference period will be the
full period covered by the dataset.
season_config (SeasonConfigInput, optional) – A dictionary for “season” frequency configurations. If configs for
predefined seasons are passed, configs for custom seasons are
ignored and vice versa.
Configs for predefined seasons:
“dec_mode” (Literal[“DJF”, “JFD”], by default “DJF”)
The mode for the season that includes December.
“DJF”: season includes the previous year December.
“JFD”: season includes the same year December.
Xarray labels the season with December as “DJF”, but it is
actually “JFD”.
“drop_incomplete_djf” (bool, by default False)
If the “dec_mode” is “DJF”, this flag drops (True) or keeps
(False) time coordinates that fall under incomplete DJF seasons
Incomplete DJF seasons include the start year Jan/Feb and the
end year Dec.
Configs for custom seasons:
“custom_seasons” ([List[List[str]]], by default None)
List of sublists containing month strings, with each sublist
representing a custom season.
Month strings must be in the three letter format (e.g., ‘Jan’)
Each month must be included once in a custom season
Order of the months in each custom season does not matter
Custom seasons can vary in length
>>> # Example of custom seasons in a three month format:>>> custom_seasons=[>>> ["Jan","Feb","Mar"],# "JanFebMar">>> ["Apr","May","Jun"],# "AprMayJun">>> ["Jul","Aug","Sep"],# "JulAugSep">>> ["Oct","Nov","Dec"],# "OctNovDec">>> ]
Returns:
xr.Dataset – Dataset with the climatology of a data variable.
References
Notes
When using weighted averages, the weights are assigned based on the
timepoint value. For example, a time point of 2020-06-15 with bounds
(2020-06-01, 2020-06-30) has 30 days of weight assigned to June, 2020
(e.g., for an annual average calculation). This would be expected
behavior, but it’s possible that data could span across typical temporal
boundaries. For example, a time point of 2020-06-01 with bounds
(2020-05-16, 2020-06-15) would have 30 days of weight, but this weight
would be assigned to June, 2020, which would be incorrect (15 days of
weight should be assigned to May and 15 days of weight should be
assigned to June). This issue could plausibly arise when using pentad
data.
Returns a Dataset with the climatological departures (anomalies) for a
data variable.
In climatology, “anomalies” refer to the difference between the value
during a given time interval (e.g., the January average surface air
temperature) and the long-term average value for that time interval
(e.g., the average surface temperature over the last 30 Januaries).
Time bounds are used for generating weights to calculate weighted
climatology (refer to the weighted parameter documentation below).
Parameters:
data_var (str) – The key of the data variable for calculating departures.
freq (Frequency) – The frequency of time to group by.
“season”: groups by season for the seasonal cycle departures.
“month”: groups by month for the annual cycle departures.
“day”: groups by (month, day) for the daily cycle departures.
If the CF calendar type is "gregorian",
"proleptic_gregorian", or "standard", leap days (if
present) are dropped to avoid inconsistencies when calculating
climatologies. Refer to [2] for more details on this
implementation decision.
weighted (bool, optional) – Calculate averages using weights, by default True.
Weights are calculated by first determining the length of time for
each coordinate point using the difference of its upper and lower
bounds. The time lengths are grouped, then each time length is
divided by the total sum of the time lengths to get the weight of
each coordinate point.
The weight of masked (missing) data is excluded when averages are
taken. This is the same as giving them a weight of 0.
Note that weights are assigned by the labeled time point. If the
dataset includes timepoints that span across typical boundaries
(e.g., a timepoint on 2020-06-01 with bounds that begin in May 2020
and end in June 2020), the weights will not be assigned properly.
See explanation in the Notes section below.
keep_weights (bool, optional) – If calculating averages using weights, keep the weights in the
final dataset output, by default False.
reference_period (Optional[Tuple[str, str]], optional) – The climatological reference period, which is a subset of the entire
time series and used for calculating departures. This parameter
accepts a tuple of strings in the format ‘yyyy-mm-dd’. For example,
('1850-01-01','1899-12-31'). If no value is provided, the
climatological reference period will be the full period covered by
the dataset.
season_config (SeasonConfigInput, optional) – A dictionary for “season” frequency configurations. If configs for
predefined seasons are passed, configs for custom seasons are
ignored and vice versa.
Configs for predefined seasons:
“dec_mode” (Literal[“DJF”, “JFD”], by default “DJF”)
The mode for the season that includes December.
“DJF”: season includes the previous year December.
“JFD”: season includes the same year December.
Xarray labels the season with December as “DJF”, but it is
actually “JFD”.
“drop_incomplete_djf” (bool, by default False)
If the “dec_mode” is “DJF”, this flag drops (True) or keeps
(False) time coordinates that fall under incomplete DJF seasons
Incomplete DJF seasons include the start year Jan/Feb and the
end year Dec.
Configs for custom seasons:
“custom_seasons” ([List[List[str]]], by default None)
List of sublists containing month strings, with each sublist
representing a custom season.
Month strings must be in the three letter format (e.g., ‘Jan’)
Each month must be included once in a custom season
Order of the months in each custom season does not matter
Custom seasons can vary in length
>>> # Example of custom seasons in a three month format:>>> custom_seasons=[>>> ["Jan","Feb","Mar"],# "JanFebMar">>> ["Apr","May","Jun"],# "AprMayJun">>> ["Jul","Aug","Sep"],# "JulAugSep">>> ["Oct","Nov","Dec"],# "OctNovDec">>> ]
Returns:
xr.Dataset – The Dataset containing the departures for a data var’s climatology.
Notes
When using weighted averages, the weights are assigned based on the
timepoint value. For example, a time point of 2020-06-15 with bounds
(2020-06-01, 2020-06-30) has 30 days of weight assigned to June, 2020
(e.g., for an annual average calculation). This would be expected
behavior, but it’s possible that data could span across typical temporal
boundaries. For example, a time point of 2020-06-01 with bounds
(2020-05-16, 2020-06-15) would have 30 days of weight, but this weight
would be assigned to June, 2020, which would be incorrect (15 days of
weight should be assigned to May and 15 days of weight should be
assigned to June). This issue could plausibly arise when using pentad
data.
This method uses xarray’s grouped arithmetic as a shortcut for mapping
over all unique labels. Grouped arithmetic works by assigning a grouping
label to each time coordinate of the observation data based on the
averaging mode and frequency. Afterwards, the corresponding climatology
is removed from the observation data at each time coordinate based on
the matching labels.
Refer to [3] to learn more about how xarray’s grouped arithmetic works.
Validates method arguments and sets them as object attributes.
Parameters:
mode (Mode) – The mode for temporal averaging.
freq (Frequency) – The frequency of time to group by.
weighted (bool) – Calculate averages using weights.
season_config (Optional[SeasonConfigInput]) – A dictionary for “season” frequency configurations. If configs for
predefined seasons are passed, configs for custom seasons are
ignored and vice versa, by default DEFAULT_SEASON_CONFIG.
Raises:
KeyError – If the Dataset does not have a time dimension.
ValueError – If an incorrect freq arg was passed.
ValueError – If an incorrect dec_mode arg was passed.
Drops incomplete DJF seasons within a continuous time series.
This method assumes that the time series is continuous and removes the
leading and trailing incomplete seasons (e.g., the first January and
February of a time series that are not complete, because the December of
the previous year is missing). This method does not account for or
remove missing time steps anywhere else.
Parameters:
dataset (xr.Dataset) – The dataset with some possibly incomplete DJF seasons.
Returns:
xr.Dataset – The dataset with only complete DJF seasons.
This method is used to drop 2/29 from leap years (if present) before
calculating climatology/departures for high frequency time series data
to avoid cftime breaking (ValueError: invalid day number provided
in cftime.DatetimeProlepticGregorian(1, 2, 29, 0, 0, 0, 0,
has_year_zero=True).
Calculates weights for a data variable using time bounds.
This method gets the length of time for each coordinate point by using
the difference in the upper and lower time bounds. This approach ensures
that the correct time lengths are calculated regardless of how time
coordinates are recorded (e.g., monthly, daily, hourly) and the calendar
type used.
The time lengths are labeled and grouped, then each time length is
divided by the total sum of the time lengths in its group to get its
corresponding weight.
The sum of the weights for each group is validated to ensure it equals
1.0.
Parameters:
time_bounds (xr.DataArray) – The time bounds.
Returns:
xr.DataArray – The weights based on a specified frequency.
Notes
Refer to [4] for the supported CF convention calendar types.
Labels time coordinates with a group for grouping.
This methods labels time coordinates for grouping by first extracting
specific xarray datetime components from time coordinates and storing
them in a pandas DataFrame. After processing (if necessary) is performed
on the DataFrame, it is converted to a numpy array of datetime
objects. This numpy serves as the data source for the final
DataArray of labeled time coordinates.
Parameters:
time_coords (xr.DataArray) – The time coordinates.
Returns:
xr.DataArray – The DataArray of labeled time coordinates for grouping.
Returns a DataFrame of xarray datetime components.
This method extracts the applicable xarray datetime components from each
time coordinate based on the averaging mode and frequency, and stores
them in a DataFrame.
Additional processing is performed for the seasonal frequency,
including:
If custom seasons are used, map them to each time coordinate based
on the middle month of the custom season.
If season with December is “DJF”, shift Decembers over to the next
year so DJF seasons are correctly grouped using the previous year
December.
Drop obsolete columns after processing is done.
Parameters:
time_coords (xr.DataArray) – The time coordinates.
Returns:
pd.DataFrame – A DataFrame of datetime components.
Notes
Refer to [5] for information on xarray datetime accessor components.
Maps the month column in the DataFrame to a custom season.
This method maps each integer value in the “month” column to its string
represention, which then maps to a custom season that is stored in the
“season” column. For example, the month of 1 maps to “Jan” and “Jan”
maps to the “JanFebMar” custom season.
Parameters:
df (pd.DataFrame) – The DataFrame of xarray datetime components.
Returns:
pd.DataFrame – The DataFrame of xarray datetime coordinates, with each row mapped
to a custom season.
Shifts Decembers over to the next year for “DJF” seasons in-place.
For “DJF” seasons, Decembers must be shifted over to the next year in
order for the xarray groupby operation to correctly label and group the
corresponding time coordinates. If the aren’t shifted over, grouping is
incorrectly performed with the native xarray “DJF” season (which is
actually “JFD”).
Parameters:
df_season (pd.DataFrame) – The DataFrame of xarray datetime components produced using the
“season” frequency.
Returns:
pd.DataFrame – The DataFrame of xarray datetime components with Decembers shifted
over to the next year.
Maps the season column values to the integer of its middle month.
DateTime objects don’t support storing seasons as strings, so the middle
months are used to represent the season. For example, for the season
“DJF”, the middle month “J” is mapped to the integer value 1.
The middle month of a custom season is extracted using the ceiling of
the middle index from its list of months. For example, for the custom
season “FebMarAprMay” with the list of months [“Feb”, “Mar”, “Apr”,
“May”], the index 3 is used to get the month “Apr”. “Apr” is then mapped
to the integer value 4.
After mapping the season to its month, the “season” column is renamed to
“month”.
Parameters:
df (pd.DataFrame) – The dataframe of datetime components, including a “season” column.
Returns:
pd.DataFrame – The dataframe of datetime components, including a “month” column.
Drops obsolete columns from the DataFrame of xarray datetime components.
For the “season” frequency, processing is required on the DataFrame of
xarray datetime components, such as mapping custom seasons based on the
month. Additional datetime component values must be included as
DataFrame columns, which become obsolete after processing is done. The
obsolete columns are dropped from the DataFrame before grouping
time coordinates.
Parameters:
df_season (pd.DataFrame) – The DataFrame of time coordinates for the “season” frequency with
obsolete columns.
Returns:
pd.DataFrame – The DataFrame of time coordinates for the “season” frequency with
obsolete columns dropped.
Converts a DataFrame of datetime components to cftime datetime
objects.
datetime objects require at least a year, month, and day value. However,
some modes and time frequencies don’t require year, month, and/or day
for grouping. For these cases, use default values of 1 in order to
meet this datetime requirement.
Parameters:
df (pd.DataFrame) – The DataFrame of xarray datetime components.
Returns:
np.ndarray – A numpy ndarray of cftime.datetime objects.
Notes
Refer to [6] and [7] for more information on Timestamp-valid range.
We use cftime.datetime objects to avoid these time range issues.
Adds attributes to the data variable describing the operation.
These attributes distinguish a data variable that has been operated on
from its original state. The attributes in netCDF4 files do not support
booleans or nested dictionaries, so booleans are converted to strings
and nested dictionaries are unpacked.
Parameters:
data_var (xr.DataArray) – The data variable.
Returns:
xr.DataArray – The data variable with a temporal averaging attributes.