xarray.Dataset.temporal.average#
- Dataset.temporal.average(data_var, weighted=True, keep_weights=False)#
Returns a Dataset with the average of a data variable and the time dimension removed.
This method infers the time grouping frequency by checking the distance between a set of upper and lower time bounds. This method is particularly useful for calculating the weighted averages of monthly or yearly time series data because the number of days per month/year can vary based on the calendar type, which can affect weighting. For other frequencies, the distribution of weights will be equal so
weighted=True
is the same asweighted=False
.Time bounds are used for inferring the time series frequency and for generating weights (refer to the
weighted
parameter documentation below).- Parameters:
data_var (
str
) – The key of the data variable for calculating averagesweighted (
bool
, optional) – Calculate averages using weights, by default True.Weights are calculated by first determining the length of time for each coordinate point using the difference of its upper and lower bounds. The time lengths are grouped, then each time length is divided by the total sum of the time lengths to get the weight of each coordinate point.
The weight of masked (missing) data is excluded when averages are taken. This is the same as giving them a weight of 0.
Note that weights are assigned by the labeled time point. If the dataset includes timepoints that span across typical boundaries (e.g., a timepoint on 2020-06-01 with bounds that begin in May 2020 and end in June 2020), the weights will not be assigned properly. See explanation in the Notes section below.
keep_weights (
bool
, optional) – If calculating averages using weights, keep the weights in the final dataset output, by default False.
- Returns:
xr.Dataset
– Dataset with the average of the data variable and the time dimension removed.
Notes
When using weighted averages, the weights are assigned based on the timepoint value. For example, a time point of 2020-06-15 with bounds (2020-06-01, 2020-06-30) has 30 days of weight assigned to June, 2020 (e.g., for an annual average calculation). This would be expected behavior, but it’s possible that data could span across typical temporal boundaries. For example, a time point of 2020-06-01 with bounds (2020-05-16, 2020-06-15) would have 30 days of weight, but this weight would be assigned to June, 2020, which would be incorrect (15 days of weight should be assigned to May and 15 days of weight should be assigned to June). This issue could plausibly arise when using pentad data.
Examples
Get weighted averages for a monthly time series data variable:
>>> ds_month = ds.temporal.average("ts") >>> ds_month.ts