Calculate Time Averages from Time Series Data#

Updated: 04/01/24 [xcdat v0.6.1]

Related APIs:

The data used in this example can be found through the Earth System Grid Federation (ESGF) search portal.

Overview#

Suppose we have netCDF4 files for air temperature data (tas) with monthly, daily, and 3hr frequencies.

We want to calculate averages using these files with the time dimension removed (a single time snapshot), and averages by time group (yearly, seasonal, and daily).

Notebook Setup#

Create an Anaconda environment for this notebook using the command below, then select the kernel in Jupyter.

conda create -n xcdat_notebook -c conda-forge python xarray netcdf4 xcdat xesmf matplotlib nc-time-axis jupyter

xesmf is required for horizontal regridding with xESMF
matplotlib is an optional dependency required for plotting with xarray
nc-time-axis is an optional dependency required for matplotlib to plot cftime coordinates

[1]:

%matplotlib inline

import pandas as pd
import matplotlib.pyplot as plt
import xcdat

2. Calculate grouped averages#

Helpful knowledge:

Each specified frequency has predefined groups for grouping time coordinates.

The table below maps type of averages with its API frequency and grouping convention.

Type of Averages	API Frequency	Group By
Yearly	`freq=“year”`	year
Monthly	`freq=“month”`	year, month
Seasonal	`freq=“season”`	year, season
Custom seasonal	`freq="season"` and `season_config={"custom_seasons": <2D ARRAY>}`	year, season
Daily	`freq=“day”`	year, month, day
Hourly	`freq=“hour”`	year, month, day, hour

The grouping conventions are based on CDAT/cdutil, except for daily and hourly means which aren’t implemented in CDAT/cdutil.

Masked (missing) data is automatically handled.
- The weight of masked (missing) data are excluded when averages are calculated. This is the same as giving them a weight of 0.

Visualize averages derived from monthly data on a specific point#

[12]:

# plot time series of temporal averages for a specific grid point: seasonal and yearly averages derived from monthly time series
lat_point = 30
lon_point = 30

start_year = "2005-01-01"
end_year = "2014-12-31"

plt.figure(figsize=(10, 3))
ax = plt.subplot()

ds.tas.sel(lat=lat_point, lon=lon_point, time=slice(start_year, end_year)).plot(
    ax=ax, label="monthly (RAW DATA)", alpha=0.5
)
ds_season.tas.sel(lat=lat_point, lon=lon_point, time=slice(start_year, end_year)).plot(
    ax=ax, label="season", alpha=0.5
)
ds_yearly.tas.sel(lat=lat_point, lon=lon_point, time=slice(start_year, end_year)).plot(
    ax=ax, label="yearly", alpha=0.5
)

plt.title("Seasonal and yearly averages derived from monthly time series")

plt.legend()
plt.tight_layout()

../_images/examples_temporal-average_23_0.png

Monthly Averages#

Group time coordinates by year and month

For this example, we will be loading a subset of daily time series data for tas using OPeNDAP.

NOTE:

For OPeNDAP servers, the default file size request limit is 500MB in the TDS server configuration. Opening up a dataset over OPeNDAP also introduces an overhead compared to direct file access.

The workaround is to use Dask to request the data in manageable chunks, which overcomes file size limitations and can improve performance.

We have a few ways to chunk our request:

Specify chunks with "auto" to let Dask determine the chunksize.
Specify a specify the file size to chunk on (e.g., "100MB") or number of chunks as an integer (100 for 100 chunks).

Visit this page to learn more about chunking and performance: https://docs.xarray.dev/en/stable/user-guide/dask.html#chunking-and-performance

[13]:

# The size of this file is approximately 1.45 GB, so we will be chunking our
# request using Dask to avoid hitting the OPeNDAP file size request limit for
# this ESGF node.
ds2 = xcdat.open_dataset(
    "https://esgf-data1.llnl.gov/thredds/dodsC/css03_data/CMIP6/CMIP/CSIRO/ACCESS-ESM1-5/historical/r10i1p1f1/3hr/tas/gn/v20200605/tas_3hr_ACCESS-ESM1-5_historical_r10i1p1f1_gn_201001010300-201501010000.nc",
    chunks={"time": "auto"},
    add_bounds=["T"],
)

# Unit adjust (-273.15, K to C)
ds2["tas"] = ds2.tas - 273.15

ds2

[13]:

<xarray.Dataset> Size: 2GB
Dimensions:    (lat: 145, bnds: 2, lon: 192, time: 14608)
Coordinates:
  * lat        (lat) float64 1kB -90.0 -88.75 -87.5 -86.25 ... 87.5 88.75 90.0
  * lon        (lon) float64 2kB 0.0 1.875 3.75 5.625 ... 354.4 356.2 358.1
    height     float64 8B ...
  * time       (time) object 117kB 2010-01-01 03:00:00 ... 2015-01-01 00:00:00
Dimensions without coordinates: bnds
Data variables:
    lat_bnds   (lat, bnds) float64 2kB dask.array<chunksize=(145, 2), meta=np.ndarray>
    lon_bnds   (lon, bnds) float64 3kB dask.array<chunksize=(192, 2), meta=np.ndarray>
    tas        (time, lat, lon) float32 2GB dask.array<chunksize=(1205, 145, 192), meta=np.ndarray>
    time_bnds  (time, bnds) object 234kB 2010-01-01 03:00:00 ... 2015-01-01 0...
Attributes: (12/48)
    Conventions:                     CF-1.7 CMIP-6.2
    activity_id:                     CMIP
    branch_method:                   standard
    branch_time_in_child:            0.0
    branch_time_in_parent:           87658.0
    creation_date:                   2020-06-05T04:54:56Z
    ...                              ...
    variant_label:                   r10i1p1f1
    version:                         v20200605
    license:                         CMIP6 model data produced by CSIRO is li...
    cmor_version:                    3.4.0
    tracking_id:                     hdl:21.14100/b79e6a05-c482-46cf-b3b8-83b...
    DODS_EXTRA.Unlimited_Dimension:  time

[14]:

ds2_monthly_avg = ds2.temporal.group_average("tas", freq="month", weighted=True)

[15]:

ds2_monthly_avg.tas

[15]:

<xarray.DataArray 'tas' (time: 61, lat: 145, lon: 192)> Size: 14MB
dask.array<truediv, shape=(61, 145, 192), dtype=float64, chunksize=(1, 145, 192), chunktype=numpy.ndarray>
Coordinates:
  * lat      (lat) float64 1kB -90.0 -88.75 -87.5 -86.25 ... 87.5 88.75 90.0
  * lon      (lon) float64 2kB 0.0 1.875 3.75 5.625 ... 352.5 354.4 356.2 358.1
    height   float64 8B ...
  * time     (time) object 488B 2010-01-01 00:00:00 ... 2015-01-01 00:00:00
Attributes:
    operation:  temporal_avg
    mode:       group_average
    freq:       month
    weighted:   True

Daily Averages#

Group time coordinates by year, month, and day

For this example, we will be opening a subset of 3hr time series data for tas using OPeNDAP.

[16]:

# The size of this file is approximately 1.17 GB, so we will be chunking our
# request using Dask to avoid hitting the OPeNDAP file size request limit for
# this ESGF node.
ds3 = xcdat.open_dataset(
    "https://esgf-data1.llnl.gov/thredds/dodsC/css03_data/CMIP6/CMIP/CSIRO/ACCESS-ESM1-5/historical/r10i1p1f1/3hr/tas/gn/v20200605/tas_3hr_ACCESS-ESM1-5_historical_r10i1p1f1_gn_201001010300-201501010000.nc",
    chunks={"time": "auto"},
    add_bounds=["T"],
)

# Unit adjust (-273.15, K to C)
ds3["tas"] = ds3.tas - 273.15

[17]:

ds3.tas

[17]:

<xarray.DataArray 'tas' (time: 14608, lat: 145, lon: 192)> Size: 2GB
dask.array<sub, shape=(14608, 145, 192), dtype=float32, chunksize=(1205, 145, 192), chunktype=numpy.ndarray>
Coordinates:
  * lat      (lat) float64 1kB -90.0 -88.75 -87.5 -86.25 ... 87.5 88.75 90.0
  * lon      (lon) float64 2kB 0.0 1.875 3.75 5.625 ... 352.5 354.4 356.2 358.1
    height   float64 8B ...
  * time     (time) object 117kB 2010-01-01 03:00:00 ... 2015-01-01 00:00:00

[18]:

ds3_day_avg = ds3.temporal.group_average("tas", freq="day", weighted=True)

[19]:

ds3_day_avg.tas

[19]:

<xarray.DataArray 'tas' (time: 1827, lat: 145, lon: 192)> Size: 407MB
dask.array<truediv, shape=(1827, 145, 192), dtype=float64, chunksize=(1, 145, 192), chunktype=numpy.ndarray>
Coordinates:
  * lat      (lat) float64 1kB -90.0 -88.75 -87.5 -86.25 ... 87.5 88.75 90.0
  * lon      (lon) float64 2kB 0.0 1.875 3.75 5.625 ... 352.5 354.4 356.2 358.1
    height   float64 8B ...
  * time     (time) object 15kB 2010-01-01 00:00:00 ... 2015-01-01 00:00:00
Attributes:
    operation:  temporal_avg
    mode:       group_average
    freq:       day
    weighted:   True

Visualize averages derived from 3-hourly data on a specific point#

[20]:

# plot time series of temporal averages for a specific grid point: daily and monthly averages derived from 3-hourly time series
lat_point = 30
lon_point = 30

start_year = "2010-01-01"
end_year = "2014-12-31"

plt.figure(figsize=(10, 3))
ax = plt.subplot()

ds2.tas.sel(lat=lat_point, lon=lon_point, time=slice(start_year, end_year)).plot(
    ax=ax, label="3-hourly (RAW DATA)", alpha=0.5
)
ds3_day_avg.tas.sel(
    lat=lat_point, lon=lon_point, time=slice(start_year, end_year)
).plot(ax=ax, label="daily", alpha=0.5)
ds2_monthly_avg.tas.sel(
    lat=lat_point, lon=lon_point, time=slice(start_year, end_year)
).plot(ax=ax, label="monthly", alpha=0.5)

plt.title("Daily and monthly averages derived from 3-hourly time series")
plt.legend()
plt.tight_layout()

../_images/examples_temporal-average_34_0.png

Calculate Time Averages from Time Series Data

Contents

Calculate Time Averages from Time Series Data#

Overview#

Notebook Setup#

1. Calculate averages with the time dimension removed (single snapshot)#

Open the `Dataset`#

2. Calculate grouped averages#

Open the `Dataset`#

Yearly Averages#

Seasonal Averages#

Visualize averages derived from monthly data on a specific point#

Monthly Averages#

Daily Averages#

Visualize averages derived from 3-hourly data on a specific point#

Calculate Time Averages from Time Series Data

Contents

Calculate Time Averages from Time Series Data#

Overview#

Notebook Setup#

1. Calculate averages with the time dimension removed (single snapshot)#

Open the Dataset#

2. Calculate grouped averages#

Open the Dataset#

Yearly Averages#

Seasonal Averages#

Visualize averages derived from monthly data on a specific point#

Monthly Averages#

Daily Averages#

Visualize averages derived from 3-hourly data on a specific point#

Open the `Dataset`#

Open the `Dataset`#