xCDAT logo

A Gentle Introduction to xCDAT (Xarray Climate Data Analysis Tools)#

“A Python package for simple and robust climate data analysis.”

Core Developers: Tom Vo, Stephen Po-Chedley, Jason Boutte, Jill Zhang, Jiwoo Lee

With thanks to Peter Gleckler, Paul Durack, Karl Taylor, and Chris Golaz


This work is performed under the auspices of the U. S. DOE by Lawrence Livermore National Laboratory under contract No. DE-AC52-07NA27344.

Notebook Setup#

Create an Anaconda environment for this notebook using the command below, then select the kernel in Jupyter.

conda create -n xcdat_notebook -c conda-forge python xarray netcdf4 xcdat xesmf matplotlib nc-time-axis jupyter
  • xesmf is required for horizontal regridding with xESMF

  • matplotlib is an optional dependency required for plotting with xarray

  • nc-time-axis is an optional dependency required for matplotlib to plot cftime coordinates

Presentation Overview#

Intended audience: Some or no familiarity with xarray and/or xcdat

  1. Driving force behind xCDAT

  2. Goals and milestones of CDAT’s successor

  3. Introducing xCDAT

  4. Understanding the basics of Xarray

  5. How xCDAT extends Xarray for climate data analysis

  6. Technical design philosophy and APIs

  7. Demo of capabilities

  8. How to get involved

The Driving Force Behind xCDAT#

  • The CDAT (Community Data Analysis Tools) library has provided a suite of robust and comprehensive open-source climate data analysis and visualization packages for over 20 years

  • A driving need for a modern successor

    • Focus on a maintainable and extensible library

    • Serve the needs of the climate community in the long-term

CMIP6 logo CMIP6 logo

Goals and Milestones for CDAT’s Successor#

  1. Offer similar core capabilities

    1. For example geospatial averaging, temporal averaging, and regridding

  2. Use modern technologies in the library’s stack

    1. Support parallelism and lazy operations

  3. Be maintainable, extensible, and easy-to-use

    1. Python Enhancement Proposals (PEPs)

    2. Automate DevOps processes (unit testing, code coverage)

    3. Actively maintain documentation

  4. Cultivate an open-source community that can sustain the project

    1. Encourage GitHub contributions

    2. Community engagement efforts (e.g., Pangeo, ESGF)

Introducing xCDAT#

  • xCDAT is an extension of xarray for climate data analysis on structured grids

  • Goal of providing features and utilities for simple and robust analysis of climate data

  • Jointly developed by scientists and developers from:

    • E3SM Project (Energy Exascale Earth System Model Project)

    • PCMDI (Program for Climate Model Diagnosis and Intercomparison)

    • SEATS Project (Simplifying ESM Analysis Through Standards Project)

    • Users around the world via GitHub

E3SM logo PCMDI logo SEATS logo

Before We Dive Deeper, Let’s Talk About Xarray#

  • Xarray is an evolution of an internal tool developed at The Climate Corporation

  • Released as open source in May 2014

  • NumFocus fiscally sponsored project since August 2018

xarray logo NumFOCUS logo

Key Features and Capabilities in Xarray#

  • “N-D labeled arrays and datasets in Python”

    • Built upon and extends NumPy and pandas

  • Interoperable with scientific Python ecosystem including NumPy, Dask, Pandas, and Matplotlib

  • Supports file I/O, indexing and selecting, interpolating, grouping, aggregating, parallelism (Dask), plotting (matplotlib wrapper)

    • Supported formats include: netCDF, Iris, OPeNDAP, Zarr, and GRIB

Source: https://xarray.dev/#features

Why use Xarray?#

“Xarray introduces labels in the form of dimensions, coordinates and attributes on top of raw NumPy-like multidimensional arrays, which allows for a more intuitive, more concise, and less error-prone developer experience.”

https://xarray.pydata.org/en/v2022.10.0/getting-started-guide/why-xarray.html

  • Apply operations over dimensions by name

    • x.sum('time')

  • Select values by label (or logical location) instead of integer location

    • x.loc['2014-01-01'] or x.sel(time='2014-01-01')

  • Mathematical operations vectorize across multiple dimensions (array broadcasting) based on dimension names, not shape

    • x - y

  • Easily use the split-apply-combine paradigm with groupby

    • x.groupby('time.dayofyear').mean().

  • Database-like alignment based on coordinate labels that smoothly handles missing values

    • x, y = xr.align(x, y, join='outer')

  • Keep track of arbitrary metadata in the form of a Python dictionary

    • x.attrs

Source: https://docs.xarray.dev/en/v2022.10.0/getting-started-guide/why-xarray.html#what-labels-enable

The Xarray Data Models#

“Xarray data models are borrowed from netCDF file format, which provides xarray with a natural and portable serialization format.”

https://docs.xarray.dev/en/v2022.10.0/getting-started-guide/why-xarray.html

  1. ``xarray.Dataset``

    • A dictionary-like container of DataArray objects with aligned dimensions

      • DataArray objects are classified as “coordinate variables” or “data variables”

      • All data variables have a shared union of coordinates

    • Serves a similar purpose to a pandas.DataFrame


  1. ``xarray.DataArray``

    • A class that attaches dimension names, coordinates, and attributes to multi-dimensional arrays (aka “labeled arrays”)

    • An N-D generalization of a pandas.Series

xarray logo

Exploring the Xarray Data Models#

Example dataset: tas_Amon_ACCESS-ESM1-5_historical_r10i1p1f1_gn_185001-201412.nc

  • Open up this real dataset from ESGF using xarray’s OPeNDAP support.

    • Contains the tas variable (near-surface air temperature) recorded on a monthly frequency

  • It is not downloaded until calculations/computations are performed on the Dataset object

    • Example of an xarray lazy operation

[1]:
# This style import is necessary to properly render Xarray's HTML output with
# the Jupyer RISE extension.
# GitHub Issue: https://github.com/damianavila/RISE/issues/594
# Source: https://github.com/smartass101/xarray-pydata-prague-2020/blob/main/rise.css

from IPython.core.display import HTML

style = """
<style>
.reveal pre.xr-text-repr-fallback {
    display: none;
}
.reveal ul.xr-sections {
    display: grid
}

.reveal ul ul.xr-var-list {
    display: contents
}
</style>
"""


HTML(style)
[1]:
[2]:
import xarray as xr

filepath = "https://esgf-data1.llnl.gov/thredds/dodsC/css03_data/CMIP6/CMIP/CSIRO/ACCESS-ESM1-5/historical/r10i1p1f1/Amon/tas/gn/v20200605/tas_Amon_ACCESS-ESM1-5_historical_r10i1p1f1_gn_185001-201412.nc"

ds = xr.open_dataset(filepath)

The Dataset Model#

[3]:
ds
[3]:
<xarray.Dataset>
Dimensions:    (time: 1980, bnds: 2, lat: 145, lon: 192)
Coordinates:
  * time       (time) datetime64[ns] 1850-01-16T12:00:00 ... 2014-12-16T12:00:00
  * lat        (lat) float64 -90.0 -88.75 -87.5 -86.25 ... 86.25 87.5 88.75 90.0
  * lon        (lon) float64 0.0 1.875 3.75 5.625 ... 352.5 354.4 356.2 358.1
    height     float64 ...
Dimensions without coordinates: bnds
Data variables:
    time_bnds  (time, bnds) datetime64[ns] ...
    lat_bnds   (lat, bnds) float64 ...
    lon_bnds   (lon, bnds) float64 ...
    tas        (time, lat, lon) float32 ...
Attributes: (12/48)
    Conventions:                     CF-1.7 CMIP-6.2
    activity_id:                     CMIP
    branch_method:                   standard
    branch_time_in_child:            0.0
    branch_time_in_parent:           87658.0
    creation_date:                   2020-06-05T04:06:11Z
    ...                              ...
    variant_label:                   r10i1p1f1
    version:                         v20200605
    license:                         CMIP6 model data produced by CSIRO is li...
    cmor_version:                    3.4.0
    tracking_id:                     hdl:21.14100/af78ae5e-f3a6-4e99-8cfe-5f2...
    DODS_EXTRA.Unlimited_Dimension:  time

A dictionary-like container of labeled arrays (DataArray objects) with aligned dimensions.

Key properties:

  • dims: a dictionary mapping from dimension names to the fixed length of each dimension (e.g., {‘x’: 6, ‘y’: 6, ‘time’: 8})

  • coords: another dict-like container of DataArrays intended to label points used in data_vars (e.g., arrays of numbers, datetime objects or strings)

  • data_vars: a dict-like container of DataArrays corresponding to variables

  • attrs: dict to hold arbitrary metadata

Source: https://docs.xarray.dev/en/stable/user-guide/data-structures.html#dataset

The DataArray Model#

A class that attaches dimension names, coordinates, and attributes to multi-dimensional arrays (aka “labeled arrays”)#

Key properties:

  • values: a numpy.ndarray holding the array’s values

  • dims: dimension names for each axis (e.g., (‘x’, ‘y’, ‘z’))

  • coords: a dict-like container of arrays (coordinates) that label each point (e.g., 1-dimensional arrays of numbers, datetime objects or strings)

  • attrs: dict to hold arbitrary metadata (attributes)

Source: https://docs.xarray.dev/en/stable/user-guide/data-structures.html#dataarray

[4]:
ds.tas
[4]:
<xarray.DataArray 'tas' (time: 1980, lat: 145, lon: 192)>
[55123200 values with dtype=float32]
Coordinates:
  * time     (time) datetime64[ns] 1850-01-16T12:00:00 ... 2014-12-16T12:00:00
  * lat      (lat) float64 -90.0 -88.75 -87.5 -86.25 ... 86.25 87.5 88.75 90.0
  * lon      (lon) float64 0.0 1.875 3.75 5.625 7.5 ... 352.5 354.4 356.2 358.1
    height   float64 ...
Attributes:
    standard_name:  air_temperature
    long_name:      Near-Surface Air Temperature
    comment:        near-surface (usually, 2 meter) air temperature
    units:          K
    cell_methods:   area: time: mean
    cell_measures:  area: areacella
    history:        2020-06-05T04:06:10Z altered by CMOR: Treated scalar dime...
    _ChunkSizes:    [  1 145 192]

Resources for Learning Xarray#

xCDAT Extends Xarray for Climate Data Analysis#

  • Some key xCDAT features are inspired by or ported from the core CDAT library

    • e.g., spatial averaging, temporal averaging, regrid2 for horizontal regridding

  • Other features leverage powerful libraries in the xarray ecosystem

    • xESMF for horizontal regridding

    • xgcm for vertical interpolation

    • CF-xarray for CF convention metadata interpretation

  • xCDAT strives to support datasets CF compliant and common non-CF compliant metadata (time units in “months since …” or “years since …”)

  • Inherent support for lazy operations and parallelism through xarray + dask

esmf logo cf-xarray logo dask logo

The Technical Design Philosophy#

  • Streamline the user experience of developing code to analyze climate data

  • Reduce the complexity and overhead for implementing certain features with xarray (e.g., temporal averaging, spatial averaging)

  • Encourage reusable functionalities through a single library

Leveraging the APIs#

xCDAT provides public APIs in two ways:

  1. Top-level APIs functions

    • e.g., xcdat.open_dataset(), xcdat.center_times()

    • Usually for opening datasets and performing dataset level operations

  2. Accessor classes

    • xcdat provides Dataset accessors, which are implicit namespaces for custom functionality.

    • Accessor namespaces clearly identifies separation from built-in xarray methods.

    • Operate on variables within the xr.Dataset

    • e.g., ds.spatial, ds.temporal, ds.regridder

xcdat accessor

xcdat spatial functionality is exposed by chaining the .spatial accessor attribute to the xr.Dataset object.

Source: https://xcdat.readthedocs.io/en/latest/api.html

Key Features in xCDAT#

Feature

API

Description

Extend xr.open_dataset() and xr.open_mfdataset()

  • open_dataset()

  • open_mfdataset()

  • Bounds generation

  • Time decoding (CF and select non-CF time units)

  • Centering of time coordinates

  • Conversion of longitudinal axis orientation

Temporal averaging

  • ds.temporal.average()

  • ds.temporal.group_average()

  • ds.temporal.climatology()

  • ds.temporal.departures()

  • Single snapshot and group average

  • Climatology and departure

  • Weighted or unweighted

  • Optional seasonal configuration< (e.g., custom seasons)

Geospatial averaging

ds.spatial.average()

  • Rectilinear grids

  • Weighted

  • Optional specification of region domain

Horizontal regridding

ds.regridder.horizontal()

  • Rectilinear and curvilinear grids

  • Extends xESMF horizontal regridding

  • Python implementation of regrid2

Vertical regridding

ds.regridder.vertical()

  • Transform vertical coordinates

  • Extends xgcm vertical interpolation

  • Linear, logarithmic, and conservative interpolation

  • Decode parametric vertical coordinates if required

A Demo of xCDAT Capabilities#

  • Prerequisites

    • Installing xcdat

    • Import xcdat

    • Open a dataset and apply postprocessing operations

  • Scenario 1 - Calculate the spatial averages over the tropical region

  • Scenario 2 - Calculate the annual anomalies

  • Scenario 3 - Horizontal regridding (bilinear, gaussian grid)

Installing xcdat#

xCDAT is available on Anaconda under the conda-forge channel (https://anaconda.org/conda-forge/xcdat)

Two ways to install xcdat with recommended dependencies (xesmf):

  1. Create a conda environment from scratch (conda create)

    conda create -n <ENV_NAME> -c conda-forge xcdat xesmf
    conda activate <ENV_NAME>
    
  2. Install xcdat in an existing conda environment (conda install)

    conda activate <ENV_NAME>
    conda install -c conda-forge xcdat xesmf
    
  3. If you’re working on HPC, we have a guide for that too!

Source: https://xcdat.readthedocs.io/en/latest/getting-started.html

Opening a dataset#

Example dataset: tas_Amon_ACCESS-ESM1-5_historical_r10i1p1f1_gn_185001-201412.nc (same as before)

[5]:
# This gives access to all xcdat public top-level APIs and accessor classes.
import xcdat as xc

# We import these packages specifically for plotting. It is not required to use xcdat.
import matplotlib.pyplot as plt
import pandas as pd

filepath = "https://esgf-data1.llnl.gov/thredds/dodsC/css03_data/CMIP6/CMIP/CSIRO/ACCESS-ESM1-5/historical/r10i1p1f1/Amon/tas/gn/v20200605/tas_Amon_ACCESS-ESM1-5_historical_r10i1p1f1_gn_185001-201412.nc"

ds = xc.open_dataset(
    filepath,
    add_bounds=True,
    decode_times=True,
    center_times=True
)

# Unit adjustment from Kelvin to Celcius.
ds["tas"] = ds.tas - 273.15
[6]:
ds
[6]:
<xarray.Dataset>
Dimensions:    (time: 1980, bnds: 2, lat: 145, lon: 192)
Coordinates:
  * time       (time) object 1850-01-16 12:00:00 ... 2014-12-16 12:00:00
  * lat        (lat) float64 -90.0 -88.75 -87.5 -86.25 ... 86.25 87.5 88.75 90.0
  * lon        (lon) float64 0.0 1.875 3.75 5.625 ... 352.5 354.4 356.2 358.1
    height     float64 2.0
Dimensions without coordinates: bnds
Data variables:
    time_bnds  (time, bnds) object ...
    lat_bnds   (lat, bnds) float64 ...
    lon_bnds   (lon, bnds) float64 ...
    tas        (time, lat, lon) float32 -27.19 -27.19 -27.19 ... -25.29 -25.29
Attributes: (12/48)
    Conventions:                     CF-1.7 CMIP-6.2
    activity_id:                     CMIP
    branch_method:                   standard
    branch_time_in_child:            0.0
    branch_time_in_parent:           87658.0
    creation_date:                   2020-06-05T04:06:11Z
    ...                              ...
    variant_label:                   r10i1p1f1
    version:                         v20200605
    license:                         CMIP6 model data produced by CSIRO is li...
    cmor_version:                    3.4.0
    tracking_id:                     hdl:21.14100/af78ae5e-f3a6-4e99-8cfe-5f2...
    DODS_EXTRA.Unlimited_Dimension:  time

Scenario 1: Spatial Averaging#

Related accessor: ds.spatial

In this example, we calculate the spatial average of tas over the tropical region and plot the first 100 time steps.

[7]:
ds_trop_avg = ds.spatial.average("tas", axis=["X","Y"], lat_bounds=(-25,25))
ds_trop_avg.tas
[7]:
<xarray.DataArray 'tas' (time: 1980)>
array([25.24722608, 25.61795924, 25.96516235, ..., 26.79536823,
       26.67771602, 26.27182383])
Coordinates:
  * time     (time) object 1850-01-16 12:00:00 ... 2014-12-16 12:00:00
    height   float64 2.0
[8]:
ds_trop_avg.tas.isel(time=slice(1, 100)).plot()
[8]:
[<matplotlib.lines.Line2D at 0x14bdf2170>]
../_images/examples_introduction-to-xcdat_34_1.png

Scenario 2: Calculate temporal average#

Related accessor: ds.temporal

In this example, we calculate the temporal average of ``tas`` as a single snapshot (time dimension is collapsed).

[9]:
ds_avg = ds.temporal.average("tas", weighted=True)
ds_avg.tas
[9]:
<xarray.DataArray 'tas' (lat: 145, lon: 192)>
array([[-48.01481628, -48.01481628, -48.01481628, ..., -48.01481628,
        -48.01481628, -48.01481628],
       [-44.94085363, -44.97948214, -45.01815398, ..., -44.82408252,
        -44.86273067, -44.9009281 ],
       [-44.11875274, -44.23060624, -44.33960158, ..., -43.76766492,
        -43.88593717, -44.00303006],
       ...,
       [-18.21076615, -18.17513373, -18.13957458, ..., -18.32720478,
        -18.28428828, -18.2486193 ],
       [-18.50778243, -18.49301854, -18.47902819, ..., -18.55410851,
        -18.5406963 , -18.52413098],
       [-19.07366375, -19.07366375, -19.07366375, ..., -19.07366375,
        -19.07366375, -19.07366375]])
Coordinates:
  * lat      (lat) float64 -90.0 -88.75 -87.5 -86.25 ... 86.25 87.5 88.75 90.0
  * lon      (lon) float64 0.0 1.875 3.75 5.625 7.5 ... 352.5 354.4 356.2 358.1
    height   float64 2.0
Attributes:
    operation:  temporal_avg
    mode:       average
    freq:       month
    weighted:   True
[10]:
ds_avg.tas.plot(label="weighted")
[10]:
<matplotlib.collections.QuadMesh at 0x10cc641f0>
../_images/examples_introduction-to-xcdat_37_1.png

Scenario 3: Horizontal Regridding#

Related accessor: ds.regridder

In this example, we will generate a gaussian grid with 32 latitudes to regrid our input data to.

Create the output grid#

[11]:
output_grid = xc.create_gaussian_grid(32)
output_grid
[11]:
<xarray.Dataset>
Dimensions:   (lat: 32, bnds: 2, lon: 65)
Coordinates:
  * lat       (lat) float64 85.76 80.27 74.74 69.21 ... -74.74 -80.27 -85.76
  * lon       (lon) float64 0.0 5.625 11.25 16.88 ... 343.1 348.8 354.4 360.0
Dimensions without coordinates: bnds
Data variables:
    lat_bnds  (lat, bnds) float64 90.0 83.21 83.21 77.61 ... -83.21 -83.21 -90.0
    lon_bnds  (lon, bnds) float64 -2.812 2.812 2.812 8.438 ... 357.2 357.2 362.8

Plot the Input vs. Output Grid#

[12]:
fig, axes = plt.subplots(ncols=2, figsize=(16, 6))

input_grid = ds.regridder.grid
input_grid.plot.scatter(x='lon', y='lat', s=5, ax=axes[0], add_colorbar=False, cmap=plt.cm.RdBu)
axes[0].set_title('Input Grid')

output_grid.plot.scatter(x='lon', y='lat', s=5, ax=axes[1], add_colorbar=False, cmap=plt.cm.RdBu)
axes[1].set_title('Output Grid')

plt.tight_layout()
../_images/examples_introduction-to-xcdat_42_0.png

Regrid the data#

xCDAT offers horizontal regridding with xESMF (default) and a Python port of regrid2. We will be using xESMF to regrid.

[13]:
# xesmf supports "bilinear", "conservative", "nearest_s2d", "nearest_d2s", and "patch"
output = ds.regridder.horizontal('tas', output_grid, tool='xesmf', method='bilinear')
[14]:
fig, axes = plt.subplots(ncols=2, figsize=(16, 4))

ds.tas.isel(time=0).plot(ax=axes[0])
axes[0].set_title('Input data')

output.tas.isel(time=0).plot(ax=axes[1])
axes[1].set_title('Output data')

plt.tight_layout()
../_images/examples_introduction-to-xcdat_45_0.png

Parallelism with Dask#

Dask logo

Nearly all existing xarray methods have been extended to work automatically with Dask arrays for parallelism

https://docs.xarray.dev/en/stable/user-guide/dask.html#using-dask-with-xarray

  • Parallelized xarray methods include indexing, computation, concatenating and grouped operations

  • xCDAT APIs that build upon xarray methods inherently support Dask parallelism

    • Dask arrays are loaded into memory only when absolutely required (e.g., decoding time, handling bounds)

High-level Overview of Dask Mechanics#

  • Dask divides arrays into many small pieces, called “chunks” (each presumed to be small enough to fit into memory)

  • Dask arrays operations are lazy

    • Operations queue up a series of tasks mapped over blocks

    • No computation is performed until values need to be computed (lazy)

    • Data is loaded into memory and computation is performed in streaming fashion, block-by-block

  • Computation is controlled by multi-processing or thread pool

Source: https://docs.xarray.dev/en/stable/user-guide/dask.html

How do I activate Dask with Xarray/xCDAT?#

  • The usual way to create a Dataset filled with Dask arrays is to load the data from a netCDF file or files

  • You can do this by supplying a chunks argument to open_dataset() or using the open_mfdataset() function

    • By default, open_mfdataset() will chunk each netCDF file into a single Dask array

    • Supply the chunks argument to control the size of the resulting Dask arrays

    • Xarray maintains a Dask array until it is not possible (raises an exception instead of loading into memory)

Source: https://docs.xarray.dev/en/stable/user-guide/dask.html#reading-and-writing-data

[15]:
filepath = "http://esgf.nci.org.au/thredds/dodsC/master/CMIP6/CMIP/CSIRO/ACCESS-ESM1-5/historical/r10i1p1f1/Amon/tas/gn/v20200605/tas_Amon_ACCESS-ESM1-5_historical_r10i1p1f1_gn_185001-201412.nc"

# Use .chunk() to activate Dask arrays
# NOTE: `open_mfdataset()` automatically chunks by the number of files, which
# might not be optimal.
ds = xc.open_dataset(
    filepath,
    chunks={"time": "auto"}
)
[16]:
ds
[16]:
<xarray.Dataset>
Dimensions:    (time: 1980, bnds: 2, lat: 145, lon: 192)
Coordinates:
  * time       (time) object 1850-01-16 12:00:00 ... 2014-12-16 12:00:00
  * lat        (lat) float64 -90.0 -88.75 -87.5 -86.25 ... 86.25 87.5 88.75 90.0
  * lon        (lon) float64 0.0 1.875 3.75 5.625 ... 352.5 354.4 356.2 358.1
    height     float64 ...
Dimensions without coordinates: bnds
Data variables:
    time_bnds  (time, bnds) object dask.array<chunksize=(1980, 2), meta=np.ndarray>
    lat_bnds   (lat, bnds) float64 dask.array<chunksize=(145, 2), meta=np.ndarray>
    lon_bnds   (lon, bnds) float64 dask.array<chunksize=(192, 2), meta=np.ndarray>
    tas        (time, lat, lon) float32 dask.array<chunksize=(1205, 145, 192), meta=np.ndarray>
Attributes: (12/49)
    Conventions:                     CF-1.7 CMIP-6.2
    activity_id:                     CMIP
    branch_method:                   standard
    branch_time_in_child:            0.0
    branch_time_in_parent:           87658.0
    creation_date:                   2020-06-05T04:06:11Z
    ...                              ...
    version:                         v20200605
    license:                         CMIP6 model data produced by CSIRO is li...
    cmor_version:                    3.4.0
    _NCProperties:                   version=2,netcdf=4.6.2,hdf5=1.10.5
    tracking_id:                     hdl:21.14100/af78ae5e-f3a6-4e99-8cfe-5f2...
    DODS_EXTRA.Unlimited_Dimension:  time

Example of Parallelism in xCDAT’s Spatial Averager#

This is a demonstration that chunked Dataset objects work with xCDAT APIs.

  • The generation of weights is serial

  • The weighted average operation should be parallelized (uses xarray’s .weighted() API)

  • We intend on doing performance metrics and giving guidance on when to chunk

[17]:
tas_global = ds.spatial.average("tas", axis=["X", "Y"], weights="generate")["tas"]
tas_global
[17]:
<xarray.DataArray 'tas' (time: 1980)>
dask.array<truediv, shape=(1980,), dtype=float64, chunksize=(1205,), chunktype=numpy.ndarray>
Coordinates:
  * time     (time) object 1850-01-16 12:00:00 ... 2014-12-16 12:00:00
    height   float64 ...
Attributes:
    standard_name:  air_temperature
    long_name:      Near-Surface Air Temperature
    comment:        near-surface (usually, 2 meter) air temperature
    units:          K
    cell_methods:   area: time: mean
    cell_measures:  area: areacella
    history:        2020-06-05T04:06:10Z altered by CMOR: Treated scalar dime...
    _ChunkSizes:    [  1 145 192]

Further Dask Guidance#

Visit these pages for more guidance (e.g., when to parallelize):

Key Takeaways#

  • A driving need for a modern successor to CDAT

  • Serves the climate community in the long-term

  • xCDAT is an extension of xarray for climate data analysis on structured grids

  • Goal of providing features and utilities for simple and robust analysis of climate data

xcdat logo

Get Involved on GitHub!#

GitHub logo