LLNL Climate and Weather Seminar Series (01/25/2023) - A Gentle Introduction to xCDAT

LLNL Climate and Weather Seminar Series (01/25/2023) - A Gentle Introduction to xCDAT#

“A Python package for simple and robust climate data analysis.”

Core Developers: Tom Vo, Stephen Po-Chedley, Jason Boutte, Jill Zhang, Jiwoo Lee

With thanks to Peter Gleckler, Paul Durack, Karl Taylor, and Chris Golaz

Updated: 03/18/25 [v0.8.0]

This work is performed under the auspices of the U. S. DOE by Lawrence Livermore National Laboratory under contract No. DE-AC52-07NA27344.

Presentation Overview#

Intended audience: Some or no familiarity with xarray and/or xcdat

Driving force behind xCDAT
Goals and milestones of CDAT’s successor
Introducing xCDAT
Understanding the basics of Xarray
How xCDAT extends Xarray for climate data analysis
Technical design philosophy and APIs
Demo of capabilities
How to get involved

Notebook Kernel Setup#

Users can install their own instance of xcdat and follow these examples using their own environment (e.g., with VS Code, Jupyter, Spyder, iPython) or enable xcdat with existing JupyterHub instances.

First, create the conda environment:

conda create -n xcdat_notebook -c conda-forge xcdat xesmf matplotlib ipython ipykernel cartopy nc-time-axis gsw-xarray jupyter pooch

Then install the kernel from the xcdat_notebook environment using ipykernel and name the kernel with the display name (e.g., xcdat_notebook):

python -m ipykernel install --user --name xcdat_notebook --display-name xcdat_notebook

Then to select the kernel xcdat_notebook in Jupyter to use this kernel.

The Driving Force Behind xCDAT#

The CDAT (Community Data Analysis Tools) library has provided a suite of robust and comprehensive open-source climate data analysis and visualization packages for over 20 years
A driving need for a modern successor
- Focus on a maintainable and extensible library
- Serve the needs of the climate community in the long-term

Goals and Milestones for CDAT’s Successor#

Offer similar core capabilities
1. For example geospatial averaging, temporal averaging, and regridding
Use modern technologies in the library’s stack
1. Support parallelism and lazy operations
Be maintainable, extensible, and easy-to-use
1. Python Enhancement Proposals (PEPs)
2. Automate DevOps processes (unit testing, code coverage)
3. Actively maintain documentation
Cultivate an open-source community that can sustain the project
1. Encourage GitHub contributions
2. Community engagement efforts (e.g., Pangeo, ESGF)

Introducing xCDAT#

xCDAT is an extension of xarray for climate data analysis on structured grids
Goal of providing features and utilities for simple and robust analysis of climate data
Jointly developed by scientists and developers from:
- E3SM Project (Energy Exascale Earth System Model Project)
- PCMDI (Program for Climate Model Diagnosis and Intercomparison)
- SEATS Project (Simplifying ESM Analysis Through Standards Project)
- Users around the world via GitHub

Before We Dive Deeper, Let’s Talk About Xarray#

Xarray is an evolution of an internal tool developed at The Climate Corporation
Released as open source in May 2014
NumFocus fiscally sponsored project since August 2018

Key Features and Capabilities in Xarray#

“N-D labeled arrays and datasets in Python”
- Built upon and extends NumPy and pandas
Interoperable with scientific Python ecosystem including NumPy, Dask, Pandas, and Matplotlib
Supports file I/O, indexing and selecting, interpolating, grouping, aggregating, parallelism (Dask), plotting (matplotlib wrapper)
- Supported formats include: netCDF, Iris, OPeNDAP, Zarr, and GRIB

Source: https://xarray.dev/#features

Why use Xarray?#

“Xarray introduces labels in the form of dimensions, coordinates and attributes on top of raw NumPy-like multidimensional arrays, which allows for a more intuitive, more concise, and less error-prone developer experience.”

—https://xarray.pydata.org/en/v2022.10.0/getting-started-guide/why-xarray.html

Apply operations over dimensions by name
- x.sum('time')
Select values by label (or logical location) instead of integer location
- x.loc['2014-01-01'] or x.sel(time='2014-01-01')
Mathematical operations vectorize across multiple dimensions (array broadcasting) based on dimension names, not shape
- x - y
Easily use the split-apply-combine paradigm with groupby
- x.groupby('time.dayofyear').mean().
Database-like alignment based on coordinate labels that smoothly handles missing values
- x, y = xr.align(x, y, join='outer')
Keep track of arbitrary metadata in the form of a Python dictionary
- x.attrs

Source: https://docs.xarray.dev/en/v2022.10.0/getting-started-guide/why-xarray.html#what-labels-enable

Resources for Learning Xarray#

Here are some highly recommended resources:

xCDAT Extends Xarray for Climate Data Analysis#

Some key xCDAT features are inspired by or ported from the core CDAT library
- e.g., spatial averaging, temporal averaging, regrid2 for horizontal regridding
Other features leverage powerful libraries in the xarray ecosystem
- xESMF for horizontal regridding
- xgcm for vertical interpolation
- CF-xarray for CF convention metadata interpretation
xCDAT strives to support datasets CF compliant and common non-CF compliant metadata (time units in “months since …” or “years since …”)
Inherent support for lazy operations and parallelism through xarray + dask

cf-xarray logo

The Technical Design Philosophy#

Streamline the user experience of developing code to analyze climate data
Reduce the complexity and overhead for implementing certain features with xarray (e.g., temporal averaging, spatial averaging)
Encourage reusable functionalities through a single library

Leveraging the APIs#

xCDAT provides public APIs in two ways:

Top-level APIs functions
- e.g., xcdat.open_dataset(), xcdat.center_times()
- Usually for opening datasets and performing dataset level operations
Accessor classes
- xcdat provides Dataset accessors, which are implicit namespaces for custom functionality.
- Accessor namespaces clearly identifies separation from built-in xarray methods.
- Operate on variables within the xr.Dataset
- e.g., ds.spatial, ds.temporal, ds.regridder

xcdat accessor — xcdat spatial functionality is exposed by chaining the .spatial accessor attribute to the xr.Dataset object.

Source: https://xcdat.readthedocs.io/en/latest/api.html

Key Features in xCDAT#

Feature	API	Description
Extend `xr.open_dataset()` and `xr.open_mfdataset()`	`open_dataset()` `open_mfdataset()`	Bounds generation Time decoding (CF and select non-CF time units) Centering of time coordinates Conversion of longitudinal axis orientation
Temporal averaging	`ds.temporal.average()` `ds.temporal.group_average()` `ds.temporal.climatology()` `ds.temporal.departures()`	Single snapshot and group average Climatology and departure Weighted or unweighted Optional seasonal configuration< (e.g., custom seasons)
Geospatial averaging	`ds.spatial.average()`	Rectilinear grids Weighted Optional specification of region domain
Horizontal regridding	`ds.regridder.horizontal()`	Rectilinear and curvilinear grids Extends xESMF horizontal regridding Python implementation of regrid2
Vertical regridding	`ds.regridder.vertical()`	Transform vertical coordinates Extends xgcm vertical interpolation Linear, logarithmic, and conservative interpolation Decode parametric vertical coordinates if required

Parallelism with Dask#

Nearly all existing xarray methods have been extended to work automatically with Dask arrays for parallelism

—https://docs.xarray.dev/en/stable/user-guide/dask.html#using-dask-with-xarray

Parallelized xarray methods include indexing, computation, concatenating and grouped operations
xCDAT APIs that build upon xarray methods inherently support Dask parallelism
- Dask arrays are loaded into memory only when absolutely required (e.g., generating weights for averaging)

[15]:

# Use .chunk() to activate Dask arrays
# NOTE: `open_mfdataset()` automatically chunks by the number of files, which
# might not be optimal.
ds = xc.tutorial.open_dataset("tas_amon_access", chunks={"time": "auto"})
ds

[15]:

<xarray.Dataset> Size: 7MB
Dimensions:    (time: 60, bnds: 2, lat: 145, lon: 192)
Coordinates:
  * lat        (lat) float64 1kB -90.0 -88.75 -87.5 -86.25 ... 87.5 88.75 90.0
  * lon        (lon) float64 2kB 0.0 1.875 3.75 5.625 ... 354.4 356.2 358.1
    height     float64 8B ...
  * time       (time) object 480B 1870-01-16 12:00:00 ... 1874-12-16 12:00:00
Dimensions without coordinates: bnds
Data variables:
    time_bnds  (time, bnds) object 960B dask.array<chunksize=(60, 2), meta=np.ndarray>
    lat_bnds   (lat, bnds) float64 2kB dask.array<chunksize=(145, 2), meta=np.ndarray>
    lon_bnds   (lon, bnds) float64 3kB dask.array<chunksize=(192, 2), meta=np.ndarray>
    tas        (time, lat, lon) float32 7MB dask.array<chunksize=(60, 145, 192), meta=np.ndarray>
Attributes: (12/48)
    Conventions:                     CF-1.7 CMIP-6.2
    activity_id:                     CMIP
    branch_method:                   standard
    branch_time_in_child:            0.0
    branch_time_in_parent:           87658.0
    creation_date:                   2020-06-05T04:06:11Z
    ...                              ...
    variant_label:                   r10i1p1f1
    version:                         v20200605
    license:                         CMIP6 model data produced by CSIRO is li...
    cmor_version:                    3.4.0
    tracking_id:                     hdl:21.14100/af78ae5e-f3a6-4e99-8cfe-5f2...
    DODS_EXTRA.Unlimited_Dimension:  time

Further Dask Guidance#

Visit these pages for more guidance (e.g., when to parallelize):

Parallel computing with Dask (xCDAT): https://xcdat.readthedocs.io/en/latest/examples/parallel-computing-with-dask.html
Parallel computing with Dask (Xarray): https://docs.xarray.dev/en/stable/user-guide/dask.html
Xarray with Dask Arrays: https://examples.dask.org/xarray.html

Key Takeaways#

A driving need for a modern successor to CDAT
Serves the climate community in the long-term
xCDAT is an extension of xarray for climate data analysis on structured grids
Goal of providing features and utilities for simple and robust analysis of climate data

Where to Find xCDAT#

xCDAT is available for installation through Anaconda
- Install command: ``conda install -c conda-forge xcdat xesmf``
Check out xCDAT’s Read the Docs, which we strive to keep up-to-date
- https://xcdat.readthedocs.io/en/stable/

RTD screenshot

Get Involved on GitHub!#

Code contributions are welcome and appreciated
- GitHub Repository: xCDAT/xcdat
- Contributing Guide: https://xcdat.readthedocs.io/en/latest/contributing.html
Submit and/or address tickets for feature suggestions, bugs, and documentation updates
- GitHub Issues: xCDAT/xcdat#issues
Participate in forum discussions on version releases, architecture, feature suggestions, etc.
- GitHub Discussions: xCDAT/xcdat#discussions

LLNL Climate and Weather Seminar Series (01/25/2023) - A Gentle Introduction to xCDAT

Contents

LLNL Climate and Weather Seminar Series (01/25/2023) - A Gentle Introduction to xCDAT#

“A Python package for simple and robust climate data analysis.”

Core Developers: Tom Vo, Stephen Po-Chedley, Jason Boutte, Jill Zhang, Jiwoo Lee

Presentation Overview#

Notebook Kernel Setup#

The Driving Force Behind xCDAT#

Goals and Milestones for CDAT’s Successor#

Introducing xCDAT#

Before We Dive Deeper, Let’s Talk About Xarray#

Key Features and Capabilities in Xarray#

Why use Xarray?#

The Xarray Data Models#

Exploring the Xarray Data Models#

The `Dataset` Model#

The `DataArray` Model#

Resources for Learning Xarray#

xCDAT Extends Xarray for Climate Data Analysis#

The Technical Design Philosophy#

Leveraging the APIs#

Key Features in xCDAT#

A Demo of xCDAT Capabilities#

Installing `xcdat`#

Open the example dataset#

Scenario 1: Spatial Averaging#

Scenario 2: Calculate temporal average#

Scenario 3: Horizontal Regridding#

Create the output grid#

Plot the Input vs. Output Grid#

Regrid the data#

Parallelism with Dask#

Further Dask Guidance#

Key Takeaways#

Where to Find xCDAT#

Get Involved on GitHub!#

LLNL Climate and Weather Seminar Series (01/25/2023) - A Gentle Introduction to xCDAT

Contents

LLNL Climate and Weather Seminar Series (01/25/2023) - A Gentle Introduction to xCDAT#

“A Python package for simple and robust climate data analysis.”

Core Developers: Tom Vo, Stephen Po-Chedley, Jason Boutte, Jill Zhang, Jiwoo Lee

Presentation Overview#

Notebook Kernel Setup#

The Driving Force Behind xCDAT#

Goals and Milestones for CDAT’s Successor#

Introducing xCDAT#

Before We Dive Deeper, Let’s Talk About Xarray#

Key Features and Capabilities in Xarray#

Why use Xarray?#

The Xarray Data Models#

Exploring the Xarray Data Models#

The Dataset Model#

The DataArray Model#

Resources for Learning Xarray#

xCDAT Extends Xarray for Climate Data Analysis#

The Technical Design Philosophy#

Leveraging the APIs#

Key Features in xCDAT#

A Demo of xCDAT Capabilities#

Installing xcdat#

Open the example dataset#

Scenario 1: Spatial Averaging#

Scenario 2: Calculate temporal average#

Scenario 3: Horizontal Regridding#

Create the output grid#

Plot the Input vs. Output Grid#

Regrid the data#

Parallelism with Dask#

Further Dask Guidance#

Key Takeaways#

Where to Find xCDAT#

Get Involved on GitHub!#

The `Dataset` Model#

The `DataArray` Model#

Installing `xcdat`#