Frequently Asked Questions#
Metadata Interpretation#
What types of datasets does xcdat primarily focus on?#
xcdat supports datasets with structured grids that follow the CF Metadata Convention.
What structured grids are supported by xcdat?#
xCDAT aims to be a generalizable package that is compatible with structured grids that are CF-compliant (e.g., CMIP6). xCDAT’s spatial averager currently supports rectilinear grids, and the horizontal regridder supports curvilinear and rectilinear grids.
How does xcdat interpret dataset metadata?#
xcdat leverages cf_xarray to interpret CF attributes on xarray objects.
xcdat methods and functions usually accept an axis argument (e.g.,
ds.temporal.average("ts")). This argument is internally mapped to cf_xarray
mapping tables that interpret the CF attributes.
xCDAT also includes its own “fall-back” mapping table that maps axes to their commonly accepted names. Refer to this section of code for the mapping table.
What CF attributes are interpreted using cf_xarray mapping tables?#
Axis names – used to map to dimension coordinates
For example, any
xr.DataArraythat hasaxis: "X"in its attrs will be identified as the “latitude” coordinate variable bycf_xarray.Refer to the
cf_xarrayAxis Names table for more information.
Coordinate names – used to map to dimension coordinates
For example, any
xr.DataArraythat hasstandard_name: "latitude"or_CoordinateAxisType: "Lat"or"units": "degrees_north"in its attrs will be identified as the “latitude” coordinate variable bycf_xarray.Refer to the
cf_xarrayCoordinate Names table for more information.
Bounds attribute – used to map to bounds data variables
For example, the
latitudecoordinate variable hasbounds: "lat_bnds", which maps its bounds to thelat_bndsdata variable.Refer to
cf_xarrayBounds Variables page for more information.
Handling Bounds#
How are bounds generated in xCDAT?#
For the X, Y, and Z axes, xCDAT generates bounds by using coordinate points as the midpoint between their lower and upper bounds.
For the T axis, xCDAT can generate bounds either by 1) time frequency (default method) or 2) midpoints.
time frequency: create time bounds as the start and end of each timestep’s period using either the inferred or specified time frequency.
midpoint: create time bounds using time coordinates as the midpoint between their upper and lower bounds.
For more information, visit the documentation for these APIs:
Does xCDAT support generating bounds for multiple axis coordinate systems in the same dataset?#
For example, there are two sets of coordinates called “lat” and “latitude” in the dataset.
Yes, xCDAT can generate bounds for axis coordinates if they are “dimension coordinates” and have the required CF metadata. Dimension coordinates are also considered “index” coordinates in Xarray and coordinate variables in CF terminology. “Non-dimension coordinates” (auxiliary coordinate variables in CF terminology) are ignored because they aren’t used as indexes and aren’t mapped to the axes (dimensions) of variables.
Visit Xarray’s documentation page on Coordinates for more info on “dimension coordinates” vs. “non-dimension coordinates”.
Temporal Metadata#
What type of time units are supported?#
The units attribute must be in the CF compliant format
"<units> since <reference_date>". For example, "days since 1990-01-01".
Supported CF compliant units include day, hour, minute, second,
which is inherited from xarray and cftime. Supported non-CF compliant units
include year and month, which xcdat is able to parse. Note, the plural form
of these units are accepted.
References:
What type of calendars are supported?#
xcdat supports that same CF Metadata Convention calendars as xarray (based on
cftime and netCDF4-python package).
Supported calendars include:
'standard''gregorian''proleptic_gregorian''noleap''365_day''360_day''julian''all_leap''366_day'
References:
Why does xcdat decode time coordinates as cftime objects instead of datetime64?#
xcdat is designed to work reliably with climate and Earth system model datasets,
which commonly use CF calendars such as noleap, 365_day, and 360_day. These
calendars cannot be represented using NumPy or Pandas datetimes, so cftime is
required to correctly decode and interpret time coordinates.
While recent versions of pandas and xarray support non-nanosecond datetime resolutions,
native datetime types still have limitations for climate data. In particular,
datetime64[ns] is restricted to a limited date range, and non-Gregorian calendars
continue to require cftime. As a result, xarray may decode time coordinates as
either native datetimes or cftime objects depending on the dataset.
To provide consistent, predictable behavior, xcdat always decodes time coordinates
as cftime objects. This ensures full CF calendar support, avoids conditional type
switching, and allows xcdat to apply optimized lazy decoding for improved I/O
performance.
References:
xCDAT Does Not Support Model-Specific Data Wrangling#
xcdat aims to implement generalized functionality. This means that data wrangling
functionality to handle model-specific data quality issues is out of scope.
If data quality issues are present, xarray and xcdat might not be able to open
the datasets. For example, there might be cases where conflicting floating point values
exist between files of a multi-file dataset, or the dataset contains non-CF compliant
attributes that cannot be interpreted correctly by xCDAT.
A few workarounds include:
Configuring
open_dataset()oropen_mfdataset()keyword arguments based on your needs.Writing a custom
preprocess()function to feed intoopen_mfdataset(). This function preprocesses each dataset file individually before joining them into a single Dataset object.
How do I open a multi-file dataset with bounds values that conflict?#
In xarray, the default setting for checking compatibility across a multi-file dataset
is compat='no_conflicts'. In cases where variable values conflict between files,
xarray raises MergeError: conflicting values for variable <VARIABLE NAME> on objects
to be combined. You can skip this check by specifying compat="override".
If you still intend on working with these datasets and recognize the source of the issue (e.g., minor floating point diffs), follow the workarounds below. Please proceed with caution. You should understand the potential implications of these workarounds.
Pick the first bounds variable and keep dimensions the same as the input files
This option is recommended if you know bounds values should be the same across all files, but one or more files has inconsistent bounds values which breaks the concatenation of files into a single xr.Dataset object.
>>> ds = xcdat.open_mfdataset( "path/to/files/*.nc", compat="override", data_vars="minimal", coords="minimal", join="override", )
compat="override": skip comparing and pick variable from first datasetxarray defaults to
compat="no_conflicts"
data_vars="minimal": Only data variables in which the dimension already appears are included.xcdat defaults to
data_vars="minimal"xarray defaults to
data_vars="all"
coords="minimal": Only coordinates in which the dimension already appears are included.xarray defaults to
coord="different"
join="override": if indexes are of same size, rewrite indexes to be those of the first object with that dimension. Indexes for the same dimension must have the same size in all objects.Alternatively,
join="left": use indexes from the first object with each dimensionxarray defaults to
join="outer". This can cause issues where data variable values conflict because additional coordinates points are concatenated at the point of conflict which can producenanvalues.
Drop the conflicting bounds variable(s)
This option is recommended if you know don’t mind dropping the bounds variable(s). xcdat will generate and replace the dropped bounds if add_bounds includes the axis for the dropped variable (by default,
add_bounds=["X", "Y"]).
>>> # Drop single variable >>> xcdat.open_mfdataset("path/to/files/*.nc", drop_variables="lon_bnds") >>> # Drop multiple variables >>> xcdat.open_mfdataset("path/to/files/*.nc", drop_variables=["lon_bnds", "lat_bnds"])
For more information on these options, visit the xarray.open_mfdataset documentation.
Regridding#
xcdat extends and provides a uniform interface to xESMF and xgcm. In addition,
xcdat provides a port of the CDAT regrid2 package.
Structured rectilinear and curvilinear grids are supported.
How can I retrieve the grid from a dataset?#
The xcdat.regridder.accessor.RegridderAccessor.grid() property is provided to
extract the grid information from a dataset.
ds = xcdat.open_dataset(...)
grid = ds.regridder.grid
How do I perform horizontal regridding?#
The xcdat.regridder.accessor.RegridderAccessor.horizontal() method provides
access to the xESMF and Regrid2 packages.
The arguments for each regridder can be found:
An example of horizontal regridding can be found in the gallery.
How do I perform vertical regridding?#
The xcdat.regridder.accessor.RegridderAccessor.vertical() method provides
access to the xgcm package.
The arguments for each regridder can be found:
An example of vertical regridding can be found in the gallery.
Can xcdat automatically derive Parametric Vertical Coordinates in a dataset?#
Automatically deriving Parametric Vertical Coordinates is a planned feature for xcdat.
Can I regrid data on unstructured grids?#
Regridding data on unstructured grids is a feature we are exploring for xcdat.