Frequently Asked Questions#
Metadata Interpretation#
What types of datasets does xcdat
primarily focus on?#
xcdat
supports datasets with structured grids that follow the CF convention, but
will also strive to support datasets with common non-CF compliant metadata (e.g., time
units in “months since …” or “years since …”).
What structured grids does xcdat
support?#
xCDAT aims to be a generalizable package that is compatible with structured grids that are CF-compliant (e.g., CMIP6). xCDAT’s horizontal regridder supports grids that are supported by Regrid2 and xESMF (curvilinear and rectilinear).
How does xcdat
interpret dataset metadata?#
xcdat
leverages cf_xarray to interpret CF attributes on xarray
objects.
xcdat
methods and functions usually accept an axis
argument (e.g.,
ds.temporal.average("ts")
). This argument is internally mapped to cf_xarray
mapping tables that interpret the CF attributes.
What CF attributes are interpreted using cf_xarray
mapping tables?#
Axis names – used to map to dimension coordinates
For example, any
xr.DataArray
that hasaxis: "X"
in its attrs will be identified as the “latitude” coordinate variable bycf_xarray
.Refer to the
cf_xarray
Axis Names table for more information.
Coordinate names – used to map to dimension coordinates
For example, any
xr.DataArray
that hasstandard_name: "latitude"
or_CoordinateAxisType: "Lat"
or"units": "degrees_north"
in its attrs will be identified as the “latitude” coordinate variable bycf_xarray
.Refer to the
cf_xarray
Coordinate Names table for more information.
Bounds attribute – used to map to bounds data variables
For example, the
latitude
coordinate variable hasbounds: "lat_bnds"
, which maps its bounds to thelat_bnds
data variable.Refer to
cf_xarray
Bounds Variables page for more information.
Handling Bounds#
How are bounds generated in xCDAT?#
xCDAT generates bounds by using coordinate points as the midpoint between their lower and upper bounds.
Does xCDAT support generating bounds for multiple axis coordinate systems in the same dataset?#
For example, there are two sets of coordinates called “lat” and “latitude” in the dataset.
Yes, xCDAT can generate bounds for axis coordinates if they are “dimension coordinates” (coordinate variables in CF terminology) and have the required CF metadata. “Non-dimension coordinates” (auxiliary coordinate variables in CF terminology) are ignored.
Visit Xarray’s documentation page on Coordinates for more info on “dimension coordinates” vs. “non-dimension coordinates”.
Temporal Metadata#
What type of time units are supported?#
The units attribute must be in the CF compliant format
"<units> since <reference_date>"
. For example, "days since 1990-01-01"
.
Supported CF compliant units include day
, hour
, minute
, second
,
which is inherited from xarray
and cftime
. Supported non-CF compliant units
include year
and month
, which xcdat
is able to parse. Note, the plural form of
these units are accepted.
References:
What type of calendars are supported?#
xcdat
supports that same CF convention calendars as xarray
(based on
cftime
and netCDF4-python
package).
Supported calendars include:
'standard'
'gregorian'
'proleptic_gregorian'
'noleap'
'365_day'
'360_day'
'julian'
'all_leap'
'366_day'
References:
Why does xcdat
decode time coordinates as cftime
objects instead of datetime64[ns]
?#
One unfortunate limitation of using datetime64[ns]
is that it limits the native
representation of dates to those that fall between the years 1678 and 2262. This affects
climate modeling datasets that have time coordinates outside of this range.
As a workaround, xarray
uses the cftime
library when decoding/encoding
datetimes for non-standard calendars or for dates before year 1678 or after year 2262.
xcdat
opted to decode time coordinates exclusively with cftime
because it
has no timestamp range limitations, simplifies implementation, and the output object
type is deterministic.
References:
Data Wrangling#
xcdat
aims to implement generalized functionality. This means that functionality
intended to handle data quality issues is out of scope, especially for limited cases.
If data quality issues are present, xarray
and xcdat
might not be able to open
the datasets. Examples of data quality issues include conflicting floating point values
between files or non-CF compliant attributes that are not common.
A few workarounds include:
Configuring
open_dataset()
oropen_mfdataset()
keyword arguments based on your needs.Writing a custom
preprocess()
function to feed intoopen_mfdataset()
. This function preprocesses each dataset file individually before joining them into a single Dataset object.
How do I open a multi-file dataset with bounds values that conflict?#
In xarray
, the default setting for checking compatibility across a multi-file dataset
is compat='no_conflicts'
. In cases where variable values conflict between files,
xarray raises MergeError: conflicting values for variable <VARIABLE NAME> on objects
to be combined. You can skip this check by specifying compat="override".
If you still intend on working with these datasets and recognize the source of the issue (e.g., minor floating point diffs), follow the workarounds below. Please proceed with caution. You should understand the potential implications of these workarounds.
Pick the first bounds variable and keep dimensions the same as the input files
This option is recommended if you know bounds values should be the same across all files, but one or more files has inconsistent bounds values which breaks the concatenation of files into a single xr.Dataset object.
>>> ds = xcdat.open_mfdataset( "path/to/files/*.nc", compat="override", data_vars="minimal", coords="minimal", join="override", )
compat="override"
: skip comparing and pick variable from first datasetxarray defaults to
compat="no_conflicts"
data_vars="minimal"
: Only data variables in which the dimension already appears are included.xcdat defaults to
data_vars="minimal"
xarray defaults to
data_vars="all"
coords="minimal"
: Only coordinates in which the dimension already appears are included.xarray defaults to
coord="different"
join="override"
: if indexes are of same size, rewrite indexes to be those of the first object with that dimension. Indexes for the same dimension must have the same size in all objects.Alternatively,
join="left"
: use indexes from the first object with each dimensionxarray defaults to
join="outer"
. This can cause issues where data variable values conflict because additional coordinates points are concatenated at the point of conflict which can producenan
values.
Drop the conflicting bounds variable(s)
This option is recommended if you know don’t mind dropping the bounds variable(s). xcdat will generate and replace the dropped bounds if add_bounds includes the axis for the dropped variable (by default,
add_bounds=["X", "Y"]
).
>>> # Drop single variable >>> xcdat.open_mfdataset("path/to/files/*.nc", drop_variables="lon_bnds") >>> # Drop multiple variables >>> xcdat.open_mfdataset("path/to/files/*.nc", drop_variables=["lon_bnds", "lat_bnds"])
For more information on these options, visit the xarray.open_mfdataset documentation.