larch.Dataset
larch.Dataset¶
- class Dataset(*args, caseid=None, alts=None, **kwargs)[source]¶
Bases:
sharrow.dataset.Dataset
A xarray.Dataset extended interface for use with Larch.
A Dataset consists of variables, coordinates and attributes which together form a self describing dataset.
Dataset implements the mapping interface with keys given by variable names and values given by DataArray objects for each variable name.
One dimensional variables with name equal to their dimension are index coordinates used for label based indexing.
For Larch, one dimension of each Dataset must typiccally be named ‘_caseid_’, and this dimension is used to identify the individual discrete choice observations or simulations in the data. The caseid argument can be used to set an existing dimension as ‘_caseid_’ on Dataset construction.
- Parameters
data_vars (dict-like, optional) –
A mapping from variable names to
DataArray
objects,Variable
objects or to tuples of the form(dims, data[, attrs])
which can be used as arguments to create a newVariable
. Each dimension must have the same length in all variables in which it appears.The following notations are accepted:
mapping {var name: DataArray}
mapping {var name: Variable}
mapping {var name: (dimension name, array-like)}
mapping {var name: (tuple of dimension names, array-like)}
mapping {dimension name: array-like} (it will be automatically moved to coords, see below)
Each dimension must have the same length in all variables in which it appears.
coords (dict-like, optional) –
Another mapping in similar form as the data_vars argument, except the each item is saved on the dataset as a “coordinate”. These variables have an associated meaning: they describe constant/fixed/independent quantities, unlike the varying/measured/dependent quantities that belong in variables. Coordinates values may be given by 1-dimensional arrays or scalars, in which case dims do not need to be supplied: 1D arrays will be assumed to give index values along the dimension with the same name.
The following notations are accepted:
mapping {coord name: DataArray}
mapping {coord name: Variable}
mapping {coord name: (dimension name, array-like)}
mapping {coord name: (tuple of dimension names, array-like)}
mapping {dimension name: array-like} (the dimension name is implicitly set to be the same as the coord name)
The last notation implies that the coord name is the same as the dimension name.
attrs (dict-like, optional) – Global attributes to save on this dataset.
caseid (str, optional, keyword only) – This named dimension will be marked as the ‘_caseid_’ dimension.
alts (str or Mapping or array-like, keyword only) – If given as a str, this named dimension will be marked as the ‘_altid_’ dimension. Otherwise, give a Mapping that defines alternative names and (integer) codes or an array of codes.
Methods
__init__
(*args, **kwargs)all
([dim])Reduce this Dataset's data by applying all along some dimension(s).
altids
()Access the altids coordinates as an index.
any
([dim])Reduce this Dataset's data by applying any along some dimension(s).
apply
(func[, keep_attrs, args])Backward compatible implementation of
map
argmax
([dim])Indices of the maxima of the member variables.
argmin
([dim])Indices of the minima of the member variables.
argsort
([axis, kind, order])Returns the indices that would sort this array.
as_numpy
()Coerces wrapped data and coordinates into numpy arrays, returning a Dataset.
as_tree
([label, exclude_dims])Convert this Dataset to a DataTree.
assign
([variables])Assign new data variables to a Dataset, returning a new object with all the original variables in addition to the new ones.
assign_attrs
(*args, **kwargs)Assign new attrs to this object.
assign_coords
([coords])Assign new coordinates to this object.
astype
(dtype, *[, order, casting, subok, ...])Copy of the xarray object, with data cast to a specified type.
at
(*[, _names, _load, _index_name])Multi-dimensional fancy indexing by label.
at_df
(df)Extract values by label on the coordinates indicated by columns of a DataFrame.
bfill
(dim[, limit])Fill NaN values by propogating values backward
broadcast_equals
(other)Two Datasets are broadcast equal if they are equal after broadcasting all variables against each other.
broadcast_like
(other[, exclude])Broadcast this DataArray against another Dataset or DataArray.
caseids
()Access the caseids coordinates as an index.
chunk
([chunks, name_prefix, token, lock])Coerce all arrays in this dataset into dask arrays with the given chunks.
clip
([min, max, keep_attrs])Return an array whose values are limited to
[min, max]
.close
()Release any resources linked to this object.
coarsen
([dim, boundary, side, coord_func])Coarsen object.
combine_first
(other)Combine two Datasets, default to data_vars of self.
compute
(**kwargs)Manually trigger loading and/or computation of this dataset's data from disk or a remote source into memory and return a new dataset.
conj
()Complex-conjugate all elements.
conjugate
()Return the complex conjugate, element-wise.
construct
(source[, caseid, alts])A generic constructor for creating Datasets from various similar objects.
convert_calendar
(calendar[, dim, align_on, ...])Convert the Dataset to another calendar.
copy
([deep, data])Returns a copy of this dataset.
count
([dim])Reduce this Dataset's data by applying count along some dimension(s).
cumprod
([dim, skipna])Apply cumprod along some dimension of Dataset.
cumsum
([dim, skipna])Apply cumsum along some dimension of Dataset.
cumulative_integrate
(coord[, datetime_unit])Integrate along the given coordinate using the trapezoidal rule.
curvefit
(coords, func[, reduce_dims, ...])Curve fitting optimization for arbitrary functions.
delete_shared_memory_files
(key)diff
(dim[, n, label])Calculate the n-th order discrete difference along given axis.
differentiate
(coord[, edge_order, datetime_unit])Differentiate with the second order accurate central differences.
dissolve_coords
(dim[, others])dissolve_zero_variance
([dim, inplace])Dissolve dimension on variables where it has no variance.
drop
([labels, dim, errors])Backward compatible method based on drop_vars and drop_sel
drop_dims
(drop_dims, *[, errors])Drop dimensions and associated variables from this dataset.
drop_isel
([indexers])Drop index positions from this Dataset.
drop_sel
([labels, errors])Drop index labels from this dataset.
drop_vars
(names, *[, errors])Drop variables from this dataset.
dropna
(dim[, how, thresh, subset])Returns a new dataset with dropped labels for missing values along the provided dimension.
dump_to_store
(store, **kwargs)Store dataset contents to a backends.*DataStore object.
ensure_integer
(names[, bitwidth, inplace])Convert dataset variables to integers, if they are not already integers.
equals
(other)Two Datasets are equal if they have matching variables and coordinates, all of which are equal.
expand_dims
([dim, axis])Return a new object with an additional axis (or axes) inserted at the corresponding position in the array shape.
explode
()ffill
(dim[, limit])Fill NaN values by propogating values forward
fillna
(value)Fill missing values in this object.
filter_by_attrs
(**kwargs)Returns a
Dataset
with variables that match specific conditions.from_amx
(amx[, index_names, indexes, renames])from_dataframe
(dataframe[, sparse])Convert a pandas.DataFrame into an xarray.Dataset
from_dataframe_fast
(dataframe[, sparse])Convert a pandas.DataFrame into an xarray.Dataset
from_dict
(d)Convert a dictionary into an xarray.Dataset.
from_idca
(df[, crack, altnames, avail, ...])Construct a Dataset from an idca-format DataFrame.
from_idce
(df[, crack, altnames, dim_name, ...])Construct a Dataset from a sparse idca-format DataFrame.
from_idco
(df[, alts])Construct a Dataset from an idco-format DataFrame.
from_named_objects
(*args)Create a Dataset by populating it with named objects.
from_omx
(omx[, index_names, indexes, renames])Create a Dataset from an OMX file.
from_omx_3d
(omx[, index_names, indexes, ...])Create a Dataset from an OMX file with an implicit third dimension.
from_shared_memory
(key[, own_data, mode])Connect to an existing Dataset in shared memory.
from_table
(tbl[, index_name, index])Convert a pyarrow.Table into an xarray.Dataset
from_zarr
(store, *args, **kwargs)Load and decode a dataset from a Zarr store.
get
(k[,d])get_expr
(expression)Access or evaluate an expression.
get_index
(key)Get an index for a dimension, with fall-back to a default RangeIndex
groupby
(group[, squeeze, restore_coord_dims])Returns a GroupBy object for performing grouped operations.
groupby_bins
(group, bins[, right, labels, ...])Returns a GroupBy object for performing grouped operations.
head
([indexers])Returns a new dataset with the first n values of each array for the specified dimension(s).
iat
(*[, _names, _load, _index_name])Multi-dimensional fancy indexing by position.
iat_df
(df)Extract values by position on the coordinates indicated by columns of a DataFrame.
identical
(other)Like equals, but also checks all dataset attributes and the attributes on all variables and coordinates.
idxmax
([dim, skipna, fill_value, keep_attrs])Return the coordinate label of the maximum value along a dimension.
idxmin
([dim, skipna, fill_value, keep_attrs])Return the coordinate label of the minimum value along a dimension.
info
([buf])Concise summary of a Dataset variables and attributes.
integrate
(coord[, datetime_unit])Integrate along the given coordinate using the trapezoidal rule.
interchange_dims
(dim1, dim2)Rename a pair of dimensions by swapping their names.
interp
([coords, method, assume_sorted, ...])Multidimensional interpolation of Dataset.
interp_calendar
(target[, dim])Interpolates the Dataset to another calendar based on decimal year measure.
interp_like
(other[, method, assume_sorted, ...])Interpolate this object onto the coordinates of another object, filling the out of range values with NaN.
interpolate_na
([dim, method, limit, ...])Fill in NaNs by interpolating according to different methods.
isel
([indexers, drop, missing_dims])Returns a new dataset with each array indexed along the specified dimension(s).
isin
(test_elements)Tests each value in the array for whether it is in test elements.
isnull
([keep_attrs])Test each value in the array for whether it is a missing value.
items
()keep_dims
(keep_dims, *[, errors])Keep only certain dimensions and associated variables from this dataset.
keys
()load
(**kwargs)Manually trigger loading and/or computation of this dataset's data from disk or a remote source into memory and return this dataset.
load_store
(store[, decoder])Create a new dataset from the contents of a backends.*DataStore object
map
(func[, keep_attrs, args])Apply a function to each variable in this dataset
map_blocks
(func[, args, kwargs, template])Apply a function to each block of this Dataset.
match_names_on
(key)max
([dim, skipna])Reduce this Dataset's data by applying max along some dimension(s).
max_float_precision
([p])Set the maximum precision for floating point values.
mean
([dim, skipna])Reduce this Dataset's data by applying mean along some dimension(s).
median
([dim, skipna])Reduce this Dataset's data by applying median along some dimension(s).
merge
(other[, overwrite_vars, compat, join, ...])Merge the arrays of two datasets into a single dataset.
min
([dim, skipna])Reduce this Dataset's data by applying min along some dimension(s).
notnull
([keep_attrs])Test each value in the array for whether it is not a missing value.
pad
([pad_width, mode, stat_length, ...])Pad this dataset along one or more dimensions.
persist
(**kwargs)Trigger computation, keeping data as dask arrays
pipe
(func, *args, **kwargs)Apply
func(self, *args, **kwargs)
polyfit
(dim, deg[, skipna, rcond, w, full, cov])Least squares polynomial fit.
preload_shared_memory_size
(key)Compute the size in bytes of a shared Dataset without actually loading it.
prod
([dim, skipna])Reduce this Dataset's data by applying prod along some dimension(s).
quantile
(q[, dim, interpolation, ...])Compute the qth quantile of the data along the specified dimension.
query
([queries, parser, engine, missing_dims])Return a new dataset with each array indexed along the specified dimension(s), where the indexers are given as strings containing Python expressions to be evaluated against the data variables in the dataset.
query_cases
(query[, parser, engine])Return a new dataset with each array indexed along the CASEID dimension.
rank
(dim[, pct, keep_attrs])Ranks the data.
reduce
(func[, dim, keep_attrs, keepdims, ...])Reduce this dataset by applying func along some dimension(s).
reindex
([indexers, method, tolerance, copy, ...])Conform this object onto a new set of indexes, filling in missing values with
fill_value
.reindex_like
(other[, method, tolerance, ...])Conform this object onto the indexes of another object, filling in missing values with
fill_value
.release_shared_memory
()Release shared memory allocated to this Dataset.
rename
([name_dict])Returns a new object with renamed variables and dimensions.
rename_dims
([dims_dict])Returns a new object with renamed dimensions only.
rename_dims_and_coords
([dims_dict])rename_or_ignore
([dims_dict])rename_vars
([name_dict])Returns a new object with renamed variables including coordinates
reorder_levels
([dim_order])Rearrange index levels using input order.
resample
([indexer, skipna, closed, label, ...])Returns a Resample object for performing resampling operations.
reset_coords
([names, drop])Given names of coordinates, reset them to become variables
reset_index
(dims_or_levels[, drop])Reset the specified index(es) or multi-index level(s).
roll
([shifts, roll_coords])Roll this dataset by an offset along one or more dimensions.
rolling
([dim, min_periods, center])Rolling window object.
rolling_exp
([window, window_type])Exponentially-weighted moving window.
round
(*args, **kwargs)Evenly round to the given number of decimals.
sel
([indexers, method, tolerance, drop])Returns a new dataset with each array indexed by tick labels along the specified dimension(s).
select_and_rename
([name_dict])Select and rename variables from this Dataset
set_altnames
(altnames[, inplace])Set the alternative names for this Dataset.
set_close
(close)Register the function that releases any resources linked to this object.
set_coords
(names)Given names of one or more variables, set them as coordinates
set_digital_encoding
(name, *args, **kwargs)set_dtypes
(dtypes[, inplace, on_error])Set the dtypes for the variables in this Dataset.
set_index
([indexes, append])Set Dataset (multi-)indexes using one or more existing coordinates or variables.
set_match_names
(names)Create a copy of this dataset with the given match_names for flowing.
setup_flow
(*args, **kwargs)Set up a new Flow for analysis using the structure of this DataTree.
shift
([shifts, fill_value])Shift this dataset by an offset along one or more dimensions.
sortby
(variables[, ascending])Sort object by labels or values (along an axis).
squash_index
([indexes_dict, set_match_names])squeeze
([dim, drop, axis])Return a new object with squeezed data.
stack
([dimensions])Stack any number of existing dimensions into a single new dimension.
std
([dim, skipna])Reduce this Dataset's data by applying std along some dimension(s).
sum
([dim, skipna])Reduce this Dataset's data by applying sum along some dimension(s).
swap_dims
([dims_dict])Returns a new object with swapped dimensions.
tail
([indexers])Returns a new dataset with the last n values of each array for the specified dimension(s).
thin
([indexers])Returns a new dataset with each array indexed along every n-th value for the specified dimension(s)
to_array
([dim, name])Convert this dataset into an xarray.DataArray
to_arrays
(graph[, float_dtype])to_dask_dataframe
([dim_order, set_index])Convert this dataset into a dask.dataframe.DataFrame.
to_dataframe
([dim_order])Convert this dataset into a pandas.DataFrame.
to_dict
([data])Convert this dataset to a dictionary following xarray naming conventions.
to_netcdf
([path, mode, format, group, ...])Write dataset contents to a netCDF file.
to_pandas
()Convert this dataset into a pandas object without changing the number of dimensions.
to_shared_memory
([key, mode])Load this Dataset into shared memory.
to_stacked_array
(new_dim, sample_dims[, ...])Combine variables of differing dimensionality into a DataArray without broadcasting.
to_zarr
(*args, **kwargs)Write dataset contents to a zarr group.
transpose
(*dims[, missing_dims])Return a new Dataset object with all array dimensions transposed.
unify_chunks
()Unify chunk size along all chunked dimensions of this Dataset.
unstack
([dim, fill_value, sparse])Unstack existing dimensions corresponding to MultiIndexes into multiple new dimensions.
update
(other)Update this dataset's variables with those from another dataset.
validate_format
()values
()var
([dim, skipna])Reduce this Dataset's data by applying var along some dimension(s).
weighted
(weights)Weighted operations.
where
(cond[, other, drop])Filter elements from this object according to a condition.
Attributes
ALTIDX
The _alt_idx_ dimension of this Dataset, if defined.
CASEALT
The _casealt_ dimension of this Dataset, if defined.
CASEPTR
The _caseptr_ dimension of this Dataset, if defined.
Mapping of alternative codes to names
Dictionary of global attributes on this dataset
Mapping from dimension names to block lengths for this dataset's data, or None if the underlying data is not a dask array.
Mapping from dimension names to block lengths for this dataset's data, or None if the underlying data is not a dask array.
Dictionary of xarray.DataArray objects corresponding to coordinate variables
Dictionary of DataArray objects corresponding to data variables
digital_encodings
All digital_encoding attributes from Dataset variables.
Mapping from dimension names to lengths.
Dictionary of global encoding attributes on this dataset
iloc
Attribute for position based indexing.
imag
Mapping of pandas.Index objects used for label based indexing.
is_shared_memory
loc
Attribute for location based indexing.
match_names
Mapping[str,str]
real
shared_memory_key
shared_memory_size
Mapping from dimension names to lengths.
variables
Low level interface to Dataset contents as dict of Variable objects.
xindexes
Mapping of xarray Index objects used for label based indexing.