larch.DataFrames
larch.DataFrames¶
- class DataFrames(co=None, *args, ca=None, ce=None, av=None, ch=None, wt=None, data_co=None, data_ca=None, data_ce=None, data_av=None, data_ch=None, data_wt=None, alt_names=None, alt_codes=None, crack=False, av_name=None, ch_name=None, wt_name=None, av_as_ce=None, ch_as_ce=None, sys_alts=None, computational=False, caseindex_name='_caseid_', altindex_name='_altid_', autoscale_weights=False)¶
Bases:
object
A structured class to hold multi-format discrete choice data.
- Parameters
co (pandas.DataFrame) – A dataframe containing idco format data, with one row per case. The index contains the caseid’s.
ca (pandas.DataFrame) – A dataframe containing idca format data, with one row per alternative. The index should be a two-level multi-index, with the first level containing the caseid’s and the second level containing the altid’s. Provide either dense ca or sparse ce data, but not both.
ce (pandas.DataFrame) – A dataframe containing idca format data, with one row per alternative. The index should be a two-level multi-index, with the first level containing the caseid’s and the second level containing the altid’s. Provide either dense ca or sparse ce data, but not both.
av (pandas.DataFrame or pandas.Series or True, optional) – Alternative availability data. This can be given as a pandas.DataFrame in idco format, with one row per case and one column per alternative, where the index contains the caseid’s, and the columns contain the altid’s. Or, it can be given as a pandas.Series in idca format, with one row per alternative, and an index that is a two-level multi-index, with the first level containing the caseid’s and the second level containing the altid’s. Or, set to True to make all alternatives available for all cases. If not given, then data_av will not be defined unless it can be inferred from missing rows in ca.
ch (pandas.DataFrame or pandas.Series or str, optional) – Choice data. This can be given as a pandas.DataFrame in idco format, with one row per case and one column per alternative, where the index contains the caseid’s, and the columns contain the altid’s. Or, it can be given as a pandas.Series in idca format, with one row per alternative, and an index that is a two-level multi-index, with the first level containing the caseid’s and the second level containing the altid’s. Or, if given as a str, then that named column is found in the ca dataframe if it appears there and used as the choice. Otherwise, if the named column is found in the co dataframe, then the codes in that column are used to identify the choices. If not given, data_ch is not set.
wt (pandas.DataFrame or pandas.Series or str, optional) – Case weights. This can only be given in idco format, either as a pandas.DataFrame with a single column, or as a pandas.Series. Or, if given as a str, then that named column is found in the co or ca dataframe if it appears there and used as the weight. If not given, data_wt is not set.
alt_names (Sequence[str]) – A sequence of alternative names as str.
alt_codes (Sequence[int]) – A sequence of alternative codes.
crack (bool, default False) – Whether to pre-process ca data to identify variables that do not vary within cases, and move them to a new co dataframe. This can result in more computationally efficient model estimation, but the cracking process can be slow for large data sets.
av_name (str, optional) – A name to use for the availability variable. If not given, it is inferred from the av argument if possible.
ch_name (str, optional) – A name to use for the choice variable. If not given, it is inferred from the ch argument if possible.
wt_name (str, optional) – A name to use for the weight variable. If not given, it is inferred from the wt argument if possible.
autoscale_weights (bool, default False) – Call autoscale_weights on the DataFrames after initialization. Note that this will not only scale an explicitly given wt, but it will also extract implied weights from the ch as well.
- __init__(*args, **kwargs)¶
Methods
__init__
(*args, **kwargs)alternative_codes
(self)The alternative codes.
alternative_names
(self)The alternative names.
array_av
(self[, dtype])array_ca
(self[, dtype, force])array_ce
(self[, dtype, force])array_ch
(self[, dtype])array_ch_as_ce
(self[, dtype])array_co
(self[, dtype, force])array_not_av
(self[, dtype])array_to_ce
(self, arr)Convert a single case-alt array to an sparse vs dense Series.
array_wt
(self[, dtype])array_wt_as_ce
(self[, dtype])autoscale_weights
(self)Scale the weights so the average weight is 1.
check_data_is_sufficient_for_model
(self, ...)Check that probabilities can be found from the attached data.
choice_avail_summary
(self[, graph, ...])Generate a summary of choice and availability statistics.
compute_d_utility_onecase
(self, int c, ...)compute_utility_onecase
(self, int c, ...)data_av_as_ce
(self)Reformat avail data into idce format.
data_av_cascade
(self, graph)Create an extra wide dataframe with availability rolled up to nests.
data_ca_as_ce
(self)Reformat any idca data into idce format.
data_ca_combined
(self)Return a combined DataFrame in idca format that includes idco and idca data.
data_ce_as_ca
(self[, promote])Reformat any idce data into idca format.
data_ch_as_ce
(self)Reformat choice data into idce format.
data_ch_cascade
(self, graph)Create an extra wide dataframe with choices rolled up to nests.
data_co_as_ce
(self)Reformat any idco data into idce format.
data_co_combined
(self)Return a combined DataFrame in idco format that includes idco and idca data.
data_wt_as_ce
(self)dump
(self, filename, **kwargs)Persist this DataFrames object into one file.
from_feathers
(type cls, filename[, components])from_idce
(type cls, ce[, choice, columns, ...])Create DataFrames from a single idce format DataFrame.
get_zero_quantity_ca
(self)Find all alternatives with zeros across all quantity values.
info
(self[, verbose, out])Print info about this DataFrames.
inject_feathers
(self, filename[, components])Read data from a collection of Feather files.
is_computational_ready
(self, bool activate=False)Check if this DataFrames is or can be computational with no data conversion.
link_to_model_parameters
(self, model[, logger])load
(type cls, filename)Reconstruct a DataFrames object from a file persisted with DataFrames.dump.
make_dataframes
(self, req_data, *[, ...])Create a DataFrames object that will satisfy a data request.
make_idca
(self, *columns[, selector, ...])Extract a set of idca values into a new dataframe.
make_idco
(self, *columns[, selector, ...])Extract a set of idco values into a new dataframe.
make_mnl
(self)Generate a simple MNL model that uses the entire data_ca and data_co dataframes.
new_systematic_alternatives
(self, groupby[, ...])Create new systematic alternatives.
read_in_model_parameters
(self)scale_weights
(self, scale)Scale the weights by a fixed exogenous value.
selector_co
(self, co_expr)Filter a DataFrames object based on an idco selector expression.
set_alternative_names
(self, names)Set the alternative names.
set_data_ch_wide
(self, df, graph)Write an extra wide choice dataframe with choices also on nests.
split
(self, splits[, method])Generate a train/test or similar multi-part split of the data.
standardize
(self[, with_mean, with_std])Standardize the data in idco, idca, and idce arrays.
statistics
(self[, title, header_level, graph])to_feathers
(self, filename[, components])Output data to a collection of Feather files.
total_weight
(self)The total weight of cases.
unscale_weights
(self)validate_dataservice
(self, req_data)Attributes
alternative_pairs
A dict mapping the id to the name of each alternative
alternatives
A dict mapping the id to the name of each alternative
array_ce_altindexes
array_ce_caseindexes
array_ce_reversemap
caseindex
The indexes of the cases.
computational
data_av
data_ca
A pandas.DataFrame in idca format.
data_ca_or_ce
data_ce
data_ch
data_co
A pandas.DataFrame in idco format.
data_wt
n_alts
The number of alternatives.
n_cases
The number of cases.
n_params
n_vars_ca
n_vars_co
param_names
std_scaler_ca
std_scaler_ce
std_scaler_co
sys_alts
The SystematicAlternatives instance used to create this DataFrames.
weight_normalization