larch.DataFrames

larch.DataFrames

class DataFrames(co=None, *args, ca=None, ce=None, av=None, ch=None, wt=None, data_co=None, data_ca=None, data_ce=None, data_av=None, data_ch=None, data_wt=None, alt_names=None, alt_codes=None, crack=False, av_name=None, ch_name=None, wt_name=None, av_as_ce=None, ch_as_ce=None, sys_alts=None, computational=False, caseindex_name='_caseid_', altindex_name='_altid_', autoscale_weights=False)

Bases: object

A structured class to hold multi-format discrete choice data.

Parameters
  • co (pandas.DataFrame) – A dataframe containing idco format data, with one row per case. The index contains the caseid’s.

  • ca (pandas.DataFrame) – A dataframe containing idca format data, with one row per alternative. The index should be a two-level multi-index, with the first level containing the caseid’s and the second level containing the altid’s. Provide either dense ca or sparse ce data, but not both.

  • ce (pandas.DataFrame) – A dataframe containing idca format data, with one row per alternative. The index should be a two-level multi-index, with the first level containing the caseid’s and the second level containing the altid’s. Provide either dense ca or sparse ce data, but not both.

  • av (pandas.DataFrame or pandas.Series or True, optional) – Alternative availability data. This can be given as a pandas.DataFrame in idco format, with one row per case and one column per alternative, where the index contains the caseid’s, and the columns contain the altid’s. Or, it can be given as a pandas.Series in idca format, with one row per alternative, and an index that is a two-level multi-index, with the first level containing the caseid’s and the second level containing the altid’s. Or, set to True to make all alternatives available for all cases. If not given, then data_av will not be defined unless it can be inferred from missing rows in ca.

  • ch (pandas.DataFrame or pandas.Series or str, optional) – Choice data. This can be given as a pandas.DataFrame in idco format, with one row per case and one column per alternative, where the index contains the caseid’s, and the columns contain the altid’s. Or, it can be given as a pandas.Series in idca format, with one row per alternative, and an index that is a two-level multi-index, with the first level containing the caseid’s and the second level containing the altid’s. Or, if given as a str, then that named column is found in the ca dataframe if it appears there and used as the choice. Otherwise, if the named column is found in the co dataframe, then the codes in that column are used to identify the choices. If not given, data_ch is not set.

  • wt (pandas.DataFrame or pandas.Series or str, optional) – Case weights. This can only be given in idco format, either as a pandas.DataFrame with a single column, or as a pandas.Series. Or, if given as a str, then that named column is found in the co or ca dataframe if it appears there and used as the weight. If not given, data_wt is not set.

  • alt_names (Sequence[str]) – A sequence of alternative names as str.

  • alt_codes (Sequence[int]) – A sequence of alternative codes.

  • crack (bool, default False) – Whether to pre-process ca data to identify variables that do not vary within cases, and move them to a new co dataframe. This can result in more computationally efficient model estimation, but the cracking process can be slow for large data sets.

  • av_name (str, optional) – A name to use for the availability variable. If not given, it is inferred from the av argument if possible.

  • ch_name (str, optional) – A name to use for the choice variable. If not given, it is inferred from the ch argument if possible.

  • wt_name (str, optional) – A name to use for the weight variable. If not given, it is inferred from the wt argument if possible.

  • autoscale_weights (bool, default False) – Call autoscale_weights on the DataFrames after initialization. Note that this will not only scale an explicitly given wt, but it will also extract implied weights from the ch as well.

__init__(*args, **kwargs)

Methods

__init__(*args, **kwargs)

alternative_codes(self)

The alternative codes.

alternative_names(self)

The alternative names.

array_av(self[, dtype])

array_ca(self[, dtype, force])

array_ce(self[, dtype, force])

array_ch(self[, dtype])

array_ch_as_ce(self[, dtype])

array_co(self[, dtype, force])

array_not_av(self[, dtype])

array_to_ce(self, arr)

Convert a single case-alt array to an sparse vs dense Series.

array_wt(self[, dtype])

array_wt_as_ce(self[, dtype])

autoscale_weights(self)

Scale the weights so the average weight is 1.

check_data_is_sufficient_for_model(self, ...)

Check that probabilities can be found from the attached data.

choice_avail_summary(self[, graph, ...])

Generate a summary of choice and availability statistics.

compute_d_utility_onecase(self, int c, ...)

compute_utility_onecase(self, int c, ...)

data_av_as_ce(self)

Reformat avail data into idce format.

data_av_cascade(self, graph)

Create an extra wide dataframe with availability rolled up to nests.

data_ca_as_ce(self)

Reformat any idca data into idce format.

data_ca_combined(self)

Return a combined DataFrame in idca format that includes idco and idca data.

data_ce_as_ca(self[, promote])

Reformat any idce data into idca format.

data_ch_as_ce(self)

Reformat choice data into idce format.

data_ch_cascade(self, graph)

Create an extra wide dataframe with choices rolled up to nests.

data_co_as_ce(self)

Reformat any idco data into idce format.

data_co_combined(self)

Return a combined DataFrame in idco format that includes idco and idca data.

data_wt_as_ce(self)

dump(self, filename, **kwargs)

Persist this DataFrames object into one file.

from_feathers(type cls, filename[, components])

from_idce(type cls, ce[, choice, columns, ...])

Create DataFrames from a single idce format DataFrame.

get_zero_quantity_ca(self)

Find all alternatives with zeros across all quantity values.

info(self[, verbose, out])

Print info about this DataFrames.

inject_feathers(self, filename[, components])

Read data from a collection of Feather files.

is_computational_ready(self, bool activate=False)

Check if this DataFrames is or can be computational with no data conversion.

link_to_model_parameters(self, model[, logger])

load(type cls, filename)

Reconstruct a DataFrames object from a file persisted with DataFrames.dump.

make_dataframes(self, req_data, *[, ...])

Create a DataFrames object that will satisfy a data request.

make_idca(self, *columns[, selector, ...])

Extract a set of idca values into a new dataframe.

make_idco(self, *columns[, selector, ...])

Extract a set of idco values into a new dataframe.

make_mnl(self)

Generate a simple MNL model that uses the entire data_ca and data_co dataframes.

new_systematic_alternatives(self, groupby[, ...])

Create new systematic alternatives.

read_in_model_parameters(self)

scale_weights(self, scale)

Scale the weights by a fixed exogenous value.

selector_co(self, co_expr)

Filter a DataFrames object based on an idco selector expression.

set_alternative_names(self, names)

Set the alternative names.

set_data_ch_wide(self, df, graph)

Write an extra wide choice dataframe with choices also on nests.

split(self, splits[, method])

Generate a train/test or similar multi-part split of the data.

standardize(self[, with_mean, with_std])

Standardize the data in idco, idca, and idce arrays.

statistics(self[, title, header_level, graph])

to_feathers(self, filename[, components])

Output data to a collection of Feather files.

total_weight(self)

The total weight of cases.

unscale_weights(self)

validate_dataservice(self, req_data)

Attributes

alternative_pairs

A dict mapping the id to the name of each alternative

alternatives

A dict mapping the id to the name of each alternative

array_ce_altindexes

array_ce_caseindexes

array_ce_reversemap

caseindex

The indexes of the cases.

computational

data_av

data_ca

A pandas.DataFrame in idca format.

data_ca_or_ce

data_ce

data_ch

data_co

A pandas.DataFrame in idco format.

data_wt

n_alts

The number of alternatives.

n_cases

The number of cases.

n_params

n_vars_ca

n_vars_co

param_names

std_scaler_ca

std_scaler_ce

std_scaler_co

sys_alts

The SystematicAlternatives instance used to create this DataFrames.

weight_normalization