DataFrames

A DataFrames is essentially a collection of related pandas.DataFrames, which represent idco Format and idca Format data features.

class larch.DataFrames(co=None, *args, ca=None, ce=None, av=None, ch=None, wt=None, data_co=None, data_ca=None, data_ce=None, data_av=None, data_ch=None, data_wt=None, alt_names=None, alt_codes=None, crack=False, av_name=None, ch_name=None, wt_name=None, av_as_ce=None, ch_as_ce=None, sys_alts=None, computational=False)

A structured class to hold multi-format discrete choice data.

Parameters:
  • co (pandas.DataFrame) – A dataframe containing idco format data, with one row per case. The index contains the caseid’s.
  • ca (pandas.DataFrame) – A dataframe containing idca format data, with one row per alternative. The index should be a two-level multi-index, with the first level containing the caseid’s and the second level containing the altid’s.
  • ch (pandas.DataFrame) – A dataframe containing choice data, with one row per case and one column per alternative. The index contains the caseid’s, and the columns contain the alt_codes.
  • wt (pandas.DataFrame or pandas.Series, optional) – A one-column dataframe, or series, containing idco format data, with one row per case, containing the case-weights.
  • alt_names (Sequence[str]) – A sequence of alternative names as str.
  • alt_codes (Sequence[int]) – A sequence of alternative codes.
  • ch_name (str, optional) – A name to use for the choice variable. If not given, it is inferred from the ch argument if possible. If the ch argument is not given but a name is specified, then that named column is found in the ca or ce arguments if it appears there and used as the choice. Otherwise, if the named column is found in the co argument then the codes in that column are used to identify the choices.
  • wt_name (str, optional) – A name to use for the weight variable. If not given, it is inferred from the wt argument if possible. If the wt argument is not given but a name is specified, then that named column is found in the co, ca, or ce arguments and used as the weight.
alternative_codes(self)

The alternative codes.

alternative_names(self)

The alternative names.

set_alternative_names(self, names: Union[Mapping, Sequence])

Set the alternative names.

Parameters:names (Mapping or Sequence) – If a mapping, with keys as the codes that appear in alternative_codes, and values that are the names, these will be used. Any missing codes will be labeled with the string representation of the code. If given as a sequence, the names must be in the same order as the codes that appear in alternative_codes.

Attributes

data_co

A pandas.DataFrame in idco format.

This DataFrame should have a simple pandas.Index as the index, where the index values are is the caseids.

data_ca

A pandas.DataFrame in idca format.

This DataFrame should have a two-level MultiIndex as the index, where the first level is the caseids and the second level is the alternative codes.

Read-only Attributes

n_alts

The number of alternatives.

n_cases

The number of cases.

caseindex

The indexes of the cases.