Model

A Model is the core object used to represent a discrete choice model.

class larch.Model(utility_ca=None, utility_co=None, quantity_ca=None, **kwargs)

A discrete choice model.

Parameters:
  • parameters (Sequence, optional) – The names of parameters used in this model. It is generally not necessary to define parameter names at initialization, as the names can (and will) be collected from the utility function and nesting components later.
  • utility_ca (LinearFunction_C, optional) – The utility_ca function, which represents the qualitative portion of utility for attributes that vary by alternative.
  • utility_co (DictOfLinearFunction, optional) – The utility_co function, which represents the qualitative portion of utility for attributes that vary by decision maker but not by alternative.
  • quantity_ca (LinearFunction_C, optional) – The quantity_ca function, which represents the quantitative portion of utility for attributes that vary by alternative.
  • quantity_scale (str, optional) – The name of the parameter used to scale the quantitative portion of utility.
  • graph (NestingTree, optional) – The nesting tree for this choice model.
  • dataservice (DataService, optional) – An object that can act as a DataService to generate the data needed for this model.

Utility Function Definition

Note that these function definitions act like properties of the Model object, instead of methods on Model objects.

Model.utility_ca

The portion of the utility function computed from idca data.

Examples

>>> from larch import Model, P, X
>>> m = Model()
>>> m.utility_ca = P.Param1 * X.Data1 + P.Param2 * X.Data2
>>> print(m.utility_ca)
P.Param1 * X.Data1 + P.Param2 * X.Data2
Type:LinearFunction_C
Model.utility_co

The portion of the utility function computed from idco data.

The keys of this mapping are alternative codes for the applicable elemental alteratives, and the values are linear functions to compute for the indicated alternative. Each alternative that has any idco utility components must have a unique linear function given.

Type:DictOfLinearFunction_C
Model.quantity_ca

The portion of the quantity function computed from idca data.

Note that for the quantity function, the actual computed linear function uses the exponential of the parameter value(s), not the raw values. Thus, if the quantity function is given as P.Param1 * X.Data1 + P.Param2 * X.Data2, the computed values will actually be exp(P.Param1) * X.Data1 + exp(P.Param2) * X.Data2. This transformation ensures that the outcome from the quantity function is always positive, so long as at all of the data terms in the function are positive. The LinearFunction_C class itself is not intrinsically aware of this implementation detail, but the Model.utility_functions() method is, and will render the complete utility function in a mathematically correct form.

Examples

>>> from larch import Model, P, X
>>> m = Model()
>>> m.quantity_ca = P.Param1 * X.Data1 + P.Param2 * X.Data2
>>> print(m.quantity_ca)
P.Param1 * X.Data1 + P.Param2 * X.Data2
Type:LinearFunction_C

Parameter Manipulation

Model.set_value(self, name, value=None, **kwargs)

Set the value for a model parameter.

This function will set the current value of a parameter. Unless explicitly instructed with an alternate value, the new value will also be saved as the “initial” value of the parameter.

Parameters:
  • name (str) – The name of the parameter to set to a fixed value.
  • value (float) – The numerical value to set for the parameter.
  • initvalue (float, optional) – If given, this value is used to indicate the initial value for the parameter, which may be different from the current value.
  • nullvalue (float, optional) – If given, this will overwrite any existing null value for the parameter. If not given, the null value for the parameter is not changed.
Model.lock_value(self, name, value, note=None, change_check=True)

Set a fixed value for a model parameter.

Parameters with a fixed value (i.e., with “holdfast” set to 1) will not be changed during estimation by the likelihood maximization algorithm.

Parameters:
  • name (str) – The name of the parameter to set to a fixed value.
  • value (float) – The numerical value to set for the parameter.
  • note (str, optional) – A note as to why this parameter is set to a fixed value. This will not affect the mathematical treatment of the parameter in any way, but may be useful for reporting.
  • change_check (bool, default True) – Whether to trigger a check to see if any parameter frame values have changed. Can be set to false to skip this check if you know that the values have not changed or want to delay this check for later, but this may result in problems if the check is needed but not triggered before certain other modeling tasks are performed.

Scikit-Learn Interface

Model.fit(X, y, sample_weight=None, **kwargs)

Estimate the parameters of this model from the training set (X, y).

Parameters:
  • X (pandas.DataFrame) – This DataFrame can be in idca, idce, or idco formats. If given in idce format, this is a DataFrame with n_casealts rows, and a two-level MultiIndex.
  • y (array-like or str) – The target choice values. If given as a str, use that named column of X.
  • sample_weight (array-like, shape = [n_cases] or [n_casealts], or None) – Sample weights. If None, then samples are equally weighted. If shape is n_casealts, the array is collapsed to n_cases by taking only the first weight in each case.
Returns:

self

Return type:

Model

Model.predict(X)

Predict choices for X.

This method returns the index of the maximum probability choice, not the probability. To recover the probability, which is probably what you want (pun intended), see predict_proba().

Parameters:X (pandas.DataFrame) –
Returns:y – The predicted choices.
Return type:array of shape = [n_cases]
Model.predict_proba(X)

Predict probability for X.

Parameters:X (pandas.DataFrame) –
Returns:y – The predicted probabilities.
Return type:array of shape = [n_cases, n_alts]

Reporting and Outputs

Model.parameter_summary(self)

Create an XHTML summary of parameter values.

This will generate a small table of parameters statistics, containing:

  • Parameter Name (and Category, if applicable)
  • Estimated Value
  • Standard Error of the Estimate (if known)
  • t Statistic (if known)
  • Null Value
Returns:
Return type:xmle.Elem
Model.utility_functions(subset=None, resolve_parameters=False)

Generate an XHTML output of the utility function(s).

Parameters:
  • subset (Collection, optional) – A collection of alternative codes to include. This only has effect if there are separate utility_co functions set by alternative. It is recommended to use this parameter if there are a very large number of alternatives, and the utility functions of most (or all) of them can be effectively communicated by showing only a few.
  • resolve_parameters (bool, default False) – Whether to resolve the parameters to the current (estimated) value in the output.
Returns:

Return type:

xmle.Elem

Model.estimation_statistics()

Create an XHTML summary of estimation statistics.

This will generate a small table of estimation statistics, containing:

  • Log Likelihood at Convergence
  • Log Likelihood at Null Parameters (if known)
  • Log Likelihood with No Model (if known)
  • Log Likelihood at Constants Only (if known)

Additionally, for each included reference value (i.e. everything except log likelihood at convergence) the rho squared with respect to that value is also given.

Each statistic is reported in aggregate, as well as per case.

Parameters:compute_loglike_null (bool, default True) – If the log likelihood at null values has not already been computed (i.e., if it is not cached) then compute it, cache its value, and include it in the output.
Returns:
Return type:xmle.Elem

Visualization Tools

Model.distribution_on_continuous_idca_variable(continuous_variable, xlabel=None, ylabel='Relative Frequency', style='hist', bins=25, range=None, prob_label='Modeled', obs_label='Observed', subselector=None, probability=None, bw_method=None, **kwargs)

Generate a figure of observed and modeled choices over a range of variable values.

Parameters:
  • model (Model) – The discrete choice model to analyze.
  • continuous_variable (str) – The name of an idca variable that is continuous. If this name exactly matches that of an idca column in the model’s loaded dataframes, then those values are used, otherwise the variable is loaded from the model’s dataservice.
  • xlabel (str, optional) – A label to use for the x-axis of the resulting figure. If not given, the value of continuous_variable is used. Set to False to omit the x-axis label.
  • ylabel (str, default "Relative Frequency") – A label to use for the y-axis of the resulting figure.
  • style ({'hist', 'kde'}) – The style of figure to produce, either a histogram or a kernel density plot.
  • bins (int, default 25) – The number of bins to use, only applicable to histogram style.
  • range (2-tuple, optional) – A range to truncate the figure.
  • prob_label (str, default "Modeled") – A label to put in the legend for the modeled probabilities
  • obs_label (str, default "Observed") – A label to put in the legend for the observed choices
  • subselector (str or array-like, optional) – A filter to apply to cases. If given as a string, this is loaded from the model’s dataservice as an idco variable.
  • probability (array-like, optional) – The pre-calculated probability array for all cases in this analysis. If not given, the probability array is calculated at the current parameter values.
Other Parameters:
 

header (str, optional) – A header to attach to the figure. The header is not generated using matplotlib, but instead is prepended to the xml output with a header tag before the rendered svg figure.

Returns:

Return type:

Elem