Model

A Model is the core object used to represent a discrete choice model.

class larch.Model(utility_ca=None, utility_co=None, quantity_ca=None, constraints=None, **kwargs)

A discrete choice model.

Parameters:
  • parameters (Sequence, optional) – The names of parameters used in this model. It is generally not necessary to define parameter names at initialization, as the names can (and will) be collected from the utility function and nesting components later.
  • utility_ca (LinearFunction_C, optional) – The utility_ca function, which represents the qualitative portion of utility for attributes that vary by alternative.
  • utility_co (DictOfLinearFunction, optional) – The utility_co function, which represents the qualitative portion of utility for attributes that vary by decision maker but not by alternative.
  • quantity_ca (LinearFunction_C, optional) – The quantity_ca function, which represents the quantitative portion of utility for attributes that vary by alternative.
  • quantity_scale (str, optional) – The name of the parameter used to scale the quantitative portion of utility.
  • graph (NestingTree, optional) – The nesting tree for this choice model.
  • dataservice (DataService, optional) – An object that can act as a DataService to generate the data needed for this model.

Utility Function Definition

Note that these function definitions act like properties of the Model object, instead of methods on Model objects.

Model.utility_ca

The portion of the utility function computed from idca data.

Examples

>>> from larch import Model, P, X
>>> m = Model()
>>> m.utility_ca = P.Param1 * X.Data1 + P.Param2 * X.Data2
>>> print(m.utility_ca)
P.Param1 * X.Data1 + P.Param2 * X.Data2
Type:LinearFunction_C
Model.utility_co

The portion of the utility function computed from idco data.

The keys of this mapping are alternative codes for the applicable elemental alteratives, and the values are linear functions to compute for the indicated alternative. Each alternative that has any idco utility components must have a unique linear function given.

Type:DictOfLinearFunction_C
Model.quantity_ca

The portion of the quantity function computed from idca data.

Note that for the quantity function, the actual computed linear function uses the exponential of the parameter value(s), not the raw values. Thus, if the quantity function is given as P.Param1 * X.Data1 + P.Param2 * X.Data2, the computed values will actually be exp(P.Param1) * X.Data1 + exp(P.Param2) * X.Data2. This transformation ensures that the outcome from the quantity function is always positive, so long as at all of the data terms in the function are positive. The LinearFunction_C class itself is not intrinsically aware of this implementation detail, but the Model.utility_functions() method is, and will render the complete utility function in a mathematically correct form.

Examples

>>> from larch import Model, P, X
>>> m = Model()
>>> m.quantity_ca = P.Param1 * X.Data1 + P.Param2 * X.Data2
>>> print(m.quantity_ca)
P.Param1 * X.Data1 + P.Param2 * X.Data2
Type:LinearFunction_C

Parameter Manipulation

Model.set_value(self, name, value=None, **kwargs)

Set the value for a single model parameter.

This function will set the current value of a parameter. Unless explicitly instructed with an alternate value, the new value will also be saved as the “initial” value of the parameter.

Parameters:
  • name (str) – The name of the parameter to set to a fixed value.
  • value (float) – The numerical value to set for the parameter.
  • initvalue (float, optional) – If given, this value is used to indicate the initial value for the parameter, which may be different from the current value.
  • nullvalue (float, optional) – If given, this will overwrite any existing null value for the parameter. If not given, the null value for the parameter is not changed.
Model.lock_value(self, name, value, note=None, change_check=True)

Set a fixed value for a model parameter.

Parameters with a fixed value (i.e., with “holdfast” set to 1) will not be changed during estimation by the likelihood maximization algorithm.

Parameters:
  • name (str) – The name of the parameter to set to a fixed value.
  • value (float) – The numerical value to set for the parameter.
  • note (str, optional) – A note as to why this parameter is set to a fixed value. This will not affect the mathematical treatment of the parameter in any way, but may be useful for reporting.
  • change_check (bool, default True) – Whether to trigger a check to see if any parameter frame values have changed. Can be set to false to skip this check if you know that the values have not changed or want to delay this check for later, but this may result in problems if the check is needed but not triggered before certain other modeling tasks are performed.
Model.set_values(self, values=None, **kwargs)

Set the parameter values for one or more parameters.

Parameters:
  • values ({'null', 'init', 'best', array-like, dict, scalar}, optional) – New values to set for the parameters. If ‘null’ or ‘init’, the current values are set equal to the null or initial values given in the ‘nullvalue’ or ‘initvalue’ column of the parameter frame, respectively. If ‘best’, the current values are set equal to the values given in the ‘best’ column of the parameter frame, if that columns exists, otherwise a ValueError exception is raised. If given as array-like, the array must be a vector with length equal to the length of the parameter frame, and the given vector will replace the current values. If given as a dictionary, the dictionary is used to update kwargs before they are processed.
  • kwargs (dict) – Any keyword arguments (or if values is a dictionary) are used to update the included named parameters only. A warning will be given if any key of the dictionary is not found among the existing named parameters in the parameter frame, and the value associated with that key is ignored. Any parameters not named by key in this dictionary are not changed.

Notes

Setting parameters both in the values argument and through keyword assignment is not explicitly disallowed, although it is not recommended and may be disallowed in the future.

Raises:ValueError – If setting to ‘best’ but there is no ‘best’ column in the pf parameters DataFrame.

Estimation and Application

Model.load_data(self, dataservice=None, autoscale_weights=True, log_warnings=True)

Load dataframes as required from the dataservice.

This method prepares the data for estimation. It is used to pre-process the data, extracting the required values, pre-computing the values of fixed expressions, and assembling the results into contiguous arrays suitable for computing the log likelihood values efficiently.

Parameters:
  • dataservice (DataService, optional) – A dataservice from which to load data. If a dataservice has not been previously defined for this model, this argument is not optional.
  • autoscale_weights (bool, default True) – If True and data_wt is not None, the loaded dataframes will have the weights automatically scaled such that the average value for data_wt is 1.0. See autoscale_weights for more information.
  • log_warnings (bool, default True) – Emit warnings in the logger if choice, avail, or weight is not included in req_data but is set in the dataservice, and thus returned by default even though it was not requested.
Raises:

ValueError – If no dataservice is given nor pre-defined.

Model.maximize_loglike()

Maximize the log likelihood.

The dataframes for this model should previously have been prepared using the load_data method.

Parameters:
  • method (str, optional) – The optimization method to use. See scipy.optimize for most possibilities, or use ‘BHHH’. Defaults to SLSQP if there are any constraints or finite parameter bounds, otherwise defaults to BHHH.
  • quiet (bool, default False) – Whether to suppress the dashboard.
Returns:

A dictionary of results, including final log likelihood, elapsed time, and other statistics. The exact items included in output will vary by estimation method.

Return type:

dictx

Raises:

ValueError – If the dataframes are not already loaded.

Model.calculate_parameter_covariance()

Compute the parameter covariance matrix.

This function computes the parameter covariance by taking the inverse of the hessian (2nd derivative of log likelihood with respect to the parameters.)

It does not return values directly, but rather stores the result in covariance_matrix, and computes the standard error of the estimators (i.e. the square root of the diagonal) and stores those values in pf[‘std_err’].

Parameters:like_ratio (bool, default True) – For parameters where the
Model.estimate()

A convenience method to load data, maximize loglike, and get covariance.

This runs the following methods in order: - load_data - maximize_loglike - calculate_parameter_covariance

Parameters:
  • dataservice (DataService, optional) – A dataservice from which to load data. If a dataservice has not been previously defined for this model, this argument is not optional.
  • autoscale_weights (bool, default True) – If True and data_wt is not None, the loaded dataframes will have the weights automatically scaled such that the average value for data_wt is 1.0. See autoscale_weights for more information.
  • **kwargs – All other keyword arguments are passed through to maximize_loglike.
Returns:

Return type:

dictx

Scikit-Learn Interface

Model.fit(X, y, sample_weight=None, **kwargs)

Estimate the parameters of this model from the training set (X, y).

Parameters:
  • X (pandas.DataFrame) – This DataFrame can be in idca, idce, or idco formats. If given in idce format, this is a DataFrame with n_casealts rows, and a two-level MultiIndex.
  • y (array-like or str) – The target choice values. If given as a str, use that named column of X.
  • sample_weight (array-like, shape = [n_cases] or [n_casealts], or None) – Sample weights. If None, then samples are equally weighted. If shape is n_casealts, the array is collapsed to n_cases by taking only the first weight in each case.
Returns:

self

Return type:

Model

Model.predict(X)

Predict choices for X.

This method returns the index of the maximum probability choice, not the probability. To recover the probability, which is probably what you want (pun intended), see predict_proba().

Parameters:X (pandas.DataFrame) –
Returns:y – The predicted choices.
Return type:array of shape = [n_cases]
Model.predict_proba(X)

Predict probability for X.

Parameters:X (pandas.DataFrame) –
Returns:y – The predicted probabilities.
Return type:array of shape = [n_cases, n_alts]

Reporting and Outputs

Model.parameter_summary(self, output=u'df')

Create a tabular summary of parameter values.

This will generate a small table of parameters statistics, containing:

  • Parameter Name (and Category, if applicable)
  • Estimated Value
  • Standard Error of the Estimate (if known)
  • t Statistic (if known)
  • Null Value
  • Binding Constraints (if applicable)
Parameters:output ({'df','xml'}) – The format of the output. The default, ‘df’, creates a pandas DataFrame. Alternatively, use ‘xml’ to create a table as a xmle.Elem (this format is no longer preferred).
Returns:
Return type:pandas.DataFrame or xmle.Elem
Model.utility_functions(subset=None, resolve_parameters=False)

Generate an XHTML output of the utility function(s).

Parameters:
  • subset (Collection, optional) – A collection of alternative codes to include. This only has effect if there are separate utility_co functions set by alternative. It is recommended to use this parameter if there are a very large number of alternatives, and the utility functions of most (or all) of them can be effectively communicated by showing only a few.
  • resolve_parameters (bool, default False) – Whether to resolve the parameters to the current (estimated) value in the output.
Returns:

Return type:

xmle.Elem

Model.estimation_statistics()

Create an XHTML summary of estimation statistics.

This will generate a small table of estimation statistics, containing:

  • Log Likelihood at Convergence
  • Log Likelihood at Null Parameters (if known)
  • Log Likelihood with No Model (if known)
  • Log Likelihood at Constants Only (if known)

Additionally, for each included reference value (i.e. everything except log likelihood at convergence) the rho squared with respect to that value is also given.

Each statistic is reported in aggregate, as well as per case.

Parameters:compute_loglike_null (bool, default True) – If the log likelihood at null values has not already been computed (i.e., if it is not cached) then compute it, cache its value, and include it in the output.
Returns:
Return type:xmle.Elem

Visualization Tools

Model.distribution_on_idca_variable(x, xlabel=None, ylabel='Relative Frequency', style='hist', bins=None, pct_bins=20, range=None, xlim=None, prob_label='Modeled', obs_label='Observed', subselector=None, probability=None, bw_method=None, discrete=None, ax=None, format='ax', **kwargs)

Generate a figure of observed and modeled choices over a range of variable values.

Parameters:
  • model (Model) – The discrete choice model to analyze.
  • x (str or array-like) – The name of an idca variable, or an array giving its values. If this name exactly matches that of an idca column in the model’s loaded dataframes, then those values are used, otherwise the variable is loaded from the model’s dataservice.
  • xlabel (str, optional) – A label to use for the x-axis of the resulting figure. If not given, the value of x is used if it is a string. Set to False to omit the x-axis label.
  • ylabel (str, default "Relative Frequency") – A label to use for the y-axis of the resulting figure.
  • style ({'hist', 'kde'}) – The style of figure to produce, either a histogram or a kernel density plot.
  • bins (int, default 25) – The number of bins to use, only applicable to histogram style.
  • range (2-tuple, optional) – A range to truncate the figure. (alias xlim)
  • prob_label (str, default "Modeled") – A label to put in the legend for the modeled probabilities
  • obs_label (str, default "Observed") – A label to put in the legend for the observed choices
  • subselector (str or array-like, optional) – A filter to apply to cases. If given as a string, this is loaded from the model’s dataservice as an idco variable.
  • probability (array-like, optional) – The pre-calculated probability array for all cases in this analysis. If not given, the probability array is calculated at the current parameter values.
  • ax (matplotlib.Axes, optional) – If given, the figure will be drawn on these axes and they will be returned, otherwise new blank axes are used to draw the figure.
  • format ({'ax', 'figure', 'svg'}, default 'figure') – How to return the result if it is a figure. The default is to return the raw matplotlib Axes instance. Change this to svg to get a SVG rendering as an xmle.Elem.
Other Parameters:
 

header (str, optional) – A header to attach to the figure. The header is not generated using matplotlib, but instead is prepended to the xml output with a header tag before the rendered svg figure.

Returns:

Returns ax if given as an argument, otherwise returns a rendering as an Elem

Return type:

Elem or Axes

Model.distribution_on_idco_variable(x, xlabel=None, bins=None, pct_bins=20, figsize=(12, 4), style='stacked', discrete=None, xlim=None, include_nests=False, exclude_alts=None, filter=None, format='figure', **kwargs)

Generate a figure of variables over a range of variable values.

Parameters:
  • model (Model) – The discrete choice model to analyze.
  • x (str or array-like) – The name of an idco variable, or an array giving its values. If this name exactly matches that of an idco column in the model’s loaded dataframes, then those values are used, otherwise the variable is loaded from the model’s dataservice.
  • xlabel (str, optional) – A label to use for the x-axis of the resulting figure. If not given, the value of x is used if it is a string. Set to False to omit the x-axis label.
  • bins (int, optional) – The number of equal-sized bins to use.
  • pct_bins (int or array-like, default 20) – The number of equal-mass bins to use.
  • style ({'stacked', 'dataframe', 'many'}) – The type of output to generate.
  • discrete (bool, default False) – Whether to treat the data values explicitly as discrete (vs continuous) data. This will change the styling and automatic bin generation. If there are very few unique values, the data will be assumed to be discrete anyhow.
  • xlim (2-tuple, optional) – Explicitly set the range of values shown on the x axis of generated figures. This can truncate long tails. The actual histogram bins are not changed.
  • include_nests (bool, default False) – Whether to include nests in the figure.
  • exclude_alts (Collection, optional) – Alternatives to exclude from the figure.
  • filter (str, optional) – A filter that will be used to select only a subset of cases.
  • format ({'figure','svg'}, default 'figure') – How to return the result if it is a figure. The default is to return the raw matplotlib Figure instance, ot set to svg to get a SVG rendering as an xmle.Elem.
Returns:

Return type:

Figure, DataFrame, or Elem