1: MTC MNL Mode Choice
1: MTC MNL Mode Choice¶
import pandas as pd
import larch.numba as lx
This example is a mode choice model built using the MTC example dataset. First we create the Dataset and Model objects:
d = lx.examples.MTC(format='dataset')
d
<xarray.Dataset> Dimensions: (caseid: 5029, altid: 6) Coordinates: * caseid (caseid) int64 1 2 3 4 5 6 7 ... 5024 5025 5026 5027 5028 5029 * altid (altid) int64 1 2 3 4 5 6 alt_names (altid) <U7 'DA' 'SR2' 'SR3+' 'Transit' 'Bike' 'Walk' Data variables: (12/38) hhid (caseid) int64 2 3 5 6 8 8 12 ... 9429 9430 9433 9434 9436 9438 perid (caseid) int64 1 1 1 1 2 3 2 2 1 1 2 2 ... 2 1 1 2 1 1 1 1 1 1 1 numalts (caseid) int64 2 2 2 2 2 2 2 2 2 2 2 2 ... 2 2 2 2 2 2 2 2 2 2 2 dist (caseid) float64 7.69 11.62 4.1 14.58 ... 4.34 12.06 2.95 0.73 wkzone (caseid) int64 664 738 696 665 679 ... 1002 1012 1009 1030 1021 hmzone (caseid) int64 726 9 667 8 704 665 ... 978 1023 1019 1030 1021 ... ... ivtt (caseid, altid) float64 13.38 18.38 20.38 25.9 ... 1.59 6.55 0.0 ovtt (caseid, altid) float64 2.0 2.0 2.0 15.2 2.0 ... 4.5 16.0 4.5 0.0 tottime (caseid, altid) float64 15.38 20.38 22.38 ... 17.59 11.05 19.1 totcost (caseid, altid) float64 70.63 35.32 20.18 115.6 ... 75.0 0.0 0.0 altnum (caseid, altid) int64 1 2 3 4 5 0 1 2 3 4 ... 3 4 0 6 1 2 3 4 5 6 avail (caseid, altid) int8 1 1 1 1 1 0 1 1 1 1 ... 1 1 0 1 1 1 1 1 1 1 Attributes: _caseid_: caseid _altid_: altid
m = lx.Model(d)
Then we can build up the utility function. We’ll use some :ref:idco
data first, using
the Model.utility.co
attribute. This attribute is a dict-like object, to which
we can assign :class:LinearFunction
objects for each alternative code.
from larch import P, X, PX
m.utility_co[2] = P("ASC_SR2") + P("hhinc#2") * X("hhinc")
m.utility_co[3] = P("ASC_SR3P") + P("hhinc#3") * X("hhinc")
m.utility_co[4] = P("ASC_TRAN") + P("hhinc#4") * X("hhinc")
m.utility_co[5] = P("ASC_BIKE") + P("hhinc#5") * X("hhinc")
m.utility_co[6] = P("ASC_WALK") + P("hhinc#6") * X("hhinc")
Next we’ll use some idca data, with the utility_ca
attribute. This attribute
is only a single :class:LinearFunction
that is applied across all alternatives
using :ref:idca
data. Because the data is structured to vary across alternatives,
the parameters (and thus the structure of the :class:LinearFunction
) does not need
to vary across alternatives.
m.utility_ca = PX("tottime") + PX("totcost")
Lastly, we need to identify :ref:idca
data that gives the availability for each
alternative, as well as the number of times each alternative is chosen. (In traditional
discrete choice analysis, this is often 0 or 1, but it need not be binary, or even integral.)
m.availability_var = 'avail'
m.choice_ca_var = 'chose'
And let’s give our model a descriptive title.
m.title = "MTC Example 1 (Simple MNL)"
We can view a summary of the choices and alternative availabilities to make sure the model is set up correctly.
m.choice_avail_summary()
name | chosen | available | |
---|---|---|---|
altid | |||
1 | DA | 3637 | 4755 |
2 | SR2 | 517 | 5029 |
3 | SR3+ | 161 | 5029 |
4 | Transit | 498 | 4003 |
5 | Bike | 50 | 1738 |
6 | Walk | 166 | 1479 |
< Total All Alternatives > | 5029 |
Having created this model, we can then estimate it:
m.maximize_loglike()
key | value | ||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
loglike | -3626.18625551293 | ||||||||||||||||||||||||||
x |
| ||||||||||||||||||||||||||
tolerance | 2.0071751679019565e-06 | ||||||||||||||||||||||||||
steps | array([1., 1., 1., 1., 1., 1., 1., 1., 1., 1.]) | ||||||||||||||||||||||||||
message | 'Optimization terminated successfully.' | ||||||||||||||||||||||||||
elapsed_time | 0:00:00.230719 | ||||||||||||||||||||||||||
method | 'bhhh' | ||||||||||||||||||||||||||
n_cases | 5029 | ||||||||||||||||||||||||||
iteration_number | 0 | ||||||||||||||||||||||||||
logloss | 0.7210551313408093 | ||||||||||||||||||||||||||
__verbose_repr__ | True |
m.calculate_parameter_covariance()
m.parameter_summary()
Value | Std Err | t Stat | Signif | Null Value | |
---|---|---|---|---|---|
ASC_BIKE | -2.38 | 0.305 | -7.80 | *** | 0.00 |
ASC_SR2 | -2.18 | 0.105 | -20.81 | *** | 0.00 |
ASC_SR3P | -3.73 | 0.178 | -20.96 | *** | 0.00 |
ASC_TRAN | -0.671 | 0.133 | -5.06 | *** | 0.00 |
ASC_WALK | -0.207 | 0.194 | -1.07 | 0.00 | |
hhinc#2 | -0.00217 | 0.00155 | -1.40 | 0.00 | |
hhinc#3 | 0.000358 | 0.00254 | 0.14 | 0.00 | |
hhinc#4 | -0.00529 | 0.00183 | -2.89 | ** | 0.00 |
hhinc#5 | -0.0128 | 0.00532 | -2.41 | * | 0.00 |
hhinc#6 | -0.00969 | 0.00303 | -3.19 | ** | 0.00 |
totcost | -0.00492 | 0.000239 | -20.60 | *** | 0.00 |
tottime | -0.0513 | 0.00310 | -16.57 | *** | 0.00 |
It is a little tough to read this report because the parameters can show up in pretty much any order, as they are not sorted when they are automatically discovered by Larch. We can use the reorder method to fix this:
m.ordering = (
("LOS", "totcost", "tottime", ),
("ASCs", "ASC.*", ),
("Income", "hhinc.*", ),
)
m.parameter_summary()
Value | Std Err | t Stat | Signif | Null Value | ||
---|---|---|---|---|---|---|
Category | Parameter | |||||
LOS | totcost | -0.00492 | 0.000239 | -20.60 | *** | 0.00 |
tottime | -0.0513 | 0.00310 | -16.57 | *** | 0.00 | |
ASCs | ASC_BIKE | -2.38 | 0.305 | -7.80 | *** | 0.00 |
ASC_SR2 | -2.18 | 0.105 | -20.81 | *** | 0.00 | |
ASC_SR3P | -3.73 | 0.178 | -20.96 | *** | 0.00 | |
ASC_TRAN | -0.671 | 0.133 | -5.06 | *** | 0.00 | |
ASC_WALK | -0.207 | 0.194 | -1.07 | 0.00 | ||
Income | hhinc#2 | -0.00217 | 0.00155 | -1.40 | 0.00 | |
hhinc#3 | 0.000358 | 0.00254 | 0.14 | 0.00 | ||
hhinc#4 | -0.00529 | 0.00183 | -2.89 | ** | 0.00 | |
hhinc#5 | -0.0128 | 0.00532 | -2.41 | * | 0.00 | |
hhinc#6 | -0.00969 | 0.00303 | -3.19 | ** | 0.00 |
m.estimation_statistics()
Statistic | Aggregate | Per Case |
---|---|---|
Number of Cases | 5029 | |
Log Likelihood at Convergence | -3626.19 | -0.72 |
Log Likelihood at Null Parameters | -7309.60 | -1.45 |
Rho Squared w.r.t. Null Parameters | 0.504 |