109: Swissmetro Nested Logit Mode Choice

109: Swissmetro Nested Logit Mode Choice

import pandas as pd
import larch.numba as lx

This example is a mode choice model built using the Swissmetro example dataset. First we create the Dataset and Model objects.

raw_data = pd.read_csv(lx.example_file('swissmetro.csv.gz')).rename_axis(index='CASEID')
data = lx.Dataset.construct.from_idco(raw_data, alts={1:'Train', 2:'SM', 3:'Car'})
m = lx.Model(data.dc.query_cases("PURPOSE in (1,3) and CHOICE != 0"))

We can attach a title to the model. The title does not affect the calculations as all; it is merely used in various output report styles.

m.title = "swissmetro example 09 (nested logit)"

We need to identify the availability and choice variables.

m.availability_co_vars = {
    1: "TRAIN_AV * (SP!=0)",
    2: "SM_AV",
    3: "CAR_AV * (SP!=0)",
}
m.choice_co_code = 'CHOICE'

The swissmetro dataset, as with all Biogeme data, is only in co format.

from larch.roles import P,X
m.utility_co[1] = ( P.ASC_TRAIN
                  + P.B_TIME * X.TRAIN_TT
                  + P.B_COST * X("TRAIN_CO*(GA==0)") )
m.utility_co[2] = ( P.B_TIME * X.SM_TT
                  + P.B_COST * X("SM_CO*(GA==0)") )
m.utility_co[3] = ( P.ASC_CAR
                  + P.B_TIME * X.CAR_TT
                  + P.B_COST * X("CAR_CO") )

To create a new nest, we can use the graph.new_node command. For this example, we want to nest together the Train and Car modes into a “existing” modes nest. Those are modes 1 and 3, so we can use the graph.new_node command like this:

m.graph.new_node(parameter="existing", children=[1,3], name='Existing')
4

In a Jupyter notebook, we can verify the nesting structure visually if we like.

m.graph
cluster_elemental 0 0 2 (2) SM 0->2 4 (4) Existing existing 0->4 1 (1) Train 3 (3) Car 4->1 4->3

Larch will find all the parameters in the model, but we’d like to output them in a rational order. We can use the ordering method to do this:

m.ordering = [
    ("ASCs", 'ASC.*',),
    ("LOS", 'B_.*',),
    ("LogSums", 'existing'),
]

We can estimate the models and check the results match up with those given by Biogeme:

m.set_cap(15)
m.maximize_loglike()
keyvalue
x
0
ASC_CAR -0.167
ASC_TRAIN -0.512
B_COST -0.009
B_TIME -0.009
existing 0.487
loglike-5236.900013608125
d_loglike
0
ASC_CAR -0.004
ASC_TRAIN -0.002
B_COST 0.547
B_TIME -0.547
existing -0.004
nit17
nfev37
njev17
status0
message'Optimization terminated successfully'
successTrue
elapsed_time0:00:00.290973
method'slsqp'
n_cases6768
iteration_number0
logloss0.7737736426725953
__verbose_repr__True
m.calculate_parameter_covariance()
m.parameter_summary()
    Value Std Err t Stat Signif Null Value
Category Parameter          
ASCs ASC_CAR -0.167  0.0371 -4.50 *** 0.00
ASC_TRAIN -0.512  0.0452 -11.33 *** 0.00
LOS B_COST -0.00857  0.000463 -18.51 *** 0.00
B_TIME -0.00899  0.000570 -15.77 *** 0.00
LogSums existing  0.487  0.0279 -18.39 *** 1.00

The pfo (parameter frame, ordered) has other useful information, including things like the robust standard errors and t statistics.

print(m.pfo()[['value','std_err','t_stat','robust_std_err','robust_t_stat']])
                    value    std_err  t_stat  robust_std_err  robust_t_stat
Category Parameter                                                         
ASCs     ASC_CAR   -0.167  3.714e-02  -4.501       5.453e-02         -3.065
         ASC_TRAIN -0.512  4.518e-02 -11.331       7.911e-02         -6.471
LOS      B_COST    -0.009  4.627e-04 -18.513       6.004e-04        -14.269
         B_TIME    -0.009  5.699e-04 -15.769       1.071e-03         -8.390
LogSums  existing   0.487  2.790e-02 -18.394       3.892e-02        -13.186