102: Swissmetro Weighted MNL Mode Choice

102: Swissmetro Weighted MNL Mode Choice

import pandas as pd
import larch.numba as lx

This example is a mode choice model built using the Swissmetro example dataset. First we create the Dataset and Model objects:

raw_data = pd.read_csv(lx.example_file('swissmetro.csv.gz')).rename_axis(index='CASEID')
data = lx.Dataset.construct.from_idco(raw_data, alts={1:'Train', 2:'SM', 3:'Car'})
data
<xarray.Dataset>
Dimensions:    (CASEID: 10728, _altid_: 3)
Coordinates:
  * CASEID     (CASEID) int64 0 1 2 3 4 5 ... 10723 10724 10725 10726 10727
  * _altid_    (_altid_) int64 1 2 3
    alt_names  (_altid_) <U5 'Train' 'SM' 'Car'
Data variables: (12/28)
    GROUP      (CASEID) int64 2 2 2 2 2 2 2 2 2 2 2 2 ... 3 3 3 3 3 3 3 3 3 3 3
    SURVEY     (CASEID) int64 0 0 0 0 0 0 0 0 0 0 0 0 ... 1 1 1 1 1 1 1 1 1 1 1
    SP         (CASEID) int64 1 1 1 1 1 1 1 1 1 1 1 1 ... 1 1 1 1 1 1 1 1 1 1 1
    ID         (CASEID) int64 1 1 1 1 1 1 1 ... 1192 1192 1192 1192 1192 1192
    PURPOSE    (CASEID) int64 1 1 1 1 1 1 1 1 1 1 1 1 ... 4 4 4 4 4 4 4 4 4 4 4
    FIRST      (CASEID) int64 0 0 0 0 0 0 0 0 0 0 0 0 ... 1 1 1 1 1 1 1 1 1 1 1
    ...         ...
    SM_CO      (CASEID) int64 52 49 58 52 42 49 58 43 ... 21 21 17 16 16 17 21
    SM_HE      (CASEID) int64 20 10 30 20 20 10 10 30 ... 20 30 30 10 20 30 30
    SM_SEATS   (CASEID) int64 0 0 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0 0
    CAR_TT     (CASEID) int64 117 117 117 72 90 90 72 ... 96 156 96 96 96 120
    CAR_CO     (CASEID) int64 65 84 52 52 84 52 65 65 ... 70 91 56 70 56 91 70
    CHOICE     (CASEID) int64 2 2 2 2 2 2 2 1 2 2 2 2 ... 3 3 3 2 2 3 2 3 3 2 3
Attributes:
    _caseid_:  CASEID
    _altid_:   _altid_

The swissmetro example models exclude some observations. We can use the Dataset.query_cases method to identify the observations we would like to keep.

m = lx.Model(data.dc.query_cases("PURPOSE in (1,3) and CHOICE != 0"))

We can attach a title to the model. The title does not affect the calculations as all; it is merely used in various output report styles.

m.title = "swissmetro example 02 (weighted logit)"

We need to identify the availability and choice variables.

m.availability_co_vars = {
    1: "TRAIN_AV * (SP!=0)",
    2: "SM_AV",
    3: "CAR_AV * (SP!=0)",
}
m.choice_co_code = 'CHOICE'

This model adds a weighting factor.

m.weight_co_var = "1.0*(GROUP==2)+1.2*(GROUP==3)"

The swissmetro dataset, as with all Biogeme data, is only in co format.

from larch.roles import P,X
m.utility_co[1] = P("ASC_TRAIN")
m.utility_co[2] = 0
m.utility_co[3] = P("ASC_CAR")
m.utility_co[1] += X("TRAIN_TT") * P("B_TIME")
m.utility_co[2] += X("SM_TT") * P("B_TIME")
m.utility_co[3] += X("CAR_TT") * P("B_TIME")
m.utility_co[1] += X("TRAIN_CO*(GA==0)") * P("B_COST")
m.utility_co[2] += X("SM_CO*(GA==0)") * P("B_COST")
m.utility_co[3] += X("CAR_CO") * P("B_COST")

Larch will find all the parameters in the model, but we’d like to output them in a rational order. We can use the ordering method to do this:

m.ordering = [
    ("ASCs", 'ASC.*',),
    ("LOS", 'B_.*',),
]

We can estimate the models and check the results match up with those given by Biogeme:

m.set_cap(15)
m.maximize_loglike(method='SLSQP')
keyvalue
x
0
ASC_CAR -0.114
ASC_TRAIN -0.757
B_COST -0.011
B_TIME -0.013
loglike-5931.557677745116
d_loglike
0
ASC_CAR -0.005
ASC_TRAIN -0.003
B_COST 0.478
B_TIME -0.379
nit11
nfev26
njev11
status0
message'Optimization terminated successfully'
successTrue
elapsed_time0:00:00.178762
method'SLSQP'
n_cases6768
iteration_number0
logloss0.7792172667225135
__verbose_repr__True
m.calculate_parameter_covariance()
m.parameter_summary()
    Value Std Err t Stat Signif Null Value
Category Parameter          
ASCs ASC_CAR -0.114  0.0407 -2.81 ** 0.00
ASC_TRAIN -0.757  0.0528 -14.32 *** 0.00
LOS B_COST -0.0112  0.000490 -22.83 *** 0.00
B_TIME -0.0132  0.000537 -24.62 *** 0.00

Looks good!