101: Swissmetro MNL Mode Choice

101: Swissmetro MNL Mode Choice

import larch.numba as lx

This example is a mode choice model built using the Swissmetro example dataset. First we can create a Model object:

m = lx.Model()

We can attach a title to the model. The title does not affect the calculations as all; it is merely used in various output report styles.

m.title = "swissmetro example 01 (simple logit)"

We need to identify the availability and choice variables. The Swissmetro dataset, as with all Biogeme data, is only in co format, so we must define alternative availability as an expression for each alternative, using a dictionary to map alternative codes and expressions.

m.availability_co_vars = {
    1: "TRAIN_AV * (SP!=0)",
    2: "SM_AV",
    3: "CAR_AV * (SP!=0)",
}

In the Swissmetro example dataset, as in many discrete choice modeling applications, there is one and only one chosen alternative for each case, so the choices can be described as a single expression that evaluates to the code of the chosen alternative.

m.choice_co_code = 'CHOICE'

We will also write utility functions for each alternative. Since the data is only in co format, we must use only the utility_co form for the utility functions.

from larch.roles import P,X
m.utility_co[1] = P("ASC_TRAIN")
m.utility_co[2] = 0
m.utility_co[3] = P("ASC_CAR")
m.utility_co[1] += X("TRAIN_TT") * P("B_TIME")
m.utility_co[2] += X("SM_TT") * P("B_TIME")
m.utility_co[3] += X("CAR_TT") * P("B_TIME")
m.utility_co[1] += X("TRAIN_CO*(GA==0)") * P("B_COST")
m.utility_co[2] += X("SM_CO*(GA==0)") * P("B_COST")
m.utility_co[3] += X("CAR_CO") * P("B_COST")

Larch will find all the parameters in the model, but we’d like to output them in a rational order. We can use the ordering method to do this:

m.ordering = [
    ("ASCs", 'ASC.*',),
    ("LOS", 'B_.*',),
]

Now we can prepare the data, which is available in the data warehouse that comes with Larch.

import pandas as pd
raw_data = pd.read_csv(lx.example_file('swissmetro.csv.gz')).rename_axis(index='CASEID')
raw_data.head()
GROUP SURVEY SP ID PURPOSE FIRST TICKET WHO LUGGAGE AGE MALE INCOME GA ORIGIN DEST TRAIN_AV CAR_AV SM_AV TRAIN_TT TRAIN_CO TRAIN_HE SM_TT SM_CO SM_HE SM_SEATS CAR_TT CAR_CO CHOICE
CASEID
0 2 0 1 1 1 0 1 1 0 3 0 2 0 2 1 1 1 1 112 48 120 63 52 20 0 117 65 2
1 2 0 1 1 1 0 1 1 0 3 0 2 0 2 1 1 1 1 103 48 30 60 49 10 0 117 84 2
2 2 0 1 1 1 0 1 1 0 3 0 2 0 2 1 1 1 1 130 48 60 67 58 30 0 117 52 2
3 2 0 1 1 1 0 1 1 0 3 0 2 0 2 1 1 1 1 103 40 30 63 52 20 0 72 52 2
4 2 0 1 1 1 0 1 1 0 3 0 2 0 2 1 1 1 1 130 36 60 63 42 20 0 90 84 2

The swissmetro example models exclude some observations. We can use pandas to identify the observations we would like to keep.

keep = raw_data.eval("PURPOSE in (1,3) and CHOICE != 0")
selected_data = raw_data[keep]

When you’ve created the data you need, you can pass the dataframe to the larch.DataFrames constructor. Since the swissmetro data is in idco format, we’ll need to explicitly identify the alternative codes as well.

ds = lx.Dataset.construct.from_idco(selected_data, alts={1:'Train', 2:'SM', 3:'Car'})
ds
<xarray.Dataset>
Dimensions:    (CASEID: 6768, _altid_: 3)
Coordinates:
  * CASEID     (CASEID) int64 0 1 2 3 4 5 6 ... 8445 8446 8447 8448 8449 8450
  * _altid_    (_altid_) int64 1 2 3
    alt_names  (_altid_) <U5 'Train' 'SM' 'Car'
Data variables: (12/28)
    GROUP      (CASEID) int64 2 2 2 2 2 2 2 2 2 2 2 2 ... 3 3 3 3 3 3 3 3 3 3 3
    SURVEY     (CASEID) int64 0 0 0 0 0 0 0 0 0 0 0 0 ... 1 1 1 1 1 1 1 1 1 1 1
    SP         (CASEID) int64 1 1 1 1 1 1 1 1 1 1 1 1 ... 1 1 1 1 1 1 1 1 1 1 1
    ID         (CASEID) int64 1 1 1 1 1 1 1 1 ... 939 939 939 939 939 939 939
    PURPOSE    (CASEID) int64 1 1 1 1 1 1 1 1 1 1 1 1 ... 3 3 3 3 3 3 3 3 3 3 3
    FIRST      (CASEID) int64 0 0 0 0 0 0 0 0 0 0 0 0 ... 1 1 1 1 1 1 1 1 1 1 1
    ...         ...
    SM_CO      (CASEID) int64 52 49 58 52 42 49 58 43 ... 21 21 17 16 16 17 21
    SM_HE      (CASEID) int64 20 10 30 20 20 10 10 30 ... 20 30 30 10 20 30 30
    SM_SEATS   (CASEID) int64 0 0 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0 0
    CAR_TT     (CASEID) int64 117 117 117 72 90 90 72 ... 80 130 80 80 80 100
    CAR_CO     (CASEID) int64 65 84 52 52 84 52 65 65 ... 80 104 64 80 64 104 80
    CHOICE     (CASEID) int64 2 2 2 2 2 2 2 1 2 2 2 2 ... 2 2 1 1 1 1 1 1 1 1 1
Attributes:
    _caseid_:  CASEID
    _altid_:   _altid_

You might notice we have not carefully constructed this object to include only the relevant data or the various simple transformations used in the utility definition above. Larch can do this itself, if you assign this DataFrames not as the actual set of data used in model estimation, but rather as the dataservice that can be used as the source to create these computational arrays.

m.datatree = ds

We can estimate the models and check the results match up with those given by Biogeme:

m.set_cap(15)
m.maximize_loglike(method='SLSQP')
keyvalue
x
0
ASC_CAR -0.155
ASC_TRAIN -0.701
B_COST -0.011
B_TIME -0.013
loglike-5331.252006918114
d_loglike
0
ASC_CAR 0.011
ASC_TRAIN -0.027
B_COST -1.223
B_TIME -1.843
nit10
nfev26
njev10
status0
message'Optimization terminated successfully'
successTrue
elapsed_time0:00:00.170717
method'SLSQP'
n_cases6768
iteration_number0
logloss0.7877145400292721
__verbose_repr__True
m.calculate_parameter_covariance()
m.parameter_summary()
    Value Std Err t Stat Signif Null Value
Category Parameter          
ASCs ASC_CAR -0.155  0.0432 -3.58 *** 0.00
ASC_TRAIN -0.701  0.0549 -12.78 *** 0.00
LOS B_COST -0.0108  0.000518 -20.91 *** 0.00
B_TIME -0.0128  0.000569 -22.46 *** 0.00