102: Swissmetro Weighted MNL Mode Choice
102: Swissmetro Weighted MNL Mode Choice¶
import pandas as pd
import larch.numba as lx
This example is a mode choice model built using the Swissmetro example dataset. First we create the Dataset and Model objects:
raw_data = pd.read_csv(lx.example_file('swissmetro.csv.gz')).rename_axis(index='CASEID')
data = lx.Dataset.construct.from_idco(raw_data, alts={1:'Train', 2:'SM', 3:'Car'})
data
<xarray.Dataset>
Dimensions: (CASEID: 10728, _altid_: 3)
Coordinates:
* CASEID (CASEID) int64 0 1 2 3 4 5 ... 10723 10724 10725 10726 10727
* _altid_ (_altid_) int64 1 2 3
alt_names (_altid_) <U5 'Train' 'SM' 'Car'
Data variables: (12/28)
GROUP (CASEID) int64 2 2 2 2 2 2 2 2 2 2 2 2 ... 3 3 3 3 3 3 3 3 3 3 3
SURVEY (CASEID) int64 0 0 0 0 0 0 0 0 0 0 0 0 ... 1 1 1 1 1 1 1 1 1 1 1
SP (CASEID) int64 1 1 1 1 1 1 1 1 1 1 1 1 ... 1 1 1 1 1 1 1 1 1 1 1
ID (CASEID) int64 1 1 1 1 1 1 1 ... 1192 1192 1192 1192 1192 1192
PURPOSE (CASEID) int64 1 1 1 1 1 1 1 1 1 1 1 1 ... 4 4 4 4 4 4 4 4 4 4 4
FIRST (CASEID) int64 0 0 0 0 0 0 0 0 0 0 0 0 ... 1 1 1 1 1 1 1 1 1 1 1
... ...
SM_CO (CASEID) int64 52 49 58 52 42 49 58 43 ... 21 21 17 16 16 17 21
SM_HE (CASEID) int64 20 10 30 20 20 10 10 30 ... 20 30 30 10 20 30 30
SM_SEATS (CASEID) int64 0 0 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0 0
CAR_TT (CASEID) int64 117 117 117 72 90 90 72 ... 96 156 96 96 96 120
CAR_CO (CASEID) int64 65 84 52 52 84 52 65 65 ... 70 91 56 70 56 91 70
CHOICE (CASEID) int64 2 2 2 2 2 2 2 1 2 2 2 2 ... 3 3 3 2 2 3 2 3 3 2 3
Attributes:
_caseid_: CASEID
_altid_: _altid_The swissmetro example models exclude some observations. We can use the
Dataset.query_cases method to identify the observations we would like to keep.
m = lx.Model(data.dc.query_cases("PURPOSE in (1,3) and CHOICE != 0"))
We can attach a title to the model. The title does not affect the calculations as all; it is merely used in various output report styles.
m.title = "swissmetro example 02 (weighted logit)"
We need to identify the availability and choice variables.
m.availability_co_vars = {
1: "TRAIN_AV * (SP!=0)",
2: "SM_AV",
3: "CAR_AV * (SP!=0)",
}
m.choice_co_code = 'CHOICE'
This model adds a weighting factor.
m.weight_co_var = "1.0*(GROUP==2)+1.2*(GROUP==3)"
The swissmetro dataset, as with all Biogeme data, is only in co format.
from larch.roles import P,X
m.utility_co[1] = P("ASC_TRAIN")
m.utility_co[2] = 0
m.utility_co[3] = P("ASC_CAR")
m.utility_co[1] += X("TRAIN_TT") * P("B_TIME")
m.utility_co[2] += X("SM_TT") * P("B_TIME")
m.utility_co[3] += X("CAR_TT") * P("B_TIME")
m.utility_co[1] += X("TRAIN_CO*(GA==0)") * P("B_COST")
m.utility_co[2] += X("SM_CO*(GA==0)") * P("B_COST")
m.utility_co[3] += X("CAR_CO") * P("B_COST")
Larch will find all the parameters in the model, but we’d like to output them in a rational order. We can use the ordering method to do this:
m.ordering = [
("ASCs", 'ASC.*',),
("LOS", 'B_.*',),
]
We can estimate the models and check the results match up with those given by Biogeme:
m.set_cap(15)
m.maximize_loglike(method='SLSQP')
| key | value | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| x |
| ||||||||||
| loglike | -5931.557677745116 | ||||||||||
| d_loglike |
| ||||||||||
| nit | 11 | ||||||||||
| nfev | 26 | ||||||||||
| njev | 11 | ||||||||||
| status | 0 | ||||||||||
| message | 'Optimization terminated successfully' | ||||||||||
| success | True | ||||||||||
| elapsed_time | 0:00:00.178762 | ||||||||||
| method | 'SLSQP' | ||||||||||
| n_cases | 6768 | ||||||||||
| iteration_number | 0 | ||||||||||
| logloss | 0.7792172667225135 | ||||||||||
| __verbose_repr__ | True |
m.calculate_parameter_covariance()
m.parameter_summary()
| Value | Std Err | t Stat | Signif | Null Value | ||
|---|---|---|---|---|---|---|
| Category | Parameter | |||||
| ASCs | ASC_CAR | -0.114 | 0.0407 | -2.81 | ** | 0.00 |
| ASC_TRAIN | -0.757 | 0.0528 | -14.32 | *** | 0.00 | |
| LOS | B_COST | -0.0112 | 0.000490 | -22.83 | *** | 0.00 |
| B_TIME | -0.0132 | 0.000537 | -24.62 | *** | 0.00 |
Looks good!