# 102: Swissmetro Weighted MNL Mode Choice

In [None]:
# TEST
import larch
import os
import pandas as pd
pd.set_option("display.max_columns", 999)
pd.set_option('expand_frame_repr', False)
pd.set_option('display.precision', 3)
larch._doctest_mode_ = True
import larch.numba as lx

In [None]:
import pandas as pd
import larch.numba as lx

This example is a mode choice model built using the Swissmetro example dataset.
First we create the Dataset and Model objects:

In [None]:
raw_data = pd.read_csv(lx.example_file('swissmetro.csv.gz')).rename_axis(index='CASEID')
data = lx.Dataset.construct.from_idco(raw_data, alts={1:'Train', 2:'SM', 3:'Car'})
data

The swissmetro example models exclude some observations.  We can use the 
`Dataset.query_cases` method to identify the observations we would like to keep.

In [None]:
m = lx.Model(data.dc.query_cases("PURPOSE in (1,3) and CHOICE != 0"))

We can attach a title to the model. The title does not affect the calculations
as all; it is merely used in various output report styles.

In [None]:
m.title = "swissmetro example 02 (weighted logit)"

We need to identify the availability and choice variables.

In [None]:
m.availability_co_vars = {
    1: "TRAIN_AV * (SP!=0)",
    2: "SM_AV",
    3: "CAR_AV * (SP!=0)",
}
m.choice_co_code = 'CHOICE'

This model adds a weighting factor.

In [None]:
m.weight_co_var = "1.0*(GROUP==2)+1.2*(GROUP==3)"

The swissmetro dataset, as with all Biogeme data, is only in `co` format.

In [None]:
from larch.roles import P,X
m.utility_co[1] = P("ASC_TRAIN")
m.utility_co[2] = 0
m.utility_co[3] = P("ASC_CAR")
m.utility_co[1] += X("TRAIN_TT") * P("B_TIME")
m.utility_co[2] += X("SM_TT") * P("B_TIME")
m.utility_co[3] += X("CAR_TT") * P("B_TIME")
m.utility_co[1] += X("TRAIN_CO*(GA==0)") * P("B_COST")
m.utility_co[2] += X("SM_CO*(GA==0)") * P("B_COST")
m.utility_co[3] += X("CAR_CO") * P("B_COST")

Larch will find all the parameters in the model, but we'd like to output them in
a rational order.  We can use the ordering method to do this:

In [None]:
m.ordering = [
    ("ASCs", 'ASC.*',),
    ("LOS", 'B_.*',),
]

In [None]:
# TEST
from pytest import approx
assert m.loglike() == approx(-7892.111473285806)

We can estimate the models and check the results match up with those given by Biogeme:

In [None]:
m.set_cap(15)
m.maximize_loglike(method='SLSQP')

In [None]:
# TEST
r = _
from pytest import approx
assert r.loglike == approx(-5931.557677709527)

In [None]:
m.calculate_parameter_covariance()
m.parameter_summary()

In [None]:
# TEST
assert m.parameter_summary().data.to_markdown() == '''
|                       |   Value |   Std Err |   t Stat | Signif   |   Null Value |
|:----------------------|--------:|----------:|---------:|:---------|-------------:|
| ('ASCs', 'ASC_CAR')   | -0.114  |  0.0407   |    -2.81 | **       |            0 |
| ('ASCs', 'ASC_TRAIN') | -0.757  |  0.0528   |   -14.32 | ***      |            0 |
| ('LOS', 'B_COST')     | -0.0112 |  0.00049  |   -22.83 | ***      |            0 |
| ('LOS', 'B_TIME')     | -0.0132 |  0.000537 |   -24.62 | ***      |            0 |
'''[1:-1]

Looks good!