101: Swissmetro MNL Mode Choice¶

import larch.numba as lx

This example is a mode choice model built using the Swissmetro example dataset. First we can create a Model object:

m = lx.Model()

We can attach a title to the model. The title does not affect the calculations as all; it is merely used in various output report styles.

m.title = "swissmetro example 01 (simple logit)"

We need to identify the availability and choice variables. The Swissmetro dataset, as with all Biogeme data, is only in co format, so we must define alternative availability as an expression for each alternative, using a dictionary to map alternative codes and expressions.

m.availability_co_vars = {
    1: "TRAIN_AV * (SP!=0)",
    2: "SM_AV",
    3: "CAR_AV * (SP!=0)",
}

In the Swissmetro example dataset, as in many discrete choice modeling applications, there is one and only one chosen alternative for each case, so the choices can be described as a single expression that evaluates to the code of the chosen alternative.

m.choice_co_code = 'CHOICE'

We will also write utility functions for each alternative. Since the data is only in co format, we must use only the utility_co form for the utility functions.

from larch.roles import P,X
m.utility_co[1] = P("ASC_TRAIN")
m.utility_co[2] = 0
m.utility_co[3] = P("ASC_CAR")
m.utility_co[1] += X("TRAIN_TT") * P("B_TIME")
m.utility_co[2] += X("SM_TT") * P("B_TIME")
m.utility_co[3] += X("CAR_TT") * P("B_TIME")
m.utility_co[1] += X("TRAIN_CO*(GA==0)") * P("B_COST")
m.utility_co[2] += X("SM_CO*(GA==0)") * P("B_COST")
m.utility_co[3] += X("CAR_CO") * P("B_COST")

Larch will find all the parameters in the model, but we’d like to output them in a rational order. We can use the ordering method to do this:

m.ordering = [
    ("ASCs", 'ASC.*',),
    ("LOS", 'B_.*',),
]

Now we can prepare the data, which is available in the data warehouse that comes with Larch.

import pandas as pd
raw_data = pd.read_csv(lx.example_file('swissmetro.csv.gz')).rename_axis(index='CASEID')
raw_data.head()

	GROUP	SURVEY	SP	ID	PURPOSE	FIRST	TICKET	WHO	LUGGAGE	AGE	MALE	INCOME	GA	ORIGIN	DEST	TRAIN_AV	CAR_AV	SM_AV	TRAIN_TT	TRAIN_CO	TRAIN_HE	SM_TT	SM_CO	SM_HE	SM_SEATS	CAR_TT	CAR_CO	CHOICE
CASEID
0	2	0	1	1	1	0	1	1	0	3	0	2	0	2	1	1	1	1	112	48	120	63	52	20	0	117	65	2
1	2	0	1	1	1	0	1	1	0	3	0	2	0	2	1	1	1	1	103	48	30	60	49	10	0	117	84	2
2	2	0	1	1	1	0	1	1	0	3	0	2	0	2	1	1	1	1	130	48	60	67	58	30	0	117	52	2
3	2	0	1	1	1	0	1	1	0	3	0	2	0	2	1	1	1	1	103	40	30	63	52	20	0	72	52	2
4	2	0	1	1	1	0	1	1	0	3	0	2	0	2	1	1	1	1	130	36	60	63	42	20	0	90	84	2

The swissmetro example models exclude some observations. We can use pandas to identify the observations we would like to keep.

keep = raw_data.eval("PURPOSE in (1,3) and CHOICE != 0")
selected_data = raw_data[keep]

When you’ve created the data you need, you can pass the dataframe to the larch.DataFrames constructor. Since the swissmetro data is in idco format, we’ll need to explicitly identify the alternative codes as well.

ds = lx.Dataset.construct.from_idco(selected_data, alts={1:'Train', 2:'SM', 3:'Car'})

You might notice we have not carefully constructed this object to include only the relevant data or the various simple transformations used in the utility definition above. Larch can do this itself, if you assign this DataFrames not as the actual set of data used in model estimation, but rather as the dataservice that can be used as the source to create these computational arrays.

m.datatree = ds

We can estimate the models and check the results match up with those given by Biogeme:

m.set_cap(15)
m.maximize_loglike(method='SLSQP')

key

value

x

	0
ASC_CAR	-0.155
ASC_TRAIN	-0.701
B_COST	-0.011
B_TIME	-0.013

loglike

-5331.252006918114

d_loglike

	0
ASC_CAR	0.011
ASC_TRAIN	-0.027
B_COST	-1.223
B_TIME	-1.843

nit

10

nfev

26

njev

10

status

0

message

'Optimization terminated successfully'

success

True

elapsed_time

0:00:00.170717

method

'SLSQP'

n_cases

6768

iteration_number

0

logloss

0.7877145400292721

__verbose_repr__

True

m.calculate_parameter_covariance()

m.parameter_summary()

		Value	Std Err	t Stat	Signif	Null Value
Category	Parameter
ASCs	ASC_CAR	-0.155	0.0432	-3.58	***	0.00
ASCs	ASC_TRAIN	-0.701	0.0549	-12.78	***	0.00
LOS	B_COST	-0.0108	0.000518	-20.91	***	0.00
LOS	B_TIME	-0.0128	0.000569	-22.46	***	0.00

v5.7.0

101: Swissmetro MNL Mode Choice

101: Swissmetro MNL Mode Choice¶