302: Itinerary Choice using Simple Nested Logit
302: Itinerary Choice using Simple Nested Logit¶
import pandas as pd
import larch
larch.__version__
'5.7.0'
This example is an itinerary choice model built using the example itinerary choice dataset included with Larch. See example 300 for details.
from larch.data_warehouse import example_file
itin = pd.read_csv(example_file("arc"), index_col=['id_case','id_alt'])
d = larch.DataFrames(itin, ch='choice', crack=True, autoscale_weights=True)
rescaled array of weights by a factor of 2239.980952380952
We will be building a nested logit model, but in order to do so we need to rationalize the alternative numbers. As given, our raw itinerary choice data has a lot of alternatives, but they are not ordered or numbered in a regular way; each elemental alternative has an arbitrary code number assigned to it, and the code numbers for one case are not comparable to another case. We need to renumber the alternatives in a manner that is more suited for our application, such that based on the code number we can programatically extract a the relevant features of the alternative that we will want to use in building our nested logit model. In this example we want to test a model which has nests based on level of service. To renumber, first we will define the relevant categories and values, and establish a numbering system using a special object:
d1 = d.new_systematic_alternatives(
groupby='nb_cnxs',
name='alternative_code',
padding_levels=4,
groupby_prefixes=['Cnx'],
overwrite=False,
complete_features_list={'nb_cnxs':[0,1,2]},
)
If we compare the new data with the old data, we’ll see that we have created a few more alternative.
d.info()
larch.DataFrames: (not computation-ready)
n_cases: 105
n_alts: 127
data_ce: 8 variables, 6023 rows
data_co: 3 variables
data_av: <populated>
data_ch: choice
data_wt: computed_weight (/ 2239.980952380952)
d1.info()
larch.DataFrames: (not computation-ready)
n_cases: 105
n_alts: 134
data_ce: 9 variables, 6023 rows
data_co: 3 variables
data_av: <populated>
data_ch: choice
data_wt: computed_weight (/ 2239.98095703125)
Now let’s make our model. The utility function we will use is the same as the one we used for the MNL version of the model.
m = larch.Model(dataservice=d1)
v = [
"timeperiod==2",
"timeperiod==3",
"timeperiod==4",
"timeperiod==5",
"timeperiod==6",
"timeperiod==7",
"timeperiod==8",
"timeperiod==9",
"carrier==2",
"carrier==3",
"carrier==4",
"carrier==5",
"equipment==2",
"fare_hy",
"fare_ly",
"elapsed_time",
"nb_cnxs",
]
from larch.roles import PX
m.utility_ca = sum(PX(i) for i in v)
m.choice_ca_var = 'choice'
If we just end our model specification here, we will have a plain MNL model. To change to
a nested logit model, all we need to do is add the nests. We can do this easily, using the
special magic_nesting
method, that uses the structure of the data that we defined above.
m.magic_nesting()
m.load_data()
req_data does not request weight_co but it is set and being provided
req_data does not request avail_ca or avail_co but it is set and being provided
converting data_ce to <class 'numpy.float64'>
m.maximize_loglike()
Iteration 042 [Optimization terminated successfully]
Best LL = -777705.7732910335
value | initvalue | nullvalue | minimum | maximum | holdfast | note | best | |
---|---|---|---|---|---|---|---|---|
MU_nb_cnxs | 0.691151 | 1.0 | 1.0 | 0.001 | 1.0 | 0 | 0.691151 | |
carrier==2 | 0.079567 | 0.0 | 0.0 | -inf | inf | 0 | 0.079567 | |
carrier==3 | 0.440537 | 0.0 | 0.0 | -inf | inf | 0 | 0.440537 | |
carrier==4 | 0.397000 | 0.0 | 0.0 | -inf | inf | 0 | 0.397000 | |
carrier==5 | -0.439005 | 0.0 | 0.0 | -inf | inf | 0 | -0.439005 | |
elapsed_time | -0.004229 | 0.0 | 0.0 | -inf | inf | 0 | -0.004229 | |
equipment==2 | 0.326813 | 0.0 | 0.0 | -inf | inf | 0 | 0.326813 | |
fare_hy | -0.000847 | 0.0 | 0.0 | -inf | inf | 0 | -0.000847 | |
fare_ly | -0.000857 | 0.0 | 0.0 | -inf | inf | 0 | -0.000857 | |
nb_cnxs | -3.156922 | 0.0 | 0.0 | -inf | inf | 0 | -3.156922 | |
timeperiod==2 | 0.065438 | 0.0 | 0.0 | -inf | inf | 0 | 0.065438 | |
timeperiod==3 | 0.087974 | 0.0 | 0.0 | -inf | inf | 0 | 0.087974 | |
timeperiod==4 | 0.042816 | 0.0 | 0.0 | -inf | inf | 0 | 0.042816 | |
timeperiod==5 | 0.096447 | 0.0 | 0.0 | -inf | inf | 0 | 0.096447 | |
timeperiod==6 | 0.164563 | 0.0 | 0.0 | -inf | inf | 0 | 0.164563 | |
timeperiod==7 | 0.243778 | 0.0 | 0.0 | -inf | inf | 0 | 0.243778 | |
timeperiod==8 | 0.245030 | 0.0 | 0.0 | -inf | inf | 0 | 0.245030 | |
timeperiod==9 | -0.006025 | 0.0 | 0.0 | -inf | inf | 0 | -0.006025 |
/home/runner/work/larch/larch/larch/larch/model/optimization.py:308: UserWarning: slsqp may not play nicely with unbounded parameters
if you get poor results, consider setting global bounds with model.set_cap()
warnings.warn( # infinite bounds # )
key | value | ||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
x |
| ||||||||||||||||||||||||||||||||||||||
loglike | -777705.7732910335 | ||||||||||||||||||||||||||||||||||||||
d_loglike |
| ||||||||||||||||||||||||||||||||||||||
nit | 42 | ||||||||||||||||||||||||||||||||||||||
nfev | 137 | ||||||||||||||||||||||||||||||||||||||
njev | 42 | ||||||||||||||||||||||||||||||||||||||
status | 0 | ||||||||||||||||||||||||||||||||||||||
message | 'Optimization terminated successfully' | ||||||||||||||||||||||||||||||||||||||
success | True | ||||||||||||||||||||||||||||||||||||||
elapsed_time | 0:00:00.453657 | ||||||||||||||||||||||||||||||||||||||
method | 'slsqp' | ||||||||||||||||||||||||||||||||||||||
n_cases | 105 | ||||||||||||||||||||||||||||||||||||||
iteration_number | 42 | ||||||||||||||||||||||||||||||||||||||
logloss | 3.3066002758377295 |