302: Itinerary Choice using Simple Nested Logit

302: Itinerary Choice using Simple Nested Logit

import pandas as pd
import larch
larch.__version__
'5.7.0'

This example is an itinerary choice model built using the example itinerary choice dataset included with Larch. See example 300 for details.

from larch.data_warehouse import example_file
itin = pd.read_csv(example_file("arc"), index_col=['id_case','id_alt'])
d = larch.DataFrames(itin, ch='choice', crack=True, autoscale_weights=True)
rescaled array of weights by a factor of 2239.980952380952

We will be building a nested logit model, but in order to do so we need to rationalize the alternative numbers. As given, our raw itinerary choice data has a lot of alternatives, but they are not ordered or numbered in a regular way; each elemental alternative has an arbitrary code number assigned to it, and the code numbers for one case are not comparable to another case. We need to renumber the alternatives in a manner that is more suited for our application, such that based on the code number we can programatically extract a the relevant features of the alternative that we will want to use in building our nested logit model. In this example we want to test a model which has nests based on level of service. To renumber, first we will define the relevant categories and values, and establish a numbering system using a special object:

d1 = d.new_systematic_alternatives(
    groupby='nb_cnxs',
    name='alternative_code',
    padding_levels=4,
    groupby_prefixes=['Cnx'],
    overwrite=False,
    complete_features_list={'nb_cnxs':[0,1,2]},
)

If we compare the new data with the old data, we’ll see that we have created a few more alternative.

d.info()
larch.DataFrames:  (not computation-ready)
  n_cases: 105
  n_alts: 127
  data_ce: 8 variables, 6023 rows
  data_co: 3 variables
  data_av: <populated>
  data_ch: choice
  data_wt: computed_weight (/ 2239.980952380952)
d1.info()
larch.DataFrames:  (not computation-ready)
  n_cases: 105
  n_alts: 134
  data_ce: 9 variables, 6023 rows
  data_co: 3 variables
  data_av: <populated>
  data_ch: choice
  data_wt: computed_weight (/ 2239.98095703125)

Now let’s make our model. The utility function we will use is the same as the one we used for the MNL version of the model.

m = larch.Model(dataservice=d1)

v = [
    "timeperiod==2",
    "timeperiod==3",
    "timeperiod==4",
    "timeperiod==5",
    "timeperiod==6",
    "timeperiod==7",
    "timeperiod==8",
    "timeperiod==9",
    "carrier==2",
    "carrier==3",
    "carrier==4",
    "carrier==5",
    "equipment==2",
    "fare_hy",
    "fare_ly",    
    "elapsed_time",  
    "nb_cnxs",       
]
from larch.roles import PX
m.utility_ca = sum(PX(i) for i in v)

m.choice_ca_var = 'choice'

If we just end our model specification here, we will have a plain MNL model. To change to a nested logit model, all we need to do is add the nests. We can do this easily, using the special magic_nesting method, that uses the structure of the data that we defined above.

m.magic_nesting()
m.load_data()
req_data does not request weight_co but it is set and being provided
req_data does not request avail_ca or avail_co but it is set and being provided
converting data_ce to <class 'numpy.float64'>
m.maximize_loglike()

Iteration 042 [Optimization terminated successfully]

Best LL = -777705.7732910335

value initvalue nullvalue minimum maximum holdfast note best
MU_nb_cnxs 0.691151 1.0 1.0 0.001 1.0 0 0.691151
carrier==2 0.079567 0.0 0.0 -inf inf 0 0.079567
carrier==3 0.440537 0.0 0.0 -inf inf 0 0.440537
carrier==4 0.397000 0.0 0.0 -inf inf 0 0.397000
carrier==5 -0.439005 0.0 0.0 -inf inf 0 -0.439005
elapsed_time -0.004229 0.0 0.0 -inf inf 0 -0.004229
equipment==2 0.326813 0.0 0.0 -inf inf 0 0.326813
fare_hy -0.000847 0.0 0.0 -inf inf 0 -0.000847
fare_ly -0.000857 0.0 0.0 -inf inf 0 -0.000857
nb_cnxs -3.156922 0.0 0.0 -inf inf 0 -3.156922
timeperiod==2 0.065438 0.0 0.0 -inf inf 0 0.065438
timeperiod==3 0.087974 0.0 0.0 -inf inf 0 0.087974
timeperiod==4 0.042816 0.0 0.0 -inf inf 0 0.042816
timeperiod==5 0.096447 0.0 0.0 -inf inf 0 0.096447
timeperiod==6 0.164563 0.0 0.0 -inf inf 0 0.164563
timeperiod==7 0.243778 0.0 0.0 -inf inf 0 0.243778
timeperiod==8 0.245030 0.0 0.0 -inf inf 0 0.245030
timeperiod==9 -0.006025 0.0 0.0 -inf inf 0 -0.006025
/home/runner/work/larch/larch/larch/larch/model/optimization.py:308: UserWarning: slsqp may not play nicely with unbounded parameters
if you get poor results, consider setting global bounds with model.set_cap()
  warnings.warn( # infinite bounds # )
keyvalue
x
0
MU_nb_cnxs 0.691151
carrier==2 0.079567
carrier==3 0.440537
carrier==4 0.397000
carrier==5 -0.439005
elapsed_time -0.004229
equipment==2 0.326813
fare_hy -0.000847
fare_ly -0.000857
nb_cnxs -3.156922
timeperiod==2 0.065438
timeperiod==3 0.087974
timeperiod==4 0.042816
timeperiod==5 0.096447
timeperiod==6 0.164563
timeperiod==7 0.243778
timeperiod==8 0.245030
timeperiod==9 -0.006025
loglike-777705.7732910335
d_loglike
0
MU_nb_cnxs 0.019040
carrier==2 -0.099388
carrier==3 -0.042668
carrier==4 0.079734
carrier==5 0.092789
elapsed_time 4.260928
equipment==2 0.067728
fare_hy -9.444235
fare_ly -3.831194
nb_cnxs 0.001616
timeperiod==2 0.036795
timeperiod==3 0.021406
timeperiod==4 0.049108
timeperiod==5 -0.011497
timeperiod==6 -0.020032
timeperiod==7 0.012699
timeperiod==8 0.000712
timeperiod==9 -0.016040
nit42
nfev137
njev42
status0
message'Optimization terminated successfully'
successTrue
elapsed_time0:00:00.453657
method'slsqp'
n_cases105
iteration_number42
logloss3.3066002758377295