201: Exampville Mode Choice
Contents
201: Exampville Mode Choice¶
Welcome to Exampville, the best simulated town in this here part of the internet!
Exampville is a demonstration provided with Larch that walks through some of the data and tools that a transportation planner might use when building a travel model.
import os
import numpy as np
import pandas as pd
import larch.numba as lx
from larch import P, X
In this example notebook, we will walk through the creation of a tour mode choice model.
To begin, we’ll re-load the tours and skims data from the data setup example.
hh, pp, tour, skims = lx.example(200, ['hh', 'pp', 'tour', 'skims'])
Preprocessing¶
The Exampville data output contains a set of files similar to what we might find for a real travel survey: network skims, and tables of households, persons, and tours. We’ll need to connect these tables together to create a composite dataset for mode choice model estimation, using the DataTree structure.
from addicty import Dict
Mode = Dict(
DA = 1,
SR = 2,
Walk = 3,
Bike = 4,
Transit = 5,
).freeze()
tour_dataset = lx.Dataset.construct.from_idco(tour.set_index('TOURID'), alts=Mode)
od_skims = lx.Dataset.construct.from_omx(skims)
dt = lx.DataTree(
tour=tour_dataset,
hh=hh.set_index('HHID'),
person=pp.set_index('PERSONID'),
od=od_skims,
do=od_skims,
relationships=(
"tours.HHID @ hh.HHID",
"tours.PERSONID @ person.PERSONID",
"hh.HOMETAZ @ od.otaz",
"tours.DTAZ @ od.dtaz",
"hh.HOMETAZ @ do.dtaz",
"tours.DTAZ @ do.otaz",
),
)
In Exampville, there are only two kinds of trips:
work (purpose=1) and
non-work (purpose=2).
We want to estimate a mode choice model for work trips, so we’ll begin by excluding all the other trips:
dt_work = dt.query_cases("TOURPURP == 1")
Model Definition¶
And then we are ready to create our model.
m = lx.Model(datatree = dt_work)
m.title = "Exampville Work Tour Mode Choice v1"
We will explicitly define the set of utility functions
we want to use. Because the DataFrames we are using to
serve data to this model contains exclusively idco
format
data, we’ll use only the utility_co
mapping to define
a unique utility function for each alternative.
m.utility_co[Mode.DA] = (
+ P.InVehTime * X.AUTO_TIME
+ P.Cost * X.AUTO_COST # dollars per mile
)
m.utility_co[Mode.SR] = (
+ P.ASC_SR
+ P.InVehTime * X.AUTO_TIME
+ P.Cost * (X.AUTO_COST * 0.5) # dollars per mile, half share
+ P("LogIncome:SR") * X("log(INCOME)")
)
m.utility_co[Mode.Walk] = (
+ P.ASC_Walk
+ P.NonMotorTime * X.WALK_TIME
+ P("LogIncome:Walk") * X("log(INCOME)")
)
m.utility_co[Mode.Bike] = (
+ P.ASC_Bike
+ P.NonMotorTime * X.BIKE_TIME
+ P("LogIncome:Bike") * X("log(INCOME)")
)
m.utility_co[Mode.Transit] = (
+ P.ASC_Transit
+ P.InVehTime * X.TRANSIT_IVTT
+ P.OutVehTime * X.TRANSIT_OVTT
+ P.Cost * X.TRANSIT_FARE
+ P("LogIncome:Transit") * X('log(INCOME)')
)
To write a nested logit mode, we’ll attach some nesting nodes to the
model’s graph
. Each new_node
allows us to define the set of
codes for the child nodes (elemental alternatives, or lower level nests)
as well as giving the new nest a name and assigning a logsum parameter.
The return value of this method is the node code for the newly created
nest, which then can potenially be used as a child code when creating
a higher level nest. We do this below, adding the ‘Car’ nest into the
‘Motor’ nest.
Car = m.graph.new_node(parameter='Mu:Car', children=[Mode.DA, Mode.SR], name='Car')
NonMotor = m.graph.new_node(parameter='Mu:NonMotor', children=[Mode.Walk, Mode.Bike], name='NonMotor')
Motor = m.graph.new_node(parameter='Mu:Motor', children=[Car, Mode.Transit], name='Motor')
Let’s visually check on the nesting structure.
m.graph
The tour mode choice model’s choice variable is indicated by
the code value in ‘TOURMODE’, and this can be
defined for the model using choice_co_code
.
m.choice_co_code = 'TOURMODE'
We can also give a dictionary of availability conditions based
on values in the idco
data, using the availability_co_vars
attribute. Alternatives that are always available can be indicated
by setting the criterion to 1.
m.availability_co_vars = {
Mode.DA: 'AGE >= 16',
Mode.SR: 1,
Mode.Walk: 'WALK_TIME < 60',
Mode.Bike: 'BIKE_TIME < 60',
Mode.Transit: 'TRANSIT_FARE>0',
}
Then let’s prepare this data for estimation. Even though the
data is already in memory, the load_data
method is used to
pre-process the data, extracting the required values, pre-computing
the values of fixed expressions, and assembling the results into
contiguous arrays suitable for computing the log likelihood values
efficiently.
Model Estimation¶
We can check on some important statistics of this loaded data even before we estimate the model.
m.choice_avail_summary()
name | chosen | available | availability condition | |
---|---|---|---|---|
1 | DA | 6052 | 7564 | AGE >= 16 |
2 | SR | 810 | 7564 | 1 |
3 | Walk | 196 | 4179 | WALK_TIME < 60 |
4 | Bike | 72 | 7564 | BIKE_TIME < 60 |
5 | Transit | 434 | 4199 | TRANSIT_FARE>0 |
6 | Car | 6862 | 7564 | |
7 | NonMotor | 268 | 7564 | |
8 | Motor | 7296 | 7564 | |
< Total All Alternatives > | 7564 |
If we are satisfied with the statistics we see above, we can go ahead and estimate the model.
m.set_cap(20) # improves optimization stability
result = m.maximize_loglike()
Iteration 038 [Optimization terminated successfully]
Best LL = -3493.0397299489387
value | initvalue | nullvalue | minimum | maximum | holdfast | note | best | |
---|---|---|---|---|---|---|---|---|
ASC_Bike | -0.258018 | 0.0 | 0.0 | -20.000 | 20.0 | 0 | -0.258018 | |
ASC_SR | 1.422911 | 0.0 | 0.0 | -20.000 | 20.0 | 0 | 1.422911 | |
ASC_Transit | 6.754511 | 0.0 | 0.0 | -20.000 | 20.0 | 0 | 6.754511 | |
ASC_Walk | 8.621508 | 0.0 | 0.0 | -20.000 | 20.0 | 0 | 8.621508 | |
Cost | -0.175693 | 0.0 | 0.0 | -20.000 | 20.0 | 0 | -0.175693 | |
InVehTime | -0.123722 | 0.0 | 0.0 | -20.000 | 20.0 | 0 | -0.123722 | |
LogIncome:Bike | -0.196983 | 0.0 | 0.0 | -20.000 | 20.0 | 0 | -0.196983 | |
LogIncome:SR | -0.193812 | 0.0 | 0.0 | -20.000 | 20.0 | 0 | -0.193812 | |
LogIncome:Transit | -0.557151 | 0.0 | 0.0 | -20.000 | 20.0 | 0 | -0.557151 | |
LogIncome:Walk | -0.522787 | 0.0 | 0.0 | -20.000 | 20.0 | 0 | -0.522787 | |
Mu:Car | 0.259301 | 1.0 | 1.0 | 0.001 | 1.0 | 0 | 0.259301 | |
Mu:Motor | 0.801653 | 1.0 | 1.0 | 0.001 | 1.0 | 0 | 0.801653 | |
Mu:NonMotor | 0.853783 | 1.0 | 1.0 | 0.001 | 1.0 | 0 | 0.853783 | |
NonMotorTime | -0.265584 | 0.0 | 0.0 | -20.000 | 20.0 | 0 | -0.265584 | |
OutVehTime | -0.254813 | 0.0 | 0.0 | -20.000 | 20.0 | 0 | -0.254813 |
After we find the best fitting parameters, we can compute
some variance-covariance statistics, incuding standard errors of
the estimates and t statistics, using calculate_parameter_covariance
.
m.calculate_parameter_covariance()
Then we can review the results in a variety of report tables.
m.parameter_summary()
Value | Std Err | t Stat | Signif | Null Value | |
---|---|---|---|---|---|
ASC_Bike | -0.258 | 1.34 | -0.19 | 0.00 | |
ASC_SR | 1.42 | 1.00 | 1.42 | 0.00 | |
ASC_Transit | 6.75 | 2.06 | 3.27 | ** | 0.00 |
ASC_Walk | 8.62 | 1.14 | 7.57 | *** | 0.00 |
Cost | -0.176 | 0.120 | -1.47 | 0.00 | |
InVehTime | -0.124 | 0.0292 | -4.24 | *** | 0.00 |
LogIncome:Bike | -0.197 | 0.124 | -1.59 | 0.00 | |
LogIncome:SR | -0.194 | 0.135 | -1.43 | 0.00 | |
LogIncome:Transit | -0.557 | 0.169 | -3.29 | *** | 0.00 |
LogIncome:Walk | -0.523 | 0.100 | -5.21 | *** | 0.00 |
Mu:Car | 0.259 | 0.181 | -4.10 | *** | 1.00 |
Mu:Motor | 0.802 | 0.201 | -0.99 | 1.00 | |
Mu:NonMotor | 0.854 | 0.112 | -1.30 | 1.00 | |
NonMotorTime | -0.266 | 0.0163 | -16.29 | *** | 0.00 |
OutVehTime | -0.255 | 0.0646 | -3.95 | *** | 0.00 |
m.estimation_statistics()
Statistic | Aggregate | Per Case |
---|---|---|
Number of Cases | 7564 | |
Log Likelihood at Convergence | -3493.04 | -0.46 |
Log Likelihood at Null Parameters | -10644.66 | -1.41 |
Rho Squared w.r.t. Null Parameters | 0.672 |
Save and Report Model¶
report = lx.Reporter(title=m.title)
report.append('# Parameter Summary')
report.append(m.parameter_summary())
report
Parameter Summary
Value | Std Err | t Stat | Signif | Null Value | |
---|---|---|---|---|---|
ASC_Bike | -0.258 | 1.34 | -0.19 | 0.00 | |
ASC_SR | 1.42 | 1.00 | 1.42 | 0.00 | |
ASC_Transit | 6.75 | 2.06 | 3.27 | ** | 0.00 |
ASC_Walk | 8.62 | 1.14 | 7.57 | *** | 0.00 |
Cost | -0.176 | 0.120 | -1.47 | 0.00 | |
InVehTime | -0.124 | 0.0292 | -4.24 | *** | 0.00 |
LogIncome:Bike | -0.197 | 0.124 | -1.59 | 0.00 | |
LogIncome:SR | -0.194 | 0.135 | -1.43 | 0.00 | |
LogIncome:Transit | -0.557 | 0.169 | -3.29 | *** | 0.00 |
LogIncome:Walk | -0.523 | 0.100 | -5.21 | *** | 0.00 |
Mu:Car | 0.259 | 0.181 | -4.10 | *** | 1.00 |
Mu:Motor | 0.802 | 0.201 | -0.99 | 1.00 | |
Mu:NonMotor | 0.854 | 0.112 | -1.30 | 1.00 | |
NonMotorTime | -0.266 | 0.0163 | -16.29 | *** | 0.00 |
OutVehTime | -0.255 | 0.0646 | -3.95 | *** | 0.00 |
report << "# Estimation Statistics" << m.estimation_statistics()
report << "# Utility Functions" << m.utility_functions()
Utility Functions
alt | formula |
---|---|
1 | + P.InVehTime * X.AUTO_TIME + P.Cost * X.AUTO_COST |
2 | + P.ASC_SR + P.InVehTime * X.AUTO_TIME + P.Cost * 0.5 * X.AUTO_COST + P('LogIncome:SR') * X('log(INCOME)') |
3 | + P.ASC_Walk + P.NonMotorTime * X.WALK_TIME + P('LogIncome:Walk') * X('log(INCOME)') |
4 | + P.ASC_Bike + P.NonMotorTime * X.BIKE_TIME + P('LogIncome:Bike') * X('log(INCOME)') |
5 | + P.ASC_Transit + P.InVehTime * X.TRANSIT_IVTT + P.OutVehTime * X.TRANSIT_OVTT + P.Cost * X.TRANSIT_FARE + P('LogIncome:Transit') * X('log(INCOME)') |
report.save(
'exampville_mode_choice.html',
overwrite=True,
metadata=m,
)
'exampville_mode_choice.html'