201: Exampville Mode Choice

Welcome to Exampville, the best simulated town in this here part of the internet!

Exampville is a demonstration provided with Larch that walks through some of the data and tools that a transportation planner might use when building a travel model.

import os
import numpy as np
import pandas as pd 
import larch.numba as lx
from larch import P, X

In this example notebook, we will walk through the creation of a tour mode choice model.

To begin, we’ll re-load the tours and skims data from the data setup example.

hh, pp, tour, skims = lx.example(200, ['hh', 'pp', 'tour', 'skims'])

Preprocessing

The Exampville data output contains a set of files similar to what we might find for a real travel survey: network skims, and tables of households, persons, and tours. We’ll need to connect these tables together to create a composite dataset for mode choice model estimation, using the DataTree structure.

from addicty import Dict

Mode = Dict(
    DA = 1,
    SR = 2,
    Walk = 3,
    Bike = 4,
    Transit = 5,
).freeze()
tour_dataset = lx.Dataset.construct.from_idco(tour.set_index('TOURID'), alts=Mode)
od_skims = lx.Dataset.construct.from_omx(skims)
dt = lx.DataTree(
    tour=tour_dataset,
    hh=hh.set_index('HHID'),
    person=pp.set_index('PERSONID'),
    od=od_skims,
    do=od_skims,
    relationships=(
        "tours.HHID @ hh.HHID",
        "tours.PERSONID @ person.PERSONID",
        "hh.HOMETAZ @ od.otaz",
        "tours.DTAZ @ od.dtaz",
        "hh.HOMETAZ @ do.dtaz",
        "tours.DTAZ @ do.otaz",
    ),
)

In Exampville, there are only two kinds of trips:

  • work (purpose=1) and

  • non-work (purpose=2).

We want to estimate a mode choice model for work trips, so we’ll begin by excluding all the other trips:

dt_work = dt.query_cases("TOURPURP == 1")

Model Definition

And then we are ready to create our model.

m = lx.Model(datatree = dt_work)
m.title = "Exampville Work Tour Mode Choice v1"

We will explicitly define the set of utility functions we want to use. Because the DataFrames we are using to serve data to this model contains exclusively idco format data, we’ll use only the utility_co mapping to define a unique utility function for each alternative.

m.utility_co[Mode.DA] = (
        + P.InVehTime * X.AUTO_TIME
        + P.Cost * X.AUTO_COST # dollars per mile
)

m.utility_co[Mode.SR] = (
        + P.ASC_SR
        + P.InVehTime * X.AUTO_TIME
        + P.Cost * (X.AUTO_COST * 0.5) # dollars per mile, half share
        + P("LogIncome:SR") * X("log(INCOME)")
)

m.utility_co[Mode.Walk] = (
        + P.ASC_Walk
        + P.NonMotorTime * X.WALK_TIME
        + P("LogIncome:Walk") * X("log(INCOME)")
)

m.utility_co[Mode.Bike] = (
        + P.ASC_Bike
        + P.NonMotorTime * X.BIKE_TIME
        + P("LogIncome:Bike") * X("log(INCOME)")
)

m.utility_co[Mode.Transit] = (
        + P.ASC_Transit
        + P.InVehTime * X.TRANSIT_IVTT
        + P.OutVehTime * X.TRANSIT_OVTT
        + P.Cost * X.TRANSIT_FARE
        + P("LogIncome:Transit") * X('log(INCOME)')
)

To write a nested logit mode, we’ll attach some nesting nodes to the model’s graph. Each new_node allows us to define the set of codes for the child nodes (elemental alternatives, or lower level nests) as well as giving the new nest a name and assigning a logsum parameter. The return value of this method is the node code for the newly created nest, which then can potenially be used as a child code when creating a higher level nest. We do this below, adding the ‘Car’ nest into the ‘Motor’ nest.

Car = m.graph.new_node(parameter='Mu:Car', children=[Mode.DA, Mode.SR], name='Car')
NonMotor = m.graph.new_node(parameter='Mu:NonMotor', children=[Mode.Walk, Mode.Bike], name='NonMotor')
Motor = m.graph.new_node(parameter='Mu:Motor', children=[Car, Mode.Transit], name='Motor')

Let’s visually check on the nesting structure.

m.graph
cluster_elemental 0 0 7 (7) NonMotor Mu:NonMotor 0->7 8 (8) Motor Mu:Motor 0->8 1 (1) DA 2 (2) SR 3 (3) Walk 4 (4) Bike 5 (5) Transit 6 (6) Car Mu:Car 6->1 6->2 7->3 7->4 8->5 8->6

The tour mode choice model’s choice variable is indicated by the code value in ‘TOURMODE’, and this can be defined for the model using choice_co_code.

m.choice_co_code = 'TOURMODE'

We can also give a dictionary of availability conditions based on values in the idco data, using the availability_co_vars attribute. Alternatives that are always available can be indicated by setting the criterion to 1.

m.availability_co_vars = {
    Mode.DA: 'AGE >= 16',
    Mode.SR: 1,
    Mode.Walk: 'WALK_TIME < 60',
    Mode.Bike: 'BIKE_TIME < 60',
    Mode.Transit: 'TRANSIT_FARE>0',
}

Then let’s prepare this data for estimation. Even though the data is already in memory, the load_data method is used to pre-process the data, extracting the required values, pre-computing the values of fixed expressions, and assembling the results into contiguous arrays suitable for computing the log likelihood values efficiently.

Model Estimation

We can check on some important statistics of this loaded data even before we estimate the model.

m.choice_avail_summary()
name chosen available availability condition
1 DA 6052 7564 AGE >= 16
2 SR 810 7564 1
3 Walk 196 4179 WALK_TIME < 60
4 Bike 72 7564 BIKE_TIME < 60
5 Transit 434 4199 TRANSIT_FARE>0
6 Car 6862 7564
7 NonMotor 268 7564
8 Motor 7296 7564
< Total All Alternatives > 7564

If we are satisfied with the statistics we see above, we can go ahead and estimate the model.

m.set_cap(20) # improves optimization stability
result = m.maximize_loglike()

Iteration 038 [Optimization terminated successfully]

Best LL = -3493.0397299489387

value initvalue nullvalue minimum maximum holdfast note best
ASC_Bike -0.258018 0.0 0.0 -20.000 20.0 0 -0.258018
ASC_SR 1.422911 0.0 0.0 -20.000 20.0 0 1.422911
ASC_Transit 6.754511 0.0 0.0 -20.000 20.0 0 6.754511
ASC_Walk 8.621508 0.0 0.0 -20.000 20.0 0 8.621508
Cost -0.175693 0.0 0.0 -20.000 20.0 0 -0.175693
InVehTime -0.123722 0.0 0.0 -20.000 20.0 0 -0.123722
LogIncome:Bike -0.196983 0.0 0.0 -20.000 20.0 0 -0.196983
LogIncome:SR -0.193812 0.0 0.0 -20.000 20.0 0 -0.193812
LogIncome:Transit -0.557151 0.0 0.0 -20.000 20.0 0 -0.557151
LogIncome:Walk -0.522787 0.0 0.0 -20.000 20.0 0 -0.522787
Mu:Car 0.259301 1.0 1.0 0.001 1.0 0 0.259301
Mu:Motor 0.801653 1.0 1.0 0.001 1.0 0 0.801653
Mu:NonMotor 0.853783 1.0 1.0 0.001 1.0 0 0.853783
NonMotorTime -0.265584 0.0 0.0 -20.000 20.0 0 -0.265584
OutVehTime -0.254813 0.0 0.0 -20.000 20.0 0 -0.254813

After we find the best fitting parameters, we can compute some variance-covariance statistics, incuding standard errors of the estimates and t statistics, using calculate_parameter_covariance.

m.calculate_parameter_covariance()

Then we can review the results in a variety of report tables.

m.parameter_summary()
  Value Std Err t Stat Signif Null Value
ASC_Bike -0.258  1.34 -0.19 0.00
ASC_SR  1.42  1.00  1.42 0.00
ASC_Transit  6.75  2.06  3.27 ** 0.00
ASC_Walk  8.62  1.14  7.57 *** 0.00
Cost -0.176  0.120 -1.47 0.00
InVehTime -0.124  0.0292 -4.24 *** 0.00
LogIncome:Bike -0.197  0.124 -1.59 0.00
LogIncome:SR -0.194  0.135 -1.43 0.00
LogIncome:Transit -0.557  0.169 -3.29 *** 0.00
LogIncome:Walk -0.523  0.100 -5.21 *** 0.00
Mu:Car  0.259  0.181 -4.10 *** 1.00
Mu:Motor  0.802  0.201 -0.99 1.00
Mu:NonMotor  0.854  0.112 -1.30 1.00
NonMotorTime -0.266  0.0163 -16.29 *** 0.00
OutVehTime -0.255  0.0646 -3.95 *** 0.00
m.estimation_statistics()
StatisticAggregatePer Case
Number of Cases7564
Log Likelihood at Convergence-3493.04-0.46
Log Likelihood at Null Parameters-10644.66-1.41
Rho Squared w.r.t. Null Parameters0.672

Save and Report Model

report = lx.Reporter(title=m.title)
report.append('# Parameter Summary')
report.append(m.parameter_summary())
report

Parameter Summary

  Value Std Err t Stat Signif Null Value
ASC_Bike -0.258  1.34 -0.19 0.00
ASC_SR  1.42  1.00  1.42 0.00
ASC_Transit  6.75  2.06  3.27 ** 0.00
ASC_Walk  8.62  1.14  7.57 *** 0.00
Cost -0.176  0.120 -1.47 0.00
InVehTime -0.124  0.0292 -4.24 *** 0.00
LogIncome:Bike -0.197  0.124 -1.59 0.00
LogIncome:SR -0.194  0.135 -1.43 0.00
LogIncome:Transit -0.557  0.169 -3.29 *** 0.00
LogIncome:Walk -0.523  0.100 -5.21 *** 0.00
Mu:Car  0.259  0.181 -4.10 *** 1.00
Mu:Motor  0.802  0.201 -0.99 1.00
Mu:NonMotor  0.854  0.112 -1.30 1.00
NonMotorTime -0.266  0.0163 -16.29 *** 0.00
OutVehTime -0.255  0.0646 -3.95 *** 0.00
report << "# Estimation Statistics" << m.estimation_statistics()

Estimation Statistics

StatisticAggregatePer Case
Number of Cases7564
Log Likelihood at Convergence-3493.04-0.46
Log Likelihood at Null Parameters-10644.66-1.41
Rho Squared w.r.t. Null Parameters0.672
report << "# Utility Functions" << m.utility_functions()

Utility Functions

altformula
1
+ P.InVehTime * X.AUTO_TIME
+ P.Cost * X.AUTO_COST
2
+ P.ASC_SR
+ P.InVehTime * X.AUTO_TIME
+ P.Cost * 0.5 * X.AUTO_COST
+ P('LogIncome:SR') * X('log(INCOME)')
3
+ P.ASC_Walk
+ P.NonMotorTime * X.WALK_TIME
+ P('LogIncome:Walk') * X('log(INCOME)')
4
+ P.ASC_Bike
+ P.NonMotorTime * X.BIKE_TIME
+ P('LogIncome:Bike') * X('log(INCOME)')
5
+ P.ASC_Transit
+ P.InVehTime * X.TRANSIT_IVTT
+ P.OutVehTime * X.TRANSIT_OVTT
+ P.Cost * X.TRANSIT_FARE
+ P('LogIncome:Transit') * X('log(INCOME)')
report.save(
    'exampville_mode_choice.html',
    overwrite=True, 
    metadata=m,
)
'exampville_mode_choice.html'