203: Exampville Destination Choice

Welcome to Exampville, the best simulated town in this here part of the internet!

Exampville is a demonstration provided with Larch that walks through some of the data and tools that a transportation planner might use when building a travel model.

import larch.numba as lx
from larch import P, X

In this example notebook, we will walk through the estimation of a tour destination choice model. First, let’s load the data files from our example.

hh, pp, tour, skims, emp = lx.example(200, ['hh', 'pp', 'tour', 'skims', 'emp'])

For this destination choice model, we’ll want to use the mode choice logsums we calculated previously from the mode choice estimation, but we’ll use these values as fixed input data instead of a modeled value. We can load these logsums from the file in which they were saved. For this example, we can indentify that file using the larch.example function, which will automatically rebuild the file if it doesn’t exists. In typical applications, a user would generally just give the filename as a string and ensure manually that the file exists before loading it.

logsums_file = lx.example(202, output_file='/tmp/logsums.zarr.zip')
logsums = lx.DataArray.from_zarr('/tmp/logsums.zarr.zip')
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Input In [4], in <module>
----> 1 logsums_file = lx.example(202, output_file='/tmp/logsums.zarr.zip')
      2 logsums = lx.DataArray.from_zarr('/tmp/logsums.zarr.zip')

File ~/work/larch/larch/larch/larch/numba/__init__.py:15, in example(*args, **kwargs)
     13 import importlib
     14 kwargs['larch'] = importlib.import_module(__name__)
---> 15 return _example(*args, **kwargs)

File ~/work/larch/larch/larch/larch/examples/__init__.py:163, in example(n, extract, echo, output_file, larch, legacy)
    161 if os.path.exists(output_file):
    162 	return output_file
--> 163 _exec_example_n(n, extract=None, echo=echo, larch=larch, legacy=legacy)
    164 if os.path.exists(output_file):
    165 	return output_file

File ~/work/larch/larch/larch/larch/examples/__init__.py:135, in _exec_example_n(n, echo, larch, legacy, *arg, **kwarg)
    133 def _exec_example_n(n, *arg, echo=False, larch=None, legacy=False, **kwarg):
    134 	if not legacy and n in ex:
--> 135 		return ex[n](*arg, **kwarg)
    136 	f = os.path.join(exampledocdir, get_examplefile(n))
    137 	return _exec_example(f, *arg, echo=echo, larch=larch, **kwarg)

File ~/work/larch/larch/larch/larch/examples/generated/_202_exville_mc_logsums.py:11, in example(extract, estimate)
      7 import larch.numba as lx
      9 hh, pp, tour, skims = lx.example(200, ['hh', 'pp', 'tour', 'skims'])
---> 11 exampville_mode_choice_file = lx.example(201, output_file='/tmp/exampville_mode_choice.html')
     12 m = lx.read_metadata(exampville_mode_choice_file)
     14 Mode = Dict(
     15     DA = 1,
     16     SR = 2,
   (...)
     19     Transit = 5,
     20 ).freeze()

File ~/work/larch/larch/larch/larch/numba/__init__.py:15, in example(*args, **kwargs)
     13 import importlib
     14 kwargs['larch'] = importlib.import_module(__name__)
---> 15 return _example(*args, **kwargs)

File ~/work/larch/larch/larch/larch/examples/__init__.py:163, in example(n, extract, echo, output_file, larch, legacy)
    161 if os.path.exists(output_file):
    162 	return output_file
--> 163 _exec_example_n(n, extract=None, echo=echo, larch=larch, legacy=legacy)
    164 if os.path.exists(output_file):
    165 	return output_file

File ~/work/larch/larch/larch/larch/examples/__init__.py:135, in _exec_example_n(n, echo, larch, legacy, *arg, **kwarg)
    133 def _exec_example_n(n, *arg, echo=False, larch=None, legacy=False, **kwarg):
    134 	if not legacy and n in ex:
--> 135 		return ex[n](*arg, **kwarg)
    136 	f = os.path.join(exampledocdir, get_examplefile(n))
    137 	return _exec_example(f, *arg, echo=echo, larch=larch, **kwarg)

File ~/work/larch/larch/larch/larch/examples/generated/_201_exville_mode_choice.py:44, in example(extract, estimate)
     41 m = lx.Model(datatree = dt_work)
     42 m.title = "Exampville Work Tour Mode Choice v1"
---> 44 m.utility_co[Mode.DA] = (
     45         + P.InVehTime * X.AUTO_TIME
     46         + P.Cost * X.AUTO_COST # dollars per mile
     47 )
     49 m.utility_co[Mode.SR] = (
     50         + P.ASC_SR
     51         + P.InVehTime * X.AUTO_TIME
     52         + P.Cost * (X.AUTO_COST * 0.5) # dollars per mile, half share
     53         + P("LogIncome:SR") * X("log(INCOME)")
     54 )
     56 m.utility_co[Mode.Walk] = (
     57         + P.ASC_Walk
     58         + P.NonMotorTime * X.WALK_TIME
     59         + P("LogIncome:Walk") * X("log(INCOME)")
     60 )

AttributeError: 'NoneType' object has no attribute 'DA'

Preprocessing

The alternatives in the destinations model are much more regular than in the mode choice model, as every observation will have a similar set of alternatives and the utility function for each of those alternatives will share a common functional form. We’ll leverage this by using idca format arrays in our DataTree to make data management simpler.

The base array we’ll start with will have two dimensions, cases and alternatives, anc be formed from the logsums we loaded above.

ca = lx.Dataset(
    {'logsum': logsums},
    caseid='TOURID',
    alts=skims.TAZ_ID,
)
ca
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Input In [5], in <module>
      1 ca = lx.Dataset(
----> 2     {'logsum': logsums},
      3     caseid='TOURID',
      4     alts=skims.TAZ_ID,
      5 )
      6 ca

NameError: name 'logsums' is not defined

For our destination choice model, we’ll also want to use employment data. This data, as included in our example, has unique values only by alternative and not by caseid, so there are only 40 unique rows. (This kind of structure is common for destination choice models.)

emp.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 40 entries, 1 to 40
Data columns (total 3 columns):
 #   Column         Non-Null Count  Dtype
---  ------         --------------  -----
 0   NONRETAIL_EMP  40 non-null     int64
 1   RETAIL_EMP     40 non-null     int64
 2   TOTAL_EMP      40 non-null     int64
dtypes: int64(3)
memory usage: 1.2 KB

Then we bundle all our raw data into a DataTree structure, which is used to collect the right data for estimation. The Larch DataTree is a slightly augmented version of the regular sharrow.DataTree.

tree = lx.DataTree(
    base=ca,
    tour=tour.rename_axis(index='TOUR_ID'),
    hh=hh.set_index("HHID"),
    person=pp.set_index('PERSONID'),
    emp=emp,
    skims=lx.Dataset.from_omx(skims),
    relationships=(
        "base.TAZ_ID @ emp.TAZ",
        "base.TOURID -> tour.TOUR_ID",
        "tour.HHID @ hh.HHID",
        "tour.PERSONID @ person.PERSONID",
        "hh.HOMETAZ @ skims.otaz",
        "base.TAZ_ID @ skims.dtaz"
    ),
)
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Input In [7], in <module>
      1 tree = lx.DataTree(
----> 2     base=ca,
      3     tour=tour.rename_axis(index='TOUR_ID'),
      4     hh=hh.set_index("HHID"),
      5     person=pp.set_index('PERSONID'),
      6     emp=emp,
      7     skims=lx.Dataset.from_omx(skims),
      8     relationships=(
      9         "base.TAZ_ID @ emp.TAZ",
     10         "base.TOURID -> tour.TOUR_ID",
     11         "tour.HHID @ hh.HHID",
     12         "tour.PERSONID @ person.PERSONID",
     13         "hh.HOMETAZ @ skims.otaz",
     14         "base.TAZ_ID @ skims.dtaz"
     15     ),
     16 )

NameError: name 'ca' is not defined

Model Definition

Now we can define our choice model, using data from the tree as appropriate.

m = lx.Model(datatree=tree)
m.title = "Exampville Work Tour Destination Choice v1"
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Input In [8], in <module>
----> 1 m = lx.Model(datatree=tree)
      2 m.title = "Exampville Work Tour Destination Choice v1"

NameError: name 'tree' is not defined
m.quantity_ca = (
        + P.EmpRetail_HighInc * X('RETAIL_EMP * (INCOME>50000)')
        + P.EmpNonRetail_HighInc * X('NONRETAIL_EMP') * X("INCOME>50000")
        + P.EmpRetail_LowInc * X('RETAIL_EMP') * X("INCOME<=50000")
        + P.EmpNonRetail_LowInc * X('NONRETAIL_EMP') * X("INCOME<=50000")
)

m.quantity_scale = P.Theta
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Input In [9], in <module>
----> 1 m.quantity_ca = (
      2         + P.EmpRetail_HighInc * X('RETAIL_EMP * (INCOME>50000)')
      3         + P.EmpNonRetail_HighInc * X('NONRETAIL_EMP') * X("INCOME>50000")
      4         + P.EmpRetail_LowInc * X('RETAIL_EMP') * X("INCOME<=50000")
      5         + P.EmpNonRetail_LowInc * X('NONRETAIL_EMP') * X("INCOME<=50000")
      6 )
      8 m.quantity_scale = P.Theta

NameError: name 'm' is not defined
m.utility_ca = (
    + P.logsum * X.logsum
    + P.distance * X.AUTO_DIST
)
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Input In [10], in <module>
----> 1 m.utility_ca = (
      2     + P.logsum * X.logsum
      3     + P.distance * X.AUTO_DIST
      4 )

NameError: name 'm' is not defined
m.choice_co_code = "tour.DTAZ"
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Input In [11], in <module>
----> 1 m.choice_co_code = "tour.DTAZ"

NameError: name 'm' is not defined
m.lock_values(
    EmpRetail_HighInc=0,
    EmpRetail_LowInc=0,
)
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Input In [12], in <module>
----> 1 m.lock_values(
      2     EmpRetail_HighInc=0,
      3     EmpRetail_LowInc=0,
      4 )

NameError: name 'm' is not defined

Model Estimation

m.loglike()
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Input In [13], in <module>
----> 1 m.loglike()

NameError: name 'm' is not defined
m.maximize_loglike()
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Input In [15], in <module>
----> 1 m.maximize_loglike()

NameError: name 'm' is not defined
m.calculate_parameter_covariance()
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Input In [17], in <module>
----> 1 m.calculate_parameter_covariance()

NameError: name 'm' is not defined

Model Visualization

For destination choice and similar type models, it might be beneficial to review the observed and modeled choices, and the relative distribution of these choices across different factors. For example, we would probably want to see the distribution of travel distance. The Model object includes a built-in method to create this kind of visualization.

m.distribution_on_idca_variable('AUTO_DIST')
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Input In [18], in <module>
----> 1 m.distribution_on_idca_variable('AUTO_DIST')

NameError: name 'm' is not defined

The distribution_on_idca_variable has a variety of options, for example to control the number and range of the histogram bins:

m.distribution_on_idca_variable('AUTO_DIST', bins=40, range=(0,10))
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Input In [19], in <module>
----> 1 m.distribution_on_idca_variable('AUTO_DIST', bins=40, range=(0,10))

NameError: name 'm' is not defined

Alternatively, the histogram style can be swapped out for a smoothed kernel density function:

m.distribution_on_idca_variable(
    'AUTO_DIST',
    style='kde',
)
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Input In [20], in <module>
----> 1 m.distribution_on_idca_variable(
      2     'AUTO_DIST',
      3     style='kde',
      4 )

NameError: name 'm' is not defined

Subsets of the observations can be pulled out, to observe the distribution conditional on other idco factors, like income.

m.distribution_on_idca_variable(
    'AUTO_DIST',
    xlabel="Distance (miles)",
    bins=26,
    subselector='INCOME<10000',
    range=(0,13),
    header='Destination Distance, Very Low Income (<$10k) Households',
)
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Input In [21], in <module>
----> 1 m.distribution_on_idca_variable(
      2     'AUTO_DIST',
      3     xlabel="Distance (miles)",
      4     bins=26,
      5     subselector='INCOME<10000',
      6     range=(0,13),
      7     header='Destination Distance, Very Low Income (<$10k) Households',
      8 )

NameError: name 'm' is not defined

Save and Report Model

report = lx.Reporter(title=m.title)
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Input In [22], in <module>
----> 1 report = lx.Reporter(title=m.title)

NameError: name 'm' is not defined
report << '# Parameter Summary' << m.parameter_summary()
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Input In [23], in <module>
----> 1 report << '# Parameter Summary' << m.parameter_summary()

NameError: name 'report' is not defined
report << "# Estimation Statistics" << m.estimation_statistics()
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Input In [25], in <module>
----> 1 report << "# Estimation Statistics" << m.estimation_statistics()

NameError: name 'report' is not defined
report << "# Utility Functions" << m.utility_functions()
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Input In [26], in <module>
----> 1 report << "# Utility Functions" << m.utility_functions()

NameError: name 'report' is not defined

The figures shown above can also be inserted directly into reports.

figure = m.distribution_on_idca_variable(
    'AUTO_DIST', 
    xlabel="Distance (miles)",
    style='kde',
    header='Destination Distance',
)
report << "# Visualization"
report << figure
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Input In [27], in <module>
----> 1 figure = m.distribution_on_idca_variable(
      2     'AUTO_DIST', 
      3     xlabel="Distance (miles)",
      4     style='kde',
      5     header='Destination Distance',
      6 )
      7 report << "# Visualization"
      8 report << figure

NameError: name 'm' is not defined
report.save(
    '/tmp/exampville_dest_choice.html',
    overwrite=True,
    metadata=m,
)
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Input In [28], in <module>
----> 1 report.save(
      2     '/tmp/exampville_dest_choice.html',
      3     overwrite=True,
      4     metadata=m,
      5 )

NameError: name 'report' is not defined