# MTC Work Mode Choice Data

In [None]:
import larch, pandas, os, gzip
larch.__version__

The MTC sample dataset is the same data used in the Self Instructing Manual for discrete choice modeling:

> The San Francisco Bay Area work mode choice data set comprises 5029 home-to-work commute trips in the
> San Francisco Bay Area. The data is drawn from the San Francisco Bay Area Household Travel Survey
> conducted by the Metropolitan Transportation Commission (MTC) in the spring and fall of 1990. This
> survey included a one day travel diary for each household member older than five years and detailed
> individual and household socio-demographic information.

In [None]:
from larch.data_warehouse import example_file

In [None]:
with gzip.open(example_file("MTCwork.csv.gz"), 'rt') as previewfile:
    print(*(next(previewfile) for x in range(10)))

The first line of the file contains column headers. After that, each line represents
an alternative available to a decision maker. In our sample data, we see the first 5
lines of data share a ``caseid`` of 1, indicating that they are 5 different alternatives
available to the first decision maker.  The identity of the alternatives is given by the
number in the column ``altid``. The observed choice of the decision maker is
indicated in the column ``chose`` with a 1 in the appropriate row. 

We can load this data easily using pandas.  We'll also set the index of the resulting DataFrame to
be the case and alt identifiers.



In [None]:
df = pandas.read_csv(example_file("MTCwork.csv.gz"), index_col=['casenum','altnum'])

In [None]:
df.head(15)

In [None]:
df.info()

In [None]:
d = larch.DataFrames(df, ch='chose', crack=True)
d.info()

In [None]:
d.alternative_codes()

In [None]:
d.alternative_names()

The set of all possible alternative codes is deduced automatically from all the values
in the `altid` column.  However, the alterative codes are not very descriptive when
they are set automatically, as the csv data file does not have enough information to
tell what each alternative code number means.

In [None]:
d.set_alternative_names({
    1: 'DA',
    2: 'SR2',
    3: 'SR3+',
    4: 'Transit',
    5: 'Bike',
    6: 'Walk',
})

In [None]:
d.alternative_names()