200: Exampville Simulated Data

Contents

200: Exampville Simulated Data¶

import pandas as pd
import geopandas as gpd
import larch
larch.__version__

'5.7.0'

Welcome to Exampville, the best simulated town in this here part of the internet!

Exampville is provided with Larch to illustrate the kind of data that a transportation planner might have available when building a travel model. This data is almost entirely fictional.

import larch.exampville

TAZ Shapefile¶

The shapefile that includes a map of the travel analysis zones in Exampville is stored in a zip file.

larch.exampville.files.shapefile

'../../larch/data_warehouse/exampville_taz.zip'

Geopandas can open and read this data directly with unzipping it on disk first, by including the “zip://” protocol in front of the filename.

taz_shape = gpd.read_file("zip://"+larch.exampville.files.shapefile)

taz_shape.plot(edgecolor='k');

../_images/200_exampville_9_0.png

TAZ Employment Data¶

larch.exampville.files.employment

'../../larch/data_warehouse/exampville_employment.csv.gz'

emp = pd.read_csv(larch.exampville.files.employment, index_col='TAZ')

emp.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 40 entries, 1 to 40
Data columns (total 3 columns):
 #   Column         Non-Null Count  Dtype
---  ------         --------------  -----
 0   NONRETAIL_EMP  40 non-null     int64
 1   RETAIL_EMP     40 non-null     int64
 2   TOTAL_EMP      40 non-null     int64
dtypes: int64(3)
memory usage: 1.2 KB

emp.head()

	NONRETAIL_EMP	RETAIL_EMP	TOTAL_EMP
TAZ
1	118	340	458
2	91	0	91
3	197	2	199
4	277	8	285
5	44	13	57

Skims¶

The network skims for Exampville are stored in open matrix format. Larch includes a OMX reader, which also embeds a number of handy tools for processing OMX data into formats useful for discrete choice analysis.

larch.exampville.files.skims

'../../larch/data_warehouse/exampville_skims.omx'

skims = larch.OMX( larch.exampville.files.skims, mode='r' )
skims

<larch.OMX> ⋯/exampville_skims.omx
 |  shape:(40, 40)
 |  data:
 |    AUTO_COST    (float64)
 |    AUTO_DIST    (float64)
 |    AUTO_TIME    (float64)
 |    BIKE_TIME    (float64)
 |    TRANSIT_FARE (float64)
 |    TRANSIT_IVTT (float64)
 |    TRANSIT_OVTT (float64)
 |    WALK_DIST    (float64)
 |    WALK_TIME    (float64)
 |  lookup:
 |    TAZ_AREA_TYPE (40 |S3)
 |    TAZ_ID        (40 int64)

Households¶

larch.exampville.files.hh

'../../larch/data_warehouse/exampville_households.csv.gz'

hh = pd.read_csv( larch.exampville.files.hh )

hh.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5000 entries, 0 to 4999
Data columns (total 13 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   X            5000 non-null   float64
 1   Y            5000 non-null   float64
 2   INCOME       5000 non-null   float64
 3   N_VEHICLES   5000 non-null   int64  
 4   HHSIZE       5000 non-null   int64  
 5   geometry     5000 non-null   object 
 6   HOMETAZ      5000 non-null   int64  
 7   HHID         5000 non-null   int64  
 8   N_TRIPS      5000 non-null   int64  
 9   N_TRIPS_HBW  5000 non-null   int64  
 10  N_TRIPS_HBO  5000 non-null   int64  
 11  N_TRIPS_NHB  5000 non-null   int64  
 12  N_WORKERS    5000 non-null   int64  
dtypes: float64(3), int64(9), object(1)
memory usage: 507.9+ KB

hh.head()

	X	Y	INCOME	N_VEHICLES	HHSIZE	geometry	HOMETAZ	HHID	N_TRIPS	N_TRIPS_HBW	N_TRIPS_HBO	N_TRIPS_NHB	N_WORKERS
0	361.35	33.34	6026.0	1	4	POINT (361.35 33.34)	22	50000	15	6	8	1	3
1	272.08	30.77	30538.0	0	1	POINT (272.08 30.77)	22	50001	4	1	1	2	1
2	424.18	72.55	24131.0	1	1	POINT (424.18 72.55)	22	50002	6	0	4	2	0
3	464.48	61.30	34838.0	1	2	POINT (464.48 61.3)	22	50003	16	3	7	6	2
4	328.95	129.36	29337.0	0	1	POINT (328.95 129.36)	22	50004	0	0	0	0	0

Persons¶

larch.exampville.files.person

'../../larch/data_warehouse/exampville_persons.csv.gz'

pp = pd.read_csv( larch.exampville.files.person )

pp.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 12349 entries, 0 to 12348
Data columns (total 12 columns):
 #   Column         Non-Null Count  Dtype
---  ------         --------------  -----
 0   PERSONID       12349 non-null  int64
 1   HHID           12349 non-null  int64
 2   HHIDX          12349 non-null  int64
 3   AGE            12349 non-null  int64
 4   WORKS          12349 non-null  int64
 5   N_WORK_TOURS   12349 non-null  int64
 6   N_OTHER_TOURS  12349 non-null  int64
 7   N_TOURS        12349 non-null  int64
 8   N_TRIPS        12349 non-null  int64
 9   N_TRIPS_HBW    12349 non-null  int64
 10  N_TRIPS_HBO    12349 non-null  int64
 11  N_TRIPS_NHB    12349 non-null  int64
dtypes: int64(12)
memory usage: 1.1 MB

pp.head()

	PERSONID	HHID	HHIDX	AGE	WORKS	N_WORK_TOURS	N_OTHER_TOURS	N_TOURS	N_TRIPS	N_TRIPS_HBW	N_TRIPS_HBO	N_TRIPS_NHB
0	60000	50000	0	33	1	1	0	1	2	2	0	0
1	60001	50000	0	27	1	1	1	2	4	2	2	0
2	60002	50000	0	31	1	1	1	2	5	2	2	1
3	60003	50000	0	81	0	0	2	2	4	0	4	0
4	60004	50001	1	39	1	1	0	1	4	1	1	2

Tours¶

larch.exampville.files.tour

'../../larch/data_warehouse/exampville_tours.csv.gz'

tour = pd.read_csv( larch.exampville.files.tour )

tour.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 20739 entries, 0 to 20738
Data columns (total 11 columns):
 #   Column       Non-Null Count  Dtype
---  ------       --------------  -----
 0   TOURID       20739 non-null  int64
 1   HHID         20739 non-null  int64
 2   PERSONID     20739 non-null  int64
 3   DTAZ         20739 non-null  int64
 4   TOURMODE     20739 non-null  int64
 5   TOURPURP     20739 non-null  int64
 6   N_STOPS      20739 non-null  int64
 7   N_TRIPS      20739 non-null  int64
 8   N_TRIPS_HBW  20739 non-null  int64
 9   N_TRIPS_HBO  20739 non-null  int64
 10  N_TRIPS_NHB  20739 non-null  int64
dtypes: int64(11)
memory usage: 1.7 MB

tour.head()

	TOURID	HHID	PERSONID	DTAZ	TOURMODE	TOURPURP	N_STOPS	N_TRIPS	N_TRIPS_HBW	N_TRIPS_HBO	N_TRIPS_NHB
0	0	50000	60000	22	1	1	0	2	2	0	0
1	1	50000	60001	4	1	1	0	2	2	0	0
2	2	50000	60001	10	2	2	0	2	0	2	0
3	3	50000	60002	20	1	1	0	2	2	0	0
4	4	50000	60002	20	1	2	1	3	0	2	1

previous

Exampville Simulator

next

201: Exampville Mode Choice