17a: Market Segmentatation
Contents
17a: Market Segmentatation¶
For this example, we’re going to re-create the market segmentation versions of model 17 from the Self Instructing Manual. (pp. 133)
Market segmentation can be used to determine whether the impact of other variables is different among population groups. The most common approach to market segmentation is for the analyst to consider sample segments which are mutually exclusive and collectively exhaustive (that is, each case is included in one and only one segment). Models are estimated for the sample associated with each segment and compared to the pooled model (all segments represented by a single model) to determine if there are statistically significant and important differences among the market segments.
In these models, we will segment the market by automobile ownership for households that have one or fewer cars. It is appealing to include a distinct segment for households with no cars since the mode choice behavior of this segment is very different from the rest of the population due to their dependence on non-automobile modes. However, the size of this segment in the dataset is too small, so it is joined with the one car group.
import larch.numba as lx
d = lx.examples.MTC(format='dataset')
We can use the query_cases
method to create two separate datasets for this work,
and then attach these datasets to two models.
d1 = d.dc.query_cases("numveh <= 1")
d2 = d.dc.query_cases("numveh > 1")
m1 = lx.Model(d1, title="Cars<=1")
m2 = lx.Model(d2, title="Cars>=2")
We will then construct the same model stucture in each, using the design worked up for Model 17.
from larch import P, X
for m in (m1,m2):
m.availability_var = 'avail'
m.choice_ca_var = 'chose'
m.utility_ca = (
+ X("totcost/hhinc") * P("costbyincome")
+ X("tottime * (altnum <= 4)") * P("motorized_time")
+ X("tottime * (altnum >= 5)") * P("nonmotorized_time")
+ X("ovtt/dist * (altnum <= 4)") * P("motorized_ovtbydist")
)
for a in [4,5,6]:
m.utility_co[a] += X("hhinc") * P("hhinc#{}".format(a))
for i in d['alt_names'][1:3]:
name = str(i.values)
a = int(i.altid)
m.utility_co[a] += (
+ X("vehbywrk") * P("vehbywrk_SR")
+ X("wkccbd+wknccbd") * P("wkcbd_"+name)
+ X("wkempden") * P("wkempden_"+name)
+ P("ASC_"+name)
)
for i in d['alt_names'][3:]:
name = str(i.values)
a = int(i.altid)
m.utility_co[a] += (
+ X("vehbywrk") * P("vehbywrk_"+name)
+ X("wkccbd+wknccbd") * P("wkcbd_"+name)
+ X("wkempden") * P("wkempden_"+name)
+ P("ASC_"+name)
)
m.ordering = (
('LOS', ".*cost.*", ".*time.*", ".*dist.*",),
('Zonal', "wkcbd.*", "wkempden.*",),
('Household', "hhinc.*", "vehbywrk.*",),
('ASCs', "ASC.*",),
)
Independent Estimation¶
We can estimate these models completely independently.
r1 = m1.maximize_loglike()
r2 = m2.maximize_loglike()
To have t-statistics and \(\rho^2_{\oslash}\) values, we’ll also need to prepare those for each model.
for m in (m1,m2):
m.calculate_parameter_covariance()
m.loglike_null()
We can generate a side-by-side summary of the
two models using the joint_parameter_summary
function.
from larch.util.summary import joint_parameter_summary
joint_parameter_summary([m1,m2])
Cars<=1 | Cars>=2 | ||||
---|---|---|---|---|---|
Param | t-Stat | Param | t-Stat | ||
Category | Parameter | ||||
Parameters | ASC_Bike | 0.977 | 1.39 | -3.22 | -4.39 |
ASC_SR2 | 0.593 | 1.96 | -1.98 | -15.38 | |
ASC_SR3+ | -0.785 | -2.22 | -3.72 | -20.03 | |
ASC_Transit | 2.26 | 5.08 | -2.16 | -5.63 | |
ASC_Walk | 2.91 | 5.17 | -1.53 | -2.67 | |
costbyincome | -0.0227 | -1.64 | -0.0981 | -6.07 | |
hhinc#4 | -0.00645 | -1.81 | 0.000363 | 0.14 | |
hhinc#5 | -0.0116 | -1.23 | -0.00192 | -0.30 | |
hhinc#6 | -0.0120 | -2.01 | 0.000670 | 0.16 | |
motorized_ovtbydist | -0.113 | -4.36 | -0.194 | -5.88 | |
motorized_time | -0.0211 | -3.49 | -0.0187 | -3.64 | |
nonmotorized_time | -0.0440 | -5.43 | -0.0450 | -5.16 | |
vehbywrk_Bike | -2.66 | -4.02 | -0.190 | -0.58 | |
vehbywrk_SR | -3.01 | -8.64 | -0.241 | -3.27 | |
vehbywrk_Transit | -3.96 | -10.55 | -0.240 | -1.77 | |
vehbywrk_Walk | -3.34 | -7.52 | -0.0976 | -0.46 | |
wkcbd_Bike | 0.396 | 0.74 | 0.487 | 0.97 | |
wkcbd_SR2 | 0.372 | 1.55 | 0.163 | 1.09 | |
wkcbd_SR3+ | 0.229 | 0.56 | 1.33 | 6.02 | |
wkcbd_Transit | 1.11 | 4.27 | 1.28 | 5.23 | |
wkcbd_Walk | 0.0297 | 0.08 | 0.111 | 0.29 | |
wkempden_Bike | 0.00151 | 0.80 | 0.00161 | 0.96 | |
wkempden_SR2 | 0.00204 | 2.80 | 0.00107 | 2.22 | |
wkempden_SR3+ | 0.00353 | 3.89 | 0.00134 | 2.45 | |
wkempden_Transit | 0.00316 | 4.73 | 0.00288 | 6.42 | |
wkempden_Walk | 0.00379 | 3.91 | -0.000893 | -0.42 | |
---- | ---- | ---- | ---- | ---- | ---- |
Log Likelihood | Converged | -1,049.28 | -2,296.67 | ||
Null | -1,775.42 | -5,534.18 | |||
Rho Squared | vs Null | 0.4090 | 0.5850 |
Joint Estimation¶
Suppose we think completely independent estimation is inappropriate, and that while these two market segments behave differently with respect to some attributes of the choice, they are also the same on some attributes. In that case, we’ll need to use joint estimation – that is, estimating all the parameters of both models simultaneously.
We’ll assume that any parameter that appears in both models with
the same name is actually the same parameter (with the same value).
As configured so far, that’s all the parameters. But we can edit
one of the models so that the parameter names differ where we
want them to, using reformat_param
.
Suppose we want different parameters for all the ca
level of
service variables. We can change every parameter in the entire
utility_ca
linear function with a simple formatted replacement.
m2.utility_ca = m2.utility_ca.reformat_param('{}_2Cars')
To change just a selection of parameters, there’s a regular
expression approach using different arguments to reformat_param
.
Here, we’ll replace all the parameters with ‘ASC’ in them to
be a segment-specific value.
for a in m2.utility_co:
m2.utility_co[a] = m2.utility_co[a].reformat_param(
pattern='(ASC)', repl='ASC_2Cars'
)
Then, we can put our models together into a ModelGroup
for
joint estimation.
mg = lx.ModelGroup([m1,m2])
The estimation interface for a ModelGroup
is the
same as that for a regular model: first maximize_loglike
,
then calculate_parameter_covariance
if desired. We can also
use many other Model commands, e.g. set_cap
to bound our
parameters to finite values.
mg.set_cap()
rg = mg.maximize_loglike()
mg.calculate_parameter_covariance()
rg
key | value | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
x |
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
loglike | -3406.7232213013217 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
d_loglike |
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
nit | 51 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
nfev | 151 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
njev | 51 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
status | 0 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
message | 'Optimization terminated successfully' | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
success | True | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
elapsed_time | 0:00:03.875481 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
method | 'slsqp' | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
n_cases | 5029 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
iteration_number | 0 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
logloss | 0.6774156335854686 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
__verbose_repr__ | True |
To review the estimation results, we can use ordering
,
parameter_summary
, and estimation_statistics
as usual.
mg.ordering = (
('LOS', ".*cost.*", ".*time.*", ".*dist.*",),
('Zonal', "wkcbd.*", "wkempden.*",),
('Household', "hhinc.*", "vehbywrk.*",),
('ASCs', "ASC.*",),
)
mg.parameter_summary()
Value | Std Err | t Stat | Signif | Null Value | ||
---|---|---|---|---|---|---|
Category | Parameter | |||||
LOS | costbyincome | -0.0174 | 0.0117 | -1.49 | 0.00 | |
costbyincome_2Cars | -0.102 | 0.0154 | -6.60 | *** | 0.00 | |
motorized_time | -0.0196 | 0.00567 | -3.46 | *** | 0.00 | |
motorized_time_2Cars | -0.0195 | 0.00496 | -3.94 | *** | 0.00 | |
nonmotorized_time | -0.0442 | 0.00776 | -5.69 | *** | 0.00 | |
nonmotorized_time_2Cars | -0.0471 | 0.00868 | -5.43 | *** | 0.00 | |
motorized_ovtbydist | -0.100 | 0.0237 | -4.23 | *** | 0.00 | |
motorized_ovtbydist_2Cars | -0.187 | 0.0309 | -6.05 | *** | 0.00 | |
Zonal | wkcbd_Bike | 0.460 | 0.364 | 1.27 | 0.00 | |
wkcbd_SR2 | 0.233 | 0.124 | 1.88 | 0.00 | ||
wkcbd_SR3+ | 1.04 | 0.192 | 5.42 | *** | 0.00 | |
wkcbd_Transit | 1.27 | 0.168 | 7.53 | *** | 0.00 | |
wkcbd_Walk | 0.100 | 0.250 | 0.40 | 0.00 | ||
wkempden_Bike | 0.00137 | 0.00124 | 1.11 | 0.00 | ||
wkempden_SR2 | 0.00132 | 0.000392 | 3.36 | *** | 0.00 | |
wkempden_SR3+ | 0.00184 | 0.000461 | 3.99 | *** | 0.00 | |
wkempden_Transit | 0.00274 | 0.000358 | 7.64 | *** | 0.00 | |
wkempden_Walk | 0.00248 | 0.000730 | 3.39 | *** | 0.00 | |
Household | hhinc#4 | -0.00101 | 0.00208 | -0.48 | 0.00 | |
hhinc#5 | -0.00366 | 0.00527 | -0.70 | 0.00 | ||
hhinc#6 | -0.00232 | 0.00331 | -0.70 | 0.00 | ||
vehbywrk_Bike | -0.255 | 0.290 | -0.88 | 0.00 | ||
vehbywrk_SR | -0.314 | 0.0740 | -4.24 | *** | 0.00 | |
vehbywrk_Transit | -0.638 | 0.132 | -4.82 | *** | 0.00 | |
vehbywrk_Walk | -0.395 | 0.198 | -1.99 | * | 0.00 | |
ASCs | ASC_2Cars_Bike | -2.94 | 0.629 | -4.67 | *** | 0.00 |
ASC_2Cars_SR2 | -1.91 | 0.127 | -15.06 | *** | 0.00 | |
ASC_2Cars_SR3+ | -3.53 | 0.173 | -20.43 | *** | 0.00 | |
ASC_2Cars_Transit | -1.47 | 0.320 | -4.60 | *** | 0.00 | |
ASC_2Cars_Walk | -1.00 | 0.517 | -1.94 | 0.00 | ||
ASC_Bike | -1.39 | 0.439 | -3.17 | ** | 0.00 | |
ASC_SR2 | -1.58 | 0.126 | -12.57 | *** | 0.00 | |
ASC_SR3+ | -3.26 | 0.206 | -15.84 | *** | 0.00 | |
ASC_Transit | -0.747 | 0.271 | -2.76 | ** | 0.00 | |
ASC_Walk | 0.314 | 0.416 | 0.75 | 0.00 |
mg.estimation_statistics()
Statistic | Aggregate | Per Case |
---|---|---|
Number of Cases | 5029 | |
Log Likelihood at Convergence | -3406.72 | -0.68 |
Log Likelihood at Null Parameters | -7309.60 | -1.45 |
Rho Squared w.r.t. Null Parameters | 0.534 |