17a: Market Segmentatation

17a: Market Segmentatation

For this example, we’re going to re-create the market segmentation versions of model 17 from the Self Instructing Manual. (pp. 133)

Market segmentation can be used to determine whether the impact of other variables is different among population groups. The most common approach to market segmentation is for the analyst to consider sample segments which are mutually exclusive and collectively exhaustive (that is, each case is included in one and only one segment). Models are estimated for the sample associated with each segment and compared to the pooled model (all segments represented by a single model) to determine if there are statistically significant and important differences among the market segments.

In these models, we will segment the market by automobile ownership for households that have one or fewer cars. It is appealing to include a distinct segment for households with no cars since the mode choice behavior of this segment is very different from the rest of the population due to their dependence on non-automobile modes. However, the size of this segment in the dataset is too small, so it is joined with the one car group.

import larch.numba as lx
d = lx.examples.MTC(format='dataset')

We can use the query_cases method to create two separate datasets for this work, and then attach these datasets to two models.

d1 = d.dc.query_cases("numveh <= 1")
d2 = d.dc.query_cases("numveh > 1")
m1 = lx.Model(d1, title="Cars<=1")
m2 = lx.Model(d2, title="Cars>=2")

We will then construct the same model stucture in each, using the design worked up for Model 17.

from larch import P, X

for m in (m1,m2):
    m.availability_var = 'avail'
    m.choice_ca_var = 'chose'
    m.utility_ca = (
        + X("totcost/hhinc") * P("costbyincome")
        + X("tottime * (altnum <= 4)") * P("motorized_time")
        + X("tottime * (altnum >= 5)") * P("nonmotorized_time")
        + X("ovtt/dist * (altnum <= 4)") * P("motorized_ovtbydist")
    )
    for a in [4,5,6]:
        m.utility_co[a] += X("hhinc") * P("hhinc#{}".format(a))
    for i in d['alt_names'][1:3]:
        name = str(i.values)
        a = int(i.altid)
        m.utility_co[a] += (
            + X("vehbywrk") * P("vehbywrk_SR")
            + X("wkccbd+wknccbd") * P("wkcbd_"+name)
            + X("wkempden") * P("wkempden_"+name)
            + P("ASC_"+name)
        )
    for i in d['alt_names'][3:]:
        name = str(i.values)
        a = int(i.altid)
        m.utility_co[a] += (
            + X("vehbywrk") * P("vehbywrk_"+name)
            + X("wkccbd+wknccbd") * P("wkcbd_"+name)
            + X("wkempden") * P("wkempden_"+name)
            + P("ASC_"+name)
        ) 
    m.ordering = (
        ('LOS', ".*cost.*", ".*time.*", ".*dist.*",),
        ('Zonal', "wkcbd.*", "wkempden.*",),
        ('Household', "hhinc.*", "vehbywrk.*",),
        ('ASCs', "ASC.*",),
    )

Independent Estimation

We can estimate these models completely independently.

r1 = m1.maximize_loglike()
r2 = m2.maximize_loglike()

To have t-statistics and \(\rho^2_{\oslash}\) values, we’ll also need to prepare those for each model.

for m in (m1,m2):
    m.calculate_parameter_covariance()
    m.loglike_null()

We can generate a side-by-side summary of the two models using the joint_parameter_summary function.

from larch.util.summary import joint_parameter_summary
joint_parameter_summary([m1,m2])
Cars<=1 Cars>=2
Param t-Stat Param t-Stat
Category Parameter
Parameters ASC_Bike 0.977 1.39 -3.22 -4.39
ASC_SR2 0.593 1.96 -1.98 -15.38
ASC_SR3+ -0.785 -2.22 -3.72 -20.03
ASC_Transit 2.26 5.08 -2.16 -5.63
ASC_Walk 2.91 5.17 -1.53 -2.67
costbyincome -0.0227 -1.64 -0.0981 -6.07
hhinc#4 -0.00645 -1.81 0.000363 0.14
hhinc#5 -0.0116 -1.23 -0.00192 -0.30
hhinc#6 -0.0120 -2.01 0.000670 0.16
motorized_ovtbydist -0.113 -4.36 -0.194 -5.88
motorized_time -0.0211 -3.49 -0.0187 -3.64
nonmotorized_time -0.0440 -5.43 -0.0450 -5.16
vehbywrk_Bike -2.66 -4.02 -0.190 -0.58
vehbywrk_SR -3.01 -8.64 -0.241 -3.27
vehbywrk_Transit -3.96 -10.55 -0.240 -1.77
vehbywrk_Walk -3.34 -7.52 -0.0976 -0.46
wkcbd_Bike 0.396 0.74 0.487 0.97
wkcbd_SR2 0.372 1.55 0.163 1.09
wkcbd_SR3+ 0.229 0.56 1.33 6.02
wkcbd_Transit 1.11 4.27 1.28 5.23
wkcbd_Walk 0.0297 0.08 0.111 0.29
wkempden_Bike 0.00151 0.80 0.00161 0.96
wkempden_SR2 0.00204 2.80 0.00107 2.22
wkempden_SR3+ 0.00353 3.89 0.00134 2.45
wkempden_Transit 0.00316 4.73 0.00288 6.42
wkempden_Walk 0.00379 3.91 -0.000893 -0.42
---- ---- ---- ---- ---- ----
Log Likelihood Converged -1,049.28 -2,296.67
Null -1,775.42 -5,534.18
Rho Squared vs Null 0.4090 0.5850

Joint Estimation

Suppose we think completely independent estimation is inappropriate, and that while these two market segments behave differently with respect to some attributes of the choice, they are also the same on some attributes. In that case, we’ll need to use joint estimation – that is, estimating all the parameters of both models simultaneously.

We’ll assume that any parameter that appears in both models with the same name is actually the same parameter (with the same value). As configured so far, that’s all the parameters. But we can edit one of the models so that the parameter names differ where we want them to, using reformat_param.

Suppose we want different parameters for all the ca level of service variables. We can change every parameter in the entire utility_ca linear function with a simple formatted replacement.

m2.utility_ca = m2.utility_ca.reformat_param('{}_2Cars')

To change just a selection of parameters, there’s a regular expression approach using different arguments to reformat_param. Here, we’ll replace all the parameters with ‘ASC’ in them to be a segment-specific value.

for a in m2.utility_co:
    m2.utility_co[a] = m2.utility_co[a].reformat_param(
        pattern='(ASC)', repl='ASC_2Cars'
    )

Then, we can put our models together into a ModelGroup for joint estimation.

mg = lx.ModelGroup([m1,m2])

The estimation interface for a ModelGroup is the same as that for a regular model: first maximize_loglike, then calculate_parameter_covariance if desired. We can also use many other Model commands, e.g. set_cap to bound our parameters to finite values.

mg.set_cap()
rg = mg.maximize_loglike()
mg.calculate_parameter_covariance()
rg
keyvalue
x
0
ASC_Bike -1.393
ASC_SR2 -1.582
ASC_SR3+ -3.261
ASC_Transit -0.747
ASC_Walk 0.314
costbyincome -0.017
hhinc#4 -0.001
hhinc#5 -0.004
hhinc#6 -0.002
motorized_ovtbydist -0.100
motorized_time -0.020
nonmotorized_time -0.044
vehbywrk_Bike -0.255
vehbywrk_SR -0.314
vehbywrk_Transit -0.638
vehbywrk_Walk -0.395
wkcbd_Bike 0.460
wkcbd_SR2 0.233
wkcbd_SR3+ 1.043
wkcbd_Transit 1.265
wkcbd_Walk 0.100
wkempden_Bike 0.001
wkempden_SR2 0.001
wkempden_SR3+ 0.002
wkempden_Transit 0.003
wkempden_Walk 0.002
ASC_2Cars_Bike -2.937
ASC_2Cars_SR2 -1.908
ASC_2Cars_SR3+ -3.528
ASC_2Cars_Transit -1.471
ASC_2Cars_Walk -1.005
costbyincome_2Cars -0.102
motorized_ovtbydist_2Cars -0.187
motorized_time_2Cars -0.020
nonmotorized_time_2Cars -0.047
loglike-3406.7232213013217
d_loglike
0
ASC_Bike 6.991e-04
ASC_SR2 6.265e-04
ASC_SR3+ -1.153e-03
ASC_Transit -4.816e-04
ASC_Walk 8.153e-04
costbyincome 2.108e-02
hhinc#4 3.037e-01
hhinc#5 -9.055e-02
hhinc#6 -2.147e-01
motorized_ovtbydist -6.585e-03
motorized_time 4.493e-02
nonmotorized_time 8.670e-02
vehbywrk_Bike -1.012e-03
vehbywrk_SR 4.290e-03
vehbywrk_Transit 7.795e-03
vehbywrk_Walk -3.664e-03
wkcbd_Bike 3.149e-04
wkcbd_SR2 2.240e-03
wkcbd_SR3+ -1.559e-03
wkcbd_Transit -2.616e-04
wkcbd_Walk 1.341e-03
wkempden_Bike 1.339e-01
wkempden_SR2 1.158e+00
wkempden_SR3+ -1.707e-01
wkempden_Transit -1.702e+00
wkempden_Walk 7.263e-01
ASC_2Cars_Bike -1.243e-03
ASC_2Cars_SR2 9.230e-04
ASC_2Cars_SR3+ 4.287e-03
ASC_2Cars_Transit 4.326e-03
ASC_2Cars_Walk -2.746e-03
costbyincome_2Cars -9.706e-03
motorized_ovtbydist_2Cars 3.633e-02
motorized_time_2Cars 1.871e-01
nonmotorized_time_2Cars -1.089e-01
nit51
nfev151
njev51
status0
message'Optimization terminated successfully'
successTrue
elapsed_time0:00:03.875481
method'slsqp'
n_cases5029
iteration_number0
logloss0.6774156335854686
__verbose_repr__True

To review the estimation results, we can use ordering, parameter_summary, and estimation_statistics as usual.

mg.ordering = (
    ('LOS', ".*cost.*", ".*time.*", ".*dist.*",),
    ('Zonal', "wkcbd.*", "wkempden.*",),
    ('Household', "hhinc.*", "vehbywrk.*",),
    ('ASCs', "ASC.*",),
)
mg.parameter_summary()
    Value Std Err t Stat Signif Null Value
Category Parameter          
LOS costbyincome -0.0174  0.0117 -1.49 0.00
costbyincome_2Cars -0.102  0.0154 -6.60 *** 0.00
motorized_time -0.0196  0.00567 -3.46 *** 0.00
motorized_time_2Cars -0.0195  0.00496 -3.94 *** 0.00
nonmotorized_time -0.0442  0.00776 -5.69 *** 0.00
nonmotorized_time_2Cars -0.0471  0.00868 -5.43 *** 0.00
motorized_ovtbydist -0.100  0.0237 -4.23 *** 0.00
motorized_ovtbydist_2Cars -0.187  0.0309 -6.05 *** 0.00
Zonal wkcbd_Bike  0.460  0.364  1.27 0.00
wkcbd_SR2  0.233  0.124  1.88 0.00
wkcbd_SR3+  1.04  0.192  5.42 *** 0.00
wkcbd_Transit  1.27  0.168  7.53 *** 0.00
wkcbd_Walk  0.100  0.250  0.40 0.00
wkempden_Bike  0.00137  0.00124  1.11 0.00
wkempden_SR2  0.00132  0.000392  3.36 *** 0.00
wkempden_SR3+  0.00184  0.000461  3.99 *** 0.00
wkempden_Transit  0.00274  0.000358  7.64 *** 0.00
wkempden_Walk  0.00248  0.000730  3.39 *** 0.00
Household hhinc#4 -0.00101  0.00208 -0.48 0.00
hhinc#5 -0.00366  0.00527 -0.70 0.00
hhinc#6 -0.00232  0.00331 -0.70 0.00
vehbywrk_Bike -0.255  0.290 -0.88 0.00
vehbywrk_SR -0.314  0.0740 -4.24 *** 0.00
vehbywrk_Transit -0.638  0.132 -4.82 *** 0.00
vehbywrk_Walk -0.395  0.198 -1.99 * 0.00
ASCs ASC_2Cars_Bike -2.94  0.629 -4.67 *** 0.00
ASC_2Cars_SR2 -1.91  0.127 -15.06 *** 0.00
ASC_2Cars_SR3+ -3.53  0.173 -20.43 *** 0.00
ASC_2Cars_Transit -1.47  0.320 -4.60 *** 0.00
ASC_2Cars_Walk -1.00  0.517 -1.94 0.00
ASC_Bike -1.39  0.439 -3.17 ** 0.00
ASC_SR2 -1.58  0.126 -12.57 *** 0.00
ASC_SR3+ -3.26  0.206 -15.84 *** 0.00
ASC_Transit -0.747  0.271 -2.76 ** 0.00
ASC_Walk  0.314  0.416  0.75 0.00
mg.estimation_statistics()
StatisticAggregatePer Case
Number of Cases5029
Log Likelihood at Convergence-3406.72-0.68
Log Likelihood at Null Parameters-7309.60-1.45
Rho Squared w.r.t. Null Parameters0.534