Using Larch within Scikit-LearnΒΆ

Larch is (mostly) compatible with the scikit-learn stucture for machine learning. Within this structure, the larch.Model object can be used as an estimator and a predictor.

[1]:
import larch
import pandas

from larch import PX, P, X
[2]:
from larch.data_warehouse import example_file
df = pandas.read_csv(example_file("MTCwork.csv.gz"))
df.set_index(['casenum','altnum'], inplace=True, drop=False)
[3]:
m = larch.Model()

m.utility_ca = (
    PX('tottime')
    + PX('totcost')
    + sum(P(f'ASC_{i}') * X(f'altnum=={i}') for i in [2,3,4,5,6])
    + sum(P(f'HHINC#{i}') * X(f'(altnum=={i})*hhinc') for i in [2,3,4,5,6])
)

Because the larch.Model object is an estimator, if offers a fit method to estimate the fitted (likelihood maximizing) parameters. This method for model estimation takes a plain old pandas.DataFrame as the X input. Because this is a regular DataFrame, the data does not internally identify which column[s] contain the observed choice values, so that data must be explictly identified in the method call:

[4]:
m.fit(df, y=df.chose)

Iteration 010 [Converged]

LL = -3626.186255512929

value initvalue nullvalue minimum maximum holdfast note best
ASC_2 -2.178014 0.0 0.0 -inf inf 0 -2.178014
ASC_3 -3.725078 0.0 0.0 -inf inf 0 -3.725078
ASC_4 -0.670861 0.0 0.0 -inf inf 0 -0.670861
ASC_5 -2.376328 0.0 0.0 -inf inf 0 -2.376328
ASC_6 -0.206775 0.0 0.0 -inf inf 0 -0.206775
HHINC#2 -0.002170 0.0 0.0 -inf inf 0 -0.002170
HHINC#3 0.000358 0.0 0.0 -inf inf 0 0.000358
HHINC#4 -0.005286 0.0 0.0 -inf inf 0 -0.005286
HHINC#5 -0.012808 0.0 0.0 -inf inf 0 -0.012808
HHINC#6 -0.009686 0.0 0.0 -inf inf 0 -0.009686
totcost -0.004920 0.0 0.0 -inf inf 0 -0.004920
tottime -0.051342 0.0 0.0 -inf inf 0 -0.051342
[4]:
<larch.Model (MNL)>

Unlike most scikit-learn estimators, the fit method cannot accept a numpy ndarray, because Larch needs the column names to be able to match up the data to the pre-defined utility function.

[5]:
proba = m.predict_proba(df)
proba.head(10)
[5]:
   altnum
0  1         0.817458
   2         0.077710
   3         0.017906
   4         0.071428
   5         0.015497
1  1         0.336928
   2         0.074339
   3         0.052072
   4         0.498117
   5         0.038545
dtype: float64
[6]:
score = m.score(df, y=df.chose)
score
[6]:
-0.7210551313408093
[7]:
score * m.dataframes.n_cases
[7]:
-3626.18625551293