# Machine Learning

Larch is (mostly) compatible with the [scikit-learn](https://scikit-learn.org) stucture for machine learning.
Within this structure, the larch.Model object can be used as an `estimator`
and as a `predictor`.

Note this page applies to the legacy interface for Larch.
Updates to enable these features for the numba-based version are coming eventually.

## Using Larch within Scikit-Learn

In [None]:
# TEST
from pytest import approx
import numpy as np

In [None]:
import larch
import pandas as pd
from larch import PX, P, X

In [None]:
from larch.data_warehouse import example_file
df = pd.read_csv(example_file("MTCwork.csv.gz"))
df.set_index(['casenum','altnum'], inplace=True, drop=False)

To use the scikit-learn interface, we'll need to define our model
based exclusively on idca or idco format data.  We do so here,
although we don't need to actually connect the model to the data yet.

In [None]:
m = larch.Model()

m.utility_ca = (
    PX('tottime') 
    + PX('totcost') 
    + sum(P(f'ASC_{i}') * X(f'altnum=={i}') for i in [2,3,4,5,6])
    + sum(P(f'HHINC#{i}') * X(f'(altnum=={i})*hhinc') for i in [2,3,4,5,6])
)

Because the larch.Model object is an estimator, if offers a `fit`
method to estimate the fitted (likelihood maximizing) parameters.  This method
for model estimation takes a plain old pandas.DataFrame as the `X` input. Because
this is a regular DataFrame, the data does not internally identify which column[s]
contain the observed choice values, so that data must be explictly identified
in the method call:

In [None]:
m.fit(df, y=df.chose)

In [None]:
# TEST
assert m.pvals == approx(np.array([
       -2.178014e+00, -3.725078e+00, -6.708610e-01, 
       -2.376328e+00, -2.067752e-01, -2.169938e-03,  
       3.577067e-04,  -5.286324e-03, -1.280798e-02, 
       -9.686303e-03, -4.920235e-03, -5.134209e-02]))

Unlike most scikit-learn estimators, the [fit](larch.Model.fit) method cannot
accept a numpy ndarray, because Larch needs the column names to be able 
to match up the data to the pre-defined utility function.  But we can
use the `predict`, `predict_proba` and `score` functions with dataframe inputs.

In [None]:
m.predict(df)

In [None]:
proba = m.predict_proba(df)
proba.head(10)

In [None]:
score = m.score(df, y=df.chose)
score

In [None]:
score * m.dataframes.n_cases

In [None]:
# TEST
assert score * m.dataframes.n_cases == approx(-3626.1862555129305)

## Using Scikit-Learn within Larch

It is also possible to use machine learning methods in a chained model with Larch.
This can be implemented through a "prelearning" step, which builds a predictor
using some other machine learning method, and then adding the result of that 
prediction as an input into the discrete choice model.

**Use this power with great care!** Applying a prelearner can result in over-fitting,
spoil the interpretability of some or all of the model parameters, and create
other challenging problems. Achieving an amazingly good log likelihood is not
necessarily a sign that you have a good model.

In [None]:
import larch.prelearning

In [None]:
dfs = larch.DataFrames(df.drop(columns=['casenum','altnum']), ch='chose', crack=True)

In [None]:
prelearned = larch.prelearning.XGBoostPrelearner(
    dfs,
    ca_columns=['totcost', 'tottime'],
    co_columns=['numveh', 'hhsize', 'hhinc', 'famtype', 'age'],
    eval_metric='logloss',
)

In [None]:
dfs1 = prelearned.apply(dfs)

In [None]:
m = larch.Model(dfs1)

m.utility_ca = (
    PX('tottime') 
    + PX('totcost') 
    + PX('prelearned_utility') 
)
m.utility_co[2] = P("ASC_SR2")  + P("hhinc#2") * X("hhinc")
m.utility_co[3] = P("ASC_SR3P") + P("hhinc#3") * X("hhinc")
m.utility_co[4] = P("ASC_TRAN") + P("hhinc#4") * X("hhinc")
m.utility_co[5] = P("ASC_BIKE") + P("hhinc#5") * X("hhinc")
m.utility_co[6] = P("ASC_WALK") + P("hhinc#6") * X("hhinc")

In [None]:
m.load_data()
m.loglike()

In [None]:
m.maximize_loglike()