Data Fundamentals

Larch requires data to be structured in one of two basic formats: the case-only (“idco”) format or the case-alternative (“idca”) format. This are commonly referred to as IDCase (each record contains all the information for mode choice over alternatives for a single trip) or IDCase-IDAlt (each record contains all the information for a single alternative available to each decision maker so there is one record for each alternative for each choice).

idco Format

In the idco case-only format, each record provides all the relevant information about an individual choice, including the variables related to the decision maker or the choice itself, as well as alternative related variables for all available alternatives and a variable indicating which alternative was chosen.

Example of data in idco format
caseid Income Alt 1 Time Alt 1 Cost Alt 2 Time Alt 2 Cost Alt 3 Time Alt 3 Cost Chosen Alt
1 30,000 30 150 40 100 20 200 1
2 30,000 25 125 35 100 0 0 2
3 40,000 40 125 50 75 30 175 3
4 50,000 15 225 20 150 10 250 3

idca Format

In the idca case-alternative format, each record can include information on the variables related to the decision maker or the choice itself, the attributes of that particular alternative, and a choice variable that indicates whether the alternative was or was not chosen.

Example of data in idca format
caseid Alt Number Number Of Alts Income Time Cost Chosen
1 1 3 30,000 30 150 1
1 2 3 30,000 40 100 0
1 3 3 30,000 20 200 0
2 1 2 30,000 25 125 0
2 2 2 30,000 35 100 1
3 1 3 40,000 40 125 0
3 2 3 40,000 50 75 0
3 3 3 40,000 30 175 1
4 1 3 50,000 15 225 0
4 2 3 50,000 20 150 0
4 3 3 50,000 10 250 1

Internally, Larch uses two distinct sub-types for data in this format: a dense version labeled as idca and a sparse version labeled as idce. The dense idca version enforces a constraint that every case must have a row for every alternative, even when the alternative is not available and has no meaningful data. In this case, the data storage is no more space efficient than idco format, but the utility function can be expressed in a simpler fashion. The sparse idce version allows such unavailable alternative to be omitted entirely, but it must maintain an separate but associated lookup table to find the first alternative in each case efficiently.

In the abstract, these two data formats are completely interchangable: any data that you can express in idco format, you can also express in idca format, and vice versa. The preference for one format or another is generally for convenience only.

Unlike most other tools for discrete choice analysis, Larch does not demand you employ one or the other of these data formats. You can use either, or both simultaneously, to express utility function data.