predictive analytics for transportation in a high dimensional heterogeneous data world

37
Predictive Analytics for Transportation in a High Dimensional Heterogeneous Data World Dr. Chandra Bhat Center for Transportation Research The University of Texas at Austin Acknowledgments: D-STOP, Humboldt Award, Dr. Ram Pendyala, Dr. Kostas Goulias, all my graduate/undergraduate students

Upload: center-for-transportation-research-ut-austin

Post on 21-Feb-2017

281 views

Category:

Technology


1 download

TRANSCRIPT

Page 1: Predictive Analytics for Transportation in a High Dimensional Heterogeneous Data World

Predictive Analytics for Transportation in a High Dimensional

Heterogeneous Data World

Dr. Chandra BhatCenter for Transportation Research

The University of Texas at Austin

Acknowledgments: D-STOP, Humboldt Award, Dr. Ram Pendyala, Dr. Kostas Goulias, all my graduate/undergraduate students

Page 2: Predictive Analytics for Transportation in a High Dimensional Heterogeneous Data World

World of high dimensional heterogeneous data

• Providing accurate traffic information is becoming an imperative

• Cameras, GPS, cell phone tracking, and probe vehicles are used to supplement the information provided by conventional measurement systems.

• Methodologies to combine and aggregate high dimensional heterogeneous data are needed

Page 3: Predictive Analytics for Transportation in a High Dimensional Heterogeneous Data World

Connected/Automated Vehicles (CAVs) and big data

• The car of the near future will be a part of a gigantic data-collection engine.

• Vehicles have embedded computers, GPS receivers, short-range wireless network interfaces, in-car sensors, cameras, and internet.

• Vehicles interact with Roadside wireless sensor networks, passenger’s wireless devices, and other cars.

Page 4: Predictive Analytics for Transportation in a High Dimensional Heterogeneous Data World

Data required to keep a CAV safely on the road

• Highly detailed maps information: Shape and elevation of roadways, lane lines, intersections, crosswalks, speed limits, and traffic signals.

• Position, speed and intentions of other vehicles and pedestrians.

• Position, speed and intentions of other road users, such as… jaywalking pedestrians, cars coming out of hidden driveways, a stop sign held up by a crossing guard, and cyclist making signs of riding intent.

Page 5: Predictive Analytics for Transportation in a High Dimensional Heterogeneous Data World

What can be inferred from CAVs and smartphones data

• Where people drive,

• when people drive,

• what route people take,

• where people stop,

• what people put in their car,

• why, how and when people take decisions on the fly and change their activity plan route, and

• detailed crashes data (speed, position, and intention at the moment of the accident).

Page 6: Predictive Analytics for Transportation in a High Dimensional Heterogeneous Data World

Data Science

• Not enough humans to process

• Machine learning, visualization, and advanced computation techniques

• Statistics, social sciences, and domain knowledge

• High-dimensional heterogeneous data

Page 7: Predictive Analytics for Transportation in a High Dimensional Heterogeneous Data World

DOMAIN KNOWLEDGE IN THE CONTEXT OF TRAVEL DEMAND

MODELING

Page 8: Predictive Analytics for Transportation in a High Dimensional Heterogeneous Data World

Exogenous Variable Vector X

Model

• Conceptual/Theoretical/ Methods/Tools and Techniques

• Specification and Definition of Alternatives

Activity-based ParadigmTrip-based Paradigm

Outputs

Page 9: Predictive Analytics for Transportation in a High Dimensional Heterogeneous Data World

Five Pillars of ABM Design

• Based on sound behavioral theory/paradigm

• Computationally feasible and tractable

– Model estimation

– Model implementation

• Optimal use of available data (present and future)

• ABM should be both an Activity-Based Model and an Agent-Based Model

• Sensitive to policy issues and planning applications of interest

Page 10: Predictive Analytics for Transportation in a High Dimensional Heterogeneous Data World

Behavioral Basis of ABM

• Decision hierarchies and choice processes– A variety of behavioral decision structures possible– Virtually all models assume a sequential decision structure

similar to traditional four-step models for computational convenience

• Considerable evidence of simultaneity in behavioral choice mechanisms– Several choices made simultaneously as a lifestyle package

Page 11: Predictive Analytics for Transportation in a High Dimensional Heterogeneous Data World

Behavioral Basis of ABM

• Examples of simultaneous choice packages – Residential location, vehicle ownership, mode to pre-

planned activities (e.g., work)– Activity type, activity duration, and activity timing

(scheduling)

• Behavioral heterogeneity– Differences in choice processes across market segments– Identify market segments both exogenously and

endogenously (latent market segments)

Page 12: Predictive Analytics for Transportation in a High Dimensional Heterogeneous Data World

Time-Space Interactions

Home Work

Activity 1 (Fixed)

Activity 2 (Fixed)Ti

me

Urban Space

1

vHome Activity

A

Activity atLocation A

Activity 1

Activity 2

Page 13: Predictive Analytics for Transportation in a High Dimensional Heterogeneous Data World

Agent Interactions

I have a client meeting today; so I will take the

car

I have to pick up Jane from School and go shopping later; I need the

car.

My meeting is in the morning. I can pick up

Jane from school today. And we can go shopping together in the evening. OK, that sounds

good. I’ll go ahead and take

light rail today to work. See you

later.

Hey, Mom and Dad, don’t forget; you have to drop

me off at Johnny’s house in the

evening today

Don’t worry Jane; we’ll drop you off on the way to the store and pick you up later. Run along now,

you’ll miss the bus.

Page 14: Predictive Analytics for Transportation in a High Dimensional Heterogeneous Data World

Definition of an Activity

• Disaggregate activity purpose definition– Challenge traditional notion of mandatory and discretionary

activities/trips– Movie, ball game, and child’s tennis lesson or soccer game often

have spatial and/or temporal fixity– Characterize activities and trips by level of spatial and temporal

fixity/constraints (besides purpose)– Can be accomplished using concepts of time-space

geography– Automated method to add attributes describing degrees of

freedom according to set of spatial/temporal fixity criteria to activity records in data set

Page 15: Predictive Analytics for Transportation in a High Dimensional Heterogeneous Data World

Central Role of Time Use• Notion of time is central to activity-based modeling

– Explicit modeling of activity durations (daily activity time allocation and individual episode duration)

– Treat time as “continuous” and not as “discrete choice” blocks

• Activity engagement is the focus of attention– Travel patterns are inferred as an outcome of activity participation and

time use decisions– Continuous treatment of time dimension allows explicit consideration of

time constraints on human activities

• Reconcile activity durations with network travel durations (feedback

processes)

Page 16: Predictive Analytics for Transportation in a High Dimensional Heterogeneous Data World

In Summary

• ABM should…– Capture the central role of activities, time, and space

in a continuum – Explicitly recognize constraints and interactions– Represent simultaneity in behavioral choice processes– Account for heterogeneity in behavioral decision

hierarchies – Incorporate feedback processes to facilitate

integration with land use and network models

• SimAGENT does it all and more…

Page 17: Predictive Analytics for Transportation in a High Dimensional Heterogeneous Data World

SIMAGENT (SIMULATOR OF ACTIVITIES, GREENHOUSE GAS EMISSIONS, ENERGY,

NETWORKS, AND TRAVEL):AN OVERVIEW

Page 18: Predictive Analytics for Transportation in a High Dimensional Heterogeneous Data World

SimAGENT

Activity-travel environment

characteristics (base year)

Detailed individual-level socio-

demographics (base year)

Activity-travel

simulator (CEMDAP)

Individual activity-travel

patterns

Link volumes and

speeds

Dynamic Traffic

Assignment (DTA)

Socio-economics,

land-use and transportation

system characteristics

simulator (CEMSELTS)

Socio-demographics and activity-

travel environmentSimAGEN

T

Aggregate socio-

demographics (base year)

Synthetic populatio

n generator (PopGen)

Base Year Inputs

Forecast Year Outputs and GHG Emissions

Prediction

Page 19: Predictive Analytics for Transportation in a High Dimensional Heterogeneous Data World

CEMDAPA COMPREHENSIVE

ECONOMETRIC MICROSIMULATOR OF DAILY ACTIVITY-TRAVEL

PATTERNS

Page 20: Predictive Analytics for Transportation in a High Dimensional Heterogeneous Data World

CEMDAP – The Core ABM in SimAGENT

Socio-Economic Data

PopGen

CEMSELTS

CEMDAP

• Simulates activity schedule and travel characteristics for each individual of the region

• Core module of SimAGENT

• 52 sub-models.

• Developed by UT Austin

Page 21: Predictive Analytics for Transportation in a High Dimensional Heterogeneous Data World

Features of CEMDAP (continued)

• Changes in the activity-travel pattern of one individual in a household may bring about changes in activity-travel patterns of other household members

• MDCEV approach facilitates modeling activity participation at a household level with joint activity participation incorporated in a simple fashion– MDCEV – Multiple Discrete Continuous Extreme Value

econometric choice modeling method

• Includes a model of household vehicle ownership by type and make/model, and primary driver assignment

Page 22: Predictive Analytics for Transportation in a High Dimensional Heterogeneous Data World

INNOVATION: COMPREHENSIVE REPRESENTATION

OF INTRA-HOUSEHOLD INTERACTIONS

Page 23: Predictive Analytics for Transportation in a High Dimensional Heterogeneous Data World

Joint Activities and Household Interactions MDCEV Model

• Most activity based models accommodate activity type choice as a series of models for each individual in the household

• These approaches do not explicitly recognize that activity participation is a collective decision of household members

• MDCEV approach – simple and relatively inexpensive for modeling activity participation at a household level

• SimAGENT now features MDCEV modeling methodology to capture household-level activity participation

Page 24: Predictive Analytics for Transportation in a High Dimensional Heterogeneous Data World

Joint Activities and Interactions MDCEV Model

• Conventional discrete choice frameworks need to generate mutually exclusive alternatives results in an explosion in the number of alternatives

• MDCEV allows us to tackle the problem by considering activity participation as a household decision

• MDCEV offers substantial computational and behavioral advantages– Employ one model to generate activities

– Accommodate substitution/complementarity in activity participation and household member dimensions

Page 25: Predictive Analytics for Transportation in a High Dimensional Heterogeneous Data World

MDCEV ModelP1 P2 P1 P2

None None None

A1 None None

A2 None None

A1 A2 None None

P1 P2 P1 P2

None None A1

A1 None A1

A2 None A1

A1 A2 None A1

P1 P2 P1 P2

None None A2

A1 None A2

A2 None A2

A1 A2 None A2

P1 P2 P1 P2

None None A1 A2

A1 None A1 A2

A2 None A1 A2

A1 A2 None A1 A2

P1 P2 P1 P2

None A1 None

A1 A1 None

A2 A1 None

A1 A2 A1 None

P1 P2 P1 P2

None A1 A1

A1 A1 A1

A2 A1 A1

A1 A2 A1 A1

P1 P2 P1 P2

None A1 A2

A1 A1 A2

A2 A1 A2

A1 A2 A1 A2

P1 P2 P1 P2

None A1 A1 A2

A1 A1 A1 A2

A2 A1 A1 A2

A1 A2 A1 A1 A2

P1 P2 P1 P2

None A2 None

A1 A2 None

A2 A2 None

A1 A2 A2 None

P1 P2 P1 P2

None A2 A1

A1 A2 A1

A2 A2 A1

A1 A2 A2 A1

P1 P2 P1 P2

None A2 A2

A1 A2 A2

A2 A2 A2

A1 A2 A2 A2

P1 P2 P1 P2

None A2 A1 A2

A1 A2 A1 A2

A2 A2 A1 A2

A1 A2 A2 A1 A2

P1 P2 P1 P2

None A1 A2 None

A1 A1 A2 None

A2 A1 A2 None

A1 A2 A1 A2 None

P1 P2 P1 P2

None A1 A2 A1

A1 A1 A2 A1

A2 A1 A2 A1

A1 A2 A1 A2 A1

P1 P2 P1 P2

None A1 A2 A2

A1 A1 A2 A2

A2 A1 A2 A2

A1 A2 A1 A2 A2

P1 P2 P1 P2

None A1 A2 A1 A2

A1 A1 A2 A1 A2

A2 A1 A2 A1 A2

A1 A2 A1 A2 A1 A2

Each box represents an alternative

Page 26: Predictive Analytics for Transportation in a High Dimensional Heterogeneous Data World

MDCEV Model

A1 P1 A1 P2 A1 P1P2

A2 P1 A2 P2 A2 P1P2Each box represents an alternative

None+

Alternatives - Total 7 alternatives versus 64 in traditional case

Page 27: Predictive Analytics for Transportation in a High Dimensional Heterogeneous Data World

INNOVATION: HOUSEHOLD VEHICLE COMPOSITION

AND DRIVER ASSIGNMENT

Page 28: Predictive Analytics for Transportation in a High Dimensional Heterogeneous Data World

Vehicle Type Choice Simulation Component

• Vehicle type choice determines vehicle fleet mix; critical to energy and emissions analysis

• SimAGENT incorporates joint vehicle type choice and primary driver allocation model which jointly determines:– Multiple vehicle holdings

– Body type (Sub-compact, Compact car, Mid-sized car, Large car, Small SUV, Mid-sized SUV, Large SUV, Van, and Pickup)

– Age (Less than 2 years old, 2 to 3 years old, 4 to 5 years old, 6 to 9 years old, 10 to 12 years old, Older than 12 years)

– Make/model and use (miles)

– Primary driver of each vehicle

Page 29: Predictive Analytics for Transportation in a High Dimensional Heterogeneous Data World

Vehicle Holdings and Use

Vehicle Type/Vintage

33 makes/models

21 makes/models

24 makes/models

25 makes/models

7 makes/models

10 makes/models

23 makes/models

19 makes/models

16 makes/models

12 makes/models

13 makes/models

13 makes/models

23 makes/models

15 makes/models

12 makes/models

23 makes/models

12 makes/models

5 makes/models

6 makes/models

15 makes/models

Coupe Old

Sedan Mid-size New

Sedan Mid-size Old

Sedan Compact Old

Sedan Mini/Subcompact New

Sedan Mini/Subcompact Old

Coupe New

Sedan Compact New

Sedan Large Old

Sedan Large New

Minivan Old

Pickup Truck New

SUV New

SUV Old

Hatchback/Station Wagon New

Hatchback/Station Wagon Old

Pickup Truck Old

Van New

Van Old

Minivan New

Non-motorized vehicles

Page 30: Predictive Analytics for Transportation in a High Dimensional Heterogeneous Data World

COMPUTATIONAL TECHNIQUES AND INTEGRATION POTENTIAL

Page 31: Predictive Analytics for Transportation in a High Dimensional Heterogeneous Data World

Portable & Flexible Software Architecture

ODBC

Run-Time Data ObjectsHousehol

dPersonZone Data

LOS Data

Pattern

TourStop

Output Files

Simulation Coordinator

Modeling Modules

…...

Decision to Work Model

Work Start/End Time model

InputDatabas

e

Application Driver

Data Queries

Zone to Zone

Data Coordinator

Page 32: Predictive Analytics for Transportation in a High Dimensional Heterogeneous Data World

Ability to Integrate and Enhance

• Successfully interfaced with

– Multi-period static assignment (the current four-step approach of SCAG)

– TRANSIMS and MATSim (second by second assignment of people and vehicles on networks), and

• Continuous-time evolutionary framework facilitates real-time dynamic integration of ABM and DTA models

Page 33: Predictive Analytics for Transportation in a High Dimensional Heterogeneous Data World

• SimAGENT is successfully implemented in the LA region

• Existing SimAGENT code (CEMDAP, PopGen, CEMSELTS) is open source

• Being implemented currently in the New York region; selected based on behavioral realism and ability to accommodate CAVs

• Elements of system being used for long distance travel modeling by CDOT; UT-Austin working with CDOT

Page 34: Predictive Analytics for Transportation in a High Dimensional Heterogeneous Data World

HIGH DIMENSIONAL HETEROGENEOUS DATA

Page 35: Predictive Analytics for Transportation in a High Dimensional Heterogeneous Data World

Why joint modeling of data is important?• Borrows information on other outcomes

• Able to answer intrinsically multivariate questions, such as the effect of a covariate on a multidimensional outcome

• Obviates the need for multiple tests and facilitates global tests, offering superior testing power and better control of Type 1 error rates

• If some endogenous outcomes are used to explain other endogenous outcomes, and if the outcomes are not modeled jointly, the result can be inconsistent estimation of the effects of one endogenous outcome on another.

• Problem? Mixed data, high-dimensional data

Page 36: Predictive Analytics for Transportation in a High Dimensional Heterogeneous Data World

A Way-Out• The new Spatial Generalized Heterogeneous Data Model (GHDM);

Bhat (2015)

• Correlation across various dimensions (of the dependent variables) are captured using latent constructs.

• Accommodates all possible types of data (dependent variables).

• Dimension of integration is independent of number of latent constructs.

• Bhat’s Maximum Approximate Composite Marginal Likelihood (MACML) estimation approach is used for estimation of GHDM.

Page 37: Predictive Analytics for Transportation in a High Dimensional Heterogeneous Data World

Conceptual diagram of structural relationships in the empirical model