ben carls l_tron

Post on 06-Apr-2017

47 Views

Category:

Data & Analytics

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

@L_Tron_CTA: A Friendly Bot with an Eye on Chicago’s ‘L’Ben Carls

The Chicago Transit Authority (CTA) operates the ‘L’ (elevated)

• Overwhelming amount of data exists for describing the system

• CTA Twitter account is still operated by a person in a control room

• Could we do better?

A Twitter bot sends out timely information

Pulls data from sources

Analyzes data, finds what’s important

Creates sentence and posts it to Twitter

Famous examples

Wealth of structured data exists for the ‘L’ amongst other things in Chicago

Wealth of structured data exists for the ‘L’ amongst other things in Chicago

Wealth of structured data exists for the ‘L’ amongst other things in Chicago

Wealth of structured data exists for the ‘L’ amongst other things in Chicago

Wealth of structured data exists for the ‘L’ amongst other things in Chicago

Wealth of structured data exists for the ‘L’ amongst other things in Chicago

Wealth of structured data exists for the ‘L’ amongst other things in Chicago

Okay! Okay! Most of this is irrelevant! How do I quickly find out what actually matters?

What kinds of events impact train travel and are worth mentioning? Chicago Cubs’ games?

Daily ridership for Addison Stop (Red), right where the Chicago Cubs play

Random forest modeling ridership showed baseball mattered, bot tweets it

Trained on 2011-2013, tested on 2014-2015

Here used day of the week and day of the year as features

Random forest modeling ridership showed baseball mattered, bot tweets it

Trained on 2011-2013, tested on 2014-2015

Here used day of the week, day of the year, and if there was a Cubs game that day as features

‘L’ Tron works 24/7 on an EC2 instance

Find what the person wants

Compare data to timetable and look for delays

Search for other events (e.g. baseball), compare to ridership model

Thread 1: Every 5 minutes

Query CTA server for data via API

Thread 2: Someone talks to ‘L’ Tron

Look for data from Thread 1 to respond with

What should I tweet to my audience?

Find a line delay > 5 minutes?No

Is there a baseball game?Yes

No

Does the system look okay?

NoYes

YesTweet it out!

Tweet it out!

Tweet it out!

Tweet it out!

Following from Thread 1:

Language generation starts with a large, human-written corpus

"[route_name] line trains on their way toward [destination] are running roughly [delay_minutes] [minute_s]late.”"[route_name] line trains on their way toward [destination] have fallen roughly [delay_minutes] [minute_s]behind schedule.”"[route_name] line trains on their way to [destination] are running roughly [delay_minutes] [minute_s] late.”"[destination] headed [route_name] line trains have fallen roughly [delay_minutes] [minute_s] behind schedule.”"[destination] bound [route_name] line trains are running roughly [delay_minutes] [minute_s] behind schedule.”"[destination] bound [route_name] line trains have fallen roughly [delay_minutes] [minute_s] behind schedule.”

Each tweet template is categorized for a particular use case

A template is chosen at random and filled in as needed

"[destination] bound [route name] line trains are running about [delay_minutes] [minute_s] behind schedule."

”O’Hare bound Blue line trains are running about 12 minutes behind schedule."

If a delay of 12 minutes is found on the O’Hare bound Blue line, those details are inserted into the template

‘L’ Tron - CTA is alive and tweeting!

I lived here

I worked here

High-resolution imaging detectors for particles and 3D data visualizations

Looking for the Higgs boson at Fermilab

top related