unsupervised machine learning to analyse city...

29
MINES ParisTech PSL Research University [email protected] UNSUPERVISED MACHINE LEARNING TO ANALYSE CITY LOGISTICS THROUGH TWITTER Simon Tamayo 1 , François Combes 2 , Arthur Gaudron 1 1 Mines ParisTech – PSL Research Univesity, Paris, France 2 IFSTTAR/AME/SPLOTT, Paris, France 11th International Conference on City Logistics 12th – 14th June 2019, Dubrovnik, Croatia - Vision, Technology and Policy - To cite this work: Simon Tamayo, François Combes, Gaudron Arthur. Unsupervised machine learning to analyse city logistics through Twitter. 11th International Conference on City Logistics, Jun 2019, Dubrovnik, Croatia. hal-02156076

Upload: others

Post on 09-Jan-2020

8 views

Category:

Documents


0 download

TRANSCRIPT

MINES ParisTech – PSL Research University [email protected]

UNSUPERVISED MACHINE LEARNING TO ANALYSE CITY LOGISTICS THROUGH TWITTER

Simon Tamayo1, François Combes2, Arthur Gaudron1

1 Mines ParisTech – PSL Research Univesity, Paris, France 2 IFSTTAR/AME/SPLOTT, Paris, France

11th International Conference on City Logistics 12th – 14th June 2019, Dubrovnik, Croatia

- Vision, Technology and Policy -

To cite this work: Simon Tamayo, François Combes, Gaudron Arthur. Unsupervised machine learning to analyse city logistics through Twitter. 11th International Conference on City Logistics, Jun 2019, Dubrovnik, Croatia. ⟨hal-02156076⟩

•  1/ Introduction –  Context

–  Motivation

•  2/ Methodology –  Data collection

–  Dimensional reduction & clustering

–  Sentiment analysis

–  Methodology

•  3/ Results –  Evolution in time and most twitted n-grams

–  Interest map (demo and analysis)

–  Sentiment analysis (over all)

–  Focus on some specific concepts

•  4/ Conclusion and perspectives

OUTLINE

UNSUPERVISED MACHINE LEARNING TO ANALYSE CITY LOGISTICS THROUGH TWITTER

SOCIAL MEDIA MINING

Image credit: Daria Nepriakhina - https://unsplash.com UNSUPERVISED MACHINE LEARNING TO ANALYSE CITY LOGISTICS THROUGH TWITTER

MACHINE LEARNING

Image credit: Franck V - https://unsplash.com UNSUPERVISED MACHINE LEARNING TO ANALYSE CITY LOGISTICS THROUGH TWITTER

•  Observation of City Logistics traditionally uses qualitative observation,

typically relying on professional, technical or academic communities

–  logistics observatories (quantitative surveys)

–  hybrid approaches (statistics on opinions)

•  These approaches have qualities but also limitations…

–  they provide little insight outside their domain of validity

–  academic and professional groups have limited information processing capabilities

–  they can be subject to significant biases

•  Social media mining is an opportunity to complete these protocols. This

paper’s motivation is to explore to what extent…

MOTIVATION

UNSUPERVISED MACHINE LEARNING TO ANALYSE CITY LOGISTICS THROUGH TWITTER

2/ METHODOLOGY

UNSUPERVISED MACHINE LEARNING TO ANALYSE CITY LOGISTICS THROUGH TWITTER

•  Web scraping (111 265 tweets related to City Logistics)

•  Filtering (remove repeated entries)

–  Raw dataset = 111 265 tweets ; filtered data set = 101 349 tweets

•  Cleaning and lemmatization

–  Removing undesired content (such as links, symbols and linking words)

–  Lemmatizing the text inputs (grouping several forms of a word together so they can

be analyzed as a single item)

DATA COLLECTION

UNSUPERVISED MACHINE LEARNING TO ANALYSE CITY LOGISTICS THROUGH TWITTER

Key-term City Logistics Last-mile Logistics Urban Logistics Urban Freight

Nb. of tweets 73 802 (~66%) 21 219 (~19%) 9 721 (~9%) 6 523 (~6%)

INTEREST MAP: DIMENSIONALITY REDUCTION AND CLUSTERING

Image credit: https://ubique.americangeo.org UNSUPERVISED MACHINE LEARNING TO ANALYSE CITY LOGISTICS THROUGH TWITTER

We assume that concepts that are close in terms of interest will occur in similar entries of text. Therefore, the resulting visualization implies that concepts represented by nearby points are similar (i.e. they are often present in the same entries) and distant points represent dissimilar concepts (i.e. rarely seen together).

Intuition

INTEREST MAP: DIMENSIONALITY REDUCTION AND CLUSTERING

UNSUPERVISED MACHINE LEARNING TO ANALYSE CITY LOGISTICS THROUGH TWITTER

Input content is transformed into a features vector in

which the lemmas are grouped by n-grams.

This vector is used to build a sparse matrix that indicates if each feature is present in

each entry).

Vectorization (sparse matrix)

We used Truncated Singular Value

Decomposition to reduce the number of

dimensions.

The resulting matrix is denser and has

continuous values.

Dimensionality reduction (SVD)

K-Means clustering is performed to the data

in order to group features that are

“close” in terms of user interest.

At this point we can

color-code the observations in K

group.

Clustering (K-means)

T-Distributed Stochastic Neighbour Embedding is applied

to the data, which allowed to reveal data

that lie in different manifolds in a two

dimensional space.

The resulting interest map is a 2D scatter

plot.

Manifold learning (T-SNE)

101 349

1

7 115

101 349

7 115

500

7 115

500

7 115

y

x

SENTIMENT ANALYSIS: CLASSIFICATION

Image credit: http://datameetsmedia.com/ UNSUPERVISED MACHINE LEARNING TO ANALYSE CITY LOGISTICS THROUGH TWITTER

•  Sentiment analysis is the procedure in which information is extracted from the

opinions, appraisals and emotions of people.

–  Determine if City Logistics tweets have positive, negative or neutral sentiments.

•  Polarity score (negative vs. positive) of each input is calculated with VADER

(Valence Aware Dictionary and sentiment Reasoner)

–  VADER returns a score in the range -1 to 1, from most negative to most positive.

SENTIMENT ANALYSIS: CLASSIFICATION

UNSUPERVISED MACHINE LEARNING TO ANALYSE CITY LOGISTICS THROUGH TWITTER

Text Score Sentiment

Uber for logistics startup Lalamove raises $30M to expand beyond 100 cities in Asia 0.3182 Positive

Mumbai unrest affecting smooth functioning of the city. Logistics and our orders delayed due to same #MumbaiBandh #kandivali -0.2263 Negative

PROPOSED METHODOLOGY

Polarity scores calculation

Statistical analysis

Sentiment Analysis

Web scraping

Text cleaning and lemmatization

Data collection

UNSUPERVISED MACHINE LEARNING TO ANALYSE CITY LOGISTICS THROUGH TWITTER

Creation of features vector and

sparse matrix

Dimensionality reduction

(SVD)

Clustering (K-means)

Manifold learning (T-SNE)

Interest Map*

* Adapted from the works of (Olson & Neal 2015) and (Kruchten 2014)

3/ RESULTS

UNSUPERVISED MACHINE LEARNING TO ANALYSE CITY LOGISTICS THROUGH TWITTER

NUMBER OF TWEETS PER YEAR

UNSUPERVISED MACHINE LEARNING TO ANALYSE CITY LOGISTICS THROUGH TWITTER

N-GRAMS RANKING

UNSUPERVISED MACHINE LEARNING TO ANALYSE CITY LOGISTICS THROUGH TWITTER

Top 10 unigrams

job 32489

urban 15874

mile 15601

last 15382

cdl 10386

delivery 9813

needed 9344

freight 8756

new 8753

trucking 8665

Top 10 bigrams

last mile 14867

mile logistics 7182

kansa city 6909

city logistics 5864

oklahoma city 5666

lake city 5406

salt lake 5183

job cdl 4598

logistics needed 4464

trucking trucker 4439

Top 10 trigrams

last mile logistics 7077

salt lake city 5143

job cdl logistics 4432

cdl logistics needed 4432

lake city ut 4087

oklahoma city ok 3616

kansa city mo 3329

cdl trucking logistics 3254

last mile delivery 2685

logistics needed flatbed 2385

INTEREST MAP

UNSUPERVISED MACHINE LEARNING TO ANALYSE CITY LOGISTICS THROUGH TWITTER

BLOCKCHAIN & TOKEN

SAME DAY DELIVERY

GREEN

AUSTRALIAN COALITION’S NAURU « TENT SOLUTION »

TRIP CONTROVERSY

CARDINAL LOGISTICS

SIOUX & MEMPHIS

JOBS IN KANSAS

COMMERCIAL DRIVING LICENSE (CDL), FLATBED JOBS

INTERMODAL JOBS

JOBS IN UTAH

OLD NAVY

JOBS IN CALIFORNIA

LOGISTICS MGMT. JOBS

OKLAHOMA

WAREHOUSE

XPO

NEW YORK

PICK UP

BLOCK CHAIN, LAST MILE DELIVERY

URBAN TRANSIT

SELF DRIVING CARS, DELIVERY ROBOTICS,

AI

KEY TECH, IOT

IBM

WALMART

TRUCKING JOBS UTM

VISA GIFT

GAMEINSIGHT’S GAME ‘AIRPORT CITY’

JOBS UPS

RAIL LOGISTICS JOBS

LOGISTICS FORUM INDONESIA

COLLECTION LOGISTICS

HOGAN

UBER LOGISTICS, LALAMOVE

COURIER LOGISTICS

MILE PER GALLON, RAIL FREIGHT

LETSTRANSPORT, INDIA

INDIA START-UPS, HYPER LOCAL LOGISTICS

CHINESE CITY REGULATION

MBA JOBS

ALIBABA

HELLOFRESH

URBAN FREIGHT

LAST MILE LOGISTICS

SOUTH CHINA

LA POSTE

CLARK FREE ZONE, PHL GATEWAY

DELIVERY OFFICER

JOB

DLVR IT

LOGISTICS MANAGER

URBAN LOGISTICS CITY LOGISTICS

NEWS

JOBS IN LONDON

ENGLAND

INDUSTRY

(REGULAR WORDS)

PORT LOGISTICS

TRANSPORTATION

INTRA CITY LOGISTICS

ROAD SAFETY

AIR QUALITY

RIDE HAILING COMPANIES,| URBAN LOGISTICS FABRIC

HEAVY HAUL

PARCEL

NEXPAKK APP

VAN JOBS

FLATBED JOBS

USA

SUPPLY CHAIN

SMART CITY

AMAZON BUFF

INTER CITY

START UP

SEEK JOB

TRUCK DRIVER JOBS

JOBS IN FLORIDA

LOGISTICS PARK

SALT LAKE CITY

ANALYST JOBS

CUSTOMER SEREVICE

VIETNAM

EARTH CITY

ZERO EMISSIONS, ELECTRIC

CITY LIFE

Interactive version avalable at http://chairelogistiqueurbaine.fr/2018/10/15/1072

INTEREST MAP

UNSUPERVISED MACHINE LEARNING TO ANALYSE CITY LOGISTICS THROUGH TWITTER

BLOCKCHAIN & TOKEN

SAME DAY DELIVERY

GREEN

AUSTRALIAN COALITION’S NAURU « TENT SOLUTION »

TRIP CONTROVERSY

CARDINAL LOGISTICS

SIOUX & MEMPHIS

JOBS IN KANSAS

COMMERCIAL DRIVING LICENSE (CDL), FLATBED JOBS

INTERMODAL JOBS

JOBS IN UTAH

OLD NAVY

JOBS IN CALIFORNIA

LOGISTICS MGMT. JOBS

OKLAHOMA

WAREHOUSE

XPO

NEW YORK

PICK UP

BLOCK CHAIN, LAST MILE DELIVERY

URBAN TRANSIT

SELF DRIVING CARS, DELIVERY ROBOTICS,

AI

KEY TECH, IOT

IBM

WALMART

TRUCKING JOBS UTM

VISA GIFT

GAMEINSIGHT’S GAME ‘AIRPORT CITY’

JOBS UPS

RAIL LOGISTICS JOBS

LOGISTICS FORUM INDONESIA

COLLECTION LOGISTICS

HOGAN

UBER LOGISTICS, LALAMOVE

COURIER LOGISTICS

MILE PER GALLON, RAIL FREIGHT

LETSTRANSPORT, INDIA

INDIA START-UPS, HYPER LOCAL LOGISTICS

CHINESE CITY REGULATION

MBA JOBS

ALIBABA

HELLOFRESH

URBAN FREIGHT

LAST MILE LOGISTICS

SOUTH CHINA

LA POSTE

CLARK FREE ZONE, PHL GATEWAY

DELIVERY OFFICER

JOB

DLVR IT

LOGISTICS MANAGER

URBAN LOGISTICS CITY LOGISTICS

NEWS

JOBS IN LONDON

ENGLAND

INDUSTRY

(REGULAR WORDS)

PORT LOGISTICS

TRANSPORTATION

INTRA CITY LOGISTICS

ROAD SAFETY

AIR QUALITY

RIDE HAILING COMPANIES,| URBAN LOGISTICS FABRIC

HEAVY HAUL

PARCEL

NEXPAKK APP

VAN JOBS

FLATBED JOBS

USA

SUPPLY CHAIN

SMART CITY

AMAZON BUFF

INTER CITY

START UP

SEEK JOB

TRUCK DRIVER JOBS

JOBS IN FLORIDA

LOGISTICS PARK

SALT LAKE CITY

ANALYST JOBS

CUSTOMER SEREVICE

VIETNAM

EARTH CITY

ZERO EMISSIONS, ELECTRIC

CITY LIFE

CORE TRENDS AND ISSUES

JOB-RELATED

NEW TECHNOLOGIES START-UPS

ASIA

•  Regulation and policy issues are present, but not easily visible.

–  One can find a rather large range of issues (e.g. road safety, fuel consumption,

sustainability, urban fabric, etc.) and solutions (e.g. training, ICT, urban consolidation

centres, clean vehicles, cargo-bikes, etc.).

•  In contrast, some concepts very much advertised in academic circles, are

almost absent in the corpus (e.g. physical internet; about off-hour deliveries;

synchro-modality).

•  Virtual absence of issues such as labour regulation, or negative local

impacts of urban freight (pollution, noise, etc.).

–  Maybe the corresponding stakeholders are vocal on other forms of social media or

use other keywords than those used in our query.

UNDER-REPRESENTED ISSUES AND/OR BLIND SPOTS

UNSUPERVISED MACHINE LEARNING TO ANALYSE CITY LOGISTICS THROUGH TWITTER

OVER ALL SENTIMENT DISTRIBUTION

UNSUPERVISED MACHINE LEARNING TO ANALYSE CITY LOGISTICS THROUGH TWITTER

SENTIMENT DISTRIBUTION PER YEAR

UNSUPERVISED MACHINE LEARNING TO ANALYSE CITY LOGISTICS THROUGH TWITTER

LOW EMISSIONS ZONE

UNSUPERVISED MACHINE LEARNING TO ANALYSE CITY LOGISTICS THROUGH TWITTER

'LOW EMISSION ZONE’ 'ACCESS REGULATION’

'LEZ’

ELECTRIC VEHICLE

UNSUPERVISED MACHINE LEARNING TO ANALYSE CITY LOGISTICS THROUGH TWITTER

'ELECTRIC CAR’ 'ELECTRIC VEHICLE’

'ELECTRIC TRUCK’ 'ELECTRIC VAN’

'ZERO EMISSION CAR’ 'ZERO EMISSION TRUCK'

URBAN CONSOLIDATION CENTRE

UNSUPERVISED MACHINE LEARNING TO ANALYSE CITY LOGISTICS THROUGH TWITTER

'CONSOLIDATION CENTER’ 'CONSOLIDATION CENTRE’ 'URBAN CONSOLIDATION’

‘UCC’ ’_UCC’ 'UCC_'

CARGO BIKE

UNSUPERVISED MACHINE LEARNING TO ANALYSE CITY LOGISTICS THROUGH TWITTER

'CARGO BIKE’ 'BICYCLE’

'TRICYCLE’ 'FREIGHT BICYCLE’

'CARRIER CYCLE’ 'FREIGHT TRICYCLE’

'CYCLETRUCK’ 'BOX BIKE’

'CARGO TRIKE'

AUTONOMOUS CAR

UNSUPERVISED MACHINE LEARNING TO ANALYSE CITY LOGISTICS THROUGH TWITTER

'AUTONOMOUS CAR’ 'SMART CAR’

'SMART TRUCK’ 'AUTONOMOUS TRUCK’

'AUTONOMOUS VEHICLE’ 'SELF DRIVING’ 'SELF-DRIVING’ 'SELFDRIVING’ 'DRIVER LESS’ 'DRIVERLESS'

4/ CONCLUSION

UNSUPERVISED MACHINE LEARNING TO ANALYSE CITY LOGISTICS THROUGH TWITTER

•  Straight-forward methodology to perform social media mining about City Logistics:

–  Web scraping, dimensionality reduction, clustering and classification.

•  Preferred term is “City Logistics” (as opposed to Last-Mile Logistics, Urban Logistics or Urban Freight).

•  Interest map reveals distinctive clusters:

–  employment; new technologies; start-ups; new forms of organization; Asia. Core issues

are in the centre of the map (quality of life, zero emissions, regulation).

•  The large number of tweets related to employment reveals that the corpus is biased…

–  This analysis does not generalize the vision of the general population about City Logistics.

–  With respect to public policy issues: several topics are present, but they are not prominent;

and some of them are virtually non-existent.

•  Sentiment analysis: the overall view of City Logistics is more positive than negative.

•  Social media mining cannot provide a complete and understandable picture of City

Logistics.

CONCLUSION

UNSUPERVISED MACHINE LEARNING TO ANALYSE CITY LOGISTICS THROUGH TWITTER

•  This exploratory research could only scratch the surface of the topic…

•  A strength of social media analysis is how it can cost-efficiently contribute to

business and technological intelligence, with the risk of missing less-

advertised topics.

•  Regarding dynamics and sentiment analysis, it seems that there is

untapped potential; this clearly requires more work!

•  Open questions:

–  How to measure the biases in the expressed subjects?

–  Are some stakeholders more vocal than others?

–  How reliable is this information?

PERSPECTIVES

UNSUPERVISED MACHINE LEARNING TO ANALYSE CITY LOGISTICS THROUGH TWITTER

ANY QUESTIONS?

THANKS FOR YOUR ATTENTION

UNSUPERVISED MACHINE LEARNING TO ANALYSE CITY LOGISTICS THROUGH TWITTER

[email protected] www.chairelogistiqueurbaine.fr www.mines-paristech.fr

Find us at