scilabtec 2015 - university of luxembourg

20
University of Luxembourg SCILABTEC 2015, Friday 22 May. ANALYSIS Of Call Detail Records Based on Scilab Foued Melakessou

Upload: scilab-enterprises

Post on 29-Jul-2015

59 views

Category:

Presentations & Public Speaking


2 download

TRANSCRIPT

Page 1: ScilabTEC 2015 - University of Luxembourg

University of LuxembourgSCILABTEC 2015, Friday 22 May.

ANALYSIS Of Call Detail Records Based on ScilabFoued Melakessou

Page 2: ScilabTEC 2015 - University of Luxembourg

SCILABTEC 2015 Analysis of Call Detail Records Based on Scilab

Outline

n  Project Description

n  Problematic n  Big Data Analysis n  Population Mobility Modeling

n  D4D Challenge (2015 Senegal)

n  Scilab Contribution: NARVAL

n  Conclusion & Perspective

Page 3: ScilabTEC 2015 - University of Luxembourg

n  FNR Core Project (April 2014-March 2017)

n  The MAMBA (MultimodAl MoBility Assistance) project intends to propose and validate a multimodal mobility platform that relies on new Internet technologies to interconnect different mobile services with the aim of providing relevant travel advice based on users’ contexts, so as to optimize overall system performance n  Real time traffic conditions n  Status of existing public transport services (e.g., buses, trains) n  User preferences

n  Analysis of large mobility datasets (e.g., mobile phone traces)

n  Influence the itinerary of the users by suggesting new multimodal routes based on their preferences

SCILABTEC 2015 Analysis of Call Detail Records Based on Scilab

MAMBA Project

Page 4: ScilabTEC 2015 - University of Luxembourg

n  Transportation Research: human mobility n  Urban planning n  Traffic forecasting n  Resource management

n  The evaluation of mobility models helps to better design and develop future infrastructure in order to better support the actual demand

n  Lack of tools to monitor the time-resolved location of individuals n  Observation studies n  Forms: population census

n  Call Detail Records (CDRs), generated by mobile phone operators can be used to retrieve mobility patterns of the population under study. n  Extraction of realistic mobility models adapted to the Senegal use case

SCILABTEC 2015 Analysis of Call Detail Records Based on Scilab

Problematic

Page 5: ScilabTEC 2015 - University of Luxembourg

n  D4D Senegal is an innovation challenge open on ICT Big Data for the purposes of societal development n  NETMOB’15, Media Lab, MIT, April 8-10, 2015

n  Sonatel and the Orange Group are making anonymous data, extracted from the mobile network in Senegal, available to international research laboratories

n  5 priority subject matters: health, agriculture, transport/urban planning, energy and national statistics n  Advancing research in the field of Big Data (anonymisation, datamining, visualization and

cross-referencing) n  Involving local stakeholders and guaranteeing benefits in education and development of the

ecosystem of local start-ups n  Advancing in anonymisation techniques to allow sharing of data that is relevant for society

while respecting privacy

SCILABTEC 2015 Analysis of Call Detail Records Based on Scilab

Data 4 Development Challenge: Senegal

Page 6: ScilabTEC 2015 - University of Luxembourg

n  Sonatel/Orange (Société nationale des télécommunications) is the main mobile telecommunication operator

n  Anonymous call patterns of Orange’s mobile phone users in Senegal have been released for the 2015 D4D challenge: phone calls and text exchanges between 9M of Orange customers in 2013

n  1666 Base Stations: Antenna ID, Lat, Lon n  SET1 (53.8GB): Voice and Text traffic (#Call,

TCall and #Text) for each month (antenna-to-antenna traffic on an hourly basis) n  2013-01-01 00,1,1,1,54 n  2013-01-01 00,1,2,1,39

n  SET2 (38.4GB): Fine-grained mobility data on a rolling 2-week basis (trajectories of 300000 randomly sampled users) n  1,2013-01-07 13:10:00,461 n  1,2013-01-07 17:20:00,454

n  SET3 (16.5GB): Coarse-grained mobility data, month by month, at the district level (trajectories of 150000 randomly sampled users) n  37509, 2013-01-29 15:00:00,3 n  84009, 2013-01-14 07:00:00,3

SCILABTEC 2015 Analysis of Call Detail Records Based on Scilab

Orange/Sonatel Call Detail Records

14

16

13

15

12.5

13.5

14.5

15.5

16.5

Latit

ude

Longitude

-17 -16 -15 -14 -13 -12

Region Population

3.14M

0.15M

59

Antenna

8

1 2 3

4

5

6 7

9

13

489#

1110

14

N3

N2

N7

N1N1

N2

N3

N4

N3

N5

N5N6

1 Dakar 8 Saint-Louis

2 Thies 9 Kolda

3 Djourbel 10 Sedhiou

4 Fatick 11 Ziguinchor

5 Louga 12 Kedougou

6 Kaolack 13 Tambacounda

7 Kaffrine 14 Matam

Page 7: ScilabTEC 2015 - University of Luxembourg

n  Human activities heavily impact the behavior of calls and messages emitted and received during each period of the day n  Model the daily traffic profile

supported in each base station (level of aggregation=1h) n  Traffic variable: number of calls

#Calls, duration of calls Tcall and number of messages #Text

n  Outliers removal process at 2σ n  Normal traffic behavior n  Detection of traffic anomalies

(cultural or sport events)

SCILABTEC 2015 Analysis of Call Detail Records Based on Scilab

Daily Profile Model

T

0 2 000 4 000 6 000 8 0000

Voice NS

500

1000

1500

T

#Cal

l

0 2 000 4 000 6 000 8 000

100000

70000

30000

0

TCal

l

T

Voice DS

Text S

0 2 000 4 000 6 000 8 000

#Tex

t

0

2000

4000

6000

8000

0 2 000 4 000 6 000 8 0000

1 000

500

1 500

Voice ND

T

#Cal

l

0 2 000 4 000 6 000 8 000

Voice DD

TCal

l

0

20000

40000

60000

80000

T

Text D

0 2 000 4 000 6 000 8 000

T

0

2000

4000

6000

8000

#Tex

t

Page 8: ScilabTEC 2015 - University of Luxembourg

n  PX(a)={X1X2X3…X24} n  Xi is the expected value of the traffic variable X

collected in the antenna a during the ith one hour time slot

n  E(a)={E1E2…Ee} n  List of extrema computed with a gradient algorithm n  Succession of local minimums and maximums

n  NaN, a statistics and machine learning toolbox for Scilab focusing on data with and without missing values encoded as NaN’s n  Tmax_f: time when the first local maximum Nmax_f occurs (beginning of diurnal activities) n  Tmax_l: time when the last local maximum Nmax_l occurs (end of diurnal activities) n  M (respectively S) is the average traffic

(respectively the standard deviation) of the diurnal activities

n  Ne: number of extrema n  Tmin: time when the global minimum Nmin occurs

n  Classification based on k-Means algorithm to find a set of maximally disjoint clusters (k=4) n  Urban (Class 1) n  Suburban (Class 2 and 3) n  Rural (Class 4)

SCILABTEC 2015 Analysis of Call Detail Records Based on Scilab

K-Means Classification

2 4 6 8 10 12 14 16 18 20 22 240

500

1000

1500

2000

2500

3000

3500

4000

T

N

Tmax_f

First MAX

Page 9: ScilabTEC 2015 - University of Luxembourg

n  First and largest peak centered at 12AM

n  Diurnal activities start at 08AM and finish at 11PM

n  Rural population represents 55% of the total population (14M) n  Highest population density in the area of Dakar, the capital of Senegal

SCILABTEC 2015 Analysis of Call Detail Records Based on Scilab

#Call

Class 1Class 2Class 3

Class 0

Class 4

Class 1

Class 2

Class 3

Class 4

2 4 6 8 10 12 14 16 18 20 22 240

500

1000

1500

2000

2500

3000

3500#Call

T

Page 10: ScilabTEC 2015 - University of Luxembourg

n  High activity during the night (08PM to 02AM)

n  Young population n  The median age is 18.4 years n  60% of the population are less than 24 years old

SCILABTEC 2015 Analysis of Call Detail Records Based on Scilab

#Text

#Text

Class 1

Class 2

Class 3

Class 4

2 4 6 8 10 12 14 16 18 20 22 24

T500

0

1000

1500

2000

2500

3000

3500

4000

4500

5000

5500

6000

6500

Class 1Class 2Class 3

Class 0

Class 4

Page 11: ScilabTEC 2015 - University of Luxembourg

n  Mobile phone activity between the first and last peaks n  Mean n  Standard deviation

n  Local vs Distant calls

SCILABTEC 2015 Analysis of Call Detail Records Based on Scilab

Geographical distribution of Caller/Callee activities (#Call)

Page 12: ScilabTEC 2015 - University of Luxembourg

n  Level of aggregation: 1 Day

n  Detection of local anomalies n  Correlation #Call and Tcall

SCILABTEC 2015 Analysis of Call Detail Records Based on Scilab

Local Traffic Anomaly Detection

50 100 150 200 250 300 350

#Call Antenna 1

0

20

40

Day

0

1000

2000

3000TCall

50 100 150 200 250 300 350 Day

#Text

0

10

20

50 100 150 200 250 300 350 Day

σ2σ

Page 13: ScilabTEC 2015 - University of Luxembourg

n  Level of aggregation: 1 Day n  Number of antennas that present an anomaly the same day

SCILABTEC 2015 Analysis of Call Detail Records Based on Scilab

National Traffic Anomaly Detection

Date (2013) Event 1 January New Year

24 January Mawlid (Prophet birth)

31 March - 1 April Easter

4 April Independence day

9 July - 7 August Ramadan

4 August Laylat Al Qadr

8 August Aid El Fitr

15 August Assomption

15 October Aid El Kebir

13-14 November Tamkharit (Achoura)

15 December to beginning of January Magal of Touba

25 December Christmas

Page 14: ScilabTEC 2015 - University of Luxembourg

n  On the 1st August 2013, the new highway between Dakar and Diamniadio has been inaugurated

SCILABTEC 2015 Analysis of Call Detail Records Based on Scilab

Specific Event Analysis: Highway Inauguration

50 100 150 200 250 300 350 Day0

500

1000#Call Antenna 237

50 100 150 200 250 300 350 Day0

50000

TCall

#Text

50 100 150 200 250 300 350 Day0

500

1000

50 100 150 200 250 300 350 Day0

500

1000#Call Antenna 397

50 100 150 200 250 300 350 Day0

50000

100000TCall

50 100 150 200 250 300 350 Day

#Text2000

1000

0

Page 15: ScilabTEC 2015 - University of Luxembourg

n  67 antennas located near to the new highway

n  Computation of the traffic growth after the inauguration n  #Call: +15% n  TCall: +23% n  #Text: +29%

SCILABTEC 2015 Analysis of Call Detail Records Based on Scilab

Traffic Growth After the Opening of the New Highway

5 10 15 20 25 30 35 40 45 50 55 60-1

-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1

TCall#Call

#Text

Antenna (Highway) Id

Gro

wth

Page 16: ScilabTEC 2015 - University of Luxembourg

n  Maps of primary flows n  Mobility graphs are composed by all interconnections between antennas (respectively

districts) where users moved during the duration of the traffic collection n  Congestion map provides for each antenna A, the number of users that have crossed A

during the duration of the traffic measurement n  Each congestion map presents high values near to urban areas n  Mobility graphs are highly correlated with the road network n  Long links: there is a lack of information about intermediate locations where the users

traveled, if the inter-arrival time between two mobile activities is too large

SCILABTEC 2015 Analysis of Call Detail Records Based on Scilab

Mobility Demand: SET2 and SET3

Page 17: ScilabTEC 2015 - University of Luxembourg

n  Other data sets used in this project n  Type of data: OpenStreetMap n  Perl Scripts (automatic extraction of national boundaries, highway/primary/secondary/

trunk roads)

n  New Scilab module (CDRs analysis) added to the NARVAL toolbox n  Complete software environment enabling the understanding of available communication

algorithms, but also the design of new schemes n  Graph Optimization, Topology, Internet Traffic, Routing, Transmission Protocol, Route

Diversity, Mobility, Security, Anonymity, Path Planning, Wireless Sensor Network, etc. n  Target audience: academics, students, engineers and scientists n  http://atoms.scilab.org/toolboxes/NARVAL

SCILABTEC 2015 Analysis of Call Detail Records Based on Scilab

Scilab Contribution

NARVAL

Network Analysis and Routing eVALuation

c

Ver 3.0

Page 18: ScilabTEC 2015 - University of Luxembourg

n  Daily Profile Model n  Characterization of each base station traffic (amount of calls,

duration of calls and amount of text messages) n  Classification: urban, suburban and rural modes n  Correlation between each base station traffic and population

of its covering area

n  Traffic Anomalies Detection n  Analysis of national anomalies n  Analysis of local anomalies, e.g. inauguration of the new

highway between Dakar and Diamniado

n  Mobility graphs n  Computation of mobility flows n  Performance of congestion maps

n  Future Work n  Separation of daily profiles into 7 daily profiles

n  Working day and weekend n  Anomaly detection with an aggregation level of 1h

SCILABTEC 2015 Analysis of Call Detail Records Based on Scilab

Conclusion

Antenna 266: Week ProfilesMondayTuesdayWednesdayThursdayFridaySaturdaySunday

2 4 6 8 10 12 14 16 18 20 22 24Time

0

500

1000

1500

2000

2500

3000

3500

4000

4500

#Ca

ll

Page 19: ScilabTEC 2015 - University of Luxembourg

n  A global open innovation initiative for climate resilience

n  Now is the time to use big data to fight climate change n  World leader will decide on a universal climate

agreement at COP21, potentially the most decisive climate conference in 25 years

n  COP21 (Paris, 2015, Nov 30th to Dec 11th) is critical as the impacts of climate change are accelerating, and time to develop solutions is limited n  Achieve a new international agreement on

the climate, applicable to all countries, with the aim of keeping global warming below 2°C

n  Analyzing anonymised, aggregated data from digital sources such as mobile phones and bank transactions can provide valuable insights on human behavior and climate risks.

SCILABTEC 2015 Analysis of Call Detail Records Based on Scilab

D4C DATA For CLIMATE ACTION CHALLENGE 2015

Page 20: ScilabTEC 2015 - University of Luxembourg

SCILABTEC 2015 Analysis of Call Detail Records Based on Scilab

Questions

n  Thank You!

n  Acknowledgement n  This Data was made available by ORANGE/SONATEL within the framework of the D4D

Challenge. The author would like to thank Orange and SONATEL for the availability of these CDR’s datasets.

n  The author would like to thank the National Research Fund of Luxembourg (FNR) for providing financial support through the CORE 2013 MAMBA project (C13/IS/5825301)

n  Contact n  [email protected]