ieee cig 2016 time series clustering of free-to-play game data

17
Discovering Playing Patterns: Time Series Clustering of Free-To-Play Game Data Alain Saas, Anna Guitart and ´ Africa Peri´ nez (Silicon Studio) IEEE CIG 2016 Santorini 21 September, 2016

Upload: anna-guitart

Post on 20-Mar-2017

47 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Page 1: IEEE CIG 2016 Time Series Clustering of Free-To-Play Game Data

Discovering Playing Patterns:

Time Series Clustering of Free-To-Play Game Data

Alain Saas, Anna Guitart and Africa Perianez (Silicon Studio)

IEEE CIG 2016 Santorini

21 September, 2016

Page 2: IEEE CIG 2016 Time Series Clustering of Free-To-Play Game Data

About us

• Who are we?◦ Game studio and graphics

middleware company based in Tokyo◦ Research project to provide Game

Data Science as a service◦ Goals: predict player behavior, scale

to big data and intuitive resultvisualization

• Which data?◦ RPG free-to-play games◦ TS of two games◦ TS of in-app purchases and activity

behavioral data

2 of 17

Page 3: IEEE CIG 2016 Time Series Clustering of Free-To-Play Game Data

Challenge

Unsupervised clustering of Time Series of player activity

• Why?◦ discover temporary player patterns◦ evaluation of game events and business diagnosis◦ assess common characteristics of players belonging to the same cluster

• How?

1. representation techniques: reducing the high dimensionality of TS2. similarity measures for free-to-play game data3. hierarquical clustering4. visual validation of the results

3 of 17

Page 4: IEEE CIG 2016 Time Series Clustering of Free-To-Play Game Data

Representation methods

Symbolic Aggregate Approximation

Trend Extraction

Discrete Wavelet Transfrom

4 of 17

Page 5: IEEE CIG 2016 Time Series Clustering of Free-To-Play Game Data

Similarity measures

Dynamic Time Warping

DTW (X ,Y ) = minr∈M

(M∑

m=1

|xim − yjm|)

Correlation-based measure

COR(X ,Y ) =

∑Nn=1(xn − X )(yn − Y )√∑N

n=1(xn − X )2√∑N

n=1(yn − Y )2

Temporal Correlation and Raw ValuesBehaviors measure

CORT (X ,Y ) =

∑N−1n=1 (xn+1 − xn)(yn+1 − yn)√∑N−1

n=1 (xn+1 − xn)2√∑N−1

n=1 (yn+1 − yn)2

Complexity-Invariant Distancemeasure

CID(X ,Y ) = dist(X ,Y ) · CF (X ,Y ),

CF complexity correction factor

CF (X ,Y ) =max(CE(X ),CE(Y ))

min(CE(X ),CE(Y ))

CE is the complexity estimation

CE(X ) =

√√√√N−1∑n=1

(xn − xn+1)2

5 of 17

Page 6: IEEE CIG 2016 Time Series Clustering of Free-To-Play Game Data

Similarity measure comparison

Euclidean vs. Correlation Correlation vs. Complexity-Invariant Distance

Dynamic Time Warping vs.Correlation Correlation vs. Discrete Wavelet Transform

6 of 17

Page 7: IEEE CIG 2016 Time Series Clustering of Free-To-Play Game Data

Comparison clustering methods

• DTW Dynamic Time Warping

◦ similar player profiles with ashift on the time axis

◦ different patterns but atdifferent scale

• DWT Discrete Wavelet Transform

◦ dimensionality reduction◦ frequency of the series

• SAX Symbolic Aggregate

Approximation

◦ parameters w,a

• COR Correlation

◦ similar geometric andsynchronous profiles

◦ sensitive to noise data andoutliers

• CORT Temporal Correlation

◦ similar to COR but with timeconsideration?

• CID Complexity-Invariant distance

◦ similar complexity patterns◦ good for sparse time series

• COR+trend Correlation and trend extraction

◦ addresses COR’s sensitivity to noise◦ does not work well with sparse time series

7 of 17

Page 8: IEEE CIG 2016 Time Series Clustering of Free-To-Play Game Data

Hierarchical clustering

Agglomerative Ward method:Lead to a minimum increase of total within-cluster variance

Single LinkageComplete LinkageAverage LinkageCentroid MethodWard Method

8 of 17

Page 9: IEEE CIG 2016 Time Series Clustering of Free-To-Play Game Data

Our data

Time series measured per user per day.

Game ActivityBehavioral data

Time: The amount of time spent in the gameSessions: The total number of playing sessionsActions: The total number of actions performed

In-app Sales Purchase: The total amount of in-app purchases

9 of 17

Page 10: IEEE CIG 2016 Time Series Clustering of Free-To-Play Game Data

Data selection, constraints

Time Series: Multi-dimensional data⇒ selection of period P

• in our data weekly game events

• period P of length 21 days

• played time → active usersmin connections 6/7 days a week

• purchases → paying usersat least one purchase in period P

• players alive during period P

10 of 17

Page 11: IEEE CIG 2016 Time Series Clustering of Free-To-Play Game Data

Datasets and tests

Game Data Technique Clusters Date rangeAge of Ishtaria Daily played time COR-trend 8 Oct2014 - Jan2016Age of Ishtaria Daily purchase CID 5 Oct2014 - Jan2016Grand Sphere Daily played time COR-trend 8 Jun2015 - Mar2016

11 of 17

Page 12: IEEE CIG 2016 Time Series Clustering of Free-To-Play Game Data

Clustering time series of time played

1. representation method: trend extraction

2. similarity measure: correlation

3. hierarchical clustering: Ward method

4. validation of results: visualization withheatmap (raw data)

12 of 17

Page 13: IEEE CIG 2016 Time Series Clustering of Free-To-Play Game Data

Extraction of players characteristics

13 of 17

Page 14: IEEE CIG 2016 Time Series Clustering of Free-To-Play Game Data

Clustering time series of time played

Also able to extract differentiate patterns as in Age of Ishtaria

14 of 17

Page 15: IEEE CIG 2016 Time Series Clustering of Free-To-Play Game Data

Clustering time series of purchases

1. similarity measure:complexity-invariant distance

2. hierarchical clustering: Ward method

3. validation of results: visualization withheatmap (raw data)

15 of 17

Page 16: IEEE CIG 2016 Time Series Clustering of Free-To-Play Game Data

Summary and Next Steps

• Unsupervised clustering time series data from two free-to-playgames

• Evaluate several similarity measures and representation methods

• Extract meaningful behavioral patterns of players

• Assess impact of weekly game events

• Discover hidden playing dynamics regarding purchases and timeplayed

• Feature for churn prediction

• Event recommender

• Cluster level behaviour

16 of 17

Page 17: IEEE CIG 2016 Time Series Clustering of Free-To-Play Game Data

http://www.siliconstudio.co.jp/rd/4front/

Thank you!

17 of 17