[2c6]everyplay_big_data

Post on 28-Nov-2014

438 Views

Category:

Technology

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

[2C6]Everyplay_Big_Data

TRANSCRIPT

Tuomas Rinta, Development Director Everyplay / Unity Technologies

FROM BIG DATA TO ACTIONABLE ANALYTICS

So  what  is                                                                                ?  

 

                                                                       and  numbers  

•  Live  in  about  1000  games  across  iOS  and  Android  •  Nearly  100  million  game  sessions  recorded  daily  •  About  2  billion  events  of  usage  data  generated  every  week  

 

 

 Why  do  we  care  about  big  data?  

•  Mobile  games,  especially  free-­‐to-­‐play,  live  and  die  by  their  metrics  

•  Providing  a  service  for  game  developers  must  have    proven  value,  and  each  opFmizaFon  counts  

 

 

So  let’s  talk  about  how  we  use  big  data,  and  how  we  got  

started  

 

Our  goal  “How  do  we  create  a  metrics-­‐driven    product  based  on  big  data?”  

 

 

This  needs  to  be  as  quick  as  possible  

Collect data

Analyze Create

A/B tests

Improve product

 

Challenges  •  We  ship  an  SDK  –  and  normal  update  cycle  by  clients  can  be  as  long  as  6-­‐12  months,  not  very  dynamic  –  This  conflicts  with  the  fast  improvement  cycle  –  Technology  must  adapt  to  supporFng  big  data  

•  The  product  evolves  constantly  –  AnalyFcs  requirements  change  constantly  

 

 

SDK  is  instrumented    to  send  everything    

the  user  does  to  the    servers  

Scribe

Amazon S3

Real-tim

e production system

Batch data processing

Apache Pig

Tackling  evolving  analy;cs  

 

Issues  with  big  data  and  analyFcs  •  AnalyFcs  requirements  change  •  RedshiS  is  based  on  PostgreSQL,  so  there  needs  to  be  a  scheme  –  Schemes  are  the  most  restricFve  factor  with  RedshiS  

•  How  does  that  work  with  evolving  analyFcs?  •  Everything  would  be  easy  if  there  weren’t  billions  of  rows  of  data…  

 

 

How  should  data  be  reported?  •  Choosing  how  the  end-­‐user  instrumentaFon  sends    events  is  crucial  

•  Bad  format  of  events  can  make  analyFcs  from  big  data  nearly  impossible  

•  You  don’t  always  know  before-­‐hand  what  you  need  

 

Two  possible  approaches  Separate events Example of video sharing: openVideoEditor trimButtonPressed undoTrimPressed activateFacecamRecording finishFacecamRecording shareButtonPressed •  More flexible with a schema-

based database •  Requires much more

data processing •  Combining events can be

a hassle

Conversions with properties Example of video sharing: {event: “videoShareComplete”, {properties: [ {didTrimVideo: true}, {isVideoTrimmed: false}, {didUseFacecam: true}, {isFacecamEnabled: true}, {totalDuration: 1241} ] } } •  Problematic with a schema-

based database •  Easier and faster to process •  All relevant data is pre-

aggregated

 

“What  about  Postgre  and  JSON?”  •  Yes,  Postgre  allows  parsing  of  JSON  documents  which  allows  arbitrary  format  of  event  data  

•  However,  when  your  data  gets  big,  this  comes  with  a  warning…  

 Comparing  querying  fields  and  JSON  Normal  query:  select count(*) from events where created > ‘2014-09-01’ and event_type=‘recordSessionClosed’;  Vs.  JSON-­‐based:  select count(*) from events where created > ‘2014-09-01’ and json_extract_path_text(event_json, ‘event_type’) = ‘recordSessionClosed’

Results  

0

200

400

600

800

1000

1200

1400

Normal JSON

Execution time (in seconds)

 

So  what’s  the  best  soluFon?  •  Combining  single-­‐event  sending  with  extra  JSON-­‐  properFes  

•  Querying  the  JSON-­‐properFes  is  slow,  so  we  store  only  informaFon  that  is  not  needed  that  much  there  (drill-­‐down  informaFon)  

 

How  do  we  then  analyse  the  data?  •  Most  on-­‐the-­‐market  soluFons  fell  short  due  to  

– Pricing  – Features  – Availability  

•  Turned  out  to  be  easier  to  “roll  your  own”  

Solving  an  actual  problem    “What  are  the  worst  drop-­‐off  points  for  uploading  a  replay?”  

Tools    •  SQL  •  JavaScript  •  Google  Charts  visualisaFon  library  

Why  JavaScript  for  processing?    •  Dynamic,  fast,  relaFvely  well-­‐known  •  Excellent  libraries  for  data  visualisaFon  

– Highcharts,  Google  Charts,  D3.js,  Dygraph  •  Good  for  visualizing  data,  but  that’s  it  

Keys  to  a  successful  data-­‐driven  product    •  Plan  ahead  for  analyFcs  and  leave  room  for  an  evolving  product  

•  If  metrics  and  analyFcs  are  not  easily  accessible  by  decision  makers,  they  are  worthless  –  self-­‐updaFng  dashboards  are  one  of  the  main  keys  to  success  

•  Build  A/B  tesFng  and  data-­‐driven  behaviour  directly  into  your  product,  don’t  hack  it  on  later  

Thank  you!  Questions, comments? Email: tuomas@unity3d.com Twitter: @trinta developers.everyplay.com

Q&A

THANK YOU

top related