app store cluster analysis

Upload: afnan-al-subaihin

Post on 07-Jul-2018

222 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/18/2019 App Store Cluster Analysis

    1/41

    Feature Analysis in App Stores Afnan A. AlSubaihin

    Feature Analysis in App Stores

    Afnan A. Al Subaihin

    Supervisors

    Mark Harman Sue Black

    Licia Capra Federica Sarro

    Finding Latent Clustering of Mobile AppsBased on Their Extracted Features

  • 8/18/2019 App Store Cluster Analysis

    2/41

    Feature Analysis in App Stores Afnan A. AlSubaihin

    Motivation

    Related Work

    Clustering Approach

    Future Directions

    Feature Extraction

    technique

    Clustering algorithm and

    Distance Metric

    Clustering Validation

    Feature Representation

    Data Acquisition

  • 8/18/2019 App Store Cluster Analysis

    3/41

    Feature Analysis in App Stores Afnan A. AlSubaihin

    Developer User

    Select ideal category Easily discover apps

    Motivation

  • 8/18/2019 App Store Cluster Analysis

    4/41

    Feature Analysis in App Stores Afnan A. AlSubaihin

    “Sometimes it’s not easy to select a category ifyou app does more than one thing, for example,social music sharing.”

    What Do developers Think?

    Motivation

  • 8/18/2019 App Store Cluster Analysis

    5/41

    Feature Analysis in App Stores Afnan A. AlSubaihin

    Guess the Category

    Motivation

  • 8/18/2019 App Store Cluster Analysis

    6/41

    Feature Analysis in App Stores Afnan A. AlSubaihin

    Does it matter?

    “Of course it does! If I put it in ‘social’ where all the

    ‘big players’ are, I have less chance of being on thetop lists..”

    What Do developers Think?

    Motivation

  • 8/18/2019 App Store Cluster Analysis

    7/41Feature Analysis in App Stores Afnan A. AlSubaihin

    “Categories are not useful in discovery, we try topromote using other means..”

    “Categories are too deep and the ‘big players’ are on

    the top, we don’t have a chance.”

    What Do developers Think?

    Motivation

  • 8/18/2019 App Store Cluster Analysis

    8/41Feature Analysis in App Stores Afnan A. AlSubaihin

    Methods used to find appsSoo Ling Lim et al. Investigating Country Differences in Mobile App User Behavior and Challenges for

    Software Engineering

    Motivation

  • 8/18/2019 App Store Cluster Analysis

    9/41Feature Analysis in App Stores Afnan A. AlSubaihin

    App Store App Store

    Motivation

  • 8/18/2019 App Store Cluster Analysis

    10/41Feature Analysis in App Stores Afnan A. AlSubaihin

    Related Work

    Motivation

    Clustering Approach

    Future Directions

    Feature Extraction

    technique

    Clustering algorithm and

    Distance Metric

    Clustering Validation

    Feature Representation

    Data Acquisition

  • 8/18/2019 App Store Cluster Analysis

    11/41Feature Analysis in App Stores Afnan A. AlSubaihin

    Related Work

  • 8/18/2019 App Store Cluster Analysis

    12/41Feature Analysis in App Stores Afnan A. AlSubaihin

    B. Sanz, I. Santos, C. Laorden, X. Ugarte-Pedrero, and P. G. Bringas, “On the automatic categorisation of androidapplications," in 2012 IEEE Consumer Communications and Networking Conference (CCNC), pp. 149-153, IEEE, Jan.2012.

    S. Vakulenko, O. Muller, and J. Brocke, “Enriching iTunes App Store Categories via Topic Modeling," in Proceedingsof the Thirty Fifth International Conference on Information Systems (ICIS), (Auckland, New Zealand), 2014.

    S. Kawaguchi, P. K. Garg, M. Matsushita, and K. Inoue, “MUDABlue: An automatic categorization system for OpenSource repositories," Journal of Systems and Software, vol. 79, pp. 939-953, July 2006.

    K. Tian, M. Revelle, and D. Poshyvanyk, “Using Latent Dirichlet Allocation for automatic categorization ofsoftware," in 2009 6th IEEE International Working Conference on Mining Software Repositories, pp. 163-166, IEEE,May 2009

    M. Linares-Vasquez, C. McMillan, D. Poshyvanyk, and M. Grechanik, “On using machine learning to automaticallyclassify software applications into domain categories,” Empirical Software Engineering, vol. 19, pp. 582-618, Oct.2012.

    A. Shabtai, Y. Fledel, and Y. Elovici, “Automated Static Code Analysis for Classifying Android Applications UsingMachine Learning," in 2010 International Conference on Computational Intelligence and Security, pp. 329-333, IEEE,Dec. 2010.

    T. Wang, H. Wang, G. Yin, C. X. Ling, X. Li, and P. Zou, “Mining Software Profiles across Multiple Repositories forHierarchical Categorization," in 2013 IEEE International Conference on Software Maintenance, pp. 240-249, IEEE,Sept. 2013.

  • 8/18/2019 App Store Cluster Analysis

    13/41Feature Analysis in App Stores Afnan A. AlSubaihin

    FeatureConceptualisation

    Software

    Categorisation+

  • 8/18/2019 App Store Cluster Analysis

    14/41Feature Analysis in App Stores Afnan A. AlSubaihin

    DomainAnalysis FeatureLocation

    Feature Conceptualisation

    App StoreAnalysis

    Feature Model

    Synthesis

    MaintenanceCode Re-

    use

    Feature-based

    feedback

    FeatureRequests

    FeatureBehaviour

    Monitoring

  • 8/18/2019 App Store Cluster Analysis

    15/41Feature Analysis in App Stores Afnan A. AlSubaihin

    Developer Benefits

    Software Categorisation

    AnomalyDetection

    User Benefits

    ApplicationDiscovery

    ApplicationComparison

    FacilitatingCode Re-Use

    MonitorTechnical

    Trends

    Find CommonBugs

  • 8/18/2019 App Store Cluster Analysis

    16/41Feature Analysis in App Stores Afnan A. AlSubaihin

    Clustering Approach

    Motivation

    Related Work

    Future Directions

    Feature Extraction

    technique

    Clustering algorithm and

    Distance Metric

    Clustering Validation

    Feature Representation

    Data Acquisition

  • 8/18/2019 App Store Cluster Analysis

    17/41Feature Analysis in App Stores Afnan A. AlSubaihin

    Clustering Approach

    Feature Extraction

    technique

    Clustering algorithm and

    Distance Metric

    Clustering Validation

    Feature Representation

    Data Acquisition

  • 8/18/2019 App Store Cluster Analysis

    18/41Feature Analysis in App Stores Afnan A. AlSubaihin

     App Database

    2014

    !"#$%&

    '(

    )##*&

    +(

    ),&-./&&

    0(

    12,34%-#. 6 7/8/$/.3/

    9(

    1.%/$%4-.:/.%

    ++(

    ;-.4.3/

    %? 6 ;-%./&&

    B(

    C/G& 6 @4E4H-./&

    +I(

    J?#%# 6 K-2/#L(

    J$#2,3%-D-%M

    B(

    !?#""-.E

    +(

    !#3-4>

    L(

    N%->-%-/&OI(

    P/4%?/$

    +(

    )>43*)/$$M

    Q4%4&/%

    Category Number of Apps

    !"#$%&'( * +,-,.,($,   /012

    !(3,.3%4(5,(3   /676

    84(%($,   699:,%;3< * 843(,==   621

    >#=4$ * ?#"4'   /206

    @%A4B%&'( * C.%A,;   76D

    @,E= * >%B%F4(,=   /GHG

    I

  • 8/18/2019 App Store Cluster Analysis

    19/41Feature Analysis in App Stores Afnan A. AlSubaihinFeature Life Cycles in App Stores

    SamsungApp Store

  • 8/18/2019 App Store Cluster Analysis

    20/41Feature Analysis in App Stores Afnan A. AlSubaihin

    choose, photo, automatically

    link, Google, drive

    list, making

    image select

    create list automatically

    Features Apps

    Documents to Go

    Photo Sketch HD

    Shopping List

    Radio Superior

    Note+

    Total Number of features = 23,337 

    Clustering Approach

  • 8/18/2019 App Store Cluster Analysis

    21/41

    Feature Analysis in App Stores Afnan A. AlSubaihin

    Feature Representation

    choose select automatically link list photo image

    choose, photo, automatically 1 0 1 0 0 1 0

    link, Google, drive 0 0 0 1 0 0 0

    list, making 0 0 0 0 1 0 0

    image select 0 1 0 0 0 0 1

    create list automatically 0 0 1 0 1 0 0

    terms

    Clustering Approach

  • 8/18/2019 App Store Cluster Analysis

    22/41

    Feature Analysis in App Stores Afnan A. AlSubaihin

    choose, photo, automatically

    link, Google, drive

    list, making

    image select

    create list automatically

    Features

    Feature Representation

    Clustering Approach

  • 8/18/2019 App Store Cluster Analysis

    23/41

    Feature Analysis in App Stores Afnan A. AlSubaihin

    choose, photo, automatically

    link, Google, drive

    list, making

    image select

    create list automatically

    Features

    Feature Representation

    Clustering Approach

  • 8/18/2019 App Store Cluster Analysis

    24/41

    Feature Analysis in App Stores Afnan A. AlSubaihin

    choose, photo, automatically

    link, Google, drive

    list, making

    image select

    create list automatically

    Features

    Feature Representation

    Clustering Approach

  • 8/18/2019 App Store Cluster Analysis

    25/41

    Feature Analysis in App Stores Afnan A. AlSubaihin

    choose select automatically link list photo image

    choose, photo, automatically 1 0 1 0 0 1 0

    link, Google, drive 0 0 0 1 0 0 0

    list, making 0 0 0 0 1 0 0

    image select 0 1 0 0 0 0 1

    create list automatically 0 0 1 0 1 0 0

    terms

    !"#$%& ' )*+,-./01 2# 304%-105

    ,-./01 2# #04%-105 627%4!7!78 % 

    Feature Representation

    Clustering Approach

  • 8/18/2019 App Store Cluster Analysis

    26/41

    Feature Analysis in App Stores Afnan A. AlSubaihin

    choose select automatically link list photo image

    choose, photo, automatically 1*5 0 1*2.5 0 0 1*5 0

    link, Google, drive 0 0 0 1*5 0 0 0

    list, making 0 0 0 0 1*2.5 0 0

    image select 0 1*5 0 0 0 0 1*5

    create list automatically 0 0 1*2.5 0 1*2.5 0 0

    terms

    !"#$%& ' )*+,-./01 2# 304%-105

    ,-./01 2# #04%-105 627%4!7!78 % 

    Feature Representation

    Clustering Approach

  • 8/18/2019 App Store Cluster Analysis

    27/41

    Feature Analysis in App Stores Afnan A. AlSubaihin

    choose select automatically link list photo image

    choose, photo, automatically 0.7 0 0.4 0 0 0.7 0

    link, Google, drive 0 0 0 0.7 0 0 0

    list, making 0 0 0 0 0.4 0 0

    image, select

    0 0.7 0 0 0 0 0.7create, list, automatically 0 0 0.4 0 0.4 0 0

    terms

    !"#$%& ' )*+,-./01 2# 304%-105

    ,-./01 2# #04%-105 627%4!7!78 % 

    Feature Representation

    Clustering Approach

  • 8/18/2019 App Store Cluster Analysis

    28/41

    Feature Analysis in App Stores Afnan A. AlSubaihin

    choose select automatically link list photo image

    choose, photo, automatically 0.7 0 0.4 0 0 0.7 0

    link, Google, drive 0 0 0 0.7 0 0 0

    list, making 0 0 0 0 0.4 0 0

    image, select

    0 0.7 0 0 0 0 0.7create, list, automatically 0 0 0.4 0 0.4 0 0

    terms

    !"# $ % &!' #( $  

    Feature Representation

    Clustering Approach

  • 8/18/2019 App Store Cluster Analysis

    29/41

    Feature Analysis in App Stores Afnan A. AlSubaihin

    sim(t1,t2)  = The length of the

    shortest path

    Feature Representation

    Clustering Approach

  • 8/18/2019 App Store Cluster Analysis

    30/41

    Feature Analysis in App Stores Afnan A. AlSubaihin

    choose select automatically link list photo image

    choose, photo, automatically 0.7 0.7 0.4 0 0 0.7 0.7

    link, Google, drive 0 0 0 0.7 0 0 0

    list, making 0 0 0 0 0.4 0 0

    image, select

    0.7 0.7 0 0 0 0.7 0.7create, list, automatically 0 0 0.4 0 0.4 0 0

    terms

    !"# $ % &!' #( $  

    Feature Representation

    Clustering Approach

  • 8/18/2019 App Store Cluster Analysis

    31/41

    Feature Analysis in App Stores Afnan A. AlSubaihin

    Selecting number of clusters: Can’s Metric

    ! "#$%&'( *+ ,'-.$('/ 0#$%&'( *+ 1'(%/

    #$%&'( *+ 2*2 3 4'(* '2.(5'/  6 78999 

    Spherical K-Means

    Image courtesy of Christian S. Perone. http://blog.christianperone.com/2013/09/machine-learning-cosine-similarity-for-vector-space-models-part-iii/ 

    Clustering algorithm and Distance Metric

    Clustering Approach

    http://blog.christianperone.com/2013/09/machine-learning-cosine-similarity-for-vector-space-models-part-iii/

  • 8/18/2019 App Store Cluster Analysis

    32/41

    Feature Analysis in App Stores Afnan A. AlSubaihin

    choose, photo, automatically 0.7 0.7 0.4 0 0 0.7 0.7

    link, Google, drive 0 0 0 0.7 0 0 0 1

    list, making 0 0 0 0 0.4 0 0

    create, list, automatically

    0 0 0.4 0 0.4 0 0

    0.23

    choose, photo, automatically 0.7 0.7 0.4 0 0 0.7 0.7

    image, select 0.7 0.7 0 0 0 0.7 0.70.04

    Clustering algorithm and Distance Metric

    Clustering Approach

  • 8/18/2019 App Store Cluster Analysis

    33/41

    Feature Analysis in App Stores Afnan A. AlSubaihin

    choos

    eselect

    automatically link list photo image

    choose, photo,automaticall

    0.7 0 0.4 0 0 0.7 0

    link, Google, drive 0 0 0 0.7 0 0 0

    list, making 0 0 0 0 0.4 0 0

    image, select 0 0.7 0 0 0 0 0.7

    create, list, automatically 0 0 0.4 0 0.4 0 0

    choos

    eselect

    automatically link list photo image

    choose, photo,automaticall

    0.7 0.7 0.4 0 0 0.7 0.7

    link, Google, drive 0 0 0 0.7 0 0 0

    list, making 0 0 0 0 0.4 0 0

    image, select 0.7 0.7 0 0 0 0.7 0.7

    create, list, automatically 0 0 0.4 0 0.4 0 0

    Adjusted Rand Index = 0.12

    -10

    10.5-0.5

    Exactly the sameExact Disagreement

    Clustering algorithm and Distance Metric

    Clustering Approach

  • 8/18/2019 App Store Cluster Analysis

    34/41

    Feature Analysis in App Stores Afnan A. AlSubaihin

    C

    f4f3

    f1f2

    f4f3

    f5C

    1

    2

    f10f35

    f40

    C3

    C

    f56

    f33

    f204

    4

    f87

    f60

    FC3FC1

    FC2

    FC3

    Clustering algorithm and Distance Metric

    Clustering Approach

  • 8/18/2019 App Store Cluster Analysis

    35/41

    Feature Analysis in App Stores Afnan A. AlSubaihin

    FC1 FC2 ..

    Documents to Go 1 0 ..

    Photo Sketch HD 0 0 ..

    Shopping List 0 0 ..

    Radio Superior 0 1 ..

    Note+ 1 1 ..

    App x Feature Vector Space

    FC1 FC2

    Clustering algorithm and Distance Metric

    Clustering Approach

  • 8/18/2019 App Store Cluster Analysis

    36/41

    Feature Analysis in App Stores Afnan A. AlSubaihin

    Hierarchical Clustering

    Clustering algorithm and Distance Metric

    Clustering Approach

  • 8/18/2019 App Store Cluster Analysis

    37/41

    Feature Analysis in App Stores Afnan A. AlSubaihin

    ! "#$%&'( *+ ,'-.$('/ 0#$%&'( *+ 1'(%/

    #$%&'( *+ 2*2 3 4'(* '2.(5'/  6 788

     K-Means where,

    Shape: Original Category

    Color: Assigned Cluster

    Clustering of appsaccording to shared featureclusters. Means where k =368. Mapped using tSNE

    Clustering algorithm and Distance Metric

    Clustering Approach

  • 8/18/2019 App Store Cluster Analysis

    38/41

    Feature Analysis in App Stores Afnan A. AlSubaihin

    Manual: Rate relatedness between every two apps in a cluster

    Do clusters show same distribution of app metrics to that found inapp store categories?

    Internal Validation

    Cluster cohesion (inter similarity), coverage, intra similarity and

    silhouette

    External Validation

    Do apps in different clusters exhibit different tendencies in terms of

    app store metrics?

    Clustering Evaluation

    Clustering Approach

  • 8/18/2019 App Store Cluster Analysis

    39/41

    Feature Analysis in App Stores Afnan A. AlSubaihin

    Future Directions

    Motivation

    Related Work

    Clustering Approach

    Feature Extraction

    technique

    Clustering algorithm and

    Distance Metric

    Clustering Validation

    Feature Representation

    Data Acquisition

  • 8/18/2019 App Store Cluster Analysis

    40/41

    Feature Analysis in App Stores Afnan A. AlSubaihin

    Better internal, external cluster validation

    Employ different feature extraction

    Compare with regular text clustering using whole description

    Incorporate the source code to enhance clustering

    Further tweak approach variables: clustering methods, feature granularity

    Future Directions

  • 8/18/2019 App Store Cluster Analysis

    41/41