app store cluster analysis
TRANSCRIPT
-
8/18/2019 App Store Cluster Analysis
1/41
Feature Analysis in App Stores Afnan A. AlSubaihin
Feature Analysis in App Stores
Afnan A. Al Subaihin
Supervisors
Mark Harman Sue Black
Licia Capra Federica Sarro
Finding Latent Clustering of Mobile AppsBased on Their Extracted Features
-
8/18/2019 App Store Cluster Analysis
2/41
Feature Analysis in App Stores Afnan A. AlSubaihin
Motivation
Related Work
Clustering Approach
Future Directions
Feature Extraction
technique
Clustering algorithm and
Distance Metric
Clustering Validation
Feature Representation
Data Acquisition
-
8/18/2019 App Store Cluster Analysis
3/41
Feature Analysis in App Stores Afnan A. AlSubaihin
Developer User
Select ideal category Easily discover apps
Motivation
-
8/18/2019 App Store Cluster Analysis
4/41
Feature Analysis in App Stores Afnan A. AlSubaihin
“Sometimes it’s not easy to select a category ifyou app does more than one thing, for example,social music sharing.”
What Do developers Think?
Motivation
-
8/18/2019 App Store Cluster Analysis
5/41
Feature Analysis in App Stores Afnan A. AlSubaihin
Guess the Category
Motivation
-
8/18/2019 App Store Cluster Analysis
6/41
Feature Analysis in App Stores Afnan A. AlSubaihin
Does it matter?
“Of course it does! If I put it in ‘social’ where all the
‘big players’ are, I have less chance of being on thetop lists..”
What Do developers Think?
Motivation
-
8/18/2019 App Store Cluster Analysis
7/41Feature Analysis in App Stores Afnan A. AlSubaihin
“Categories are not useful in discovery, we try topromote using other means..”
“Categories are too deep and the ‘big players’ are on
the top, we don’t have a chance.”
What Do developers Think?
Motivation
-
8/18/2019 App Store Cluster Analysis
8/41Feature Analysis in App Stores Afnan A. AlSubaihin
Methods used to find appsSoo Ling Lim et al. Investigating Country Differences in Mobile App User Behavior and Challenges for
Software Engineering
Motivation
-
8/18/2019 App Store Cluster Analysis
9/41Feature Analysis in App Stores Afnan A. AlSubaihin
App Store App Store
Motivation
-
8/18/2019 App Store Cluster Analysis
10/41Feature Analysis in App Stores Afnan A. AlSubaihin
Related Work
Motivation
Clustering Approach
Future Directions
Feature Extraction
technique
Clustering algorithm and
Distance Metric
Clustering Validation
Feature Representation
Data Acquisition
-
8/18/2019 App Store Cluster Analysis
11/41Feature Analysis in App Stores Afnan A. AlSubaihin
Related Work
-
8/18/2019 App Store Cluster Analysis
12/41Feature Analysis in App Stores Afnan A. AlSubaihin
B. Sanz, I. Santos, C. Laorden, X. Ugarte-Pedrero, and P. G. Bringas, “On the automatic categorisation of androidapplications," in 2012 IEEE Consumer Communications and Networking Conference (CCNC), pp. 149-153, IEEE, Jan.2012.
S. Vakulenko, O. Muller, and J. Brocke, “Enriching iTunes App Store Categories via Topic Modeling," in Proceedingsof the Thirty Fifth International Conference on Information Systems (ICIS), (Auckland, New Zealand), 2014.
S. Kawaguchi, P. K. Garg, M. Matsushita, and K. Inoue, “MUDABlue: An automatic categorization system for OpenSource repositories," Journal of Systems and Software, vol. 79, pp. 939-953, July 2006.
K. Tian, M. Revelle, and D. Poshyvanyk, “Using Latent Dirichlet Allocation for automatic categorization ofsoftware," in 2009 6th IEEE International Working Conference on Mining Software Repositories, pp. 163-166, IEEE,May 2009
M. Linares-Vasquez, C. McMillan, D. Poshyvanyk, and M. Grechanik, “On using machine learning to automaticallyclassify software applications into domain categories,” Empirical Software Engineering, vol. 19, pp. 582-618, Oct.2012.
A. Shabtai, Y. Fledel, and Y. Elovici, “Automated Static Code Analysis for Classifying Android Applications UsingMachine Learning," in 2010 International Conference on Computational Intelligence and Security, pp. 329-333, IEEE,Dec. 2010.
T. Wang, H. Wang, G. Yin, C. X. Ling, X. Li, and P. Zou, “Mining Software Profiles across Multiple Repositories forHierarchical Categorization," in 2013 IEEE International Conference on Software Maintenance, pp. 240-249, IEEE,Sept. 2013.
-
8/18/2019 App Store Cluster Analysis
13/41Feature Analysis in App Stores Afnan A. AlSubaihin
FeatureConceptualisation
Software
Categorisation+
-
8/18/2019 App Store Cluster Analysis
14/41Feature Analysis in App Stores Afnan A. AlSubaihin
DomainAnalysis FeatureLocation
Feature Conceptualisation
App StoreAnalysis
Feature Model
Synthesis
MaintenanceCode Re-
use
Feature-based
feedback
FeatureRequests
FeatureBehaviour
Monitoring
-
8/18/2019 App Store Cluster Analysis
15/41Feature Analysis in App Stores Afnan A. AlSubaihin
Developer Benefits
Software Categorisation
AnomalyDetection
User Benefits
ApplicationDiscovery
ApplicationComparison
FacilitatingCode Re-Use
MonitorTechnical
Trends
Find CommonBugs
-
8/18/2019 App Store Cluster Analysis
16/41Feature Analysis in App Stores Afnan A. AlSubaihin
Clustering Approach
Motivation
Related Work
Future Directions
Feature Extraction
technique
Clustering algorithm and
Distance Metric
Clustering Validation
Feature Representation
Data Acquisition
-
8/18/2019 App Store Cluster Analysis
17/41Feature Analysis in App Stores Afnan A. AlSubaihin
Clustering Approach
Feature Extraction
technique
Clustering algorithm and
Distance Metric
Clustering Validation
Feature Representation
Data Acquisition
-
8/18/2019 App Store Cluster Analysis
18/41Feature Analysis in App Stores Afnan A. AlSubaihin
App Database
2014
!"#$%&
'(
)##*&
+(
),&-./&&
0(
12,34%-#. 6 7/8/$/.3/
9(
1.%/$%4-.:/.%
++(
;-.4.3/
%? 6 ;-%./&&
B(
C/G& 6 @4E4H-./&
+I(
J?#%# 6 K-2/#L(
J$#2,3%-D-%M
B(
!?#""-.E
+(
!#3-4>
L(
N%->-%-/&OI(
P/4%?/$
+(
)>43*)/$$M
Q4%4&/%
Category Number of Apps
!"#$%&'( * +,-,.,($, /012
!(3,.3%4(5,(3 /676
84(%($, 699:,%;3< * 843(,== 621
>#=4$ * ?#"4' /206
@%A4B%&'( * C.%A,; 76D
@,E= * >%B%F4(,= /GHG
I
-
8/18/2019 App Store Cluster Analysis
19/41Feature Analysis in App Stores Afnan A. AlSubaihinFeature Life Cycles in App Stores
SamsungApp Store
-
8/18/2019 App Store Cluster Analysis
20/41Feature Analysis in App Stores Afnan A. AlSubaihin
choose, photo, automatically
link, Google, drive
list, making
image select
create list automatically
Features Apps
Documents to Go
Photo Sketch HD
Shopping List
Radio Superior
Note+
Total Number of features = 23,337
Clustering Approach
-
8/18/2019 App Store Cluster Analysis
21/41
Feature Analysis in App Stores Afnan A. AlSubaihin
Feature Representation
choose select automatically link list photo image
choose, photo, automatically 1 0 1 0 0 1 0
link, Google, drive 0 0 0 1 0 0 0
list, making 0 0 0 0 1 0 0
image select 0 1 0 0 0 0 1
create list automatically 0 0 1 0 1 0 0
terms
Clustering Approach
-
8/18/2019 App Store Cluster Analysis
22/41
Feature Analysis in App Stores Afnan A. AlSubaihin
choose, photo, automatically
link, Google, drive
list, making
image select
create list automatically
Features
Feature Representation
Clustering Approach
-
8/18/2019 App Store Cluster Analysis
23/41
Feature Analysis in App Stores Afnan A. AlSubaihin
choose, photo, automatically
link, Google, drive
list, making
image select
create list automatically
Features
Feature Representation
Clustering Approach
-
8/18/2019 App Store Cluster Analysis
24/41
Feature Analysis in App Stores Afnan A. AlSubaihin
choose, photo, automatically
link, Google, drive
list, making
image select
create list automatically
Features
Feature Representation
Clustering Approach
-
8/18/2019 App Store Cluster Analysis
25/41
Feature Analysis in App Stores Afnan A. AlSubaihin
choose select automatically link list photo image
choose, photo, automatically 1 0 1 0 0 1 0
link, Google, drive 0 0 0 1 0 0 0
list, making 0 0 0 0 1 0 0
image select 0 1 0 0 0 0 1
create list automatically 0 0 1 0 1 0 0
terms
!"#$%& ' )*+,-./01 2# 304%-105
,-./01 2# #04%-105 627%4!7!78 %
Feature Representation
Clustering Approach
-
8/18/2019 App Store Cluster Analysis
26/41
Feature Analysis in App Stores Afnan A. AlSubaihin
choose select automatically link list photo image
choose, photo, automatically 1*5 0 1*2.5 0 0 1*5 0
link, Google, drive 0 0 0 1*5 0 0 0
list, making 0 0 0 0 1*2.5 0 0
image select 0 1*5 0 0 0 0 1*5
create list automatically 0 0 1*2.5 0 1*2.5 0 0
terms
!"#$%& ' )*+,-./01 2# 304%-105
,-./01 2# #04%-105 627%4!7!78 %
Feature Representation
Clustering Approach
-
8/18/2019 App Store Cluster Analysis
27/41
Feature Analysis in App Stores Afnan A. AlSubaihin
choose select automatically link list photo image
choose, photo, automatically 0.7 0 0.4 0 0 0.7 0
link, Google, drive 0 0 0 0.7 0 0 0
list, making 0 0 0 0 0.4 0 0
image, select
0 0.7 0 0 0 0 0.7create, list, automatically 0 0 0.4 0 0.4 0 0
terms
!"#$%& ' )*+,-./01 2# 304%-105
,-./01 2# #04%-105 627%4!7!78 %
Feature Representation
Clustering Approach
-
8/18/2019 App Store Cluster Analysis
28/41
Feature Analysis in App Stores Afnan A. AlSubaihin
choose select automatically link list photo image
choose, photo, automatically 0.7 0 0.4 0 0 0.7 0
link, Google, drive 0 0 0 0.7 0 0 0
list, making 0 0 0 0 0.4 0 0
image, select
0 0.7 0 0 0 0 0.7create, list, automatically 0 0 0.4 0 0.4 0 0
terms
!"# $ % &!' #( $
Feature Representation
Clustering Approach
-
8/18/2019 App Store Cluster Analysis
29/41
Feature Analysis in App Stores Afnan A. AlSubaihin
sim(t1,t2) = The length of the
shortest path
Feature Representation
Clustering Approach
-
8/18/2019 App Store Cluster Analysis
30/41
Feature Analysis in App Stores Afnan A. AlSubaihin
choose select automatically link list photo image
choose, photo, automatically 0.7 0.7 0.4 0 0 0.7 0.7
link, Google, drive 0 0 0 0.7 0 0 0
list, making 0 0 0 0 0.4 0 0
image, select
0.7 0.7 0 0 0 0.7 0.7create, list, automatically 0 0 0.4 0 0.4 0 0
terms
!"# $ % &!' #( $
Feature Representation
Clustering Approach
-
8/18/2019 App Store Cluster Analysis
31/41
Feature Analysis in App Stores Afnan A. AlSubaihin
Selecting number of clusters: Can’s Metric
! "#$%&'( *+ ,'-.$('/ 0#$%&'( *+ 1'(%/
#$%&'( *+ 2*2 3 4'(* '2.(5'/ 6 78999
Spherical K-Means
Image courtesy of Christian S. Perone. http://blog.christianperone.com/2013/09/machine-learning-cosine-similarity-for-vector-space-models-part-iii/
Clustering algorithm and Distance Metric
Clustering Approach
http://blog.christianperone.com/2013/09/machine-learning-cosine-similarity-for-vector-space-models-part-iii/
-
8/18/2019 App Store Cluster Analysis
32/41
Feature Analysis in App Stores Afnan A. AlSubaihin
choose, photo, automatically 0.7 0.7 0.4 0 0 0.7 0.7
link, Google, drive 0 0 0 0.7 0 0 0 1
list, making 0 0 0 0 0.4 0 0
create, list, automatically
0 0 0.4 0 0.4 0 0
0.23
choose, photo, automatically 0.7 0.7 0.4 0 0 0.7 0.7
image, select 0.7 0.7 0 0 0 0.7 0.70.04
Clustering algorithm and Distance Metric
Clustering Approach
-
8/18/2019 App Store Cluster Analysis
33/41
Feature Analysis in App Stores Afnan A. AlSubaihin
choos
eselect
automatically link list photo image
choose, photo,automaticall
0.7 0 0.4 0 0 0.7 0
link, Google, drive 0 0 0 0.7 0 0 0
list, making 0 0 0 0 0.4 0 0
image, select 0 0.7 0 0 0 0 0.7
create, list, automatically 0 0 0.4 0 0.4 0 0
choos
eselect
automatically link list photo image
choose, photo,automaticall
0.7 0.7 0.4 0 0 0.7 0.7
link, Google, drive 0 0 0 0.7 0 0 0
list, making 0 0 0 0 0.4 0 0
image, select 0.7 0.7 0 0 0 0.7 0.7
create, list, automatically 0 0 0.4 0 0.4 0 0
Adjusted Rand Index = 0.12
-10
10.5-0.5
Exactly the sameExact Disagreement
Clustering algorithm and Distance Metric
Clustering Approach
-
8/18/2019 App Store Cluster Analysis
34/41
Feature Analysis in App Stores Afnan A. AlSubaihin
C
f4f3
f1f2
f4f3
f5C
1
2
f10f35
f40
C3
C
f56
f33
f204
4
f87
f60
FC3FC1
FC2
FC3
Clustering algorithm and Distance Metric
Clustering Approach
-
8/18/2019 App Store Cluster Analysis
35/41
Feature Analysis in App Stores Afnan A. AlSubaihin
FC1 FC2 ..
Documents to Go 1 0 ..
Photo Sketch HD 0 0 ..
Shopping List 0 0 ..
Radio Superior 0 1 ..
Note+ 1 1 ..
App x Feature Vector Space
FC1 FC2
Clustering algorithm and Distance Metric
Clustering Approach
-
8/18/2019 App Store Cluster Analysis
36/41
Feature Analysis in App Stores Afnan A. AlSubaihin
Hierarchical Clustering
Clustering algorithm and Distance Metric
Clustering Approach
-
8/18/2019 App Store Cluster Analysis
37/41
Feature Analysis in App Stores Afnan A. AlSubaihin
! "#$%&'( *+ ,'-.$('/ 0#$%&'( *+ 1'(%/
#$%&'( *+ 2*2 3 4'(* '2.(5'/ 6 788
K-Means where,
Shape: Original Category
Color: Assigned Cluster
Clustering of appsaccording to shared featureclusters. Means where k =368. Mapped using tSNE
Clustering algorithm and Distance Metric
Clustering Approach
-
8/18/2019 App Store Cluster Analysis
38/41
Feature Analysis in App Stores Afnan A. AlSubaihin
Manual: Rate relatedness between every two apps in a cluster
Do clusters show same distribution of app metrics to that found inapp store categories?
Internal Validation
Cluster cohesion (inter similarity), coverage, intra similarity and
silhouette
External Validation
Do apps in different clusters exhibit different tendencies in terms of
app store metrics?
Clustering Evaluation
Clustering Approach
-
8/18/2019 App Store Cluster Analysis
39/41
Feature Analysis in App Stores Afnan A. AlSubaihin
Future Directions
Motivation
Related Work
Clustering Approach
Feature Extraction
technique
Clustering algorithm and
Distance Metric
Clustering Validation
Feature Representation
Data Acquisition
-
8/18/2019 App Store Cluster Analysis
40/41
Feature Analysis in App Stores Afnan A. AlSubaihin
Better internal, external cluster validation
Employ different feature extraction
Compare with regular text clustering using whole description
Incorporate the source code to enhance clustering
Further tweak approach variables: clustering methods, feature granularity
Future Directions
-
8/18/2019 App Store Cluster Analysis
41/41