association rule mining in social network data
TRANSCRIPT
Association Rule Mining in Social Network DataPRESENTED BY: HOSSEIN MOBASHER
COURSE: DATA MINING
19 /2
Contents• Introduction
• Related Works
• The proposed Framework
• Experimental Evaluation
• Conclusion
19 /3
Introduction• The use of social networks has altered the way of life of online community since
last decade.
• Social data uses in:• Academic applications
• E-commerce
• Discovers the user habits and interests of different geographical online communities
• Sentimental analysis of users
• Purpose: Support analysts in decision-making and optimal resource management in businesses as well as web maintenance.
19 /4
Introduction (continue)• The social data is one of the powerful sources of data:
• To get knowledge about social communities
• Investigate the behavior and other different aspects of the online communities
• User-generated contents (UGC) used to help online organizations to enhance their services based on user perspectives.
• The data mining techniques are effectively exploited to discover hidden, interested and meaningful knowledge from the social data.
19 /5
Related Works• TwitterEcho
• Collect data from distributed architecture (Portuguese Twittosphere)
• Use of micro-blogging as the means to predict the political sentiment.
• TWICALL• Discovers important events, categorizes and classifies them
• NIF-T• Exploring data published on micro-blogging websites (i.e. Twitter)
19 /6
The proposed Framework• Environment for the association rule mining to discover hidden patterns from
tweets.
19 /7
Collecting and preprocessing of tweets
• Access tweets using Twitter API.
• Received tweets are unsuitable for the subsequent processes.• Includes information which is not required for problem under consideration
• Remove unnecessary information and transform them into items and related contextual features.
Access data using Twitter API
Remove Unnecessary Information
Transform into suitable format
Mapped into a transactional database
19 /8
Collecting and preprocessing of tweets
• Transformed tweets are then mapped into a transactional database.• Composed of set of stems
• i.e. “Imagination is more important than knowledge” may be mapped into {imagination, important, knowledge}
Access data using Twitter API
Remove Unnecessary Information
Transform into suitable format
Mapped into a transactional database
19 /9
Discovery of Correlations• Use apriori method to extract frequent itemset mining.
• An association rule is usually represented as: If Body then Head• If Body happens then there are more chance that Head may also happen
• It is the relationship between them
• Strength of the rule depends on association rule support and confidence
• The higher the strength of the rule, higher the association in between the terms.
• 𝑖𝑚𝑎𝑔𝑖𝑛𝑎𝑡𝑖𝑜𝑛 ⇒ 𝑘𝑛𝑜𝑤𝑙𝑒𝑑𝑔𝑒• Support = 40%
• Confidence = 70%
19 /10
Taxonomy Generation• Automatically generates taxonomy based on tweet attributes (i.e. frequent
keywords that are generated in the previous phase).
• The more generalized or high-level concepts or correlations can be extracted.
• The taxonomy nodes represent distinct terms extracted from tweet contents• Graph extraction
• Graph partitioning and pruning
19 /11
Taxonomy Generation (Graph extraction)• Strong correlations are detected using previous phase result.
• Generated correlations are represented in graph format• Edge: The implications present in the rule
• Vertices: Items of tweet contents
• 𝑐𝑜𝑢𝑛𝑡𝑟𝑦 ⇒ 𝑊𝑜𝑟𝑙𝑑𝑠𝑜𝑐𝑖𝑒𝑡𝑦, 𝑝𝑒𝑜𝑝𝑙𝑒 ⇒ 𝑐𝑜𝑢𝑛𝑡𝑟𝑦𝑝𝑒𝑎𝑐𝑒 ⇒ 𝑊𝑜𝑟𝑙𝑑𝑠𝑜𝑐𝑖𝑒𝑡𝑦 ⇒ 𝑊𝑜𝑟𝑙𝑑𝑠𝑜𝑐𝑖𝑒𝑡𝑦 ⇒ 𝑐𝑜𝑢𝑛𝑡𝑟𝑦
19 /12
Taxonomy Generation (Graph partitioning and pruning)• Makes the graph compact
• Prunes edges which do not have string relevant relationship by performing vertex labeling. (Label represents level of taxonomy)
19 /13
Analyzing Correlations• The selection and ranking of the significant correlations
• The selection is made having• A rule schema < 𝐾𝑒𝑦𝑤𝑜𝑟𝑑,∗ > ⇒< 𝑃𝑙𝑎𝑐𝑒,∗ >
• Given interesting rule items < 𝐾𝑒𝑦𝑤𝑜𝑟𝑑, 𝑆𝑐ℎ𝑜𝑜𝑙 > ⇒ < 𝑃𝑙𝑎𝑐𝑒, 𝐿𝑜𝑛𝑑𝑜𝑛 >
• The results ranked based on their support and confidence quality indexes.
19 /14
Experimental Evaluation• The proposed framework highlights famous topical subjects (i.e. European
Union)
• The results includes 58 transactions with 209 distinct items (i.e. keywords).
• Firstly, the effectiveness is presented in two scenarios:• User behavior analysis
• Topic trend analysis
• Secondly, the effectiveness is presented as quality of generated taxonomies.
19 /15
User Behavior Analysis• Extracted correlations allow experts to highlight hidden and potentially
interesting user behaviors.
• 𝑝𝑒𝑎𝑐𝑒 ⇒ 𝑊𝑜𝑟𝑙𝑑, 𝑠𝑜𝑐𝑖𝑒𝑡𝑦 ⇒ 𝑐𝑜𝑢𝑛𝑡𝑟𝑦, 𝑐𝑜𝑢𝑛𝑡𝑟𝑦 ⇒ 𝑊𝑜𝑟𝑙𝑑• Proposed framework automatically generates the taxonomy from the mined rules.
• The taxonomy clearly highlights the behavior of people towards the peace.
19 /16
Topic Trend Analysis• Discovery and analysis of currently matter of contention on Twitter.
• Domain expert wants to discover subjects of topical interest for Twitter users.
• The taxonomy suggests that society as a general and people in particular are concerns with peace in the World.
19 /17
Quality of generated taxonomies• The evaluation of taxonomy generation is measured with
• Global quality (Using geometry average)
• Local quality (Degree of correlation between non-leaf and leaf nodes)
• Spread (Number of nodes across the taxonomy to move from node to its root node in graph)
• The results are compared with the approach of • “Evolutionary Taxonomy Construction from Dynamic Tag Space”, 2010
19 /18
Quality of generated taxonomies (continue)• Global quality remained same in both approaches.
• Produced pretty balanced local quality vs. spread measurement indexes.
• Proposed approach takes slightly less time comparing with the approach reported in.
19 /19
Conclusion• Present the mechanism of extracting hidden correlations between contents.
• Generated correlations are helpful to understand the hidden associations among the textual and contextual features of the UGC.
• Proposed approach automatically generates taxonomy.
• The experimental results validate the efficiency and effectiveness of the proposed framework.
Thanks for your attentions
Questions ?