citizen sensing, social media analytics, and applications

335
Citizen Sensor Data Mining, Social Media Analytics and Development Centric Web Applications. Tutorial at Semantic Technology Conference, San Francisco, CA. Karthik Gomadam Accenture Technology Labs, San Jose Amit Sheth Kno.e.sis @ Wright State University Selvam Velmurugan eMoksha, Kiirti Monday, June 6, 2011

Post on 23-Sep-2014

7 views

Category:

Education


0 download

DESCRIPTION

Description: http://semtech2011.semanticweb.com/sessionPop.cfm?confid=62&proposalid=3845Original version: http://slidesha.re/social-WWW

TRANSCRIPT

  • Citizen Sensor Data Mining, Social Media Analytics and

    Development Centric Web Applications.Tutorial at

    Semantic Technology Conference, San Francisco, CA.

    Karthik GomadamAccenture Technology Labs,

    San Jose

    Amit ShethKno.e.sis @

    Wright State University

    Selvam VelmuruganeMoksha, Kiirti

    Monday, June 6, 2011

  • Lu Chen(Sentiment Analysis)

    Meena Nagarajan(Content Analysis)

    Ashutosh Jadhav(Event Analysis)

    Hemant Purohit(People & Network analysis)

    Pavan Kapanipathi(Real Time Web)

    Selvam Velmurugan (Kiirti, eMoksha NGOs)

    Pramod Anantharam(Social & Sensor web)

    Amit Sheth(Semantic Web)

    Monday, June 6, 2011

  • Much of the work discussed in this tutorial is primarily the doctoral research by Dr. Meena Nagarajan, currently at IBM Almaden. It also includes current work done at kno.e.sis center at Wright State University.

    A Quick Word

    Monday, June 6, 2011

  • Citizen Sensing: Role, Enablers, Apps

    Systematic Study Social Media

    Citizen Sensing @ Real-time

    Emerging Research Areas Spam and Trust in Social Media, Mobile Social ComputingResearch Application: Twitris

    Tutorial part 2

    Outline

    Monday, June 6, 2011

  • Citizen Sensing

    Everyday users of Web2.0 and social networks: Citizens ofan Internet- or Web-enabled social communityObservation and Information reported by citizens => Citizen SensingHuman-in-the-loop (participatory)sensing + Web 2.0 + mobile computing = emergence of

    " citizen-sensor networks

    Monday, June 6, 2011

  • Social Signals

    The activity of observing, reporting, disseminating information via text, audio, video and built in device sensor (and smart devices), Creating social signals through aggregation, enhancement,

    analysis, visualization, and interpretation.Immense potential to disseminate information quickly and in real-time

    Monday, June 6, 2011

  • Enablers: Mobile Devices & Ubiquitous Connectivity

    Mobile device fast emerging as our primary tool Redefines the way we engage with people, information,

    etc. Global, Ubiquitous, always availableSense where you are, how you are,

    Monday, June 6, 2011

  • Enablers: Mobile Devices & Ubiquitous Connectivity

    Global, Ubiquitous, always availableSense where you are, how you are,

    Monday, June 6, 2011

  • Enablers: Mobile Devices & Ubiquitous Connectivity

    Sense where you are, how you are,

    Monday, June 6, 2011

  • Enablers: Mobile Devices & Ubiquitous Connectivity

    Monday, June 6, 2011

  • Mobile Platforms Hit Critical Mass Over 5 billion users 1+B with internet connected mobile devices (2010) Smartphones > Notebooks + Netbooks (2010E) 500K+ mobile phone applications 74% of mobile phone users (2.4B) worldwide texted (2007)

    Enablers: Mobile Devices & Ubiquitous Connectivity

    Monday, June 6, 2011

  • Enablers: Web 2.0 & Social Media

    500M+ Facebook Users100M+ Twitter users, 85M+ tweets/dayInternet Users: 1.8 BlnContent dissemination medium Even for traditional media (@cnn, @nytimes)

    Monday, June 6, 2011

  • Enablers: Web 2.0 & Social Media

    100M+ Twitter users, 85M+ tweets/dayInternet Users: 1.8 BlnContent dissemination medium Even for traditional media (@cnn, @nytimes)

    Monday, June 6, 2011

  • Enablers: Web 2.0 & Social Media

    Internet Users: 1.8 BlnContent dissemination medium Even for traditional media (@cnn, @nytimes)

    Monday, June 6, 2011

  • Enablers: Web 2.0 & Social Media

    Content dissemination medium Even for traditional media (@cnn, @nytimes)

    Monday, June 6, 2011

  • Enablers: Web 2.0 & Social Media

    Monday, June 6, 2011

  • Enablers: Web 2.0 & Social Media

    Types of UGC: Twitter(text/microblogs), Facebook(multimedia),YouTube(videos), Flicker(images), Blogs(text),Ping: (Social network for music)

    Monday, June 6, 2011

  • Enablers: Web 2.0 & Social Media

    Flicker(images), Blogs(text),Ping: (Social network for music)

    Monday, June 6, 2011

  • Enablers: Web 2.0 & Social Media

    Ping: (Social network for music)

    Monday, June 6, 2011

  • Enablers: Web 2.0 & Social Media

    Monday, June 6, 2011

  • Iran electionHaiti EarthquakeUS healthcare debate

    Citizen Sensors in Action

    Monday, June 6, 2011

  • Revolution 2.0 Political/Social Activism

    If you want to liberate a government, give them the internet. - Wael Ghonim (Egyptian social activist)When Blitzer asked Tunisia, then Egypt, whats next?, Ghonim replied succinctly Ask Facebook.

    Monday, June 6, 2011

  • Revolution 2.0 Political/Social Activism

    When Blitzer asked Tunisia, then Egypt, whats next?, Ghonim replied succinctly Ask Facebook.

    Monday, June 6, 2011

  • Revolution 2.0 Political/Social Activism

    Monday, June 6, 2011

  • Citizen Journalism

    Twitter Journalism

    Monday, June 6, 2011

  • Social Media Inuence: Intelligence, News & Analysis

    Many media companies useFacebook and Twitter asnews-delivery platform. Manyindividuals rely on them as newssource. News is increasingly social.

    Monday, June 6, 2011

  • Business Intelligence Trend SpoTing, Forecasting, Brand

    Tracking and Crisis ManagementSysomos : http://www.sysomos.com/Trendspotting : http://trendspotting.comSimplify : http://simplify360.com/Shoutlet : http://www.shoutlet.com/ Reputation (Defender): http://www.reputationdefender.com/

    Monday, June 6, 2011

  • Development (Education, Health, eGov)

    LiveMocha (http://www.livemocha.com/) OnlineLanguage learning tool with social engagement bridging the gap!!Soliya (http://www.soliya.net/) Dialogue between students fromdiverse " backgrounds

    across the globe using latest multimedia technologiesProject Einstein (http://digital-democracy.org/what-we-do/programs/) A photography-based digital penpal programconnecting

    youths in refugee camps to the world

    Monday, June 6, 2011

  • Development (Education, Health, eGov)

    Soliya (http://www.soliya.net/) Dialogue between students fromdiverse " backgrounds

    across the globe using latest multimedia technologiesProject Einstein (http://digital-democracy.org/what-we-do/programs/) A photography-based digital penpal programconnecting

    youths in refugee camps to the world

    Monday, June 6, 2011

  • Development (Education, Health, eGov)

    Project Einstein (http://digital-democracy.org/what-we-do/programs/) A photography-based digital penpal programconnecting

    youths in refugee camps to the world

    Monday, June 6, 2011

  • Development (Education, Health, eGov)

    Monday, June 6, 2011

  • PatientsLikeMe (http://mashable.com/2010/07/13/social-media-health-trends/)TrialX (http://trialx.com)

    Image: hMp://www.dragonsearchmarketing.com/blog/

    social-media-development-through-visual-aids-tools/

    Development (Education, Health, eGov)

    Monday, June 6, 2011

  • Why People-Content-Network metadata?

    Monday, June 6, 2011

  • Spatio - Temporal -Thematic+

    People - Content - Network

    Dimensions of Systematic Study of Social Media

    Monday, June 6, 2011

  • "Who says what, to whom, why, to what extent and with what effect?" [Laswell] Network: Social structure emerges from the aggregate of relationships (ties)People: poster identities, the active effortof accomplishing interactionContent : studying the content of ommunication.

    Social InformationProcessing

    Monday, June 6, 2011

  • How does the (semantics or style of) content t into the observations made about the network?

    Often, the three-dimensional dynamic of people, content and link structure is what shapes the social dynamic.

    Studying Online Human Social Dynamics

    Monday, June 6, 2011

  • Studying Online Human Social Dynamics

    Monday, June 6, 2011

  • Studying Online Human Social Dynamics

    Example: how does the topic of discussion, emotional charge of a conversation, the presence of an expert and connections between participants; together explain information propagation in a social network?

    Monday, June 6, 2011

  • Studying Online Human Social Dynamics

    Monday, June 6, 2011

  • Metadata/Annotations

    Metadata: an organized way to study types creation/extraction and storage use

    Monday, June 6, 2011

  • The Anatomy of a Tweet

    Monday, June 6, 2011

  • Explicit information from user proles User Names, Pictures, Videos, Links, Demographic

    Information, Group memberships... Often is not updated Implicit information from user a+ention metadata Page views, Facebook 'Likes', Comments; TwiMer

    'Follows', Retweets, Replies..

    People Metadata: Variety of Self-expression Modes on Multiple

    Social Media Platforms

    Monday, June 6, 2011

  • People Metadata: Various Levels

    Demographic

    Network

    Activity

    Interests

    Monday, June 6, 2011

  • People Metadata: Continued

    User Demographic MetadataUser-idScreen/Display-name of userReal name of userLocation Profile Creation DateUser descriptionUser BioURL

    Interest Level MetadataAuthor type Trustee/donor, journalist, blogger, scientist etc.

    Favorite tweets Types of lists subscribed Style of Writing personality indicator No. of Followees Author type trend of Followees

    Monday, June 6, 2011

  • Web Presence:User affiliationsKLOUT Score influence measure (www.klout.com)

    Activity Level Metadata

    Age of the prole

    Frequency of posts

    Timestamp of last status

    No. of Posts

    No. of Lists/groups created

    No. of Lists/groups subscribed

    Inuence Level Metadata (Inferring People Metadata from Network level Information)

    No. of Followers normal, inuential

    No. of Mentions

    No. of Retweets/Forwards

    No. of Replies

    No. of Lists/groups following

    No. of people following back

    Authority & Hub Scores

    People Metadata: Continued

    Monday, June 6, 2011

  • Content Independent metadata " date, location, author etcContent Dependent metadata Direct content-based metadata Explicit/Mentioned Content metadata

    named entities in content Implicit/Inferred Content Metadata

    related named entities from knowledge sources Indirect content-based metadata (External metadata)

    context inferred from URLs in content (images, links to articles, FourSquare checkins etc.)

    Content Metadata

    Monday, June 6, 2011

  • Content Dependent metadata Direct content-based metadata Explicit/Mentioned Content metadata

    named entities in content Implicit/Inferred Content Metadata

    related named entities from knowledge sources Indirect content-based metadata (External metadata)

    context inferred from URLs in content (images, links to articles, FourSquare checkins etc.)

    Content Metadata

    Monday, June 6, 2011

  • Content Metadata

    Monday, June 6, 2011

  • For Tweets Published date and time Location (where tweet was generated from) Tweet posting method (smart-phone, twitter.com,

    clients for twitter) Author information

    Content Independent Metadata

    Monday, June 6, 2011

  • Content Independent Metadata

    Monday, June 6, 2011

  • For Text messages Published date and time Origin location Recipient Carrier information

    Content Independent Metadata

    Monday, June 6, 2011

  • Content Independent Metadata

    Monday, June 6, 2011

  • Content Independent Metadata

    Monday, June 6, 2011

  • Content Dependent Metadata (Tweet) Direct Content-based Metadata

    Direct Content-based Metadata

    Indirect content-based metadata (External metadata)

    Monday, June 6, 2011

  • Content Dependent Metadata

    Direct Content-based Metadata

    Monday, June 6, 2011

  • Network Metadata

    Connections/Relationships (foundation for the network) matter!Structure Level Metadata

    Community SizeCommunity growth rateLargest Strongly Connected Component sizeWeakly Connected Components & Max. sizeAverage Degree of SeparationClustering Coecient

    Relationship Level Metadata

    Type of RelationshipRelationship strengthUser Homophily based on certain characteristic (e.g., Location, interest etc.)Reciprocity: mutual relationshipActive Community/ Ties

    Monday, June 6, 2011

  • Metadata: Creation, Extraction and Storage

    Monday, June 6, 2011

  • Extracted Metadata Directly visible information from the user profile, tweet

    content & community structureCreated Metadata After processing information in the user profile, content

    and/or network structure

    Metadata Creation & Extraction

    Monday, June 6, 2011

  • Length: 144 characters; General topic: Egypt protestThis poor {sentiment_expression: {target:Lara Logan, polarity:negative}} woman! RT @THR CBS News'{entity:{type=News Agency}} Lara Logan{entity:{type=Person}} Released From Hospital{entity:{type=Location}} After Egypt{entity:{type=Country} Assault{type=topic}http://bit.ly/dKWTY0 {external_URL}

    An Example

    Monday, June 6, 2011

  • Rich Snippet, RDFa, open graph, semantic web based social data standards

    Relationships/connections play central role Relationships as rst class object is important

    Why Semantic Web is a standard for social metadata?

    Monday, June 6, 2011

  • Semantic Web: A Very Short Primer

    Monday, June 6, 2011

  • Representation RDF relationships as first class object OWLRepresenting Knowledge and Agreements:

    nomenclature, taxonomy, folksonomy, ontology

    Semantic Web: A Very Short Primer

    Monday, June 6, 2011

  • Semantic Web: A Very Short Primer

    Monday, June 6, 2011

  • Annotation RDFa, Xlink, model reference

    Semantic Web: A Very Short Primer

    Monday, June 6, 2011

  • Annotation RDFa, Xlink, model referenceWeb of Data Linked Open Data

    Semantic Web: A Very Short Primer

    Monday, June 6, 2011

  • Annotation RDFa, Xlink, model referenceWeb of Data Linked Open DataQuerying SPARQL; Rules: SWRL, RIF

    Semantic Web: A Very Short Primer

    Monday, June 6, 2011

  • Store metadata as data and use standard database techniques

    Use filtering and clustering, summarization, statistics - implicit semantics

    How to save and use metadata?

    Monday, June 6, 2011

  • Use filtering and clustering, summarization, statistics - implicit semantics

    How to save and use metadata?

    Monday, June 6, 2011

  • How to save and use metadata?

    Monday, June 6, 2011

  • How to save and use metadata?

    Monday, June 6, 2011

  • Use explicit semantics and Semantic Web standards and technologies

    semantics = meaningricher representation, support for relationships, contextsupports use of background knowledgebetter integration, powerful analysisSemantics- the implicit, the formal and the

    powerfulSocial metadata on the Web

    How to save and use metadata?

    Monday, June 6, 2011

  • Metadata Extraction from Informal Text

    Meena Nagarajan, Understanding User-Generated Content on Social Media, Ph.D. Dissertation, Wright State University, 2010

    Monday, June 6, 2011

  • Characteristics of Text on Social Media

    Monday, June 6, 2011

  • The Formality of Text

    Monday, June 6, 2011

  • Recognize key entities mentioned in content Information Extraction (entity recognition, anaphora

    resolution, entity classification..) Discovery of Semantic Associations between entities Topic Classification, Aboutness of content What is the content about? Intention Analysis Why did they share this content?

    Content Analysis-Typical Sub-tasks

    Monday, June 6, 2011

  • Topic Classification, Aboutness of content What is the content about? Intention Analysis Why did they share this content?

    Content Analysis-Typical Sub-tasks

    Monday, June 6, 2011

  • Intention Analysis Why did they share this content?

    Content Analysis-Typical Sub-tasks

    Monday, June 6, 2011

  • Content Analysis-Typical Sub-tasks

    Monday, June 6, 2011

  • Content Analysis-Typical Sub-tasks

    Monday, June 6, 2011

  • Content Analysis-Typical Sub-tasks

    Sentiment Analysis What opinions are people conveying via the content?Author ProfilingWhat can we infer about the author from the content he posts?Context (external to content) extractionURL extraction, analyzing external content

    Monday, June 6, 2011

  • Examining usefulness of multiple context cues for text mining algorithms Compensating for for informal, highly variable

    language, lack of context Using context cues: Document corpus, syntactic,

    structural cues, social medium, external domain knowledge

    In this talk, highlighting sample metadata creation tasks: NER, Key Phrase Extraction, Intention, Sentiment/Opinion Mining

    Research Eorts, Contributions in this space..

    Monday, June 6, 2011

  • Named Entity Recognition I loved the hangover !Key Phrase Extraction

    Part 1. NER, Key Phrase Extraction

    Monday, June 6, 2011

  • Multiple Context Cues Utilized for NER in Blogs and MySpace

    Monday, June 6, 2011

  • Multiple Context Cues Utilized for Keyphrase Extraction from TwiTer,

    Facebook and MySpace

    Monday, June 6, 2011

  • Techniques focus on relatively less explored content aspects on social

    media platformsCombination of top-down, bottom-up analysis for informal text Statistical NLP, ML algorithms over large corpora Models and rich knowledge bases in a domain

    Focus, Impact

    Monday, June 6, 2011

  • NAMED ENTITY RECOGNITION

    Monday, June 6, 2011

  • I loved your music Yesterday!It was THE HANGOVER of the year..lasted

    forever.. So I went to the movies..badchoice picking GI

    Janeworse now

    NAMED ENTITY RECOGNITION

    Monday, June 6, 2011

  • Identifying and classifying tokens

    NAMED ENTITY RECOGNITION

    Monday, June 6, 2011

  • NER in prior work vs. NER for Informal Text

    Monday, June 6, 2011

  • NER focus in this work: Cultural Named Entities

    Artifacts of Culture Name of a books, music albums, lms, video games,

    etc.Common words in a language The Lord of the Rings, Lips, Crash, Up, Wanted,

    Today, Twilight, Dark Knight

    Cultural Named Entities

    Monday, June 6, 2011

  • Varied senses, several poorly documented Merry Christmas covered by 60+ artists Star Trek:

    movies, TV series, media franchise.. and cuisines !!Changing contexts with recent events The Dark Knight reference to Obama, health care

    reformUnrealistic expectations Comprehensive sense definitions, enumeration of

    contexts, labeled corpora for all senses .. NER Relaxing the closed-world sense assumptions

    Characteristics of Cultural Entities

    Monday, June 6, 2011

  • NER in prior work vs. NER for Informal Text

    Monday, June 6, 2011

  • NER generally a sequential prediction problem NER system that achieves 90.8 F1 score on the

    CoNLL-2003 NER shared task (PER, LOC, ORGN entities) [Lev Ratinov, Dan Roth]

    Focus of approach: Spot and Disambiguate ParadigmStarting off with a dictionary or list of entities we want to spot

    A Spot and Disambiguate Paradigm

    Monday, June 6, 2011

  • Spot, then disambiguate in context (natural language, domain knowledge cues)Binary ClassificationIs this mention of the hangover in a sentence referring to a movie?

    A Spot and Disambiguate Paradigm

    Monday, June 6, 2011

  • NER in prior work vs. NER for Informal Text

    Monday, June 6, 2011

  • Algorithmic Contributions Supervised Algorithms

    Monday, June 6, 2011

  • Algorithmic Contributions Supervised Algorithms

    Examples:I am watching Pattinson scenes in Twilight for the nth time.I spent a romantic evening watching the Twilight

    by the bay..I love Lilys song

    Monday, June 6, 2011

  • Multiple Senses in the Same Domain

    Monday, June 6, 2011

  • Problem Defn Cultural Entity Identification : Music album, tracks Smile (Lilly Allen), Celebration (Madonna)Corpus: MySpace comments Context-poor utterances

    " Happy 25th Lilly, Alfieis funny

    Algorithm Preliminaries

    Monday, June 6, 2011

  • Corpus: MySpace comments Context-poor utterances

    " Happy 25th Lilly, Alfieis funny

    Algorithm Preliminaries

    Monday, June 6, 2011

  • " Happy 25th Lilly, Alfieis funny

    Algorithm Preliminaries

    Monday, June 6, 2011

  • Goal: Semantic Annotation of music named entities (w.r.t

    MusicBrainz)

    Algorithm Preliminaries

    Monday, June 6, 2011

  • Using a Knowledge Resource for NER is not straight-forward..

    Monday, June 6, 2011

  • Approach Overview

    Scoped Relationship graphsUsing context cues from the

    content, webpage title, url new Merry Christmas tune

    Reduce potential entity spot size new albums/songs

    Generate candidate entitiesSpot and Disambiguate

    Monday, June 6, 2011

  • Sample Real-world Constraints

    Career Restrictionsrelease your third album already..Recent Album restrictionsI loved your new album..Artist age restrictionshappy 25th rihanna, loved alfie btw.. etc.

    Monday, June 6, 2011

  • Challenge 1: Several senses in the same domain Scoping relationship graphs narrows possible senses Solves the named entity identification problem

    partially

    Challenge 2: Non-music mentions Got your new album Smile. Loved it! Keep your SMILE on!

    " " " "" " " "

    Non-Music Mentions

    Monday, June 6, 2011

  • Challenge 1: Several senses in the same domain Scoping relationship graphs narrows possible senses Solves the named entity identification problem

    partially

    Challenge 2: Non-music mentions Got your new album Smile. Loved it! Keep your SMILE on!

    " " " "" " " "

    Non-Music Mentions

    Monday, June 6, 2011

  • Syntactic features POS Tags, Typed dependencies.. Example hereWord-level features Capitalization, QuotesDomain-level features

    Using Language Features to eliminate incorrect mentions..

    Monday, June 6, 2011

  • Supervised Learners

    Monday, June 6, 2011

  • 1800+ spots in MySpace user comments from artist pages

    Keep your SMILE on! good spot, bad spot, inconclusive?

    4-way annotator agreements

    Madonna 90% agreement Rihanna 84% agreement

    Lily Allen 53% agreement

    Hand Labeling - Fairly Subjective

    Monday, June 6, 2011

  • Daniel Gruhl, Meena Nagarajan, Jan Pieper, Christine Robson, Amit Sheth, Context and Domain Knowledge Enhanced Entity SpoMing in Informal Text, The 8th International Semantic Web Conference,

    2009: 260-276

    Dictionary SpoTer + NLP Step

    Monday, June 6, 2011

  • Highlights issues with using a domain knowledge for an IE task Two stage approach: chaining NL learners over results of domain model based spotters Improves accuracy up to a further 50% allows the more time-intensive NLP analytics to

    run on less than the full set of input data

    NER on Social Media Text using Domain Knowledge

    Monday, June 6, 2011

  • " "

    Daniel Gruhl, Meenakshi Nagarajan, Jan Pieper, Christine Robson, Amit Sheth: Multimodal Social Intelligence in a Real-Time Dashboard System, special issue of the VLDB Journal on "Data Management and Mining for Social Networks and Social Media", 2010 CHECK hMp://www.almaden.ibm.com/cs/

    projects/iis/sound/

    BBC SoundIndex (IBM Almaden): Pulse of the Online Music

    Monday, June 6, 2011

  • http://www.almaden.ibm.com/cs/projects/iis/sound/

    The Vision

    Monday, June 6, 2011

  • Monday, June 6, 2011

  • Only 4% -ve sentiments, perhaps ignore the Sentiment Annotator on this data source?

    Ignoring Spam can change ordering of popular artists

    Trending popularity of artists Trending topics in artist pages

    Several Insights

    Monday, June 6, 2011

  • Billboards Top 50 Singles chart during the week of Sept 22-28 07 vs. MySpace popularity charts.User study indicated 2:1 and upto 7:1 (younger age

    groups) preference for MySpace list.Challenging traditional polling methods!

    Predictive Power of Data

    Monday, June 6, 2011

  • Key Phrase Extraction

    Monday, June 6, 2011

  • Key phrases extracted from prominent discussionson Twitter around the 2009 Health Care Reformdebate and 2008 Mumbai Terror Attack on one day

    Key Phrase Extraction: Example

    Monday, June 6, 2011

  • Different from Information ExtractionExtracting vs. Assigning Key Phrases " Focus: Key Phrase ExtractionPrior work focus: extracting phrases that summarize a document -- a news article, a web page, a journal article, a book..Focus: summarize multiple documents (UGC) around same event/topic of interest

    Key Phrase Extraction from SM Text

    Monday, June 6, 2011

  • Focus: Summarizing Social Perceptions via key phrase extractionPreserving/Isolating the social behind the social

    data"What is said in Egypt vs. the USA should be viewed in

    isolation

    Key Phrase Extraction on SM Content

    Monday, June 6, 2011

  • Accounting for redundancy, variability, off-topic content

    " Met up with mom for lunch, she looks lovely as ever, good genes .. Thanks Nike, I love my new Gladiators ..smooth as a feather. I burnt all the calories of Italian joy in one run.. if you are looking for good Italian food on Main, Bucais the place to go.

    Key Phrase Extraction on SM Content

    Monday, June 6, 2011

  • Thematic components similar messages convey similar ideas Space, time metadata role of community and geography in communicationPoster attributes age, gender, socio-economic status reflect similar

    perceptions

    Social and Cultural Logic in SMC

    Monday, June 6, 2011

  • Focus: n-grams, spatio-temporal metadata (social components) Syntactic Cues: In quotes, italics, bold; in document headers; phrases collocated with acronyms

    Feature Space (common to several eorts)

    Monday, June 6, 2011

  • Document and Structural Cues: Two word phrases, appearing in the beginning of a document, frequency, presence in multiple similar documents etc. Linguistic Cues: Stemmed form of a phrase, phrases that are simple and compound nouns in sentences etc.

    Feature Space (common to several eorts)

    Monday, June 6, 2011

  • President Obama in trying to regain control of the health-care debate will likely shift his pitch in September

    " 1-grams: President, Obama, in, trying, to, regain, ..." 2-grams: President Obama, Obama in, in

    trying, trying

    Key Phrase Extraction: Overview

    Monday, June 6, 2011

  • A descriptor is an n-gram weighted by: Thematic Importance

    TFIDF, stop words, noun phrases Redundancy: statistically discriminatory in nature variability: contextually important

    Spatial Importance (local vs. global popularity) Temporal Importance (always popular vs. currently trending)

    Monday, June 6, 2011

  • Monday, June 6, 2011

  • Eliminating Off-topic Content [WISE2009]Frequency based heuristics will not eliminate off-topic content that is ALSO POPULAR

    Monday, June 6, 2011

  • Yeah i know this a bit off topic but the other electronics forum is dead right now. im looking for a good camcorder, somethin not to large that can record in full HD only ones so far that ive seen are sonysCanonHV20.Great little cameras under $1000.

    Approach Overview

    Monday, June 6, 2011

  • Assume one or more seed words (from domain knowledge base) C1 -['camcorder']Extracted Key words / phrases

    C2 -['electronics forum', 'hd', 'camcorder', 'somethin', 'ive', 'canon', 'little camera', 'canon hv20', 'cameras', 'offtopic']

    Gradually expand C1 by adding phrases from C2 that are strongly associated with C1Mutual Information based algorithm [WISE2009]

    Approach Overview

    Monday, June 6, 2011

  • Are the key phrases we extracted topical and good indicators of what the content is about? If it is, it should act as an effective index/search

    phrase and return relevant contentEvaluation Application: Targeted Content Delivery

    Key Phrases and Aboutness Evaluations

    Monday, June 6, 2011

  • 12K posts from MySpace and Facebook Electronics forums Baseline phrases: Yahoo Term Extractor Our method phrases: Key phrase extraction,

    eliminationTargeted Content from Google AdSense

    Targeted Content Delivery -Evaluations

    Monday, June 6, 2011

  • Targeted Content for all content vs. extracted key phrases

    Monday, June 6, 2011

  • User Studies and Results

    Monday, June 6, 2011

  • TFIDF + social contextual cues yield more useful phrases that preserve social perceptionsCorpus + seeds from a domain knowledge base eliminate off-topic phrases effectively

    Impact and Contributions

    Monday, June 6, 2011

  • Intention Mining

    Monday, June 6, 2011

  • On social networksUse case for this talk " Targeted content = content-based " advertisements " Target = user profilesContent-based advertisements CBAs " Well-known monetization model for online content

    Targeted Content Delivery via Intention Mining

    Monday, June 6, 2011

  • Circa. 2009 Content-based Ads

    Monday, June 6, 2011

  • Circa. 2009 -Ads on Proles

    Monday, June 6, 2011

  • Interests do not translate to purchase intents " Interests are often outdated.. " Intents are rarely stated on a profile.. Cases that do seem to work " New store openings, sales " Highly demographic-targeted ads

    What is going on here

    Monday, June 6, 2011

  • Intents in User

    Monday, June 6, 2011

  • Content Ads Outside Proles

    Monday, June 6, 2011

  • Non-trivial Non-policed contentBrand image, Unfavorable sentiments People are there to networkUser attention to ads is not guaranteed Informal, casual nature of content People are sharing experiences and eventsMain message overloaded with off topic content"

    Targeted Content-based Advertising

    Monday, June 6, 2011

  • Targeted Content-based Advertising

    Monday, June 6, 2011

  • Targeted Content-based Advertising

    I NEED HELP WITHSONY VEGAS PRO 8!! Ugh and ihave a video project due tomorrow for merrilllynch :(( all ineed to

    do is simple: Extract several scenes from a clip, insert captions, transitions and thatsit. really. omggicant figure out anything!! help!! and igot food poisoning from eggs. its not

    fun. Pleasssse, help? :(

    Learning from Multi-topic Web Documents for Contextual Advertisement, Zhang, Y., Surendran, A. C., Platt, J. C., and

    Narasimhan, M.,KDD 2008

    Monday, June 6, 2011

  • Identifying intents behind user posts on social networks Identify Content with monetization potentialIdentifying keywords for advertising in user-generated content Considering interpersonal communication & off-topic

    chatter

    Preliminary Results in

    Monday, June 6, 2011

  • Investigations

    User studies Hard to compare activity based ads to s.o.t.a Impressions to Clickthroughs How well are we able to identify monetizable posts How targeted are ads generated using our " keywords

    vs. entire user generated contentMonday, June 6, 2011

  • Scribe Intent not same as Web Search Intent 1B.People write sentences, not keywords or phrasesPresence of a keyword does not imply navigational / transactional intents am thinking of getting X (transactional) I like my new X (information sharing) what do you think about X (information seeking)

    1B. J. Jansen, D. L. Booth, and A. Spink, Determining the informational, navigational, and transactional intent of web queries,Inf. Process. Manage., vol. 44, no. 3, 2008.

    Identifying Monetizable Intents

    Monday, June 6, 2011

  • Action patterns surrounding an entity How questions are asked and not topic words that indicate

    what the question is about where can I find a chottopspcam User post also has an entity

    From X to Action PaTerns

    Monday, June 6, 2011

  • Set of user posts from SNSsNot annotated for presence or absence of any intent

    Conceptual Overview Bootstrapping to learn IS paTerns

    Monday, June 6, 2011

  • Generate a universal set of n- gram paMerns; freq > f

    S = set of all 4-grams; freq > 3

    Bootstrapping to learn IS paTerns

    Monday, June 6, 2011

  • ! !Generate set of candidate paMerns from seed words (why,when,where,how,what)

    Sc= all 4-grams in S that extract seed words

    Bootstrapping to learn IS paTerns

    Monday, June 6, 2011

  • ! !User picks 10 seed paMerns from Sc

    Sis= does anyone know how, where do I nd,

    someone tell me where

    Bootstrapping to learn IS paTerns

    Monday, June 6, 2011

  • ! !! !

    Gradually expand Sis by adding Information

    Seeking paDerns from Sc

    Bootstrapping to learn IS paTerns

    Monday, June 6, 2011

  • ! !! !

    For every pis in Sis generate set of ller paMerns

    Bootstrapping to learn IS paTerns

    Monday, June 6, 2011

  • .* anyone know how does .* know how

    does anyone .* how does anyone know .*

    Bootstrapping to learn IS paTerns

    Monday, June 6, 2011

  • Extracting and Scoring PaTerns

    Monday, June 6, 2011

  • Extracting and Scoring PaTerns

    does * know how does someone know how

    Functional Compatibility -Impersonal pronouns Empirical Support 1/3

    does somebody know how Functional Compatibility -Impersonal pronouns

    Empirical Support 0 PaMern Retained

    does john know how PaMern discarded

    Monday, June 6, 2011

  • Sc= {does anyone know how, where do I nd,

    someone tell me where}

    pis= `does anyone know how

    Extracting and Scoring PaTerns

    Monday, June 6, 2011

  • pis= `does anyone know how

    Extracting and Scoring PaTerns

    Monday, June 6, 2011

  • Extracting and Scoring PaTerns

    Monday, June 6, 2011

  • Functional properties / communicative functions of words

    From a subset of LIWC

    cognitive mechanical (e.g., if, whether, wondering, nd) I am thinking about geMing X

    adverbs(e.g., how, somehow, where)

    (e.g., someone, anybody, whichever)

    Someone tell me where can I nd X

    1Linguistic Inquiry Word Count, LIWC, hMp://liwc.net

    Expanding the PaTern Pool

    Monday, June 6, 2011

  • Over iterations, single-word substitutions, functional usage and empirical support conservatively expands Sis

    Infusing new paMerns and seed words

    Stopping conditions

    Details in [WISE2009] for..

    Monday, June 6, 2011

  • Sample Extracted PaTerns

    Monday, June 6, 2011

  • Information Seeking paMerns generated oine

    Information seeking intent score of a post

    Extract and compare paMerns in posts with extracted paMerns

    Transactional intent score of a post LIWC Money dictionary - 173 words and

    word forms indicative of transactions, e.g., trade, deal, buy, sell, worth, price etc.

    Identifying Monetizable Posts

    Monday, June 6, 2011

  • Identifying keywords in monetizable posts" Plethora of work in this spaceOff-topic noise removal is our focus" I NEED HELP WITHSONY VEGAS PRO 8!! Ugh and ihave a video project due tomorrow for merrilllynch :(( all ineed to do is simple: Extract several scenes from a clip, insert captions, transitions and thatsit. really. omggicant figure out anything!! help!! and igot food poisoning from eggs. its not fun. Pleasssse, help? :(

    Keywords for Advertizing

    Monday, June 6, 2011

  • Identifying keywords in monetizable posts Plethora of work in this spaceOff-topic noise removal is our focus I NEED HELP WITHSONY VEGAS PRO 8!! Ugh and

    ihave a video project due tomorrow for merrilllynch :(( all ineed to do is simple: Extract several scenes from a clip, insert captions, transitions and thatsit. really. omggicant figure out anything!! help!! and igot food poisoning from eggs. its not fun. Pleasssse, help? :(

    Keywords for Advertising

    Monday, June 6, 2011

  • Topical hints

    C1 -['camcorder']Keywords in post

    C2 -['electronics forum', 'hd', 'camcorder', 'somethin', 'ive', 'canon', 'little camera', 'canon hv20', 'cameras', 'offtopic']

    Move strongly related keywords from C2 to C1 one-by-one

    Relatedness determined using information gain Using the Web as a corpus, domain independent

    Conceptual Overview (also see slides 88,89)

    Monday, June 6, 2011

  • C1 -['camcorder']C2 -['electronics forum', 'hd', 'camcorder', 'somethin', 'ive', 'canon', 'little camera', 'canon hv20', 'cameras', 'offtopic'] Informative words ['camcorder', 'canon hv20', 'little camera', 'hd', 'cameras',

    'canon']

    O-topic ChaTer

    Monday, June 6, 2011

  • Keywords from 60 monetizable user posts

    Monetizable intent, at least 3 keywords in content45 MySpace Forums, 15 Facebook Marketplace, 30 graduate students

    10 sets of 6 posts each Each set evaluated by 3 randomly selected usersMonetizable intents?

    All 60 posts voted as unambiguously information seeking in intent

    Evaluations -User Study

    Monday, June 6, 2011

  • Google AdSenseads for user post vs. extracted topical keywords

    1. Eectiveness of using topical keywords

    Monday, June 6, 2011

  • Instructions User Study

    Monday, June 6, 2011

  • Users picked ads relevant to the post At least 50% inter-evaluator agreementFor the 60 posts Total of 144 ad impressions 17% of ads picked as relevantFor the topical keywords Total of 162 ad impressions 40% of ads picked as relevant

    Result -2X Relevant Impressions

    Monday, June 6, 2011

  • Users profile information Interests, hobbies, TV shows.. Non-demographic informationSubmit a postLooking to buy and why (induced noise)Ads that generate interest, captured attention

    2. Prole Ads vs. Activity Ads

    Monday, June 6, 2011

  • Using profile ads

    Total of 56 ad impressions 7% of ads generated interestUsing authored posts

    Total of 56 ad impressions 43% of ads generated interest" Using topical keywords from authored posts

    Total of 59 ad impressions 59% of ads generated interest

    Result -8X Generated Interest

    Monday, June 6, 2011

  • User studies small and preliminary, clearly suggest Monetization potential in user activity Improvement for Ad programs in terms of relevant

    impressionsEvaluations based on forum, marketplace Verbose content Status updates, notes, community and event

    memberships One size may not fit all

    To note

    Monday, June 6, 2011

  • A world between relevant impressions and click throughs Objectionable content, vocabulary impedance, Ad

    placement, network behaviorIn a pipeline of other community effortsNo profile information taken into accountCannot custom send information to Google AdSense

    To note

    Monday, June 6, 2011

  • SENTIMENT / OPINION MINING

    Monday, June 6, 2011

  • Two main types of information we can learn from user-generated content: fact vs. opinionMuch of what we read in social media (e.g., blogs, Twitter, Facebook) is a mix of facts and opinions. For example, " Latest news: Mobile web services not working in #Bahrain and Internet is extremely slow #feb14 {fact}... looks like they "learned" from #Egypt {opinion}"

    Content Analysis: Sentiment Analysis/Opinion Mining

    Monday, June 6, 2011

  • Sentiment Analysis Motivation

    Which movie should I see?

    What customers complain about?

    Why do people oppose

    health care reform?

    Monday, June 6, 2011

  • Example: How awful that many #Egyptian artifacts are in danger of

    being destroyed. What Zahi Hawass must be thinking #jan25 (read in the

    tone of what were YOU thinking

    Sentiment Analysis: Tasks

    Monday, June 6, 2011

  • Sentiment Analysis: Tasks

    Monday, June 6, 2011

  • Sentiment Analysis: Tasks

    Classification: overall sentiment polarity: positive/neutral/negativeExample: How awful that many #Egyptian artifacts are in danger of being destroyed.overall polarity is negative Target-specific sentiment polarity: positive/neutral/negative Example: for target "egyptian artifacts", polarity is "negative for target "Zahi Hawass", polarity is "neutral

    Monday, June 6, 2011

  • Sentiment Analysis: Tasks

    Monday, June 6, 2011

  • Sentiment Analysis: Tasks

    Identification & Extraction: opinion, opinion holder, opinion target

    Example: opinion="awful", opinion holder="the author", target="egyptian artifacts are in danger"

    Opinion="must be thinking", opinion holder="the author", target="Zahi Hawass"

    Monday, June 6, 2011

  • Classification: Supervised: labeled training data features, differ from traditional topic classification tasks learning strategies

    Unsupervised: lexicon-based approach Bootstrapping

    Sentiment Analysis: Approaches

    Monday, June 6, 2011

  • Sentiment Analysis: Approaches

    Monday, June 6, 2011

  • Sentiment Analysis: Approaches

    Identification & Extraction: utilizing the relations between opinion and opinion target, proximity, syntactic dependency, co-occurrence and prepared patterns/rules

    Monday, June 6, 2011

  • Sentiment Analysis: From Tweets to polls

    Lexicon-based approach for sentiment analysis of tweets:subjective lexicon from OpinionFinder (Wilson et al., 2005)Within topic tweets, count messages containing these positive and negative words defined by the lexicon

    corpus: 0.7 billion tweets, Jan 2008 Oct

    2009 1.5 billion tweets, Jan 2008 May

    2010

    Monday, June 6, 2011

  • Sentiment Analysis: From Tweets to polls

    subjective lexicon from OpinionFinder (Wilson et al., 2005)Within topic tweets, count messages containing these positive and negative words defined by the lexicon

    corpus: 0.7 billion tweets, Jan 2008 Oct

    2009 1.5 billion tweets, Jan 2008 May

    2010

    Monday, June 6, 2011

  • Sentiment Analysis: From Tweets to polls

    Within topic tweets, count messages containing these positive and negative words defined by the lexicon

    corpus: 0.7 billion tweets, Jan 2008 Oct

    2009 1.5 billion tweets, Jan 2008 May

    2010

    Monday, June 6, 2011

  • Sentiment Analysis: From Tweets to polls

    B.OConnor, R.Balasubramanyan, B.R.Routledge, and N.A.Smith. From Tweets to polls: Linking text sentiment to public opinion time series. In Intl.AAAI Conference on Weblogs and

    Social Media, Washington,D.C.,2010.

    corpus: 0.7 billion tweets, Jan 2008 Oct

    2009 1.5 billion tweets, Jan 2008 May

    2010

    Monday, June 6, 2011

  • Corpus: 2.89 million tweets referring to 24 movies released over a period of three monthsSentiment Analysis Classifier:

    DynamicLMClassifier provided by LingPipe linguistic analysis packagethousands of workers from the Amazon Mechanical Turk to assignsentiments (positive, negative, neutral) for a large random sample of tweetstrain the classifier using an n-gram model

    Sentiment Analysis: Predicting the Future With Social Media

    S. Asur and B.Huberman. Predicting the Future With Social Media. 2010. hMp://arxiv.org/abs/1003.5699

    Monday, June 6, 2011

  • Sentiment Analysis Classifier:DynamicLMClassifier provided by LingPipe linguistic analysis packagethousands of workers from the Amazon Mechanical Turk to assignsentiments (positive, negative, neutral) for a large random sample of tweetstrain the classifier using an n-gram model

    Sentiment Analysis: Predicting the Future With Social Media

    S. Asur and B.Huberman. Predicting the Future With Social Media. 2010. hMp://arxiv.org/abs/1003.5699

    Monday, June 6, 2011

  • DynamicLMClassifier provided by LingPipe linguistic analysis packagethousands of workers from the Amazon Mechanical Turk to assignsentiments (positive, negative, neutral) for a large random sample of tweetstrain the classifier using an n-gram model

    Sentiment Analysis: Predicting the Future With Social Media

    S. Asur and B.Huberman. Predicting the Future With Social Media. 2010. hMp://arxiv.org/abs/1003.5699

    Monday, June 6, 2011

  • thousands of workers from the Amazon Mechanical Turk to assignsentiments (positive, negative, neutral) for a large random sample of tweetstrain the classifier using an n-gram model

    Sentiment Analysis: Predicting the Future With Social Media

    S. Asur and B.Huberman. Predicting the Future With Social Media. 2010. hMp://arxiv.org/abs/1003.5699

    Monday, June 6, 2011

  • train the classifier using an n-gram model

    Sentiment Analysis: Predicting the Future With Social Media

    S. Asur and B.Huberman. Predicting the Future With Social Media. 2010. hMp://arxiv.org/abs/1003.5699

    Monday, June 6, 2011

  • Sentiment Analysis: Predicting the Future With Social Media

    S. Asur and B.Huberman. Predicting the Future With Social Media. 2010. hMp://arxiv.org/abs/1003.5699

    Monday, June 6, 2011

  • Observations:The opinions may not contribute toward the given target (1,2,3,6)The subjectivity and polarity of opinion clues are domain-dependent (5,7)Single words are not enough (4,7,8)

    Simple lexicon-based method doesn't work.

    Sentiment Analysis: Target-specic opinion identication & Classication of Tweets-Unsupervised Approach

    Monday, June 6, 2011

  • General subjective lexicon Commonly used subjective lexicon + popular slangs learned from

    Urban Dictionary

    Domain-dependent sentiment lexicon Learned from domain-specic corpus

    bootstrapping More than words (word/phrase/paMern)

    n-gram + statistical model

    Sentiment Analysis: Target-specic opinion identication & Classication of Tweets-Unsupervised Approach

    Monday, June 6, 2011

  • General subjective lexicon Commonly used subjective lexicon + popular slangs learned from

    Urban Dictionary

    Domain-dependent sentiment lexicon Learned from domain-specic corpus

    bootstrapping More than words (word/phrase/paMern)

    n-gram + statistical model

    Sentiment Analysis: Target-specic opinion identication & Classication of Tweets-Unsupervised Approach

    Monday, June 6, 2011

  • Domain-dependent sentiment lexicon Learned from domain-specic corpus

    bootstrapping More than words (word/phrase/paMern)

    n-gram + statistical model

    Sentiment Analysis: Target-specic opinion identication & Classication of Tweets-Unsupervised Approach

    Monday, June 6, 2011

  • Sentiment Analysis: Target-specic opinion identication & Classication of Tweets-Unsupervised Approach

    Monday, June 6, 2011

  • Sentiment Analysis: Target-specic opinion identication & Classication of Tweets-Unsupervised Approach

    Monday, June 6, 2011

  • Sentiment Analysis: Target-specic opinion identication &

    Classication of Tweets-Unsupervised Approach

    Monday, June 6, 2011

  • Sentiment Analysis: Target-specic opinion identication &

    Classication of Tweets-Unsupervised Approach

    Target-specic opinion identication/extraction Shallow syntactic analysis Rules + Proximity

    Monday, June 6, 2011

  • URL Extraction is for Tweets

    FourSquare in Facebook, TwiMer

    What is it in other mediums/SMS?

    Content Analysis: Context Extraction, Utilization

    Monday, June 6, 2011

  • ResolutionSemantic Context Relevance

    Content Analysis: URL extraction

    Monday, June 6, 2011

  • Personality Signals Blogs, Style of WritingPsychometric analysis of contentSample study: Gendered writing styles online

    Author Categorization: Using Content to derive additional

    People metadata

    Monday, June 6, 2011

  • Interesting questions to ask: Who are the most popular people* in the network Who are the most influential people in the network Who are the most active people in the network What are the types of people in communities of the

    network Who are the bridges between communities in the network

    People Analysis: Using Network to derive People metadata

    Monday, June 6, 2011

  • By Link Analysis AlgorithmsHits [K-99]& variants PageRank [BP-97]& variants etc..Links not sufficient! Million Follower Fallacy[C-10]

    People Analysis: Inuence

    Source : informing-arts

    Monday, June 6, 2011

  • People Analysis: Inuence

    Monday, June 6, 2011

  • People Analysis: Inuence

    Flavor of Context Analysis (activity level)Popularity NOT = Influence! Influence & Passivity[RGAH-10]Interest Similarity TwitterRank: Reciprocity & Homophily [WLJH-10]Klout Score - True Reach, Amplification [Klout]

    Monday, June 6, 2011

  • Blogger, Scientist, Journalist,Artist, Trustee, Company X in DomainY.. Multiple types and affiliations!User interest mining Key Phrase Extraction followed by semantic association on

    user bio, tweets, lists, favorite posts Twitter Study [BCDMJNRM-09]

    People Analysis: User types & Aliation

    Source: kahunainstitute.com

    Monday, June 6, 2011

  • People Analysis: User types & Aliation

    Monday, June 6, 2011

  • Semantic analysis of profile description Web Presence:Use of Web & Knowledge bases

    (Wikipedia, Blogs)to build contextfor user types Entity Spotting & Extraction, followed by Semantic

    Association and Similarity with user-type context

    People Analysis: User types & Aliation

    Monday, June 6, 2011

  • People Analysis: Social Engagement

    Frequency Distribution Analysis of user activity posting, retweet, reply, mentions, lists etc.

    Source: http://www.syscomminternational.com/

    Monday, June 6, 2011

  • Network Analysis

    Interesting questions to ask:

    How communities form around topics- growth & evolution

    What are the eects of presence of inuential participants in the communities

    What are the eects of content nature (or sentiment, opinions) owing in network on the community life

    What is the community structure: degree of separation and sub-communities

    Foundation of network: NodesConnections/Relationships

    Monday, June 6, 2011

  • Network Analysis: Methods

    Source: http://www.kudos-dynamics.com/

    Monday, June 6, 2011

  • Network Analysis: Methods

    Source: http://www.kudos-dynamics.com/

    Network Structure metricsCentrality, Connected Component, Avg.

    Degree, Clustering Coecient, Avg. Path Length, Bridge, Cohesion, Prestige, Reciprocity

    Important Literature: [AB-02, WS-98, BW-00; NW-06, WF-92, MW-10]

    Monday, June 6, 2011

  • Community Discovery, growth, evolution Based on relationship types (e.g., signed network),

    geography/location based etc. Hierarchical clustering algorithms Top-down, bottom-upModularity Maximization [NW-06]Algorithms comparison survey [B-06]

    Network Analysis: Algorithms

    Monday, June 6, 2011

  • Graph Partitioning & TraversalBest time-complexity & reachabilityFollow Greedy paths K-way multilevel Partitioning , Bron-Kerbosch, K-plex, K-core or N-cliques, DFS, BFS,

    MST

    Network Analysis: Algorithms

    "We dream in Graph and We analyze in Matrix-

    Barry Wellman, INSNA

    Monday, June 6, 2011

  • Network Analysis: Methods

    Network Modeling Approaches Random graph model (Erdos-Renyi model) Small-world model(Small World Phenomenon) Scale-free model(led to Power-Law degree distribution) Social Network Analysis methods Centrality (Degree, Eigenvector, Betweenness, Closeness) Clusters (Cliques and extensions, Communities)

    Source: http://www.kudos-dynamics.com/

    Monday, June 6, 2011

  • Information Flow: Diffusion Maximizing Spread (Opinion, Innovation, Recommendation) Outbreak Detection (e.g., disease)Social Network: No info about user action Understanding dynamics is challenging!Power Law distribution [LAH-07]Factors impacting flow: Sampling strategy, user Homophily, content nature

    [CLSCK-10, NPS-10]

    Network Analysis: Diusion & Homophily

    Monday, June 6, 2011

  • Querying

    Monday, June 6, 2011

  • (Network WorkBench)NWBTruthy Graph-toolOrangePajekTuliphttp://en.wikipedia.org/wiki/social_network_analysis_software

    Analysis & Visualization Tools

    Source: hMp://truthy.indiana.edu/

    Monday, June 6, 2011

  • Event Detection

    Monday, June 6, 2011

  • Citizen Sensing in Real-time

    Monday, June 6, 2011

  • People cant wait forInformation500 years ago

    Single life time20 years ago

    Next day or two Television,News papers

    Presently

    Minutes are notconsideredfast enough Digital media,Social media

    Real-Time Motivation

    Monday, June 6, 2011

  • Is Real-Time the future of Web?Social Media for Real-Time Web Disaster Management

    Ushahidi Real-Time Markets

    Examples Brand Tracking

    Twarql Movie reviews

    Real-Time Social Media

    Monday, June 6, 2011

  • Scenario

    The GuardianFeb 2010

    Monday, June 6, 2011

  • Scenario

    The GuardianFeb 2010

    Monday, June 6, 2011

  • Scenario

    Journalist

    The GuardianFeb 2010

    Monday, June 6, 2011

  • Information Overload Can we aggregate, organize and collectively analyze data

    Real Time Can we deliver the data as it is generated

    Challenges

    Monday, June 6, 2011

  • Expressive description of Information need

    Using SPARQL (Instead of traditional keyword search)Flexibility on the point of view

    Ability to "slice and dice" the data in several dimensions: thematic, spatial, temporal, sentiment etc..

    Streaming data with Background Knowledge

    Enables automatic evolution and serendipityScalable Real-Time delivery

    Using sparqlPuSH (SFSW'10)

    A Semantic Web Approach

    Monday, June 6, 2011

  • Concept Feed

    Monday, June 6, 2011

  • Architecture

    Monday, June 6, 2011

  • Social Sensor Server

    Monday, June 6, 2011

  • Named Entity Recognition 2 Million Entities from DBPedia Load as Trie for efficiency N-grams matched Example: Obama, Barack Obama

    Metadata Extractions (Social Sensor Server)

    Monday, June 6, 2011

  • URL, HashTag Extraction Regex extraction Resolution URL Resolution: Follows http redirects for resolution HashTag Resolution:Tagdef, Tagal,WTHashTag.com

    Metadata Extractions (Social Sensor Server)

    Monday, June 6, 2011

  • Metadata Extractions (Social Sensor Server)

    Monday, June 6, 2011

  • Other Metadata provided by Twitter User profile: User Name, Location, Time etc.. Tweet: RT, reply etc..

    Metadata Extractions (Social Sensor Server)

    Monday, June 6, 2011

  • RDF Annotation Common RDF/OWL Vocabularies FOAF -(foaf-project.org) Friend of aFriend SIOC- (sioc-project.org) Semantically Interlinked

    Online Communities

    OPO -(online-presence.net) Online PresenceOntology MOAT -(moat-project.org) Meaning Of A Tag

    Structured Data(Social Sensor Server)

    Monday, June 6, 2011

  • Structured Data(Social Sensor Server)

    Monday, June 6, 2011

  • A snippet of the annotation

    rdf:type sioct:MicroblogPost ; sioc:content Fingers crossed for the upcoming #hcrvote

    sioc:hascreator ; foaf:maker ;

    moat:taggedWith dbpedia:Healthcare_reform . geonames:locatedIn

    Dbpedia:Ohio .

    Structured Data(Social Sensor Server)

    Monday, June 6, 2011

  • Semantic Publisher

    Monday, June 6, 2011

  • Virtuoso to store triplesQueries formulated by the users are storedSPARQL protocol over the HTTP to access rdf from the storeCombine data from tweet with the background knowledge in the rdf store

    Semantic Publisher

    Monday, June 6, 2011

  • Application Server & Distribution Hub

    Monday, June 6, 2011

  • Distribution Hub PUSH Model - Pubsubhubbub protocol Pushes the tweets to the Application Server

    Application Server Delivers data to the Clients RSS Enable Concept feeds

    Application Server & Distribution Hub

    Monday, June 6, 2011

  • ?competitor

    ?category

    ?tweet dbpedia:IPad

    moat:taggedWith

    skos:subjectskos:subject

    skos:subject

    Background Knowledge (e.g. DBpedia)

    @anonymizedLorem ipsum bla bla this is an example tweet

    Brand Tracking - Example

    Monday, June 6, 2011

  • ?competitor

    ?category

    ?tweet dbpedia:IPad

    moat:taggedWith

    skos:subjectskos:subject

    skos:subject

    Background Knowledge (e.g. DBpedia)

    @anonymizedLorem ipsum bla bla this is an example tweet

    Brand Tracking - Example

    Monday, June 6, 2011

  • ?competitor

    ?category

    ?tweet dbpedia:IPad

    moat:taggedWith

    skos:subjectskos:subject

    category:Wi-Fi category:Touchscreen

    skos:subject

    Background Knowledge (e.g. DBpedia)

    @anonymizedLorem ipsum bla bla this is an example tweet

    Brand Tracking - Example

    Monday, June 6, 2011

  • ?competitor

    ?category

    ?tweet dbpedia:IPad

    moat:taggedWith

    skos:subjectskos:subject

    category:Wi-Fi category:Touchscreen

    skos:subject

    Background Knowledge (e.g. DBpedia)

    @anonymizedLorem ipsum bla bla this is an example tweet

    HPTabletPCIPhone

    Brand Tracking - Example

    Monday, June 6, 2011

  • 1242 Articles from Nytimes

    Around 800,000 tweets

    Monday, June 6, 2011

  • 1242 Articles from Nytimes

    Around 800,000 tweets

    President Obama lays out plan for

    Health care reform in Speech to Joint

    Session of Congress (10th Sept

    Timeline.com)

    Monday, June 6, 2011

  • 1242 Articles from Nytimes

    Around 800,000 tweets

    President Obama lays out plan for

    Health care reform in Speech to Joint

    Session of Congress (10th Sept

    Timeline.com)

    Obama taking an active role in Health talks in pursuing his proposed overhaul

    of health care system. (13th Aug

    Nytimes)Monday, June 6, 2011

  • Twarql on Linked Open Data

    Monday, June 6, 2011

  • Twarql on Linked Open Data

    Monday, June 6, 2011

  • Emerging Research Areas

    Monday, June 6, 2011

  • Reasons for spamming include: Gaining Popularity Use of popular topic related keywords (e.g. hashtags of

    trending topics) to propagate something off topic.

    Launching malicious attacks Phishing attacks, virus, malware etc. Misleading the masses Propagating false information [MM-10].

    Spam in Social Networks

    Monday, June 6, 2011

  • Spam in Social Networks

    Gaining popularity using trending keywords:This tweet uses #Cairo but refers to a fashion website.

    Monday, June 6, 2011

  • Spam in Social Networks

    Gaining popularity using trending keywords:This tweet uses #Cairo but refers to a fashion website.

    Monday, June 6, 2011

  • Spam in Social Networks

    Gaining popularity using trending keywords:This tweet uses #Cairo but refers to a fashion website.

    Monday, June 6, 2011

  • Spam in Social Networks

    Gaining popularity using trending keywords:This tweet uses #Cairo but refers to a fashion website. Egypt

    Protests

    Monday, June 6, 2011

  • Spam in Social Networks

    Gaining popularity using trending keywords:This tweet uses #Cairo but refers to a fashion website. Egypt

    Protests

    Monday, June 6, 2011

  • Spam in Social Networks

    Gaining popularity using trending keywords:This tweet uses #Cairo but refers to a fashion website. Egypt

    Protests

    Monday, June 6, 2011

  • Spam in Social Networks

    Gaining popularity using trending keywords:This tweet uses #Cairo but refers to a fashion website. Egypt

    Protests

    Monday, June 6, 2011

  • Spam in Social Networks

    Gaining popularity using trending keywords:This tweet uses #Cairo but refers to a fashion website. Egypt

    Protests

    Monday, June 6, 2011

  • Spam in Social Networks

    Gaining popularity using trending keywords:This tweet uses #Cairo but refers to a fashion website. Egypt

    Protests

    Monday, June 6, 2011

  • Spam detection Content-based features ContentSize,URL type, spam words

    Metadata-based features Account information, behavior.

    Network-based features Provenance. (e.g. content from a reliable source)

    Spam in Social Networks

    Monday, June 6, 2011

  • Reputation,Policy,Evidence, and Provenance used to derive trustworthiness.Illustrative examples of online cues used for trust assessment. Wikipedia: article size, number of references, author, edit

    history, age of the article, edit frequency etc. Product Reviews: number of helpful, very helpful ratings,

    author expertise, sentiments in comments received for a review etc.

    Trust in Social Networks

    Monday, June 6, 2011

  • We propose trust ontology[AHTS-10] that Captures semantics of trust. Enables representation and reasoning with trust.Semantics of Trust specifies, for a given trustor and trustee, the following features. Type - Type of trust relationship. Scope - Context of the trust relationship. Value - Quantifies the trust relationship.

    Trust in Social Networks

    Monday, June 6, 2011

  • Gleaning primitive (edge) trust Trust value between two nodes is quantified using

    numbers. E.g., [0,1] or [-1,1] or partial ordering[TAHS-09].Gleaning composite (path) trust Propagation via chaining and aggregation (transitivity)Some popular algorithms for trust computation Eigentrust, Spreading Activation, SUNNY etc.

    Trust in Social Networks

    Monday, June 6, 2011

  • Machine sensor observations are quantitative in nature, while human observations can be both qualitative and quantitative.Benefits of combining observations from humans and machine sensors Complementary evidence. Corroborative evidence

    Integrating Social And Sensor Networks

    Monday, June 6, 2011

  • Applications of integrating heterogeneous sensor observations Situation Awareness by using human observations to

    interpret machine sensor observations. Enhancing trustworthiness using corroborative evidence.

    Integrating Social And Sensor Networks

    Monday, June 6, 2011

  • Instant Discovery: Geo-tagging and location-aware services, in combination with search, have made discovery a two-way street.

    Compressed Expression: Mobile makes social networking even more compelling

    Outsourced Memory: Cloud-based servers to store all of their mobile applications and databases

    Mobile Social Computing

    Monday, June 6, 2011

  • Compressed Expression: Mobile makes social networking even more compelling

    Outsourced Memory: Cloud-based servers to store all of their mobile applications and databases

    Mobile Social Computing

    Monday, June 6, 2011

  • Outsourced Memory: Cloud-based servers to store all of their mobile applications and databases

    Mobile Social Computing

    Monday, June 6, 2011

  • Mobile Social Computing

    Monday, June 6, 2011

  • Mobile Social Computing

    Monday, June 6, 2011

  • Mobile Social Computing

    Automated Decisions: Smart apps helps to make faster decisions or even apps makes decisions for usPeer Power: Mobiles can create social movements based on peer influence

    Monday, June 6, 2011

  • Personalized Branding: advertising are rapidly becomingpersonalized based onindividual's needs and preferencesMobiles in social development becoming an integral part of development Coordination in disaster situations Health care delivery, especially in developing countries Elections and other forms of political expression

    Mobile Social Computing (Cont.)

    Monday, June 6, 2011

  • Research Application: Twitris

    Monday, June 6, 2011

  • 1. Information OverloadMultiple events around usWHAT to be aware ofMultiple Storylines aboutsame event!!

    Twitris - Motivation

    Monday, June 6, 2011

  • 2. Evolution of Citizen Observation with location and time

    Twitris - Motivation

    Monday, June 6, 2011

  • 3. Semantics of Social perceptions

    What is being said about an event (theme) where (spatial) When (temporal )

    Twitris lets you browse citizen reports using social perceptions as the fulcrum

    Twitris - Motivation

    Monday, June 6, 2011

  • Facilitates understanding of multi-dimensional social perceptions over SMS, Tweets, multimedia Web content, electronic news media

    Twitris: Semantic Social Web Mash-up

    Monday, June 6, 2011

  • Twitris: Architecture

    Monday, June 6, 2011

  • Twitris: Functional Overview

    Monday, June 6, 2011

  • Twitris: Functional Overview

    Monday, June 6, 2011

  • Twitris: Event Summarization 1

    Monday, June 6, 2011

  • Sentiment Analysis using statistical and machine learning techniques

    Twitris: Event Summarization 2

    Monday, June 6, 2011

  • Entity-relationship graph

    using semantically annotated DBpedia entities mentioned in the tweets

    Twitris: Event Summarization 3

    Monday, June 6, 2011

  • http://twitris.knoesis.org/

    http://knoesis1.wright.edu/sidfot/

    Twitris: Demo, Quick Show

    Monday, June 6, 2011

  • Twitris: On going work

    Monday, June 6, 2011

  • Domain models to enhance understanding of the content

    Twitris: Knowledge-Enabled Computing

    Monday, June 6, 2011

  • Great role in military and NGOrescue operations during emergencies:Haiti and Chile Earthquakes

    Twitris: Coordination

    Monday, June 6, 2011

  • Coordinating needs and resources in disaster situation Analyze SMS and Web reports from disaster location Use domain models for efficient and timely coordination

    Twitris: Coordination

    Monday, June 6, 2011

  • Modeling relationships between social behavior,roles, social and cultural values, etc.

    Twitris: Socio-Cultural-Behavior Model as Lens

    Monday, June 6, 2011

  • We simply do not have enough genes to program the brain fully in advance, we must work together, extending and supporting our own intelligence with social prosthetic systems that make up for our missing cognitive and emotional capacities:Evolution has allowed our brains to be configured during development so that we are plug compatible with other humans, so that others can help us extend ourselves.- Harvard "Group Brain Project"

    Collaboration

    Monday, June 6, 2011

  • Open Source Linux,Apache, ...Social Networks Facebook, Twitter, ...Crowd Sourcing Wikipedia, Kiva, Ushahidi, Kiirti, SwiftRiver, Sahana...Collaborative Governance Peer-to-Patent, ...

    Beginnings

    Monday, June 6, 2011

  • http://gomadam.org/tutorial

    @namelessnerd

    Monday, June 6, 2011

  • Facebook + Twitter Iran post-election protests Tunisia,Egypt, Libya, Bahrain, ... Ushahidi Kenya Violence India, Lebanon, Afghanistan, and Sudan elections Haiti Earthquake Pakistan Floods

    Popular Initiatives

    Monday, June 6, 2011

  • Kiirti BBMP election monitoring Bangalore AutoWatch

    Popular Initiatives

    Monday, June 6, 2011

  • FixOurCity allows citizens to report, view and discuss civic issues in their locality.

    FixOurCity Process Flow

    Monday, June 6, 2011

  • Built on top of FixMyCity open-source codebaseStage I Report by Area/Ward and Street Integration with Google Map Displays Ward member name/contact details Select category of issue, description and severity Confirmation through email to avoid misuse

    FixOurCity Backend

    Monday, June 6, 2011

  • Stage II/III Normalize incoming reports to official wards and

    categories Integration with Corporation website to allow auto-

    forwarding and updating of reports

    FixOurCity Backend

    Monday, June 6, 2011

  • Information Collection: SMS (FrontlineSMS, Clickatell), Email, WebVisualization/Interactive Mapping: Timeline, Category, Geo-spatialAlerts: Geo-spatialAdmin: User Management, Report Moderation / Creation, Site Statistics

    Ushahidi Features

    Monday, June 6, 2011

  • Enables filtering and verification of real-time data from channels like Twitter, SMS, Email and RSS feeds.

    SwiftRiver Architecture - I

    Monday, June 6, 2011

  • Kiirti allows you to set up your own instance of the Ushahidi Platform without having to install it on your own web server. And, it provides pre-integrated Voice and SMS reporting capabilities within India.

    Kiirti Features

    Monday, June 6, 2011

  • Kiirti - Flywheel of Engagement

    Monday, June 6, 2011

  • Sahana: a Free and Open Source Disaster Management system. A web based collaboration tool that addresses the common coordination problems during a disaster between Government groups, the civil society (NGOs) and the victims themselves.

    Sahana Features

    Monday, June 6, 2011

  • Sahana Features

    Monday, June 6, 2011

  • Requests Management: Tracks requests for aid and matches them against donors who have pledged aid.Volunteer Management: Manage volunteers by capturing their skills, availability and allocation.

    Sahana Features

    Monday, June 6, 2011

  • Volunteer Management: Manage volunteers by capturing their skills, availability and allocation.

    Sahana Features

    Monday, June 6, 2011

  • Sahana Features

    Monday, June 6, 2011

  • Sahana Features

    Monday, June 6, 2011

  • Missing Persons Registry: Report and Search for Missing Persons.Disaster Victim Identification.Shelter Registry- Tracks the location, distribution, capacity and breakdown of victims in Shelters.

    Sahana Features

    Monday, June 6, 2011

  • Hospital Management System- Hospitals can share information on resources & needs.Organization Registry- "Who is doing What & Where". Allows relief agencies to coordinate their activities.Ticketing- Master Message Log to process incoming reports & requests.Delphi Decision Maker- Supports the decision making of large groups of Experts.

    Sahana Features

    Monday, June 6, 2011

  • Organization Registry- "Who is doing What & Where". Allows relief agencies to coordinate their activities.Ticketing- Master Message Log to process incoming reports & requests.Delphi Decision Maker- Supports the decision making of large groups of Experts.

    Sahana Features

    Monday, June 6, 2011

  • Ticketing- Master Message Log to process incoming reports & requests.Delphi Decision Maker- Supports the decision making of large groups of Experts.

    Sahana Features

    Monday, June 6, 2011

  • Delphi Decision Maker- Supports the decision making of large groups of Experts.

    Sahana Features

    Monday, June 6, 2011

  • Sahana Features

    Monday, June 6, 2011

  • Sahana Features

    Monday, June 6, 2011

  • Sahana Features

    Monday, June 6, 2011

  • Mapping- Situation Awareness & Geospatial Analysis.Messaging- Sends & Receives Alerts via Email & SMS.Document Library- A library of digital resources, such as Photos & Office documents.

    Sahana Features

    Monday, June 6, 2011

  • Peer To Patent is a historic initiative by the United States Patent and Trademark Office (USPTO) that opens the patent examination process to public participation for the first time. Peer to Patent is an online system that aims to improve the quality of issued patents by enabling the public to supply the USPTO with information relevant to assessing the claims of pending patent applications.

    Peer to Patent

    Monday, June 6, 2011

  • Twitris 2.0, a Semantic Web application that facilitates understanding of social perceptions by Semantics-based processing of massive amounts of event-centric data. Twitris 2.0 addresses challenges in large scale processing of social data, preserving spatio-temporal-thematic properties.

    Twitris Architecture

    Monday, June 6, 2011

  • Online Dispute Resolution 30M+ pending cases in India's courtsPublic Policy ReviewsCrisis ManagementEffective Local Governance

    Future Possibilities

    Monday, June 6, 2011

  • http://www.nascio.org/events/2009Midyear/documents/NASCIO-KeynoteNoveck.pdfhttp://citizensensing.posterous.com/[MM-10] Eni Mustafaraj, Panagiotis Metaxas, From Obscurity to Prominence in Minutes: Political Speech and Real-Time Search, In: Proceedings of the WebSci10: Extending the Frontiers of Society On-Line (April 2010).[AHTS-10] Pramod Anantharam, Cory A. Henson, Krishnaprasad Thirunarayan and, Amit P. Sheth, 'Trust Model for Semantic Sensor and Social Networks: A Preliminary Report', National Aerospace & Electronics Conference (NAECON), Dayton Ohio, July 14-16th, 2010.[TAHS-09] K. Thirunarayan, Dharan K. Althuru, Cory A. Henson, and Amit P. Sheth, 'A Local Qualitative Approach to Referral and Functional Trust,' In: Proceedings of the The 4th Indian International Conference on Artificial Intelligence (IICAI-09), pp. 574-588, December 2009.

    References

    Monday, June 6, 2011

  • B.OConnor, R.Balasubramanyan, B.R.Routledge, and N.A.Smith. From Tweets to polls: Linking text sentiment to public opinion time series.In International AAAI Conference on Weblogs and Social Media, Washington,D.C.,2010.Sitaram Asur and Bernardo A.Huberman. Predicting the Future With Social Media. 2010. http://arxiv.org/abs/1003.5699A. Sheth, Semantic Integration of Citizen Sensor Data and Multilevel Sensing: A comprehensive path towards event monitoring and situational awareness, February 17, 2009M. Nagarajan et al., Spatio-Temporal-Thematic Analysis of Citizen-Sensor Data - Challenges and Experiences, Tenth International Conference on Web Information Systems Engineering, Oct 5-7, 2009, PolandDaniel Gruhl, Meenakshi Nagarajan, Jan Pieper, Christine Robson, Amit Sheth, Multimodal Social Intelligence in a Real-Time Dashboard System to appear in a special issue of the VLDB Journal on 'Data Management and Mining for Social Networks and Social Media', 2010

    References

    Monday, June 6, 2011

  • A. Sheth, C. Thomas, and P. Mehra, Continuous Semantics to Analyze Real-Time Data, IEEE Internet Computing, November-December 2010, pp. 80-85[NPS-10] M. Nagarajan, H. Purohit, and A. Sheth. A Qualitative Examination of Topical Tweet and Retweet Practices, 4th Int'l AAAI Conference on Weblogs and Social Media, ICWSM 2010[RGAH-10] D. Romero, W. Galuba, S. Asur, and B. Huberman. Influence and Passivity in Social Media. Arxiv preprint, arXiv:1008.1253, 2010[LLDM-10] J. Leskovec, K. Lang, A. Dasgupta, and M. Mahoney. Community structure in large networks: Natural cluster sizes and the absence of large well-defined clusters. Internet Mathematics, 6(1):29{123, 2009.[CHBG-10] M. Cha, H. Haddadi, F. Benevenuto, and K. Gummadi. Measuring user influence in twitter: The million follower fallacy. In ICWSM'04, 2010.[BP-98] S. Brin and L. Page. The anatomy of a large-scale hypertextual Web search engine. Computer Networks and ISDN Systems, Vol 30, 1-7, 1998.

    References

    Monday, June 6, 2011

  • [K-99] Jon Kleinberg. Authoritative sources in a hyperlinked environment. Journal of the ACM 46 (5): 604 -632, 1999.[AB-02] R. Albert and A.L. Barabasi. Statistical Mechanics of Complex Networks. Rev. Modem Physics, vol. 74, no. 1, pp. 47-97, 2002.[WLJH-10] Jianshu Weng and Ee-Peng Lim and Jing Jiang and Qi He. TwitterRank: nding topic-sensitive influential twitterers. WSDM, 2010.[BCDMJNRM-09] N. Banerjee, D. Chakraborty, K. Dasgupta, S. Mittal, A. Joshi, S. Nagar, A. Rai, and S. Madan. User interests in social media sites: an exploration with micro-blogs. CIKM '09.[RCD-10] A. Ritter, C. Cherry, and B. Dolan. 2010. Unsupervised modeling of Twitter conversations. InHuman Language Technologies: ACL (HLT '10).[WS-10] D.J. Watts; S.H. Strogatz. Collective dynamics of 'small-world' networks. Nature 393 (6684): 40910, 1998

    References

    Monday, June 6, 2011

  • [NW-06] M. E. J. Newman, D. J. Watts The structure and dynamics of network, Princeton University Press, 2006[WF-92] Wasserman & Faust, Social Network Analysis, 1992[EK-10] D. Easley, J. Kleinberg. Networks, Crowds, and Markets: Reasoning About a Highly Connected World. Cambridge University Press, 2010[MW-10] A. Marin and B. Wellman. Handbook of Social Network Analysis, 2010[B-06] H. Balakrishnan. Algorithms for Discovering Communities in Complex Networks. Ph.D. Dissertation. University of Central Florida, Orlando, FL, USA. Advisor(s) Narsingh Deo. 2006[CLSCK-10] M. D. Choudhury, , Y-R. Lin, H. Sundaram, K. S. Candan, L. Xie, A. Kelliher. How Does the Sampling Strategy Impact the Discovery of Information Diffusion in Social Media?. ICWSM 2010[LAH-07] J. Leskovec, L. A. Adamic, and B. A. Huberman. The dynamics of viral marketing. ACM Trans. Web 1, 1, Article 5, May 2007.

    References

    Monday, June 6, 2011