social tagging system combining semantic technology

Post on 28-Aug-2014

6.713 Views

Category:

Education

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

 

TRANSCRIPT

Social Tagging System combining with Semantics Technology

Ms. Mya Thanda rM a s t e r s t u d e n t o f I n f o r m a ti o n , C o m p u t e r a n d C o m m u n i c a ti o n Te c h n o l o g y S I R I N D H O R N I N T E R N AT I O N A L I N S T I T U T E O F T E C H N O L O G Y, T H A M M A S AT U N I V E R S I T Y, T H A I L A N D

Outline

1. About Social Tagging System1.1 Tag and Social Tagging1.2 Social Tagging System and Folksonomy1.3 Characteristics of Tagging Systems1.4 Usage of Social Tagging System (Folksonomy)1.5 Problems of Social Tagging System (Folksonomy)

2. Solutions For Folksonomy combing with Semantics Technology2.1 Approaches related to introducing semantics in folksonomy

2.1.1 Statistical Approaches2.1.2 External Knowledge Source Based Approaches2.1.3 Hybrid Approaches

2.2 Hierarchical Categorization2.2.1 Classification of tags 2.2.2 Document Clustering2.2.3 Wikipedia categories and out links

2.3 Other Semantic Supporting Aspects2.3.1 Tag Spam Removal2.3.2 Multilingualism

3. Conclusion

1.1) Tags• Label used to associate meaning

• photo• video/audio• book• web page/ resource

• User-defined, informal, public

• used in searching

• easy to use and understanding

• can be created using words, acronyms or numbers

Example of Tagging …

running

Boy

boy playing dog

dog

brown dog

running dog

1.1) Social Tagging ???• keyword descriptions (called tags)

• to identify images or text as a categories or topic.

• allow users to search for similar or related content.

• categorizing resources in a shared, on-line environment

• are keywords generated by internet users

• to describe and categories an object, concept or idea.

1.2) Social Tagging System• is a term to describe the marking, saving and archiving of certain

websites.

• to track and organize their favorite websites, and access them.

• also known as folksonomy.

1.2) Folksonomy• is a combination of “folks” and “taxonomy”.

• folks = common people of a society

• taxonomy = a hierarchical structure of classification

• a system of classification derived from the practice and method of collaboratively

• creating and managing tags to annotate and categorize content

• a collective assemblage of tags assigned by many users

• is the system of tagging used in social bookmarking or others …

1.3) Characteristics of Tagging Systems• Tagging rights (who can tag what)

• Tagging support (whether or not you see other tags or if tags are suggested)

• Aggregation (duplicate tags for the same resource)

• Type of object (what is tagged)

• Source of Material (from participants, system or Web)

• Resource connectivity (links using tags or not)

• social connectivity (links between users)

1.4) Usage of Social Tagging System (Folksonomy)• freely chosen keywords

• No restrictions

• User generated/collaborative

• Identify content using labels called “tags”

• These tags are chosen by the user and used to find the content later

• Available to the public

• Can be added by others

• Provide meaning –meta data to the content

• also used to generate communication about the content

• is used to sharing of knowledge

Sites that use Folksonomies..• -flickr.com – photo sharing

• Delicious.com – social bookmarking

• youTube.com –video sharing

• librarything.com – book cataloging

• amazon.com –users can tag items

1.5) Problems of Social tagging system (Folksonomy)

• People interpret different words in different ways

• there is no official standard of vocabulary for tagging

• Difficult to show or describe hierarchical relationships

1.5) Problems of Social tagging system (Cont..)

1. having unstructured, non-hierarchical organization with unsupervised vocabulary.

2. tag meanings get ambiguous i. spelling mistakesii. different lexical forms of the same wordiii. polysemy (same word/different sense)iv. homography ( same spelling/different meaning)v. synonymy (same sense/different words)vi. incorrect tag-to-resource associationsvii. different levels of tag precision and abstraction

3. tag space is inconsistent, inefficient and noisy.

4. reduces precision and recall in search results.

2) Solutions For Folksonomy combing with Semantics Technology

2.1) Approaches related to introducing semantics in folksonomy2.1.1) Statistical Approaches2.1.2) External Knowledge Source Based Approaches2.1.3) Hybrid Approaches

2.1.1) Statistical Approaches

A. the selection of tags

B. user vocabulary of tags

C. tag-resource similarity

D. mutual contextualization of users, tags and resources

A) The selection of tags• for the tag clouds by presenting different statistical metrics

• to decide upon the selection, cohesiveness, quality, popularity, independence and relevancy of tags for a given query in the tag cloud.

B) User vocabulary of tags• linked folksonomy tags based on collaborative tagging from users

using co-occurrence of tags, users

• resources to form a semantically connected network of folksonomy.

C) Tag-resource similarity• focused on tag-resource similarity.

• used different methods of aggregation (projection, distribution, incremental, collaborative filtering)

• evaluated them based on some similarity measures like cosine, overlap, and mutual information

D) Mutual contextualization of users, tags and resources

• analyzed semantics emerging from the bipartite graphs for all three elements of folksonomy .

Fig 1. A bipartite graph

Briefly view of Statistical Approaches• represent folks pragmatic behavior in the sense that how much they

contribute to emerging semantics. They identified and

• showed experimentally a specific partition of taggers that adds semantic precision in folksonomy.

2.12) External Knowledge Source Based Approaches

A. DBpedia-Based Approaches

B. Hybrid Approaches

A) DBpedia-Based Approaches• is the Semantic Web version of Wikipedia.

• use it to ask sophisticated queries against Wikipedia

• extracts structured information from Wikipedia

• extracts structured information from Wikipedia so that semantic web technique can be applied on it.

A) DBpedia-Based Approaches (Cont..)• by mapping keywords to DBpedia resources and use DBpedias

ontological structure to enrich its meaning showing results in the form of a tag cloud.

• It ranks resources using a hybrid ranking algorithm ranking resources based on their relevance with the query

• other related connected nodes in DBpedia graph rather than calculating individual resources importance separately like done in PageRank algorithm.

2.1.3) Hybrid Approaches• are good in precision than purely statistical approaches with respect

to semantics.

• map their tags to WordNet for context similarity.

2.2) Hierarchical Categorization

2.2.1) Classification of tags

2.2.2) Document Clustering

2.2.3) Wikipedia categories and out links

2.2.1) Classification of tags • categories that consist of context or content based tags

• getting semantics for them from Wikipedia and WordNet showing significant accuracy results.

2.2.2) Document Clustering• hierarchical algorithm and using Wikipedia as an external

knowledge source.

• frequent item sets (sets of words that occur frequently and can be used for making clusters)

• topic detection within a document and clustering of that document with other documents.

• used tf-idf scores to assign to each document in a cluster.

2.2.3) Wikipedia categories and out links• Each cluster is labeled as that of the Wikipedia categories (whose

occurrence frequency is top k for all documents in a cluster).

• was based on five standard data sets and they claimed that their results

• outperform the current state of the art methods.

Aspects Approaches Feature Set

Semantics

Wikipedia Wiki Text ( label, abstract, Wiki page sections)Link StructureNamed Entity RecognitionMapping tags to Wiki ArticlesTerm Disambiguation Page in WikipediaSemantic relationsamong tags SemanticConcept Mapping Usein ontologyconstruction

DBpedia Mashups mtegratmg DBpedia and tolksonomy dataYAGOTAGora Sense RepositoryHybird ApproachesTAGora sense repository (DBpedia: hasDBpediaSenseInfo)dbpprop:disambiguatesTag mapping to Linked Data CloudFaviki

WordNet Flickr ClustersIntegration of Word Net and FolksonomyGrounding tags to WordNet synsetsLin Similarity MeasureHypernym Discovery and Synonymy problem

Ontology Folksonomy ontologiesOntology construction from folksonomyOntological enrichment of tags meaning

Statistical Cosine SimilarityLin SimilaritySemantic relatedness measuresMutual contexualization of tags, users and resourcesEmerging semantic from folks pragmatic behaviour

Aspects Approaches Feature Set

Tag-to-Resource Association Resource and concept matching (tag recommendation from Wikipedia)Post level Spam Detection

Multilingualism

Wikipedia Multilingual Wikipedia

DBpedia Titles and abstracts and info boxes of Wiki articles available in multiple languages

Categorization Classification

DMOZ languagesHierarchical OntologyInterest Hierarchy ConstructionResource Mapping

Wikipedia Resource ClassificationWikipedia categoriesFaviki (makes use of Wikipedia categories)

Statistical Approach Purpose Oriented Tag ClassificationEntropyCo-OccurrenceAgglomerative Clustering Algorithm

Search enrichment

Secondary tags Yahoo Term Extraction APIMeta tags for Resource DescriptionTerm Frequency

Bursty tags Bursty Tags and Bursty Events

2.3) Other Semantic Supporting Aspects

2.3.1 Tag Spam Removal

2.3.2 Multilingualism

2.3.1) Tag Spam Removal• are quite easy and cheap target for spammers as compared to spamming

through online advertising, email systems and search engines.

• add any content; generate spam annotations anonymously and without any cost.

• Users make false associations among tags and resources by assigning tags that are popular to bring their resources higher in search result ranking.

• no one is harmed by spam tags on web but good web information resources become difficult to be found among all the content.

• Spam can be introduced at resource level, in the form of spam post

2.3.2) Multilingualism• coming from users by relating lingual practices of different folks.

• They translate tags into other languages to support search for multilingual resources

• users will get unexpected information in search that they cannot achieve by using only one language.

3. Conclusion• review the different approaches for semantic emergence in folksonomies

• Statistical approaches help to cover the vocabulary which is not present in lexical resources.

• co-occurrence represents user consensus.

• By comparing precision ratios of knowledge source based and statistical approaches, knowledge source based approaches perform better in disambiguation.

• one of the resources that needs to be compared is not popular, distance of similarity or co-occurrence doesn't find similarity correctly.

3. Conclusion (Cont..)• Folksonomies are user-driven and a non-formal way to categorize

data

• generate metadata while ontologies are the formal way to provide metadata for annotations.

• can give a very high precision.

• integrate the folksonomic and ontological approaches can give better precision but may suffer the complexity problem.

References 1. Studying Social Tagging and Folksonomy: A Review an Framework

J.Trant, University of Toronto/ Archives & Museum Informatics

vol 10, No 9 (2009)

2. Semantics in Social Tagging Systems: A Review

Amna Majid, Shah Khusro and Azhar Rauf

Computer Networks and Information Technology (ICCNIT)

2011 International Conference

Thank you for your attention !!!

Any Question ???

top related