data mining and semantic web

25
Data Mining and Semantic Web Presented By: Mohammad Aminul Islam (11103812) Muhammad Misbahur Rahman (11101850)

Upload: aminul-daffodil

Post on 15-Sep-2015

281 views

Category:

Documents


0 download

DESCRIPTION

What is web mining?Classification of web miningWeb structure miningHITS AlgorithmPage rank algorithmWeb content miningWeb usage miningConclusionReferences

TRANSCRIPT

Data Mining and Semantic Web

Data Mining and Semantic WebPresented By:Mohammad Aminul Islam (11103812)Muhammad Misbahur Rahman (11101850)

Web MiningContentsWhat is web mining?Classification of web miningWeb structure miningHITS AlgorithmPage rank algorithmWeb content miningWeb usage miningConclusionReferencesWeb miningWeb is the collection of inter related files on one or more web servers.Web Mining is the application of data mining techniques to extract knowledge from web data.It discover global as well as local structure within and between web pagesIt help transformation human understandable content to machine understandable semantics.Example 1

Yes, I am looking for this obamaExample 2

Area of Web miningWeb Content Text, image, records, etc.Web structure hyperlinks, tag, etc.Web usagehttp logs, web server logs, etc.

Web Mining Diagram

Web Structure MiningThe structure of the web consists of web pages as a node and hyperlinks as a edge connecting between two related pages.

hyperlinksWeb pagesWeb Structure MiningWeb structure mining is the process of discovering structure information from the webThis type of mining can be performed in the inner page of the web or at the hyperlink levelThe research of the hyperlink level also called hyperlink analysis

AlgorithmsFor web structure mining there are two main algorithmsHITS (Hypertext Induced Topic Search)Page Rank AlgorithmHITS (Hypertext Induced Topic Search)Hypertext Induced Topic Search also known as Hub and Authorities is a link analysis algorithm that rate web pages Introduced by Jon Kleinberg.HITS Hub: Pages that point lots of other pages such as Google, Yahoo, Facebook, etc.Authority: Lots of pages refer to this page

HITS Algorithm

In HITS algorithm ranking of the web pages decided by the textual content of the web pages against a given query. After collecting of the web pages HITS algorithm only concentrates on the structure, forget about the content of the web pages.

HITS AlgorithmStep 1: Initialized the number of pages NStep 2: Calculate the good hubs links to the many good authorities (Hub Score)H(x)= A(y)Step 3: Calculate authority reference by many good hubs (Authority Score)A(x) = H(y)Step 4: Normalize H, A: H(x)2= A(x)2=1Page rank is the half of HITS

Page Rank AlgorithmPage Rank is an algorithm used by Google for showing the pages in the Google search engine result.Calculate the importance of the pages, how many pages refer to the pagesThe number of pages linking to a page is called back links of the pageLinks from one page to another page consider as a voteMeasuring page rank not only depends on the vote but also importance and relevance of the pagesEquationSuppose Page A has T1 to Tn pointing to it (Incoming Links). Calculating the page rank of page A we can use the following equation

Here d is the damping factor value is 0.85 (To stop other pages having too much influence, the total vote is damped down by multiplying it by 0.85). C is the number of links point to A.

ExampleWe assume that initially every page has page rank 1.PR(A)=1, PR(B)=1, PR(C), PR(D)=1, PR(E)=1, PR(F)=1,D= 0.85PR (B) =1-d + d (PR (A) +PR (D)/3 + PR(C)/3 + PR (E)/4) =2.28PR (C) =1-d + d (PR (B)/3 + PR(D)/3 + PR (E)/4)=1.62PR (D) =1-d + d (PR (B)/3 + PR(C)/3 + PR (E)/4)=1.62PR (E) =1-d + d (PR (B)/3 + PR(C)/3 + PR (D)/3)=1.71PR (F) =1-d + d (PR (E)/4) =0.51

Web Content MiningDiscovering useful information from the web contentContent means text, audio, video etc.Content could be structued, semi structured, unstructuredExample

Web usage miningExisting tools report the number of hits of Web pages and where the hits came from. Although useful, the information is not sufficient to learn user behavior. Tools providing further analysis of such information are useful.Example: HTTP logs, web server logs etc..

Why those links are showing on the first page ?

AnswerAuthority pageLots of other important pages refer to this pageHOW?HITS AlgorithmPage rank algorithmConclusionWeb mining is related with search engin optimization. If we have good knowledge about content mining, usage mining, structrue mining then we will able to make good web sites.Referenceshttp://en.wikipedia.org/wiki/PageRankhttp://www.ijcsit.com/docs/vol1issue3/ijcsit2010010308.pdfhttps://mathscinotes.wordpress.com/2012/01/02/worked-pagerank-example/http://infomesh.net/2001/swintro/https://www.youtube.com/watch?v=OGg8A2zfWKghttp://kobra.bibliothek.uni-kassel.de/handle/urn:nbn:de:hebis:34-2009022726508http://www.semantic-web-journal.net/content/inductive-learning-semantic-web-what-does-it-buyhttp://blog.seagatesoft.com/wp-content/uploads/2012/03/web_mining_diagram.pnghttp://www.expertsupdates.com/ArticleAttachments/seo/web-mining/Figure2.gifhttp://soltisconsulting1.files.wordpress.com/2013/08/hubs_and_authorities.gif