a sneak peek into the web
Post on 08-Jul-2015
370 Views
Preview:
DESCRIPTION
TRANSCRIPT
A sneak peek into the webAnother way to see Internet
Guillaume Lebourgeois - December 2008
Your visionA browser
Websites
An interface : search engines
RealityWebsites interconnected
Topology of the web
Web is a huge graph
Hyperlinks
A Graph is made of nodes linked together
A link can have an orientation
Hyperlinks
A link from website A to website B
A reciprocal link
A B
A B
Determining website quality
Two ways :
- Text mining, semantic approach- Topologic approach
It is better to mix both.
Topologic approach
Authorities : websites linked by others
A
Hubs : websites dealing a lot of links
H
Topologic approach
The authority is judged by the others as a reference website.
The hub has a good knowledge of his territory.
Topologic approach
These two notions must be understood relatively to a specific territory :
A community
Communities
Topology : a community is a subpart of the web with a good link density.
Semantic : a community is a subpart of the web which shares a thematic, ideas, ...
Communities
C1
C2
C3
weak link
weak link
Let’s Observe weak links
CommunitiesWeak links : they link distant communities together. These links are rare and stategic. They can be considered as bridges.
Six degrees : thanks to them, there are in the worst case 6 degrees of separation between 2 random websites.
Social : the situation is exactly the same in the social graph.
Exploring
To explore these structures we can use a web Crawler.
- Extracts links and informations- Stores data and visits links found
ExploringBegin Links
Crawl
Depth 1 Links
Crawl
Depth 2 Links
...
Storage
data
data
Using data
Once you’ve collected data you can :- produce a map of the territory you explored.- create a search engine- imagine loads of different applications...
End of the presentation
Feel free to ask any question
top related