cybermetrics - kaznu.kz aguillo... · general term of cybermetrics or the more specific of...
Post on 25-Sep-2018
256 Views
Preview:
TRANSCRIPT
2
Presentación: Isidro F. Aguillo Current position
Head, Cybermetrics LabSpanish National Research Council (CSIC)
Background MSc. Biology (Univ. Complutense, Madrid) MID (Univ. Carlos III, Madrid) DEA (Univ. Granada) Doctor Honoris Causa (Univ. Indonesia)
Research topics & other working activities Rankings Portal: webometrics.info Research projects: QEAVIS (e-humanities), MAVIR
(multilingual Web), CARTO (R&D cartography), ICYTnet (Virtual Libraries)
EU funded projects: ACUMEN (indicators portfolio for individuals), OpenAIRE (EU central repository), WISER (cybermetrics), EICSTES (R&D web indicators), PEKING (knowledge management), IMPACT-INFO2000 (information society)
Founder and editor of the e-journal “Cybermetrics” 300 seminars and conferences in over 100
universities from all over the World
3
Agenda I. Descriptive Cybermetrics Methods and tools Web indicators
II. Applied Webometrics Positioning in search engines Optimising web contents
III. Usagemetrics Log files and visits analysis Popularity
5
Cybermetrics is the discipline dedicated to the quantitative description of the contents and processes of the communication that take place in the cyberspace Cyberspace is the set of contents accessible in electronic
format. The condition of universal accessibility of Internet suggests the use of this term as synonymous of the Internet of the contents, basically but not exclusively, the webspace
Since the Cyber-scientometric is the sub-field more developed, for practical reasons it is named with the more general term of Cybermetrics or the more specific of Webometrics
Definition
6
informetrics
bibliometrics scientometrics
webometricscibermetrics
Adapted from Björneborn
Cyberscientometrics
Quantitative disciplines
7
Relationships
Webometrics
Informetrics
Mathematics/Physics
Librarianship and Documentation
Science‘s sociology
History of science
Economy
Scientific documentation
Services forInvestigation in
Libraries
Scientific policyInvestigation managementn
Scienctometricsapplied
basic
Life sciences
www.ulb.ac.be/unica/docs/Sch-com-2004-pres-Glanzel.ppt
Other sciences/Humanities
8
The presence on the Web reflects more and better the activities of the institution or individual than the traditional publications on paper At the academic area, professors, researchers and students put
on the Web unpublished material, first draw works, preliminary versions of papers, course materials, slides for presentations or data bases
The Web reaches a greater audience than other traditional scientific communication media The scientific journals has a restricted distribution
The hypertext nature of the Web offers the possibility to discover hidden patterns between the different institutional sites The academic sites link to other sites with a marked economic,
industrial, cultural, politic or social character
Advantages of the quantitative approach
9
New application areas
Webometrics Topology of hipertextual networks Social networks PageRank, HITS Comparative analysis of search engines
Ciberscientometrics Studies of electronic mails and forums “Big Science” & Grid Cybergeography and cyberdemography New units: institutional Web sites New indicators
Visibility Popularity
10
Cibergeography, ciberdemography Data and sources
Internet Geography Project www.zooknic.com Cybergeography www.cybergeography.org Clickz Surveys www.clickz.com/stats Blog www.internetworldstats.com/blog.htm Demography and Geography of the Internet
www.sociosite.org/demography.phpwww.sociosite.net/topics/webgeography.php
Internet Demographics Directoryinternet-demographics.netfirms.com
14
Size of Internet: Infrastructures Hosts
Lottor (World) www.isc.org/ds RIPE (Europe) www.ripe.net/info/stats/hostcount/ Asia Web Watch www.ciolek.com/Asia-Web-Watch/main-page.html
Servers Netcraft www.netcraft.com
Domains World www.norid.no/domenenavnbaser/domreg.html Domain worldwide www.domainworldwide.comwww.verisign.com/Resources/Naming_Services_Resources/Domain_Name
_Industry_Brief/ Germany (and others) www.denic.de/en/domains/statistiken Studies (outdated) www.zooknic.com
18
Web contents Webspace Spireproject 10.000 millions (10/02)
spireproject.com/art13.htm Present day 40+40.000 millions
Deposits Archive www.archive.org Google Cache www.google.com
Traffic The 80% of the browser sessions in the Web imply the use of
a search engine or a directory. Yahoo and, specially Google, are the more important intermediaries
20
The problem with the gTLD gTLD
First ones: .com, .org, .net, .int (.eu.int) New ones: .biz, .info, .name, .aero, .coop, .museum, .eu, .cat De facto: .cx, .tv, .cc Special cases: .edu
Experiments Google/Bing/Exalead
Filter operator “site:” Problems with some cTLD Domains and countries International domains (gTLD)
IP translators IP Locator 1.41 AW IP Locator 1.8 www.atelierweb.com/iploc IP Address Locator www.geobytes.com/IpLocator.htm?GetLocation Ip2location www.ip2location.com/free.asp
23
Academic Webspace
Sites Institutional domains
OCLC Web Characterization (1998-2002)http://www.oclc.org/research/projects/archive/wcp/
Sites and institutional sites Netcraft October 2011
500 millions of web sites Active (50%) * (5-10 institutional site/site) ~ 2 000 mill.
institutional sites Academic webspace Academic subdomains
Not every country
24
Academic subdomainsac.ae ac.in ac.rw edu.am edu.cn edu.hk edu.mm edu.pk edu.uaac.at ac.ir ac.se edu.ar edu.co edu.hn edu.mn edu.pl edu.uyac.bd ac.je ac.sg edu.au edu.cu edu.hu edu.mo edu.pr edu.veac.be ac.jp ac.sz edu.az edu.dm edu.jm edu.mp edu.pt edu.vgac.bw ac.ke ac.th edu.ba edu.do edu.jo edu.mt edu.py edu.vnac.by ac.kr ac.tz edu.bb edu.dz edu.kg edu.mx edu.qa edu.wsac.ci ac.lk ac.ug edu.bh edu.ec edu.kh edu.my edu.ru edu.yeac.cn ac.lv ac.uk edu.bm edu.ee edu.kn edu.na edu.sa edu.yuac.cr ac.ma ac.uz edu.bn edu.eg edu.kw edu.nf edu.sg edu.zaac.cy ac.mu ac.vn edu.bo edu.gd edu.ky edu.ng edu.sh edu.zmac.fj ac.mz ac.yu edu.br edu.ge edu.kz edu.ni edu.stac.gg ac.nz ac.za edu.bs edu.gh edu.lb edu.np edu.svac.gs ac.pa ac.zm edu.bt edu.gr edu.lc edu.om edu.toac.id ac.pg ac.zw edu.by edu.gs edu.li edu.pa edu.trac.il ac.pl acad.bg edu.bz edu.gt edu.lv edu.pe edu.ttac.im ac.ru edu.al edu.ck edu.gu edu.mk edu.ph edu.tw
25
Academic databases Public Web
Google Scholar scholar.google.comPublish or Perish www.harzing.com/pop.htmCitations Gadget code.google.com/p/citations-gadget/
MS Academic Search academic.research.microsoft.comScirus www.scirus.comCiteSeerX citeseerx.ist.psu.eduCitebase www.citebase.orgParacite paracite.eprints.orgDBLP dblp.uni-trier.deScienceDirect www.sciencedirect.com(US) Science Gov www.science.govIn-extenso www.in-extenso.org
26
ContextPublic Web Private Web
Databases
Repositories
Electronic journals
Visible Web
Invisible Internet
36
Rich files and media files Rich files
Definition and types Adobe Acrobat (pdf) y Postscript (ps) MS Office: Word (doc, rtf), Excel (xls), Powerpoint (ppt)
Size Filter operators: filetype (Google, Live, Exalead) Media files
Definition and types FilExt www.filext.com
Localization in search engines Terms Filter operators Autonomous databases
40
Languages on the Net
Sources and studies Users according to language
Global Reach global-reach.biz/globstats/index.php3
Composition of the webspace Experiments with search engines Google Yahoo! Bing (ex-Live) Search Ask (Teoma) Copernic
43
Languages (Google)
Language
<lr> value
Language
Idioma Código Idioma CódigoArabic lang_ar Icelandic lang_isChinese (S) lang_zh-CN Italian lang_itChinese (T) lang_zh-TW Japanese lang_jaCzech lang_cs Korean lang_koDanish lang_da Latvian lang_lvDutch lang_nl Lithuanian lang_ltEnglish lang_en Norwegian lang_noEstonian lang_et Portuguese lang_ptFinnish lang_fi Polish lang_plFrench lang_fr Romanian lang_roGerman lang_de Russian lang_ruGreek lang_el Spanish lang_esHebrew lang_iw Swedish lang_svHungarian lang_hu Turkish lang_tr
44
Countries (Google)
Language
Language
Andorra AD Bhutan BT Estonia EE Guinea-Bissau GW Kazakhstan KZUnited Arab Emirates AE Bouvet Island BV Egypt EG Guyana GY Lao PDR LAAfghanistan AF Botswana BW Western Sahara EH Hong Kong HK Lebanon LBAntigua and Barbuda AG Belarus BY Eritrea ER Heard and Mc Donald Islands HM Saint Lucia LCAnguilla AI Belize BZ Spain ES Honduras HN Liechtenstein LIAlbania AL Canada CA Ethiopia ET Croatia (Hrvatska) HR Sri Lanka LKArmenia AM Cocos (Keeling) Islands CC European Union EU Haiti HT Liberia LRNetherlands Antilles AN Congo, DR CD Finland FI Hungary HU Lesotho LSAngola AO Central African Republic CF Fiji FJ Indonesia ID Lithuania LTAntarctica AQ Congo CG Falkland Islands (Malvinas) FK Ireland IE Luxembourg LUArgentina AR Switzerland CH Micronesia, FS FM Israel IL Latvia LVAmerican Samoa AS Cote D'ivoire CI Faroe Islands FO India IN Libya LYAustria AT Cook Islands CK France FR British Indian Ocean Terr. IO Morocco MAAustralia AU Chile CL France, Metropolitan FX Iraq IQ Monaco MCAruba AW Cameroon CM Gabon GA Iran IR Moldova MDAzerbaijan AZ China CN United Kingdom UK Iceland IS Madagascar MGBosnia and Herzegowina BA Colombia CO Grenada GD Italy IT Marshall Islands MHBarbados BB Costa Rica CR Georgia GE Jamaica JM Macedonia, FYR MKBangladesh BD Cuba CU French Quiana GF Jordan JO Mali MLBelgium BE Cape Verde CV Ghana GH Japan JP Myanmar MMBurkina Faso BF Christmas Island CX Gibraltar GI Kenya KE Mongolia MNBulgaria BG Cyprus CY Greenland GL Kyrgyzstan KG Macau MOBahrain BH Czech Republic CZ Gambia GM Cambodia KH Northern Mariana Islands MPBurundi BI Germany DE Guinea GN Kiribati KI Martinique MQBenin BJ Djibouti DJ Guadeloupe GP Comoros KM Mauritania MRBermuda BM Denmark DK Equatorial Guinea GQ Saint Kitts and Nevis KN Montserrat MSBrunei Darussalam BN Dominica DM Greece GR Korea, DPR KP Malta MTBolivia BO Dominican Republic DO South Georgia/South Sandwich I. GS Korea, Republic of KR Mauritius MUBrazil BR Algeria DZ Guatemala GT Kuwait KW Maldives MVBahamas BS Ecuador EC Guam GU Cayman Islands KY Malawi MW
45
Countries II (Google)
Language
Language
Mexico MX Qatar QA Tokelau TKMalaysia MY Reunion RE Turkmenistan TMMozambique MZ Romania RO Tunisia TNNamibia NA Russian Federation RU Tonga TONew Caledonia NC Rwanda RW East Timor TPNiger NE Saudi Arabia SA Turkey TRNorfolk Island NF Solomon Islands SB Trinidad and Tobago TTNigeria NG Seychelles SC Tuvalu TVNicaragua NI Sudan SD Taiwan TWNetherlands NL Sweden SE Tanzania TZNorway NO Singapore SG Ukraine UANepal NP St. Helena SH Uganda UGNauru NR Slovenia SI United States Minor Outlying I. UMNiue NU Svalbard and Jan Mayen Is. SJ United States USNew Zealand NZ Slovakia (Slovak Republic) SK Uruguay UYOman OM Sierra Leone SL Uzbekistan UZPanama PA San Marino SM Holy See (Vatican City State) VAPeru PE Senegal SN Saint Vincent and the Grenadines VCFrench Polynesia PF Somalia SO Venezuela VEPapua New Guinea PG Suriname SR Virgin Islands (British) VGPhilippines PH Sao Tome and Principe ST Virgin Islands (U.S.) VIPakistan PK El Salvador SV Vietnam VNPoland PL Syria SY Vanuatu VUSt. Pierre and Miquelon PM Swaziland SZ Wallis and Futuna Islands WFPitcairn PN Turks and Caicos Islands TC Samoa WSPuerto Rico PR Chad TD Yemen YEPalestine PS French Southern Territories TF Mayotte YTPortugal PT Togo TG Yugoslavia YUPalau PW Thailand TH South Africa ZAParaguay PY Tajikistan TJ Zambia ZM
46
Lists of universities
Language
Language
Braintrack www.braintrack.comUniversities Worldwide univ.ccGalilei www.galilei.com.arWebometrics Cataloguewww.webometrics.info/university_by_country_select.aspHEIR siu.no/heirGeneral Education Online www.findaschool.orgInternational Colleges and Universities www.4icu.orgPortal Tecnociencia www.tecnociencia.esUniversia www.universia.esCanadian Universities www.uwaterloo.ca/canuU.S. Universities by State www.utexas.edu/world/univ/stateTop American Reseach Universities thecenter.ufl.eduUK Higher Education Map www.scit.wlv.ac.uk/ukinfo/uk.map.htmlTimes World Universities Rankings www.thes.co.uk/worldrankingsGerman University Ranking www.university-ranking.orgAcademic Ranking of World Universities ed.sjtu.edu.cn/ranking.htmAll Universities around the World www.bulter.nl/universitiesRanking of China Universities rank2005.netbig.comAlphabetical Index of Japanese Universities camp.ff.tku.ac.jp/TOOL-BOX/JapanUNIV
47
Personal agents (I) Website extractors
AaronWebVacuum 2.9 www.surfwarelabs.comJOC WebSpider 5.7 www.jocsoft.comTeleport Pro 1.64 www.tenmax.comLeech 4.3 www.aeria.comWebCopier 5.4 www.maximumsoft.comBlackWidow 6.28 www.softbytelabs.comMemoWeb 4.0 www.goto.frOffline Commander 2.1 www.zylox.comWebReaper 10 www.webreaper.netOffline Explorer Pro 5.9 www.metaproducts.comWebsite Extractor 10.0 www.asona.orgWebWhacker 5.0 www.bluesquirrel.comWebZip 7.1 www.spidersoft.comWebsite2PDF 1.0 www.spidersoft.comMedusa 1.2 www.candego.com
48
Personal agents (II)
Link checkersAlert LinkRunner 6.01 www.alertbookmarks.com/lrHTML Link Validator 4.47 www.lithopssoft.comHTML Validator Professional 11 www.htmlvalidator.comLink Checker Pro 3.3 www.link-checker-pro.comLinkScan Workstation 12.1 www.elsop.comWeb Link Validator 5.5 www.relsoftware.com/wlvXenu's Link Sleuth 1.3 home.snafu.de/tilman/xenulink.html
49
Personal agents (III)
HTML extractors WebData Extractor 6.0 www.webextractor.com
Experiments Site extraction with the offline browser Teleport Pro Mapping of the extracted site with Xenu
Link checking Direct mapping of the site with Xenu
Link checking Size of the site according to the search engines
Google, Yahoo, Exalead, Ask, Gigablast
52
Cybermetrics of search engines Search engines: Characteristics and
problems 8 “different” big search engines
Google Yahoo Search (now Bing supplied) Bing (ex-Live) Search Ask (ex-Teoma) Exalead Wisenut Gigablast Alexa
Studies about search enginesSearch Engine Showdown searchengineshowdown.comSearch Engine Watch searchenginewatch.com
53
¿Only seven (+one)?
Sede Base de datos Sede Base de datos Sede Base de datosGOOGLE GOOGLE GOOGLENETSCAPE NETSCAPE NETSCAPEYAHOO YAHOO YAHOOALTAVISTA ALTAVISTA ALTAVISTA ALTAVISTAALLTHEWEB ALLTHEWEB ALLTHEWEBLYCOS LYCOS TEOMA LYCOSIWON GOOGLE IWON GOOGLE IWONHOTBOT HOTBOTMSN SEARCH MSN SEARCHMSN SEARCH LIVE LIVETEOMA TEOMAASK JEEVES ASK JEEVESALEXA GOOGLE ALEXA ALEXA ALEXA
A9 A9 LIVEEXALEAD EXALEAD EXALEAD EXALEAD
WISENUT WISENUT WISENUT WISENUT WISENUT WISENUTGIGABLASTHEREUARE
GOOGLE/MSN SEARCH
2003 2004-2005 2006-2007
GIGABLAST GIGABLASTGIGABLAST GIGABLAST GIGABLAST
GOOGLEGOOGLE
ASK
YAHOO
TEOMA ASK ASK
YAHOO
FAST
INKTOMI
TEOMA
54
Cybermetrics of search engines
GOOGLE BING (LIVE) EXALEAD ASK GIGABLAST
TLD site:xx site:xx site:xx site:xx site:xxDomain site:aa.xx site:aa.xx site:aa.xx site:aa.xx site:aa.xx
Directory site:aa.xx/bb site:aa.xx/bb NO site:aa.xx/bb NO
Word in url inurl:xx NOinurl:xxurl:xx
inurl:xx inurl:xx
Link link:aa.xx/b.htm NO link:www.aa.xx (NO) (NO)
Link domain NO NO link:aaa.xx NO NO
File type filetype:yy filetype:yy filetype:yy filetype:yy filetype:yy
Language Advanced Advanced Advanced Advanced NO
Country Advanced (Advanced) Advanced Advanced NO
57
Quality, visibility and impact Quantitative evaluation of institutional
websites The Google model
ToolBar installation (toolbar.google.com) Page Rank
Logarithmic scalerankwhere.com/google-page-rank.phpwww.rustybrick.com/pagerank-prediction.php
Components: visibility + weight
Visibility Types of links: inlinks, outlinks, self-links, back-links Calculation using search engines Web impact (WebIF) Link quality: Link inspectors
63
Popularity Number of visits
It's difficult to obtain for comparative studies Relative position
Popularity according to www.alexa.com Only domains World Wide coverage Some “absolute” values Temporal evolution Geographic biases (>> Asia)
Snapshot snapshot.compete.com Only USA!!!
Ranking.com www.ranking.com Traffic Estimate www.trafficestimate.com Popularity according to Netcraft toolbar.netcraft.com/site_report
Institutional sites and variants More restricted coverage
No comparables
66
Inequalities in Alexa
Posición % VISITASTop 3 23Top 500 45Número 10 5Número 100 0,1Número 1.000 0,06%Número 10.000 0,02%
70
Working with links
Visibility Inlinks (incoming links)
Yahoo Site Explorer Exalead: link: -site:
Outlinks (outgoing links)=Luminosity Link inspectors
Web impact Definition of WebIF
Calculation=Visibility/size Quality
Link checkers
71
Basic terminology
B has an outlink to C : ~ reference B has an inlink from A : ~ citation B has a selflink : ~ self-citation
E and F are reciprocally linked A is transitively linked with H via B-D A has a transversal link to G : short cut
C and D are co-linked from B,i.e. shared inlinks: co-citation
B and E are co-linking to D,i.e. shared outlinks: bibliog.coupling
A
B
D
E G
F
H
C
co-links
72
Cyberscientometrics Development of R&D indicators in the Web
Units Institutional site
Models Indicators
Co-sitation, social networks and theory of the “small world” Small World www.db.dk/lb/2002smallworld.pps
Bibliometrics of e-journals and deposits of documents CiteSeerX citeseerx.ist.psu.edu CiteBase citebase.eprints.org/cgi-bin/search Google Scholar scholar.google.com Arxiv arxiv.org Scirus www.scirus.com DBLP dblp.uni-trier.de
73
Web indicators
R&DIndicators
Information SocietyIndicators
Input Output
WebIndicators
Scientometrics
BibliometricsPatentometrics
WebometricsCybermetrics
74
Building Indicators Experiments Codification
Institutional Subject (UNESCO) Geographic (NUTS)
Indicators calculation Visibility (sitations)
Visibility of the rich files Visibility of articles in repositories Visibility of electronic journals
Impact (WebIF) Diversity Co-citation
75
Web Impact factor (WebIF) Visibility (sitations)/ Size (No. of pages)
Webometrics (Academic) Rank
Composite indicators
Size No. of Webpages No. of files
Rich files:pdf, ppt, doc, ps
No. of papersGoogle ScholarOther bibliographic
databases
Visibility Incoming external links Mentions
Popularity
84
Applied Cybermetrics The aim is not only to publish in the Web, but to get
visibility Getting a great number of visits (real audience closed to the
potential one) Receiving external links Being present in directories and portals
A search engine is used in 80% of the web sessions The web positioning is the key to increment visibility
Quality influences the chances to get a good positioning, but also... The volume of information The hypertext structure The contents annotation
85
Positioning Presence measurements
Directory indexing Actual indexed pages by a search engine/Total pages
Visibility measurements Page Rank Prominence by terms
Measurements of access and usage Popularity
• Absolute: Number of visits• Relative: Alexa Ranking
Usage• Number of downloaded files• Average time per visit• More frequent reference terms
87
Problems Design is irrelevant, or even counterproductive
Few indexable contents on main page Flash animations or Java applets that hinder the robots’
navigation Invisible Internet
Databases and dynamic web pages can not be indexed by search engines
Link quality It's necessary a continuous maintenance and update of external
and internal links Rich files
Documental files are handy for distributing information with a plus value• Formats pdf, ppt, doc, ps
88
ToolsWebmasters World tools.webmastersworld.orgSEO Encyclopedia www.seopedia.infoWebmasters Tools tools.devshed.comSEO Online www.seoonline.infoPageStrength www.seomoz.org/tools/page-strength.phpData Centers Tool www.seocritique.com/datacentertoolSEO Tools www.seochat.com/seo-toolsSEO Web Directory www.seowebdirectory.com/SEO_ToolsSEO Company www.seocompany.ca/tool/seo-tools.htmlSEO ToolSet www.webconfs.com
91
Criteria (Google) Hypertext structure
Maturity: Depth of the institutional sites Visibility: PageRank Neighborhood: External and internal links
Number of times that the search terms appear Relative position of the search terms
Title and URL Metadata Headings ALT tags and external anchors
Updating periodicity Freshness (new contents)
Popularity: Page visits Local aspects (geographic, languages)
93
Presence of terms in the URL Very relevant Preferably in the domain or subdomain
Recommended no longer than 30 characters The order is important
http://better.good.xx/aceptable
Whole words, not truncated http://lib.univ.edu http://library.university.edu (YES)
Independent terms/phrases (dash/underscore) Universidad-Complutense= +Universidad +Complutense Universidad_Complutense= “Universidad Complutense”
95
Presence of terms in Title Very relevant Tag contents <TITLE>!!!
Key words, no title The position is important: first words carefully selected Long phrase, without empty words (~60 characters) Don't repeat terms, bilingual option Institutional identification, geographic localization
The tag’s contents are also considered <Hn> The heading gives the title obtained <H1> Moving generic words: “Hello”, “Welcome”, “Page of” to inferior
levels <H2> ó <H3>
97
Metatags They are not so important Description
Up to 250 characters Reusable tag for versions in other languages The position is important: choose wisely the first words Don’t repeat words
Keywords Up to 20 terms Terms SHOULD also appear in the text Reusable tag for versions in other languages The position is important: choose wisely the first words Don’t repeat words
Description pre-cataloging Use another tags: Dublin Core model (15 repeatable)
98
Generating META tagsMeta Builder 2vancouver-webpages.com/META/mk-metas.htmlMeta Tags Generator www.meta-tags.usMetaTags Generatortools.webmastersworld.org/MetatagsGenerator.phpMeta Tag Generatorwww.invision-graphics.com/meta-tag-generator.htmlMeta Tag Generator www.submitcorner.com/Tools/Meta
DC-Dot www.ukoln.ac.uk/metadata/dcdot/
99
Key words in text To select correctly
To study synonymy, variants, similar terms in other languages To analyze usage in search engines
Density Total: Up to 25% Individual: Up to 5%
Position Heading tags <Hn> First paragraphs Font modifying tags
Bold <B><strong>; Italic <I>; Font size To promote the proximity of terms (where appropriate)
100
More about keywords Alternative text ALT
Very important Used to give meaning to images, graphs and banners Specific treatment similar to title Up to 250 characters
Anchor terms in the links Use keywords It’s very important the pages that link ours It’s also relevant for the internal navigational links
104
Links to external pages Link’s density
Average of links/page (incl. internal) ~ 20 Structuring resource lists in hierarchical directories
Each category, one or more pages
Target pages Linking to good pages
Main page (whenever appropriate) Pages with high PR Updated pages Local>.edu>.org>.info>.com
Check frequently that links are still active Avoid links to link farms Select carefully the text on the link (avoid “here”, “page”)
105
Characteristics of the institutional sites Domain
Own Avoid acronyms, provide content Local, .org, .info, .name versus .com
Subdomain: Inherit PR from site root Don’t change domain!!!
Medium-sized and big institutional sites Preferably large
Updating Frequently
Increase number of pages (maintain new/old rate )
Promote inlinks Promote visits
Keep statistics
106
Characteristics of the pages Size
Small or medium-sized <100 k But 40-50 k can be a great volume of text Structure correctly the groups of pages through consecutive links
(back-next) Medium or big-sized
Updating Frequent, but not that much Change contents, no address
Reduce to a minimum the restructuring
Versions In different pages
In other languages In other formats (pdf, doc, ps, ppt, ...)
107
Barriers for robots Links hidden, incomplete or without meaning
Graphs and way-in banners without link in text mode Specially Flash files It’s also important the presence of ALT text
Javascripts in navigational menus With hidden links With relative, incomplete links (without URL Base declaration)
Frames (but NOT always!!) Orphan pages
Avoid re-direction and alias Refresh tags Institutional farms (site.es; site.com; site.org)
Dynamic pages Reduce length and complexity of the URLS: Give them a
meaning
108
Robot-friendly File robots.txt
Don’t abuse of “no index” Map of the site (html and xml) Navigational internal links
Just the ones and necessary Sign-in in referrals
At the search engines (not very important, only speed-up indexing)
In directories (In Yahoo increase the visibility) In supersites (trick: Wikipedia)
Fight against the invisibility Static pages Support submenus
110
Hacking strategies (to avoid) Invisible texts Pixel links Link farms
Link buying Visits buying
Duplicate texts Cloaking
Different pages for the search engine than for the user Hacking mirrors
111
Tools: Words’ Density
Site Content Analyzer 2.2.15 www.sitecontentanalyzer.comGood Keywords 2.0 www.goodkeywords.comKeyword Density www.keyworddensity.comKeyw. Dens. & Prominence 1.2 www.ranks.nl/tools/spider.htmlKeyword Density Analyzer tool.motoricerca.info/keyword-density.phtmlKDAnalyzer Version 2.0 www.webjectives.com/keyword.htmGoogle Adwords adwords.google.com/select/KeywordSandboxKeyword Density Analyzer 1.3www.searchengineworld.com/cgi-bin/kwda.cgiKeyword Investigatorwww.keywordster.com/keyword-investigator.htmGRKdawww.grsoftware.net/search_engines/software/grkda.html
113
Tools: PositionAccurate Monitor 2.5 www.cleverstat.comAdvanced Web Ranking 4.7 www.advancedwebranking.comAgentWebRanking Pro 2.6 www.agentwebranking.comIBP 9 www.axandra.comDynamic Web Ranking 7.0 www.dynamicwebrank.comLink Popularity Analysis 2.0 www.link-popularity-analysis.comLink Popularity Check 3.0 www.checkyourlinkpopularity.comLink Survey 1.5 www.antssoft.comRankSpy 1.3 www.searchutilities.com/rankspyTrellian SEO Toolkit www.trellian.com/seotoolkitWeb CEO 6.0 www.webceo.com
117
Evolution and persistence
Volatility Persistence
Changes in web pages used to be minor or cosmetic
The frequency of change varies according to the domains
The magnitude of the change depends largely on the size
Big pages change more and more frequently
research.microsoft.com/research/sv/sv-pubs/p97-fetterly/p97-fetterly.pdf
118
Generating Contents Personal pages (also research groups or departments)
Access to full texts files (academic publications)
Institutional Repositories
Papers, books and book chapters, dissertations, …
Multimedia repositories
Portal of journals
Local institutional journals
Super-sites
Added value directories of (web) resources
120
Personal pages Current situation
Few scholars with their own personal webpage, most of them with a limited amount of contents
Bad positioning practices, especially regarding the URL
Personal Branding
Increased Impact (global audiences)
Efficient Networking (peers and non-peers)
Complements your formal scholarly communication
Reflects the diversity of your activities (and of yourself)
Not only reactive but also proactive
It is easy, fast and cheap
121
A model
Institutional Logo & BannerName of the group, department or faculty
Index Papers Conferences Books Teaching Proyects Popular
Science Prizes Hobbies Press notes Blog / Web 2.0 Statistics CV (pdf)
Photo Contact info
General comments and presentation
Links
News, relevant new infoNext conferences
Updated 5-July-2012
thebook.virtualknowledgestudio.nl/author/paul-wouters
http://johnclements.net/home
123
Web Usage Mining
Definitions Data mining: Knowledge extraction from databases Web Mining: Gathering and analisys of the visit patterns of a Web
site It is not to search or recover information about that site
Objectives: Aspects to explore Joining Classification and clustering Transversal patterns Sequential patterns Similarities
Visits Web sites analysis Log files: Definition and structure Software for log analyzing
Practices with WebTrends Analysis Suite (www.netiq.com)
124
Taxonomy of the Web Mining
Web Mining
Mining of the Web use
Database miningDatabase mining
Mining of Web contents
Mining based on agents
Search engines Metasearchers Personal agents
Invisible Internet
Identification Description Analysis tools
125
Log files(logbook)
IP address from the visitor Visited URLs Time of visit Time dedicated to the visit URL from which the visit came
Type of petition Type of answer Size of answer (bytes) Browser used etc…
File that automatically records all data about the visits that a web site receives
Apache web log205.188.209.10 - - [29/Mar/2002:03:58:06 -0800] "GET /~sophal/whole5.gif HTTP/1.0" 200 9609 "http://www.csua.berkeley.edu/~sophal/whole.html" "Mozilla/4.0 (compatible; MSIE 5.0; AOL 6.0; Windows 98; DigExt)" 216.35.116.26 - - [29/Mar/2002:03:59:40 -0800] "GET /~alexlam/resume.html HTTP/1.0" 200 2674 "-" "Mozilla/5.0 (Slurp/cat; slurp@inktomi.com; http://www.inktomi.com/slurp.html)“202.155.20.142 - - [29/Mar/2002:03:00:14 -0800] "GET /~tahir/indextop.html HTTP/1.1" 200 3510 "http://www.csua.berkeley.edu/~tahir/" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)“
126
Utilities Questions to answer
¿How the information has been used? ¿How frequently? ¿What is the most and the less popular (visited)? ¿Where from do the visitors come?. ¿Where from do they
exit? ¿Where do they spend more time? ¿How much time do they spend? ¿Which are the paths that visitors follow the most? ¿Who are the visitors? ¿Where do they come from? ¿How did they arrive?
127
Google Analytics www.google.com/analyticsYahoo Web Analytics web.analytics.yahoo.comStatCounter www.statcounter.comActiveMeter www.activemeter.com123Statmore www.123stat.comCounter Central www.countercentral.comDigits Web Counter www.digits.comFree Hit Counter www.ritecounter.comGoStats www.gostats.comMyWebStats www.mywebstats.orgOneStat Free www.onestatfree.comOneStat www.onestat.comOpentracker www.opentracker.netShinyStat www.shinystat.comTDstats www.tdstats.comTheCounter www.thecounter.comWebSTAT www.webstat.comWhat Counter www.whatcounter.com
Visits trackers
132
10-Strike Log-Analyzer 1.53 www.10-strike.com123LogAnalyzer 3.3 www.123loganalyzer.comLog2Stats 1.5 www.bitstrike.comAdvancedLogAnalyzer 2.1 www.abacre.com/ala/index.htmAlterwind Log Analyzer 4.0 www.alterwind.comAnalog 6.0 www.analog.cxAnalyse Spider 3.01 www.analysespider.comDeep Log Analyzer 4.0 www.deep-software.comeWebLogAnalyzer 2.3 www.esoftys.comFastStats Analyzer 4.1 www.mach5.com/products/analyzerNihuo Web Log Analyzer 4.07 www.nihuo.comSawMill 8.5 www.sawmill.netSmarterStats 6.5 www.smartertools.comSurfstats 2011 www.surfstats.comWebLogStorming 2.6 www.datalandsoftware.com/weblogWebLogExpert 7.4 www.weblogexpert.comWebTrends Analytics 10 www.webtrends.com
Log file analysis software
136
Exercises Experiments
Funnel Web 5.0 Practices with log files
Total and disaggregated visits More popular pages and directories Downloaded files Points of entry and exit Visitors demography Entry referrals (origin, browser and search engine words
used)
140
Bibliography/Webliography General Bibliography/Webliography www.cindoc.csic.es/cybermetrics/links03.html Björneborn, L. & Ingwersen, P. (2001). Perspectives of webometrics. Scientometrics, 50(1): 65-82.
http://www.db.dk/lb/2001webometrics.pdf van Raan, A. F. J. (2001). Bibliometrics and internet: Some observations and expectations. Scientometrics, 50(1):
59-63 Bar-Ilan, J. (2001). Data collection methods on the Web for infometric purposes. A review and analysis.
Scientometrics, 50(1):7-32 Björneborn, L. (2004). Small-world link structures across an academic web space : a library and information
science approach. PhD dissertation. Royal School of Library and Information Science. xxxvi, 399 p. ISBN 87-7415-276-9.<http://www.db.dk/lb/phd/phd-thesis.pdf >
Jepsen, E.T.; Seiden, P.; Ingwersen, P.; Björneborn, L. & Borlund, P. (2005). Characteristics of scientific web publications: preliminary data gathering and analysis. Journal of the American Society for Information Science and Technology. Special Issue on Webometrics.
Björneborn, L. & Ingwersen, P. (2005). Towards a basic framework for webometrics. Journal of the American Society for Information Science and Technology. Special Issue on Webometrics.
Thelwall, M.; Vaughan, L. & Björneborn, L. (2005). Webometrics. Annual Review of Information Science and Technology, 39.
Ingwersen, P. & Björneborn, L. (2004). Methodological issues of webometric studies. In: Glänzel, W. et al. (eds.). Quantitative Science and Technology Research. Klüwer Academic Publishers.
The Statistical Cybermetrics Research Group. Wolverhampton University <http://cybermetrics.wlv.ac.uk> Alonso Berrocal, J.L.; Figuerola, C.G. & Zazo, A.F. (2004). Cibermetría:nuevas técnicas de estudio aplicables al
Web. Ediciones Trea, Gijón. 207 pags. Faba Perez, C., Guerrero Bote, V. P. & Moya Anegón, F. (2004). Fundamentos y técnicas cibermétricas: modelos
cuantitativos de análisis. Junta de Extremadura, Mérida. Serie Sociedad de la Información, no. 18. 216 pags. Prime, C.; Bassecoulard, E.; Zitt, M. (2002). Co-citations and co-sitations: A cautionary view on an analogy.
Scientometrics 54 (2): 291-308:
top related