Download - Web Log, Text, and Other Data Mining
![Page 1: Web Log, Text, and Other Data Mining](https://reader030.vdocuments.net/reader030/viewer/2022020218/557d4e7bd8b42a93078b503f/html5/thumbnails/1.jpg)
Web Log, Text, and Other Data Mining
Wayne Kao
![Page 2: Web Log, Text, and Other Data Mining](https://reader030.vdocuments.net/reader030/viewer/2022020218/557d4e7bd8b42a93078b503f/html5/thumbnails/2.jpg)
What is Data Mining?• “Automated extraction of hidden
predictive information from large databases” -Kurt Thearling
• “Quickly and thoroughly explore mountains of data, isolating the valuable, usable information -- the business intelligence” -SPSS site
![Page 3: Web Log, Text, and Other Data Mining](https://reader030.vdocuments.net/reader030/viewer/2022020218/557d4e7bd8b42a93078b503f/html5/thumbnails/3.jpg)
Possible Questions (Chi)• Usage
– How has info been accessed? How frequently? What’s popular?
– How do people enter the site? Where do people spend time? How long do they spend there?
– How do people travel within a site? What are the [un]popular paths?
– Who are the people accessing the site? From what geographical location? From what domains?
![Page 4: Web Log, Text, and Other Data Mining](https://reader030.vdocuments.net/reader030/viewer/2022020218/557d4e7bd8b42a93078b503f/html5/thumbnails/4.jpg)
Possible Questions (cont)• Structural
– What information has been added? Modified? Remained the same but moved?
• Usage + Structural– How is new info accessed? When does it
become popular?– How does introducing new information
change navigation patterns? Can people still navigate there to the desired info?
– Do people look for deleted information?
![Page 5: Web Log, Text, and Other Data Mining](https://reader030.vdocuments.net/reader030/viewer/2022020218/557d4e7bd8b42a93078b503f/html5/thumbnails/5.jpg)
Usability Testing
Common usability testing techniques:• Interviews• Ethnographic and/or lab-style observations• Surveys• Focus groups
Good qualitative data
Problems with these techniques:• Time and effort are costly• Small sample sizes – quantitative results? (Spool)
How can we get usability testing more involved in the design cycles, so we can find problems and potential problems earlier?
Design
EvaluatePrototype
![Page 6: Web Log, Text, and Other Data Mining](https://reader030.vdocuments.net/reader030/viewer/2022020218/557d4e7bd8b42a93078b503f/html5/thumbnails/6.jpg)
Remote Usability (Waterson)
• Analyze clickstreams in the context of the task and user intentions
• Human observers not present• Want methods that are
– Easy to deploy on any website– Compatible with range of OS and browsers
• Mobile computing adds further usability challenges– Small screen sizes– Limited and/or new interaction techniques– Devices are used in environments beyond
the desktop
![Page 7: Web Log, Text, and Other Data Mining](https://reader030.vdocuments.net/reader030/viewer/2022020218/557d4e7bd8b42a93078b503f/html5/thumbnails/7.jpg)
Apache Web Log205.188.209.10 - - [29/Mar/2002:03:58:06 -0800] "GET
/~sophal/whole5.gif HTTP/1.0" 200 9609 "http://www.csua.berkeley.edu/~sophal/whole.html" "Mozilla/4.0 (compatible; MSIE 5.0; AOL 6.0; Windows 98; DigExt)"
216.35.116.26 - - [29/Mar/2002:03:59:40 -0800] "GET /~alexlam/resume.html HTTP/1.0" 200 2674 "-" "Mozilla/5.0 (Slurp/cat; [email protected]; http://www.inktomi.com/slurp.html)“
202.155.20.142 - - [29/Mar/2002:03:00:14 -0800] "GET /~tahir/indextop.html HTTP/1.1" 200 3510 "http://www.csua.berkeley.edu/~tahir/" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)“
202.155.20.142 - - [29/Mar/2002:03:00:14 -0800] "GET /~tahir/animate.js HTTP/1.1" 200 14261 "http://www.csua.berkeley.edu/~tahir/indextop.html" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)“
![Page 8: Web Log, Text, and Other Data Mining](https://reader030.vdocuments.net/reader030/viewer/2022020218/557d4e7bd8b42a93078b503f/html5/thumbnails/8.jpg)
Analog - One traditional tool
• Reports number of requests, info about client machines, entry/exit points, charts (Chi et al.)
• Generated on a daily basis• Typical stats• Prettier stats
![Page 9: Web Log, Text, and Other Data Mining](https://reader030.vdocuments.net/reader030/viewer/2022020218/557d4e7bd8b42a93078b503f/html5/thumbnails/9.jpg)
Readings• “Visualizing the Evolution of Web Ecologies”
Chi et al., Xerox PARC, 1998
• “Visualizing Association Rules for Text Mining”Wong, Whitney, & Thomas, Pacific Northwest, 1999
• “VISVIP: 3D Visualization of Paths through Web Sites”Cugini & Scholtz, National Institute of Standards and Technology, 1999
• “Case Study: E-Commerce Clickstream VisualizationBrainerd & Becker, Blue Martini Software, 2001
• “What Did They Do? Understanding Clickstreams with the WebQuilt Visualization System”Waterson et al., UC Berkeley, 2002
![Page 10: Web Log, Text, and Other Data Mining](https://reader030.vdocuments.net/reader030/viewer/2022020218/557d4e7bd8b42a93078b503f/html5/thumbnails/10.jpg)
Readings• “Visualizing the Evolution of Web Ecologies”
Chi et al., Xerox PARC, 1998
• “Visualizing Association Rules for Text Mining”Wong, Whitney, & Thomas, Pacific Northwest, 1999
• “VISVIP: 3D Visualization of Paths through Web Sites”Cugini & Scholtz, National Institute of Standards and Technology, 1999
• “Case Study: E-Commerce Clickstream VisualizationBrainerd & Becker, Blue Martini Software, 2001
• “What Did They Do? Understanding Clickstreams with the WebQuilt Visualization System”Waterson et al., UC Berkeley, 2002
![Page 11: Web Log, Text, and Other Data Mining](https://reader030.vdocuments.net/reader030/viewer/2022020218/557d4e7bd8b42a93078b503f/html5/thumbnails/11.jpg)
Evolution of Web Ecologies• Rather than hits, focus intermediate
representation on (C)ontent, (U)sage, and (T)opology, sorted by URL.– URL1:
• {day1: <link> <link> …}• {day2: <link> <link> …}
– URL2:• {day1: <link> <link> …}
• Visualize an entire web site in a small amount of space
• Show temporal changes
![Page 12: Web Log, Text, and Other Data Mining](https://reader030.vdocuments.net/reader030/viewer/2022020218/557d4e7bd8b42a93078b503f/html5/thumbnails/12.jpg)
Disk Tree Visualization• Breadth first traversal• Each ring represents a tree level• All leaf nodes guaranteed some
angular space (360 / # leaves)
Tree links line mark in X and Y
Page access frequency
line size/brightness
Lifecycle stage color: new, continued, deleted
![Page 13: Web Log, Text, and Other Data Mining](https://reader030.vdocuments.net/reader030/viewer/2022020218/557d4e7bd8b42a93078b503f/html5/thumbnails/13.jpg)
Disk Tree Visualization (cont)
• Pros – No occlusion problems since it’s 2D
plane– Can use the 3rd dimension for other
info (e.g. time)– Aesthetically pleasing to the eye (?)
• Cons– Difficult to see any page-level detail– Confusing color choices
![Page 14: Web Log, Text, and Other Data Mining](https://reader030.vdocuments.net/reader030/viewer/2022020218/557d4e7bd8b42a93078b503f/html5/thumbnails/14.jpg)
Time Tube Visualization• Put Disk Trees along spatial axis• Rotated so that each slice gets
equal screen area• Focus+context• Animation: Can fly through tube,
mapping time onto time
![Page 15: Web Log, Text, and Other Data Mining](https://reader030.vdocuments.net/reader030/viewer/2022020218/557d4e7bd8b42a93078b503f/html5/thumbnails/15.jpg)
![Page 16: Web Log, Text, and Other Data Mining](https://reader030.vdocuments.net/reader030/viewer/2022020218/557d4e7bd8b42a93078b503f/html5/thumbnails/16.jpg)
Interaction Model• Can rotate slices with a button click• Can focus a slice by clicking on it• Flicking gestures move slices around• Right-clicking zooms to an area• Mouseovers display more
information about a node in a side window
• Can bring up pages in the browser• Animation of slices
![Page 17: Web Log, Text, and Other Data Mining](https://reader030.vdocuments.net/reader030/viewer/2022020218/557d4e7bd8b42a93078b503f/html5/thumbnails/17.jpg)
Real-world Analyzes• Deadwood: Shows pages
becoming [un]popular• Shows effects of a redesign
![Page 18: Web Log, Text, and Other Data Mining](https://reader030.vdocuments.net/reader030/viewer/2022020218/557d4e7bd8b42a93078b503f/html5/thumbnails/18.jpg)
Real-world Analyzes (cont)• Added items are being
used• Deleted items aren’t
negatively impacting the rest of the site
![Page 19: Web Log, Text, and Other Data Mining](https://reader030.vdocuments.net/reader030/viewer/2022020218/557d4e7bd8b42a93078b503f/html5/thumbnails/19.jpg)
Comments• Gives only a broad view of the data
with no real way to get at the specifics
• Interaction seems very advanced• Not sure how intuitive the whole
idea of a circular tree is – seems kind of gratuitous
![Page 20: Web Log, Text, and Other Data Mining](https://reader030.vdocuments.net/reader030/viewer/2022020218/557d4e7bd8b42a93078b503f/html5/thumbnails/20.jpg)
Readings• “Visualizing the Evolution of Web Ecologies”
Chi et al., Xerox PARC, 1998
• “Visualizing Association Rules for Text Mining”Wong, Whitney, & Thomas, Pacific Northwest, 1999
• “VISVIP: 3D Visualization of Paths through Web Sites”Cugini & Scholtz, National Institute of Standards and Technology, 1999
• “Case Study: E-Commerce Clickstream VisualizationBrainerd & Becker, Blue Martini Software, 2001
• “What Did They Do? Understanding Clickstreams with the WebQuilt Visualization System”Waterson et al., UC Berkeley, 2002
![Page 21: Web Log, Text, and Other Data Mining](https://reader030.vdocuments.net/reader030/viewer/2022020218/557d4e7bd8b42a93078b503f/html5/thumbnails/21.jpg)
Association Rule?• Quantitative rule that describes
associations between sets of items– Not qualitative because no domain
knowledge necessary for text mining• Implication X Y where
– X: set of antecedent items– Y: consequent item
• Example: 80% of people who buy diapers and baby powder also buy baby oil.
![Page 22: Web Log, Text, and Other Data Mining](https://reader030.vdocuments.net/reader030/viewer/2022020218/557d4e7bd8b42a93078b503f/html5/thumbnails/22.jpg)
Association Rule? (cont)• Support/predictability/conditional
probability– Percentage of items in the total set
that satisfies the union of items in the antecedent and in the consequent item
• Confidence/prevalence/joint probability– Percentage of articles that satisfy both
the antecendent and the consequent item
![Page 23: Web Log, Text, and Other Data Mining](https://reader030.vdocuments.net/reader030/viewer/2022020218/557d4e7bd8b42a93078b503f/html5/thumbnails/23.jpg)
Association Rule Visualization
• Must visualize– Antecedent items & consequent items– Associations between antecedent and
consequent– Rules' support– Confidence
• Traditional ways of visualizing it– 2D matrix– Directed graph
![Page 24: Web Log, Text, and Other Data Mining](https://reader030.vdocuments.net/reader030/viewer/2022020218/557d4e7bd8b42a93078b503f/html5/thumbnails/24.jpg)
2D Matrix (figure 1)• Antecedent and consequent items on
axes• Metadata icons in the cells that
connect the antecedent to consequent contain support and confidence values
Association rule: B C
![Page 25: Web Log, Text, and Other Data Mining](https://reader030.vdocuments.net/reader030/viewer/2022020218/557d4e7bd8b42a93078b503f/html5/thumbnails/25.jpg)
2D Matrix (cont)• Pros: one-to-one binary relationships• Cons:
– Hard to see association rules in many-to-one relationships (A+BC or AC and BC)
– Grouping antecedents adds complexity– Object occulusion
![Page 26: Web Log, Text, and Other Data Mining](https://reader030.vdocuments.net/reader030/viewer/2022020218/557d4e7bd8b42a93078b503f/html5/thumbnails/26.jpg)
Directed graph• nodes = items• edges =
associations• Cons:
– Dozen or more items tangled display
– Selecting edges to display multiple rules requires significant human interaction
![Page 27: Web Log, Text, and Other Data Mining](https://reader030.vdocuments.net/reader030/viewer/2022020218/557d4e7bd8b42a93078b503f/html5/thumbnails/27.jpg)
Confusing?
![Page 28: Web Log, Text, and Other Data Mining](https://reader030.vdocuments.net/reader030/viewer/2022020218/557d4e7bd8b42a93078b503f/html5/thumbnails/28.jpg)
“Novel” Technique• Matrix: rule-to-item
– rows = topics– columns = item associations– blue/red = antecedent and
consequent
• Bar graph = confidence/support• Can use queries to filter• Mouse zooming to support
context/focus
![Page 29: Web Log, Text, and Other Data Mining](https://reader030.vdocuments.net/reader030/viewer/2022020218/557d4e7bd8b42a93078b503f/html5/thumbnails/29.jpg)
![Page 30: Web Log, Text, and Other Data Mining](https://reader030.vdocuments.net/reader030/viewer/2022020218/557d4e7bd8b42a93078b503f/html5/thumbnails/30.jpg)
“Novel” Technique Advantages
• Handles hundreds of multiple antecedent association rules
• View topics and associations simultaneously
• Individual items clearly shown• No antecedent groups• Few occulusions because metadata is
plotted at the far end and bar graph is scaled
• No screen swapping, animation, or serious interaction required
![Page 31: Web Log, Text, and Other Data Mining](https://reader030.vdocuments.net/reader030/viewer/2022020218/557d4e7bd8b42a93078b503f/html5/thumbnails/31.jpg)
“Novel” Technique Demo• Demo shows scalability• ~9 MB news article corpus of 100,000+
documents• Use word and concept-based text engines• Words evaluated on whether they’re
interesting depending on their position in documents
• Suffices removed and common prepositions, pronouns, adj’s, gerunds ignored
• Build a table of antecedents, consequents, confidences, and supports -> feed into viz
![Page 32: Web Log, Text, and Other Data Mining](https://reader030.vdocuments.net/reader030/viewer/2022020218/557d4e7bd8b42a93078b503f/html5/thumbnails/32.jpg)
![Page 33: Web Log, Text, and Other Data Mining](https://reader030.vdocuments.net/reader030/viewer/2022020218/557d4e7bd8b42a93078b503f/html5/thumbnails/33.jpg)
![Page 34: Web Log, Text, and Other Data Mining](https://reader030.vdocuments.net/reader030/viewer/2022020218/557d4e7bd8b42a93078b503f/html5/thumbnails/34.jpg)
Conclusions• Rule-to-item association• Very clear visualization if limited to
a few dozen rules• Most web log visualizations jump
to using a graph; this paper forces you to think twice.
![Page 35: Web Log, Text, and Other Data Mining](https://reader030.vdocuments.net/reader030/viewer/2022020218/557d4e7bd8b42a93078b503f/html5/thumbnails/35.jpg)
Readings• “Visualizing the Evolution of Web Ecologies”
Chi et al., Xerox PARC, 1998
• “Visualizing Association Rules for Text Mining”Wong, Whitney, & Thomas, Pacific Northwest, 1999
• “VISVIP: 3D Visualization of Paths through Web Sites”Cugini & Scholtz, National Institute of Standards and Technology, 1999
• “Case Study: E-Commerce Clickstream VisualizationBrainerd & Becker, Blue Martini Software, 2001
• “What Did They Do? Understanding Clickstreams with the WebQuilt Visualization System”Waterson et al., UC Berkeley, 2002
![Page 36: Web Log, Text, and Other Data Mining](https://reader030.vdocuments.net/reader030/viewer/2022020218/557d4e7bd8b42a93078b503f/html5/thumbnails/36.jpg)
VISVIP• Captures individual movement
between pages rather than aggregates
• Shows paths - sequence of URLs
![Page 37: Web Log, Text, and Other Data Mining](https://reader030.vdocuments.net/reader030/viewer/2022020218/557d4e7bd8b42a93078b503f/html5/thumbnails/37.jpg)
![Page 38: Web Log, Text, and Other Data Mining](https://reader030.vdocuments.net/reader030/viewer/2022020218/557d4e7bd8b42a93078b503f/html5/thumbnails/38.jpg)
Topology• Directed graph• Force-directed algorithm
– Spring-like force– Nodes repel each other with force
inversely proportional to the distance between them (i.e. closer nodes means closer pages)
– Final force pulls nodes toward center
![Page 39: Web Log, Text, and Other Data Mining](https://reader030.vdocuments.net/reader030/viewer/2022020218/557d4e7bd8b42a93078b503f/html5/thumbnails/39.jpg)
Content• URLs abbreviated
– http://sims.berkeley.edu/~bob/pics/large/abd.gif ge/abd
• Color-coded by content type• Mouseover reveals all the
abbreviated information
![Page 40: Web Log, Text, and Other Data Mining](https://reader030.vdocuments.net/reader030/viewer/2022020218/557d4e7bd8b42a93078b503f/html5/thumbnails/40.jpg)
Simplification• Common problems
– Noise nodes not significant to paths - image and mailto nodes
– Over-connectivity - link back to home page or company logo
• Solutions– Delete all edges connected to a node– Make one node the graph root– Focus on a subset of the graph
![Page 41: Web Log, Text, and Other Data Mining](https://reader030.vdocuments.net/reader030/viewer/2022020218/557d4e7bd8b42a93078b503f/html5/thumbnails/41.jpg)
Path Sequence• Showing subject paths as straight
lines didn't work– Hard to follow single jagged path– Multiple paths overlapped
• Spline representation– Each path is a smooth curve overlaid
on the graph– Colors for groups of subjects (e.g.
novices)
![Page 42: Web Log, Text, and Other Data Mining](https://reader030.vdocuments.net/reader030/viewer/2022020218/557d4e7bd8b42a93078b503f/html5/thumbnails/42.jpg)
![Page 43: Web Log, Text, and Other Data Mining](https://reader030.vdocuments.net/reader030/viewer/2022020218/557d4e7bd8b42a93078b503f/html5/thumbnails/43.jpg)
Path Sequence (cont)• User path-oriented layouts
– Simpler structure than when path is laid over a graph of the entire site
![Page 44: Web Log, Text, and Other Data Mining](https://reader030.vdocuments.net/reader030/viewer/2022020218/557d4e7bd8b42a93078b503f/html5/thumbnails/44.jpg)
Path Timing• Vertical bar with base
on node, its height proportional to time spent on page
• Animation runs through pages at 10-30 times real-time
• Select a node to get detailed stats
![Page 45: Web Log, Text, and Other Data Mining](https://reader030.vdocuments.net/reader030/viewer/2022020218/557d4e7bd8b42a93078b503f/html5/thumbnails/45.jpg)
Comments• Capturing individual movements
pretty innovative• Curved user paths and reorienting
the layout based on user paths• Overall graph viz not too clear• Good tips for creating a web log
mining viz
![Page 46: Web Log, Text, and Other Data Mining](https://reader030.vdocuments.net/reader030/viewer/2022020218/557d4e7bd8b42a93078b503f/html5/thumbnails/46.jpg)
Readings• “Visualizing the Evolution of Web Ecologies”
Chi et al., Xerox PARC, 1998
• “Visualizing Association Rules for Text Mining”Wong, Whitney, & Thomas, Pacific Northwest, 1999
• “VISVIP: 3D Visualization of Paths through Web Sites”Cugini & Scholtz, National Institute of Standards and Technology, 1999
• “Case Study: E-Commerce Clickstream VisualizationBrainerd & Becker, Blue Martini Software, 2001
• “What Did They Do? Understanding Clickstreams with the WebQuilt Visualization System”Waterson et al., UC Berkeley, 2002
![Page 47: Web Log, Text, and Other Data Mining](https://reader030.vdocuments.net/reader030/viewer/2022020218/557d4e7bd8b42a93078b503f/html5/thumbnails/47.jpg)
Clickstream Visualizer• Aggregate
nodes using an icon (e.g. all the checkout pages)
• Edges represent transitions– Wider means
more transitions
![Page 48: Web Log, Text, and Other Data Mining](https://reader030.vdocuments.net/reader030/viewer/2022020218/557d4e7bd8b42a93078b503f/html5/thumbnails/48.jpg)
Customer Segments• Collect
– Clickstream– Purchase history– Demographic data
• Associates customer data with their clickstream (scary...)
• Different color for each customer segment
![Page 49: Web Log, Text, and Other Data Mining](https://reader030.vdocuments.net/reader030/viewer/2022020218/557d4e7bd8b42a93078b503f/html5/thumbnails/49.jpg)
Filtering• Using the mouse or table control,
can filter by– Edge weight– Node selection
• Example: select checkout nodes and see if users are exiting from nodes
![Page 50: Web Log, Text, and Other Data Mining](https://reader030.vdocuments.net/reader030/viewer/2022020218/557d4e7bd8b42a93078b503f/html5/thumbnails/50.jpg)
LayoutUsing third party Tom Sawyer package1. Hierarchical from higher-out degree
to higher-in degree– Mirrors actual flow of site users– The default
2. Circular– Puts related nodes into circles– Shows relationships between groups of
pages
![Page 51: Web Log, Text, and Other Data Mining](https://reader030.vdocuments.net/reader030/viewer/2022020218/557d4e7bd8b42a93078b503f/html5/thumbnails/51.jpg)
Layout (cont)• Aggregation based on file system
path (good idea?)
![Page 52: Web Log, Text, and Other Data Mining](https://reader030.vdocuments.net/reader030/viewer/2022020218/557d4e7bd8b42a93078b503f/html5/thumbnails/52.jpg)
Initial Findings• Gender
shopping differences (intriguing...)
![Page 53: Web Log, Text, and Other Data Mining](https://reader030.vdocuments.net/reader030/viewer/2022020218/557d4e7bd8b42a93078b503f/html5/thumbnails/53.jpg)
Initial Findings (cont)• Checkout process
analysis• Newsletter hurting
sales
![Page 54: Web Log, Text, and Other Data Mining](https://reader030.vdocuments.net/reader030/viewer/2022020218/557d4e7bd8b42a93078b503f/html5/thumbnails/54.jpg)
Comments• Visualizing clickstreams with
demographic data• Grouping pages by type• Best use of color• Icons an interesting way of
reducing complexity
![Page 55: Web Log, Text, and Other Data Mining](https://reader030.vdocuments.net/reader030/viewer/2022020218/557d4e7bd8b42a93078b503f/html5/thumbnails/55.jpg)
Readings• “Visualizing the Evolution of Web Ecologies”
Chi et al., Xerox PARC, 1998
• “Visualizing Association Rules for Text Mining”Wong, Whitney, & Thomas, Pacific Northwest, 1999
• “VISVIP: 3D Visualization of Paths through Web Sites”Cugini & Scholtz, National Institute of Standards and Technology, 1999
• “Case Study: E-Commerce Clickstream VisualizationBrainerd & Becker, Blue Martini Software, 2001
• “What Did They Do? Understanding Clickstreams with the WebQuilt Visualization System”Waterson et al., UC Berkeley, 2002
![Page 56: Web Log, Text, and Other Data Mining](https://reader030.vdocuments.net/reader030/viewer/2022020218/557d4e7bd8b42a93078b503f/html5/thumbnails/56.jpg)
System Design• Log data with proxy• Infer actions• Aggregate data• Layout graph• Display interactive visualization
![Page 57: Web Log, Text, and Other Data Mining](https://reader030.vdocuments.net/reader030/viewer/2022020218/557d4e7bd8b42a93078b503f/html5/thumbnails/57.jpg)
Capturing Interaction
• Typical HTTP request…
Client Browser Web Server
![Page 58: Web Log, Text, and Other Data Mining](https://reader030.vdocuments.net/reader030/viewer/2022020218/557d4e7bd8b42a93078b503f/html5/thumbnails/58.jpg)
Capturing Interaction (cont)
• WebQuilt captures interaction with a proxy– Proxies have typically been used for
caching and firewalls
WebQuiltLog
ProxyClient Browser Web Server
![Page 59: Web Log, Text, and Other Data Mining](https://reader030.vdocuments.net/reader030/viewer/2022020218/557d4e7bd8b42a93078b503f/html5/thumbnails/59.jpg)
Capturing Interaction (cont)
• If a page says:<A HREF=“coolpage.html">
• Change it to:<A HREF="http://webquiltproxy.cs.berkeley.edu/webquilt?replace=http://www.spiffypages.com/coolpage.html&tid=1&linkid=13">
![Page 60: Web Log, Text, and Other Data Mining](https://reader030.vdocuments.net/reader030/viewer/2022020218/557d4e7bd8b42a93078b503f/html5/thumbnails/60.jpg)
Capturing Interaction (cont)
• Pros:– Don’t need access to servers– Can analyze sites without permission
from the server– Can gather clickstreams from a
variety of devices including PDAs, phones,desktop computers
• Cons:– No access direct to the client
![Page 61: Web Log, Text, and Other Data Mining](https://reader030.vdocuments.net/reader030/viewer/2022020218/557d4e7bd8b42a93078b503f/html5/thumbnails/61.jpg)
Visualization
Interactive, zoomable directed graph
• Nodes = web pages• Edges = aggregate traffic
between pages
Java-based SATIN toolkit for gesturing & zooming interaction
Image rendering of web pages:• JacoZoom Java callable wrappers
around an ActiveX component• MSIE window
![Page 62: Web Log, Text, and Other Data Mining](https://reader030.vdocuments.net/reader030/viewer/2022020218/557d4e7bd8b42a93078b503f/html5/thumbnails/62.jpg)
Directed graph• Nodes: visited pages
– Color marks entry and exit nodes
• Arrows: traversed links– Thicker: more heavily
traversed– Color
• Red/yellow: Time spend before clicking
• Blue: optimal path chosen by designer
![Page 63: Web Log, Text, and Other Data Mining](https://reader030.vdocuments.net/reader030/viewer/2022020218/557d4e7bd8b42a93078b503f/html5/thumbnails/63.jpg)
Controls• Slider: Zoom in and out• Checkboxes: Filter paths to display
![Page 64: Web Log, Text, and Other Data Mining](https://reader030.vdocuments.net/reader030/viewer/2022020218/557d4e7bd8b42a93078b503f/html5/thumbnails/64.jpg)
Pages• Zooming in shows page thumbnails• Arrows
– Originate from actual links or the Back button
– Translucent & don’t cover details
![Page 65: Web Log, Text, and Other Data Mining](https://reader030.vdocuments.net/reader030/viewer/2022020218/557d4e7bd8b42a93078b503f/html5/thumbnails/65.jpg)
![Page 66: Web Log, Text, and Other Data Mining](https://reader030.vdocuments.net/reader030/viewer/2022020218/557d4e7bd8b42a93078b503f/html5/thumbnails/66.jpg)
![Page 67: Web Log, Text, and Other Data Mining](https://reader030.vdocuments.net/reader030/viewer/2022020218/557d4e7bd8b42a93078b503f/html5/thumbnails/67.jpg)
![Page 68: Web Log, Text, and Other Data Mining](https://reader030.vdocuments.net/reader030/viewer/2022020218/557d4e7bd8b42a93078b503f/html5/thumbnails/68.jpg)
![Page 69: Web Log, Text, and Other Data Mining](https://reader030.vdocuments.net/reader030/viewer/2022020218/557d4e7bd8b42a93078b503f/html5/thumbnails/69.jpg)
LayoutLayout system flexible…1. Edge-weighted depth-first
traversal– Most visited path along top– Recursively place less followed paths
below
2. Grid positioning– Organizes distance between nodes– Avoid overlapping nodes
![Page 70: Web Log, Text, and Other Data Mining](https://reader030.vdocuments.net/reader030/viewer/2022020218/557d4e7bd8b42a93078b503f/html5/thumbnails/70.jpg)
Interaction• Selecting nodes• Zooming in and out• Navigational gestures
![Page 71: Web Log, Text, and Other Data Mining](https://reader030.vdocuments.net/reader030/viewer/2022020218/557d4e7bd8b42a93078b503f/html5/thumbnails/71.jpg)
Inferring & Aggregating• Take log files and infer actions,
such as when the back button is pressed– Can infer back button pressed, but
not combinations of back and forward– Extensible framework to add other
inferred actions
• Aggregate information, preserving individual paths
![Page 72: Web Log, Text, and Other Data Mining](https://reader030.vdocuments.net/reader030/viewer/2022020218/557d4e7bd8b42a93078b503f/html5/thumbnails/72.jpg)
Running a WebQuilt Remote Usability Test
• Recruit users• Design and distribute tasks (via
email)• Auto-collect! Watch and wait as
users perform tasks and proxy logs data
• Visualize, analyze• Use the results to change design
![Page 73: Web Log, Text, and Other Data Mining](https://reader030.vdocuments.net/reader030/viewer/2022020218/557d4e7bd8b42a93078b503f/html5/thumbnails/73.jpg)
Pilot Usability Study• Edmunds.com PDA web site• Visor Handspring equipped with a
OmniSky wireless modem• 10 users asked to find…
– Anti-lock brake information on the latest Nissan Sentra model
– The Nissan dealer closest to them.
![Page 74: Web Log, Text, and Other Data Mining](https://reader030.vdocuments.net/reader030/viewer/2022020218/557d4e7bd8b42a93078b503f/html5/thumbnails/74.jpg)
![Page 75: Web Log, Text, and Other Data Mining](https://reader030.vdocuments.net/reader030/viewer/2022020218/557d4e7bd8b42a93078b503f/html5/thumbnails/75.jpg)
![Page 76: Web Log, Text, and Other Data Mining](https://reader030.vdocuments.net/reader030/viewer/2022020218/557d4e7bd8b42a93078b503f/html5/thumbnails/76.jpg)
![Page 77: Web Log, Text, and Other Data Mining](https://reader030.vdocuments.net/reader030/viewer/2022020218/557d4e7bd8b42a93078b503f/html5/thumbnails/77.jpg)
![Page 78: Web Log, Text, and Other Data Mining](https://reader030.vdocuments.net/reader030/viewer/2022020218/557d4e7bd8b42a93078b503f/html5/thumbnails/78.jpg)
![Page 79: Web Log, Text, and Other Data Mining](https://reader030.vdocuments.net/reader030/viewer/2022020218/557d4e7bd8b42a93078b503f/html5/thumbnails/79.jpg)
In the Lab vs. Out in the WildComparing in-lab usability testing with WebQuilt
remote usability testing• 5 users were tested in the lab • 5 were given the device and asked to perform
the task at their convenience• All task directions, demographic data, and
follow up questionnaire data was presented and collected in web forms as part of the WebQuilt testing framework.
![Page 80: Web Log, Text, and Other Data Mining](https://reader030.vdocuments.net/reader030/viewer/2022020218/557d4e7bd8b42a93078b503f/html5/thumbnails/80.jpg)
![Page 81: Web Log, Text, and Other Data Mining](https://reader030.vdocuments.net/reader030/viewer/2022020218/557d4e7bd8b42a93078b503f/html5/thumbnails/81.jpg)
![Page 82: Web Log, Text, and Other Data Mining](https://reader030.vdocuments.net/reader030/viewer/2022020218/557d4e7bd8b42a93078b503f/html5/thumbnails/82.jpg)
Classifying Usability Issues
Lab: Tester observations, participant comments and questionnaire data
Remote: WebQuilt visualization and questionnaire data
Four categories of issues• Browser • Device• Test design• Site design
• Six severity levels• 0 indicates comment• 1-5 where 1 is a very minor issue and 5 is a critical issue
![Page 83: Web Log, Text, and Other Data Mining](https://reader030.vdocuments.net/reader030/viewer/2022020218/557d4e7bd8b42a93078b503f/html5/thumbnails/83.jpg)
Browser Device Interact before load (3) No forward button (2)
Difficulty with input in questionnaire (3)
Difficulty scrolling (2) Device errors unrelated to
testing (1) Tried writing on screen (0)
Site Design Test Design Falsely completed task (4) Long download times (4) Ping-pong behavior (3) Interact before load (3) Too much scrolling (2) Save address functionality
not clear (1) Back button navigation (0) Would like more features (0) Finds site useful (0)
Falsely completed task (4) Difficulty remembering
task description (3) Difficulty with input in
questionnaire (3) Questionnaire wording
problems (3) Forgot how to end task (1) Confusing task description
(1)
Findings
![Page 84: Web Log, Text, and Other Data Mining](https://reader030.vdocuments.net/reader030/viewer/2022020218/557d4e7bd8b42a93078b503f/html5/thumbnails/84.jpg)
Findings• WebQuilt methodology is promising for
uncovering site design related issues. • 1/3 of the issues were device or browser
related.• Browser and device issues can not be
captured automatically with WebQuilt unless they cause an interaction with the server
• can be revealed via the questionnaire data.
![Page 85: Web Log, Text, and Other Data Mining](https://reader030.vdocuments.net/reader030/viewer/2022020218/557d4e7bd8b42a93078b503f/html5/thumbnails/85.jpg)
Testing Concerns• What to do when problems with running
the test occur?• Understanding user motivation is still
ambiguous: Curiosity vs. confusion?• Gathering qualitative feedback on
mobile devices is difficult– PDA input difficult– Phones have potential for audio
![Page 86: Web Log, Text, and Other Data Mining](https://reader030.vdocuments.net/reader030/viewer/2022020218/557d4e7bd8b42a93078b503f/html5/thumbnails/86.jpg)
Comments• Zooming/filtering great for showing
overview and page-level details– Can put screenshots directly into the
viz
• Layout in relation to intended path• Study compares remote usability
tests to traditional tests - promising
• Proxy logging very cool
![Page 87: Web Log, Text, and Other Data Mining](https://reader030.vdocuments.net/reader030/viewer/2022020218/557d4e7bd8b42a93078b503f/html5/thumbnails/87.jpg)
Future Work• Expanded mobile device interaction
capture, specifically net-enabled cell phones
• Improve filtering capabilities, integrating questionnaire and demographic data
• Clever algorithms to simplify graph layout• Improved quantitative reporting• Improved controls/interaction• More rigorous evaluation with designers
and usability experts
![Page 88: Web Log, Text, and Other Data Mining](https://reader030.vdocuments.net/reader030/viewer/2022020218/557d4e7bd8b42a93078b503f/html5/thumbnails/88.jpg)
Concluding Comments• Many incremental improvements in
web log/data mining viz (using a graph, using demographic data, etc.)
• Would be really good to see a study of usability engineers and web developers comparing the tools themselves