faceted navigation (lacasis fall workshop 2005)
TRANSCRIPT
Faceted NavigationPresentation to LACASIS2005 Fall WorkshopSearch Forward: Emerging Internet Capabilities November 18th 2005
Brad Allen, Founder and CTOSiderean Software, Inc.
Copyright © 2005 Siderean Software, Inc. All rights reserved. 2
Overview Problem: Knowing what information is available Solution: Faceted navigation
How is navigation different than search
Case studies and business applications
Lessons learned Challenges Demonstration Discussion
Copyright © 2005 Siderean Software, Inc. All rights reserved. 3
Problem: Knowing what information is available
Copyright © 2005 Siderean Software, Inc. All rights reserved. 4
Faceted navigation: providing “a bird’s eye view” of available information
vs.
Copyright © 2005 Siderean Software, Inc. All rights reserved. 5
How faceted navigation differs from search Faceted navigation is a new type of software application It goes beyond search and browsing by providing:
Scope: an overview of all available information
Context: provide a frame of reference to orient oneself in a dynamic
information space
Repeatability: using scope and context as cues to lead users back to
relevant information
Universality: a unified means of accessing information that is
independent of type or source Faceted navigation provides the insight of analytics with the
ease of search
Copyright © 2005 Siderean Software, Inc. All rights reserved. 6
Faceted navigation: origins Library science
Raganathan and the invention of faceted classification
Digital library efforts Information retrieval
Parametric search
Query by example
Retrieval by reformulation
Rabbit, Argon
Systems have been moving from academic prototypes into commercial use over the last four years
Marti Hearst as a pioneer in this area
Siderean, Endeca, Vivisimo, FAST driving technology into enterprises
Copyright © 2005 Siderean Software, Inc. All rights reserved. 7
Facets: the basis of navigation Facets are metadata properties whose ranges form a near-
orthogonal set of controlled vocabularies Creator: Dickens, Charles
Subject: Arsenic, Antimony
Location: World > U.S. > California > Venice
Facets form a frame of reference for information overview, access and discovery
Other properties serve as landmarks and cues
Copyright © 2005 Siderean Software, Inc. All rights reserved. 8
Building navigation applications
Organized into a unified information architecture…
Analyzed to generate faceted views…
Providing faceted navigation across
the data and content
Metadata about data and content is aggregated…
Term
Event
Person
PlaceText
View View
Copyright © 2005 Siderean Software, Inc. All rights reserved. 9
Case study: NASA JPL Delivery to implementation in
weeks using 3 internal resources Brings together SharePoint,
DocuShare, and structured trouble ticketing databases
Provides uniform access to all relevant information about previous projects in one place
Incorporates corporate vocabulary for concept-based search
Allows user community to contribute to organization of information
Copyright © 2005 Siderean Software, Inc. All rights reserved. 10
Metadata in today’s enterprises From thirty interviews conducted with Fortune 1000 organizations during Fall 2004
Use of metadata not yet widespread but emerging
Understanding varies widely across enterprises
Three basic approaches:
Top down
CEO says “We must be an information-driven company”
“Corporate controlled vocabulary that all divisions will use”
The effort is multi-year, ROI hard to track, and may not be implemented or adopted widely
Bottom up
Groups determine their vocabulary while describing their process
Light tagging of content when it is created or when the content is published to a portal
Give up
Assumption: too difficult to create metadata from existing content
But still feel that metadata would improve matters, particularly within business units
Copyright © 2005 Siderean Software, Inc. All rights reserved. 11
Verticals for faceted navigation
Vertical Strong identified application fit
Existing metadata Adopting semantic technologies
Business users
Federal Government
Search, analyze and monitor complex,
dynamic intelligence, project and problem information across organizations and
projects (Columbia, Iraq, 9/11)
Scads of all types, with unstructured information often preprocessed to
boot
Commitment to RDF/OWL as
solution for cross-agency
interoperability, actively using RSS
Intelligence analysts
E-Commerce Search and browse catalogs of products
and services, consumer-generated
information
Product catalogs, customer reviews, customer service data, advertising
Pervasive adoption of XML standards for moving product and
customer data across value chains
Consumers, marketers
Financial Services Search, analyze and monitor dynamic
financial and market data
News feeds, financial DBs, market data
Adoption of RSS for market news
emerging
Traders, industry analysts, investment
bankers
Copyright © 2005 Siderean Software, Inc. All rights reserved. 12
Navigation requires metadata Ontologies
Specifications of how to represent classes, instances and their properties
Sometimes called “vocabularies” Controlled vocabularies
Terms for saying what something is about
Also called “taxonomies” and “thesauri” Instances
Descriptions of resources Application profiles
Specifications of which classes and properties are useful and how they are to be
used in an application
Copyright © 2005 Siderean Software, Inc. All rights reserved. 13
Lessons learned Balanced incremental approach Leverage metadata and indices at hand Exploit statistics where desirable
But layer a framework on top to structure the statistics
Significant mileage from very simple frameworks
Copyright © 2005 Siderean Software, Inc. All rights reserved. 14
The utility of RDF for commercial metadata RDF can make metadata use easier and less costly
An open standard for metadata reduces cost and avoids technology and
vendor lock-in
A “universal solvent” for data and content
A platform for reuse and sharing
Copyright © 2005 Siderean Software, Inc. All rights reserved. 15
Building navigation systems with RDF Define/reuse ontologies expressed in RDF(S)
Classes for defining instances and controlled vocabularies
Properties for facets and additional attributes
Import/transform instances into an RDF representation Resources referred to via URIs
Content and controlled vocabularies
Write application profiles in terms of RDF
Copyright © 2005 Siderean Software, Inc. All rights reserved. 16
Lessons: ontologies Don’t do: assume you have to build elaborate OWL ontologies
Don’t have to boil the ocean to get the benefits
OWL DL, OWL Full are overkill for this class of application
Side issue: description logic for navigation is not addressed adequately by OWL
Class/subclass versus arbitrary hierarchical relations
Do: Tiny Ontologies All Stitched Together (TOAST) RDF Schema with a smattering of RDF/OWL properties (e.g.,
owl:inverse)
Start with DC + SKOS + FOAF
Copyright © 2005 Siderean Software, Inc. All rights reserved. 17
Lessons: controlled vocabularies Don’t do: huge monolithic taxonomies
Unless they are ready at hand and can be reused largely without
modification
Do: bite-sized controlled vocabularies that exploit faceted approaches
4 facets x 10 terms per facet versus 104 terms in a single taxonomy
Start with flat term lists
Add BT/NT/RT relationships over time
Copyright © 2005 Siderean Software, Inc. All rights reserved. 18
Lessons: instances Manual creation
Don’t do: exhaustive author creation of metadata
Do: community annotation and tagging
(Semi-)automated creation Don’t do: assume elaborate information extraction based on NLP,
subject tagging and categorization
Do: quick and dirty named entity extraction, or better yet, stick to readily
available asset and relational metadata (date, creator, document
type/genre)
Much of the benefit at a fraction of the effort
Copyright © 2005 Siderean Software, Inc. All rights reserved. 19
Lessons: application profiles Metadata is increasingly pervasive
The way to leverage existing information infrastructure
Exploit “on-demand” information integration feature of RDF DB + XML XLST RDF(S): a simple, sloppy framework
Part of Adam Bosworth’s “Web of data”
Copyright © 2005 Siderean Software, Inc. All rights reserved. 20
The big question: statistics vs. knowledge Statistics can’t deliver everything
Alan Kay’s puppy analogy
Vitanyi work on “Google learning”
On the other hand, knowledge is dearly won CYC
Need a balance that enables adoption without losing the benefits
Lessons from Statistics vs. knowledge in NLP
Expert systems
Copyright © 2005 Siderean Software, Inc. All rights reserved. 21
Future directions User tagging + RDF: the killer SW application?
The rehabilitation of metadata in the social software community
The re-emergence of RSS 1.0
“Folksonomy”-driven social search
Del.icio.us, Flickr, CiteULike
Towards social navigation: fac.etio.us
Copyright © 2005 Siderean Software, Inc. All rights reserved. 22
fac.etio.us Aggregated feeds from del.icio.us
social bookmarking site 105 Web pages 104 tags 104 contributors 104 orginating sites
Superior user experience with 10 minutes’ effort
“In 3 clicks, I drilled down through 9700+ sites, to a more specific set of 98 things, down to one I found useful.”
Tagging the tags to add semantics Bootstrapping folksonomies into
taxonomies without impacting user creation of metadata
Merging anarchy with governance
Copyright © 2005 Siderean Software, Inc. All rights reserved. 23
Challenges Scale
Must be commensurate with expectations and requirements from
traditional web and enterprise search
Algorithms Many alternatives still being explored
Usability Lots of work to be done to validate benefits
Security, trust and provenance Just beginning to understand
Copyright © 2005 Siderean Software, Inc. All rights reserved. 24
Challenges: scale Navigation has to live up to the scaling expectations set by
search, while it is doing a lot more work Number of objects, feeds: 106 to 109
Ingest rates: ~ 103 – 104 triples/sec, how many per resource?
Latency: < 0.5 sec user time regardless of application
Implementations exploit RAM to deliver low latency, but this is an impediment to terabyte-scale bodies of information
Copyright © 2005 Siderean Software, Inc. All rights reserved. 25
Challenges: algorithms Federated services vs. centralized servers Relationship to relevance ranking Support for aggregate and text search operators in RDF query Integration of multimedia retrieval algorithms as equal citizens
to free text retrieval
Copyright © 2005 Siderean Software, Inc. All rights reserved. 26
Challenges: usability Navigation interfaces in their infancy Tagging interfaces even more so Principled analyses of precision and recall have yet to be
done Visualization beyond “sticks and ovals” is begging to be
integrated Navigate to a small result set, then visualize
Copyright © 2005 Siderean Software, Inc. All rights reserved. 27
Summary Faceted navigation is a new software product category that
addresses the pain associated today with finding and discovering actionable information
The use of Semantic Web standards, principally RDF, enables the development of faceted navigation applications
It is “early days” for faceted navigation applications and challenges remain, but we believe the potential is significant
Siderean Software, Inc.390 North Sepulveda Blvd., Suite 2070El Segundo, CA 90245-4475 USA+1 310 647-4266http://www.siderean.com