taxonomies and metadata for content management
Post on 02-Jan-2016
32 Views
Preview:
DESCRIPTION
TRANSCRIPT
Taxonomies and Metadatafor Content Management
Michael HuffInformation Resource OfficerU.S. Department of State
E-Government Act of 2002
• The use of computers and the Internet is rapidly transforming societal interactions and the relationships among citizens, private businesses, and the Government.
• The Federal Government has had uneven success in applying advances in information technology to enhance governmental functions and services, achieve more efficient performance, increase access to Government information, and increase citizen participation in Government.
• Most Internet-based services of the Federal Government are developed and presented separately, according to the jurisdictional boundaries of an individual department or agency, rather than being integrated cooperatively according to function or topic.
Which U.S. Government organizations are experienced in using metadata & taxonomy tools?
– Defense Intelligence Agency– USDA Economic Research Service (ERS)– Federal Aviation Administration– FirstGov
– NASA– Small Business Administration– Social Security Administration– Department of State
Terms Definitions
Metadata Data about data - a label that describes a content object so unstructured content can be managed like structured content.
Taxonomy The specification and classification of the names of people, places, things, and everything else that is needed to allow search engines and other content applications to work better.
Facet Classification
Discrete set of elements (or fields) for labeling content and content components.
Controlled Vocabulary
A managed set of terms for which there is an agreed upon value or definition.
Field Data Type / Source
Title string
Creator string
Identifier URL
Date date
Subject (~10,000 categories)
Metadata Taxonomy
• Adding metadata to unstructured content allows it to be managed like structured content.
• Enriching content with structured metadata is critical for supporting search and personalized content delivery.
• Content that has been adequately tagged with metadata can be leveraged in usage tracking, personalization and improved searching.
Why use metadata?
User experience. How content is presented and how users experience and interact with it dictates its perceived and actual value.
Content architecture: Scalable metadata framework to enable content reuse, and handle changes in organization goals, user needs, and retrieval concerns.
Tools and technology. The information supply-chain platform that enables workflows, and supports organizational and operational concerns.
Where does metadata fit in the information system architecture?
Content architecture defined
Scalable metadata framework to enable content reuse, and handle changes in organizational goals, user needs, and retrieval concerns.
Business objectives
Metadata specification
Vocabulary specification
Training
Content Types Organization Audience Location Function /Process
Market Product/ Services
Topics
Web Content Images Code Rich Media Document Type
- General Management
- Reports & Documentation
- Tracking- Control / Policies & Procedures
- Legal & Compliance
- Personnel- Learning/Training- Templates & Forms
- Public Relations- Models- Meeting- Credit
Internal- IT- HR/CRE- PASB- International- ESG- USCO- US Consumer Card
- Credit Risk Management
- Thrift- Auto Finance- Finance- Investor Relations- Legal- Strategy- Brand- Enterprise Risk Management
Committees- Executive- Cross-Functional
External- Contractors- Vendors- Affinity Relationships
- Partnerships- Board of Directors
International- Africa- Asia- Europe- Canada
US- Idaho- Massachusetts- Texas- Virginia- California- Washington- Florida
Business Processes- Develop Business Strategy
- Develop Products & Services Strategy
- Market Products- Process Orders- Service Customers
- Manage Customer Relationships
- Manage Collections and Recoveries
- Staff Services Operating / Supporting Processes
- Analytical Functions
- Communications Functions
- Financial Functions
- Information Handling Functions
- Maintenance- Organizational Functions
- Sponsoring- Project Management
• SDM• PMM
Lines of Business- Partnership Implementation
- Under-served- Lifestyle- Cross-Sell- Hispanic- Canada- Young Adults- CRS- E-Commerce- Smile- Small Business
Asset Type- Sub-Prime- Prime- Super Prime
LifePhases- Marriage- New U.S. Residents
- Young Adults- New Parents- Moving- Divorce- Death
Card- Credit Card
• Classic• Premium• Secured• Small
Business• Equity• Others
- Debit Loans
- Auto• PeopleFirst
- Home Equity• Full Spectrum• Countrywide• LoanCenter
- Medical• AmeriFee
- Installment- Home Equity
Insurance- Auto- Life
Savings Products- CDs- Money Markets- IRAs
Product Attributes- Annual Fee- Credit Line Levels- APR- Balance Transfer Rate
- Other Benefits
Contracts Credits Credit Line Management
Fee & Charges Finance Financial Institutions Financial Instruments
Management Market Strategy Marketing Mass Media Public Relations Purchasing Rates Rates and Rankings Ratios Research Risk Settlement and Damages
Statistics
Internal- By Tier- Level
• Leadership• Associate• Employee• Administrative
- Associate Type• Phone• Non-Phone
- Function• Manager• People Manager
• Non-Manager- Type
• Exempt• Non-Exempt
- Time with Firm• New Employee
• Old Employee External
- Customers- Regulators- Media- Non-Profit- Contractors- Vendors- Affinity Relationships
- Partnerships- Board of Directors
Content inventory
Content model
Rules & procedures
What is Dublin Core?
• Dublin Core is the metadata standard for describing Internet resources so they are easy to find.
Dublin Core approved as ISO 15836.
03 04
For more information: http://www.dublincore.org
Original workshop held in Dublin, Ohio.
Shanghai meeting.
95
Asset metadata – Who, Where & When:
Title, Creator, Publisher, Contributor, Date, Type,
Format, Identifier, Source, Language
Subject metadata –What & Why:
Subject, Description, Coverage
Relational metadata – Links between and to:
Relation
Use metadata – How can it be used:
Rights & Permissions
Enabled Functionality
Co
mp
lex
ity
Why is metadata important?
http://dublincore.org/documents/dcmi-terms/
More efficient editorial process
Better navigation &
discovery
The specification of the names of people, places, things The specification of the names of people, places, things
What is a taxonomy?
Kingdom Phylum Class Order Family Genus Species
AnimaliaChordata
MammaliaCarnivora
CanidaeCanis
C. familiari
Segment Family Class Commodity
44-Office Equipment and Accessories and Supplies .12-Office Supplies
.17-Writing Instruments
.05-Mechanical pencils
.06-Wooden pencils
.07-Colored pencils
Linnaeus …
UNSPSC …
The specification of the names of people, places, things … and everything else that is needed
to allow search engines and other content applications to work better.
The specification of the names of people, places, things … and everything else that is needed
to allow search engines and other content applications to work better.
Sample Recipe TaxonomySample Recipe Taxonomy
Controlled VocabulariesControlled Vocabularies
Main Ingredient
s
Cooking Methods
CoursesMeal Type Cuisines
ChocolateDairyFruitsGrainsMeat & SeafoodNutsOlivesPastaSpices & SeasoningsVegetables
AdvancedBakeBroilFryGrillMarinadeMicrowaveNo CookingPoachQuickRoastSautéSlow CookingSteamStir-fry
AppetizersBeveragesBreadsCheeseCocktailsDessertsFish & ShellfishFruitHors d'OeuvresMeatPastaSaladSandwichesSoupVegetables
BreakfastBrunchLunchSupperDinnerSnack
AfricanAmericanAsianCaribbeanContinentalEclectic/ Fusion/ InternationalJewishLatin AmericanMediterraneanMiddle EasternVegetarian
Facet CategoriesFacet Categories
The power of taxonomy facets
• 4 independent categories of 10 nodes each have the same discriminatory power as one hierarchy of 10,000 nodes (104)• Easier to maintain• Can be easier to
navigate
Main Ingredients
Cooking Methods
Meal Type Cuisines
ChocolateDairyFruitsGrainsMeat & SeafoodNutsOlivesPastaSpices & SeasoningsVegetables
BreakfastBrunchLunchSupperDinnerSnack
AfricanAmericanAsianCaribbeanContinentalEclectic/ Fusion/ InternationalJewishLatin AmericanMediterraneanMiddle EasternVegetarian
AdvancedBakeBroilFryGrillMarinadeMicrowaveNo CookingPoachQuickRoastSautéSlow CookingSteamStir-fry
7 Common taxonomy facets
Facet Definition Example Source
Products and Services
Names of products and services. ERP system, Your products and services, etc.
Organization Organizational structure. FIPS 95-2, Your organizational structure, etc.
Content Type Structured list of the various types of content being managed or used.
AGLS Document Type, AAT Information Forms , Records management policy, etc.
Industry Broad market categories such as lines of business, life events, or industry codes.
FIPS 66, SIC, NAICS, etc.
Location Place of operations or constituencies. FIPS 5-2, FIPS 55-3, ISO 3166, US Postal Service, etc.
Function Functions and processes performed to accomplish mission and goals.
FEA Business Reference Model, Enterprise Ontology, AAT Functions, etc.
Audience Subset of constituents to whom a piece of content is directed or intended to be used.
GEM, ERIC Thesaurus, IEEE LOM, etc.
Topic Business topics relevant to your mission and goals.
Federal Register Thesaurus, ERIC Thesaurus, ProQuest, etc.
Personalized content delivery requires defining taxonomy facets
… and re-use of existing vocabulary sources
Content types Organization Audiences TopicsFunctionality and
process Locations MarketsProduct and
services
High-Level Taxonomy
Document
Rich-Media
Web content
Internal
External
Internal
External
Contracts
Credit
Fees andcharges
Finance
Competition
Financialinstitutions
Financialinstruments
Management
Marketstrategy
Marketing
Product design
Customeracquisition
Credit policies
Riskmanagement
Collectionpractices
Retentionprocess
Cross-selling
Projectmanagement
Governance
Testing
Contractors
Vendors
Customer
Vendors
Regulators
Contactors
Media
US
City
Country
Provences
States
LOB
Life events
Demographics
Credit cards
Insurance
Loans
FinancialServices
Source code
Suppliers
Partners
International
Military
Applying the facets to the Dublin Core metadata elementsDublin Core
ElementsDefinition Vocabulary
Source
Title Resource name. Not applicable
Creator Content maker. LDAP
Subject Content topic. Keyword Topic facet
Description Description of content, summary. Not applicable
Publisher Publisher of this manifestation. Agency facet
Contributor Content contributor. LDAP
Date Content lifecycle event for this manifestation.
Not applicable
Type Genre. Form Type facet
Format Format of this manifestation. RFC 2045
Identifier Reference for this manifestation, e.g., URL.
Not applicable
Source Source from which this manifestation has been derived.
Not applicable
Language Language of this manifestation. ISO 639
Relation Reference to related resource. None
Coverage Space, period, date, jurisdiction, etc.
Jurisdiction facet
Rights Who has rights to use this manifestation.
Privacy level
Applied taxonomy metadata facilitates a multi-faceted view of content
Applied taxonomy metadata facilitates a multi-faceted view of content
Facets at work on FirstGov site
OrganizationOrganization
Content TypeContent Type
FrequencyFrequency
AudienceAudience
http://www.firstgov.gov
http://www.tesco.com/winestore
Guided Navigation2-3 clicks to productNo dead ends
Powered by
http://www.towerrecords.com
Powered by
http://www.fortunoff.com
Seven practical rules for taxonomies
1. Incremental, extensible process that identifies and enables owners, and engages stakeholders.
2. Quick implementation that provides measurable results as quickly as possible.
3. Not monolithic—has separately maintainable facets.
4. Re-uses existing IP as much as possible.
5. A means to an end, and not the end in itself.
6. Not perfect, but it does the job it is supposed to do—such as improving search and navigation.
7. Improved over time, and maintained.
What is the general purpose of the content you are managing?
What types of content are you handling?
Who is the audience for this content?
What are the core organizational objectives that the content is related to?
• Creating a taxonomy is only part of the job• How will it be put to use?• In a new application, or by
modifying an existing application?
• What’s the effort around that?
• Additional Issues• Tagging – Who will add the
metadata and how?
Link to Bios from Personal Names
Link to info on Countries
Link to company data (quotes, news, ...)
from Company names
Alerts on People, Companies, and
Topics
Browse by Topic
1 Identify Objectives
Conduct interviews
2 Inventory Content
ID sources, spider assets & extract
metadata
Define fields & purpose
3 Specify Metadata
4 Model ContentDefine content chunks & XML
DTDs
5 Specify Vocabularies
Compile controlled vocabularies
6 Specify Procedures
Develop workflow, rules & procedures
7 Train StaffDevelop
materials & train staff
Task 1 – Identify objectives
What do you do? What kinds of digital assets are being produced? For what audiences?
What is the business process for submitting, selecting, editing, maintaining digital assets?
How many digital assets are there? How fast is this growing?
Are there particular industry or other standards that are important?
What types of assets are hard to search for (that should be easier to find)?
What tools would be helpful in locating assets? Acronyms? Abbreviations? Nick names? Glossary? Thesaurus? Taxonomy?
Who else should we be talking to?
Task 2 – Inventory content
Path/URL
1. Identify target asset file path/URL.
Spider-generated
2. Automatically generate inventory metadata by
crawling file stores.
Audit process
3. Audit assets using inventory.
New facets
4. Enhance metadata with new facets.
Task 3 – Specify metadata Element
Data Type Length
Req. / Repeat Source Purpose
Identifier String 48 chars 1System supplied Basic accountability
Author String Variable * LDAP validated Credits
Title String Variable ? User Text search, results display
Embargo Date Date Fixed ? System Obey rights
Description String Variable ? User Text search, results display
Asset Type List Fixed 1Asset Types vocabulary
Browse or group search results
Subject
Audience List Fixed *Audience vocabulary
Custom interface for group of users
Location List Fixed * ISO 3166 Filter or rank search results
Organization List Fixed *Organization vocabulary
Key index to retrieve & aggregate assets
...
Legend: ? – 1 or more * - 0 or more
Task 4 – Model content
Factor asset types from inventory into canonical types.
Select examples from inventory (possibly with spider).
Identify useful chunks for each asset type.
Factor chunks into element superset.
Identify relationships between chunks.
Iterate until agree on asset types, elements, and relationships.
Footer area
Header area
Main content area
Left navigation area
Task 5 – Specify vocabularies
Develop broad taxonomy outline (1-3 levels deep)
Review, revise, and approve taxonomy outline with stakeholders and subject matter experts.
Fill in taxonomy outline
Tag random samples from content inventory
Review, revise, and approve draft taxonomy with stakeholders and subject matter experts.
Task 6 – Specify procedures
Develop taxonomy style rules, ensure that the taxonomy follows them.
Develop tagging rules and procedures, along with software to assist in the task.
Specify taxonomy maintenance process and the update procedures to follow.
Task 6 – Governance & Maintenance
Recommendations by Editor1 Small taxonomy changes (labels, synonyms)2 Large taxonomy changes (retagging, application changes)3 New ‘best bets’ content
Committee considerations1 Business Goals2 Change in user experience3 Retagging cost
The taxonomy must be changed over time.
Suggestions for changes can come from users, through query log analysis, and staff, from feedback form.
Governance structure needed to make sure changes are justified.
End User
Steering Committee
Firewall
Taxonomy
ContentApplicationLogic
TaggingLogic
ApplicationUI
TaggingUI
Tagging Staff
Taxonomy Editor
Staff notes
‘missing’ concepts
Query log analysis
Task 6 – Steering Committee Roles
Business Lead
Keeps committee on track with larger business objectives
Balances cost/benefit issues to decide appropriate levels of effort
Specialists help in estimating costs
Obtains needed resources if those in committee can’t accomplish a particular task
Technical Specialist
Estimates costs of proposed changes in terms of amount of data to be retagged, additional storage and processing burden, software changes, etc.
Helps obtain data from various systems
Content Specialist
Committee’s liaison to content creators
Estimates costs of proposed changes in terms of editorial process changes, additional or reduced workload, etc.
Taxonomy Specialist
Suggests potential taxonomy changes based on analysis of query logs, indexer feedback
Makes edits to taxonomy, installs into system with aid of IT specialist
Content Owner
Reality check on process change suggestions
Task 7 – Train staff
Staff will require training onThe UI they use to tag the content
The rules to follow when deciding what codes to apply
The end-effect of the codes they apply
The structure of the taxonomy
Tagging examples come from the content inventory
Hardcopies of the taxonomy, and yellow highlighters, are helpful during training
Indexing rulesRule Description
Specificity rule
Apply the most specific terms when tagging assets. Specific terms can always be generalized, but generic terms cannot be specialized.
Repeatable rule
All attributes should be repeatable. Use as many terms as necessary to describe What the asset is about and Why it is important. Storage is cheap. Re-creating content is expensive.
Appropriateness rule
Not all attributes apply to all assets. Only supply values for attributes that make sense.
Usability rule
Anticipate how the asset will be searched for in the future, and how to make it easy to find it. Remember that search engines can only operate on explicit information.
Indexing UI
What about Automatic Categorization?
• Automatic vs. Manual Categorization is a cost/benefit tradeoff– Semi-automated recommended over pure
manual in production situations.– Automatic performance not bad, but not equal to
trained manual tagging.• Software is not sane, so errors look crazy.
– Large backlogs of content can’t justify investment of high-quality manual tagging
• Old articles rarely accessed.• Recommend automated bulk tagging with
error reporting and correction process.
What about automatically-created taxonomies?
Typically a single hierarchy with no overall plan
Results hard for people to navigate
What about automatic categorization?
Accuracy close to human levels, but errors are very different
Cost/benefit tradeoff
Semi-automation is best practice
Enterprise taxonomy maintenance workflow
Analyst Editor
Problem?
Copywriter
Problem?
Yes
Yes No
No
Suggest new name/category
Review new name
Taxon-omy
Taxonomy Tool
Copy edit new name
Add to enterprise Taxonomy
Sys Admin
Categorize with a purpose
What is the problem you are trying to solve?
Improve search
Browse for content on an enterprise-wide portal
Enable users to syndicate content
Otherwise provide the basis for content re-use
How will you control the cost of creating and maintaining the metadata) needed to solve these problems?
CMS with a metadata tagging products
Semi-automated classification
Taxonomy editing tools
Guided navigation tools
How do you sell it?
Don’t sell the taxonomy, sell the vision of what you want to be able to do
Clearly understanding what the problem is and what the opportunities are
Costs and benefits
Design the taxonomy in relation to the value at hand
Internet Resources
U.S. Government Resources
http://www.nasa.gov/home/index.html
http://pub-lib.jpl.nasa.gov/pub-lib/dscgi/ds.py/View/Collection-10
http://www.loc.gov/flicc/wg/taxonomy.html
http://www.loc.gov/lexico/servlet/lexico/
http://www.archives.gov/federal_register/code_of_federal_regulations/thesaurus.html
http://feapmo.gov/
http://www.km.gov/
Other Resources
http://www.educause.edu/asp/taxonomy/show_taxonomy_links.asp?TREE=1&EXPAND=1
http://databases.unesco.org/thesaurus/
http://www.naa.gov.au/recordkeeping/control/functions_thesaur/contents.html
http://www.taxonomystrategies.com/html/bibliography.htm
Summary
Why taxonomies?Why metadata?
Shiyali Ramamrita Ranganathan
Ranganathan’s Five Laws of Library Science
1. Books are for use (They don't belong on the shelf)
2. Books are for all; every reader his book (Every reader is unique)
3. Every book its reader (Every book is unique)4. Save the time of the reader (Make libraries
easy to use)5. A library is a growing organism (Libraries are
constantly changing to meet changing patron needs)
Thank you
Michael HuffInformation Resource Officer
U.S. Department of Statehuffmp@state.gov
top related