taxonomy lecture 12. topics tutorial review classification frame terminology classical taxonomy...

24
Taxonomy Lecture 12

Post on 19-Dec-2015

216 views

Category:

Documents


0 download

TRANSCRIPT

Taxonomy

Lecture 12

Topics

• Tutorial Review• Classification Frame• Terminology• Classical Taxonomy• Using Classifications

– In system use– In system development

• Review • Preview

Tutorial Review-Dating System• Outer join

– to include unmatched persons as well as matched• Select .. from person left join pair

– to include only unmatched : • where partner is null

– (from Placement visit) use with reference tables• Do updates before displaying status• Use tables within tables for layout• Complex calculation has to be repeated

– In Oracle/ SQL server , procedure can be stored in DBMS• Multi-user issues

– Use single queries or transactions for atomicity– Still get problems with ‘dirty’ data – screen allows match for

‘albert’ but albert already matched• Refresh shows blank screen – now refresh home screen only• An Update would be rejected (but can be ignored)

Classification Errors (Information Retrieval)

Relevant Irrelevant

Retrieved

Not retrieved true negative

true positive

false negative(Type II error)

false positive(Type 1 error)

Precision = TP/ (TP + FP) = TP/ Retrieved

Recall = TP / (TP + FN) = TP / Relevant

Efficiency = (TP + TN) / (TP + TN + FP + FN) = (TP+TN) / Full Collection

Example Calculation : email filteringGood Email Spam

reject

accept

• Precision = TP/ (TP + FP) = 3/8• Recall = TP / (TP + FN) = 3/7• Efficiency = (TP + TN) / (TP+TN+FP+FN) = 9/18= 50%• Recall > Precision => not quite balanced

7 11

3 5

TP FP

FN TN

4 6

Classification and Systems Design

• Steps in Classification– defining the domain (what kinds of things are to be classified) – creating the taxonomy (the set of categories), is purpose and force– defining the representation of individuals– defining the mapping between individuals and categories– coding the categories– creating automatic classifiers– assisting human classifiers– assisting users to interpret categorical information– evaluating classification performance– supporting evolution of taxonomy and classifiers

“An early step towards understanding any set of Phenomena is to learn what kinds of things there are in the set – to develop a taxonomy”

Herbert Simon

Classification in the News

• Criminal Justice as a Classifer– Murder, Manslaughter or Innocent

• Is ‘Munchausen by Proxy’ a real psychological condition?• Prisoners of war – US invents a new category for the

Quantanamo Bay prisoners• Blood groups:

– A,B,AB,O– RH+ , RH-

• Classification of Cloud types (Cumulus, Cirrus…) by Luke Howard 1802

• Hip evaluation to determine priority for replacement• Text classification to bring sense to the Internet

Categories in Information Systems

• Many systems require the user to classify things in the real world into categories in order to process them:– Files and documents on disk– Facilities in the University (helpdesk, reception..– Skills in a Placements system– Budget headings, Nominal Ledger headings– Complaints– Fault priority

• On the system, categories can be clearly distinguished:– Codes for each category

• But the user typically has the task of mapping the real, complex things into the appropriate categories and interpreting categorical information

Categories in IS theory

• Much of IS theory is based on a taxonomy:– Problem /solution– Method/methodology/technique..– ER model – Data Flow Diagram– Soft Systems Analysis - CATWOE– Logical /Physical– Swot analysis

• Strengths/Weaknesses/Opportunities/Treats

– Objective, Goal, Requirement, Constraint

Terminology

• Category/ Class– A group of similar objects

• Binary Category– An object is either in the category or not

• Taxonomy– A set of Categories, sometimes organised into a hierarchy, for a

common purpose– Multiple Taxonomies may be applied to the same population of

objects• Categorisation/ Classification

– The task of placing objects into the appropriate Category / Class• Clustering

– The process of identifying similar objects

A dodgy taxonomy• The Argentinean writer Jorge Luis Borges ‘Imaginary Beasts’,

‘Labyrinths’..) quotes a ‘certain Chinese encyclopedia’ in which animals are divided into:

A) belonging to the EmperorB) embalmedC) tameD) suckling pigsE) sirensF) fabulousG) stray dogsH) included in the present classificationI) frenziedJ) innumerableK) drawn with a very fine camel hair brushL) et ceteraM) having just broken the water pitcherN) that from a long way off look like flies

A B C

Classifier

Machine

Human

Categories/Classes

Taxonomy

A B C

Classifier

Machine

Human

Categories/Classes

Taxonomy

Categories not

Mutually Exclusive

An object can be put in any of

several categories

A B C

Classifier

Machine

Human

Categories/Classes

Taxonomy

Categories not

Complete

Some objects don’t

belong anywhere

A B C

Classifier

Machine

Human

Categories/Classes

Taxonomy

Categories not

Balanced

Some categories

much larger than others

A B C

Classifier

Machine

Human

Categories/Classes

Taxonomy

Categories

Inconsistant

Categories lack a

single organising principle

Taxonomy design

• Categories must be:– Mutually exclusive

• Every object in at most one category

– Complete (exhaustive)• Every object in at least one category

– Balanced• Categories divide objects evenly

– Consistant• Same characteristics used throughout

Kinds of classification• Classical

– Classes defined by presence of features• Square : 4 sides, equal length, equal angles• Rectangle : 4 sides, equal angles• Triangle : 3 sides, equal length, equal angles

• Probabilistic– Classes defined by weighted sum of features

• ‘bird’ moves, winged, feathered, sings, lays eggs• Is a robin a bird? Is a emu a bird?

• Exemplar (prototype)– Classes defined by one or more key examples

• Robin is a central example of ‘bird’• Chicken is more remote example

• Which kind is used in IS Theory?• Which kind is used in IS Use?

Clustering

• Clustering techniques find groups of similar objects

• Used in data mining to identify customer groups with similar buying behaviour…

• Mathematical Techniques – k-nearest neighbour– ID3 to create decision tree

• Human Techniques– Card sorting

Classifying• Learning Classifiers

– Based on sample of population– Classified by hand– Split into two parts

• The training set used to compute the classifier• The test set used to test the ability of the classifier

– Many kinds of classifiers available, all need good understanding of statistics e.g. Naïve Bayesian, Decision Tree, SVM

– Threshold set to balance recall and precision• Rule and example based for human classifier but performance

varies with experience and skill– E.g. book classification, Yahoo directory classification, medical

diagnosis– Human classifiers need to be trained too– If classification done by end-users, classification is likely to be

inconsistent

Tutorial• Read ‘Ten Taxonomy Myths’

• Problem:

• A team of consultants has been hired to assist a local voluntary organisation whose aim is to help local people locate organisations which provide relevant services.

• They have a web site and publish a newsletter– www.eastbristoladvice.org.uk

• How would you advise them to classify the organisations for ease of recall? What taxonomies would appropriate? Binary or multi-category?

• What information would you hold about each organisation?• How would you gather information on the effectiveness of your taxonomies?

Review• 3 tier, 4-tier web architecture – describe, explain, terminology,

typical interactions • SQL & PHP

– No exam questions to write SQL or PHP but reading knowledge required – up to outer joins and example scripts

• DBMS comparison and selection• Entity-Relationship modelling – revision, application• Data flow - specification of data flows, XML• Sequence diagrams – construct from description• Agile Development and Extreme Programming – description,

application, comparison with life-cycle• Frames – rationale, role in IS development, basic recognition in a

problem description of simple frames and the following in detail• Matching Frame – typical applications, fitness function, recognising

nominal, ordinal, interval and ratio scales, use of weights• Classification Frame – typical applications, terminology, calculation

of recall and precision, guidelines for constructing a taxonomy

Preview

• Learning Frame

• Business Processes

• Scenarios and Use cases

• Object-Relational DBMS

• Data Quality

• ….

Black board

• Suggest additional topics

• Suggest additional resources

• Ask questions

• Give me feedback