2003 taxonomy of dirty

19

Data Mining and Knowledge Discovery, 7, 81–99, 2003 c 2003 Kluwer Academic Publishers. Manufactured in The Netherlands. A Taxonomy of Dirty Data WON KIM ∗ Cyber Database Solutions, Inc., Austin, Te xas, USA BYOUNG-JU CHOI † Department of Computer Science, Ewha Institute of Science and Tec hnology , Seoul, Korea EUI-KYEONG HONG University of Seoul, AITrc, Seoul, Korea SOO-KYUNG KIM Lucent Tech nologies, Seoul, Kor ea DOHEON LEE Department of Biosystems, Korea Advanced Institute of Science and Tec hnology , Daejon, Korea Editors: Fayyad, Mannila, Ramakrishnan Received November 19, 2001; Revised November 20, 2001 Abstract. To day large corporations are constructing enterprise data warehouses from disparate data sources in order to run enterprise-wide data analysis applications, including decision support systems, multidimensional online analytical applications, data mining, and customer relationship management systems. A major problem that is only beginning to be recognized is that the data in data sources are often “dirty”. Broadly, dirty data include missing data, wrong data, and non-standard representations of the same data. The results of analyzing a database/data warehouse of dirty data can be damaging and at best be unreliable. In this paper, a comprehensive classiﬁcation of dirty data is developed for use as a framework for understanding how dirty data arise, manifest themselves, and may be cleansed to ensure proper construction of data warehouses and accurate data analysis. The impact of dirty data on data mining is also explored. Keywords: dirty data, data quality , data mining, data cleansing, data wareho using 1. Int roduc tion Today, data warehousing systems are becoming a key enabler of corporate information technology infrastructure. Corporations have recognized the value of data at their disposal as an important asset that can make them more competitive in today’s dynamic business environment. By consolidating data from disparate data sources into a “central” data warehouse, corporations may be able to run data analysis applications and obtain information ∗ This research was partially supported by Korea’s Brain Korea-21 grant. † This research was partially supported by Korea’s KISTEP grant.

Upload: alberto-hayek

Post on 03-Jun-2018

218 views

Category:

Documents

0 download

Report

Download

Embed Size (px):

TRANSCRIPT

Page 1: 2003 Taxonomy of Dirty

8/12/2019 2003 Taxonomy of Dirty

http://slidepdf.com/reader/full/2003-taxonomy-of-dirty 1/19

Page 2: 2003 Taxonomy of Dirty

8/12/2019 2003 Taxonomy of Dirty

http://slidepdf.com/reader/full/2003-taxonomy-of-dirty 2/19

Page 3: 2003 Taxonomy of Dirty

8/12/2019 2003 Taxonomy of Dirty

http://slidepdf.com/reader/full/2003-taxonomy-of-dirty 3/19

Page 4: 2003 Taxonomy of Dirty

8/12/2019 2003 Taxonomy of Dirty

http://slidepdf.com/reader/full/2003-taxonomy-of-dirty 4/19

Page 5: 2003 Taxonomy of Dirty

8/12/2019 2003 Taxonomy of Dirty

http://slidepdf.com/reader/full/2003-taxonomy-of-dirty 5/19

Page 6: 2003 Taxonomy of Dirty

8/12/2019 2003 Taxonomy of Dirty

http://slidepdf.com/reader/full/2003-taxonomy-of-dirty 6/19

Page 7: 2003 Taxonomy of Dirty

8/12/2019 2003 Taxonomy of Dirty

http://slidepdf.com/reader/full/2003-taxonomy-of-dirty 7/19

Page 8: 2003 Taxonomy of Dirty

8/12/2019 2003 Taxonomy of Dirty

http://slidepdf.com/reader/full/2003-taxonomy-of-dirty 8/19

Page 9: 2003 Taxonomy of Dirty

8/12/2019 2003 Taxonomy of Dirty

http://slidepdf.com/reader/full/2003-taxonomy-of-dirty 9/19

Page 10: 2003 Taxonomy of Dirty

8/12/2019 2003 Taxonomy of Dirty

http://slidepdf.com/reader/full/2003-taxonomy-of-dirty 10/19

Page 11: 2003 Taxonomy of Dirty

8/12/2019 2003 Taxonomy of Dirty

http://slidepdf.com/reader/full/2003-taxonomy-of-dirty 11/19

Page 12: 2003 Taxonomy of Dirty

8/12/2019 2003 Taxonomy of Dirty

http://slidepdf.com/reader/full/2003-taxonomy-of-dirty 12/19

Page 13: 2003 Taxonomy of Dirty

8/12/2019 2003 Taxonomy of Dirty

http://slidepdf.com/reader/full/2003-taxonomy-of-dirty 13/19

Page 14: 2003 Taxonomy of Dirty

8/12/2019 2003 Taxonomy of Dirty

http://slidepdf.com/reader/full/2003-taxonomy-of-dirty 14/19

Page 15: 2003 Taxonomy of Dirty

8/12/2019 2003 Taxonomy of Dirty

http://slidepdf.com/reader/full/2003-taxonomy-of-dirty 15/19

Page 16: 2003 Taxonomy of Dirty

8/12/2019 2003 Taxonomy of Dirty

http://slidepdf.com/reader/full/2003-taxonomy-of-dirty 16/19

Page 17: 2003 Taxonomy of Dirty

8/12/2019 2003 Taxonomy of Dirty

http://slidepdf.com/reader/full/2003-taxonomy-of-dirty 17/19

Page 18: 2003 Taxonomy of Dirty

8/12/2019 2003 Taxonomy of Dirty

http://slidepdf.com/reader/full/2003-taxonomy-of-dirty 18/19

Page 19: 2003 Taxonomy of Dirty

8/12/2019 2003 Taxonomy of Dirty

http://slidepdf.com/reader/full/2003-taxonomy-of-dirty 19/19

3-27-2003 Clean Break or Dirty War

Ante el dolor de los demás (2003) - WordPress.com · 2010-01-25 · Ante el dolor de los demás (2003) Susan Sontag Alfaguara, Madrid, 2003...aux vaincus! BAUDELAIRE The dirty nurse,

Justifying Taxonomy Projects: Taxonomy Boot Camp 2009

CRF 450R 2003 Reservdelsbok - Dirty Part

Typhoid Fever, Dirty Food, Dirty Water.pdf

Dirty Books, Dirty Hands

Plant Taxonomy Taxonomy: The science of classification

MDM and Taxonomy - Taxonomy Strategies

Nanomedicine Taxonomy - IFM · Canadian NanoBusiness Alliance Nanomedicine Taxonomy February 2003 Page 2 1.2 This Document This document presents a nanomedicine taxonomy that describes

BLOOM’S TAXONOMY BLOOM’S TAXONOMY Jamaica C. Olazo

Dirty Money, Dirty Water - Center for American Progress · Dirty Money, Dirty Water Keeping Polluters and Other Campaign Donors from Influencing North Carolina Judicial Elections

A Brief Taxonomy and Ranking of Creative Prototype Reduction …people.scs.carleton.ca/~oommen/papers/PaperKimOommenJPR.pdf · 2003-11-05 · A Brief Taxonomy and Ranking of Creative

Taxonomy Classification of Organisms. What is Taxonomy? Taxonomy organizes organisms into groups. Taxonomy organizes organisms into groups. Species that

Taxonomy, Phylogeny, and Biogeography of the Hummingbird ...hss.ulb.uni-bonn.de/2003/0251/0251.pdf · Taxonomy, Phylogeny, and Biogeography of the Hummingbird Genus Thalurania GOULD,

Taxonomy. Taxonomy is the science of classification

DIRTY HANDS ON DIRTY DEALS - corporateeurope.org

Release v0.3-dirty devicetree.org v0.3-dirty

IAS Taxonomy Core FS Narrative Draft of 2003... · Web view2003/07/15 · Title IAS Taxonomy Core FS Narrative Draft Author XBRL International IAS Taxonomy Working Group Last modified

Why Taxonomy?Why Taxonomy? How to determine & classify a ... · Taxonomy Content Why Taxonomy?Why Taxonomy? How to determine & classify a species Domains versus KingdomsDomains versus

DIRTY HANDS ON DIRTY DEALS - Corporate Europe Observatorycorporateeurope.org/sites/default/files/dirtydeals_small.pdf · 5 Dirty hands on dirty deals: TTIP and COP21 shaped by same

Environmental Preferences and Technological Choices : Is ...Dirty dirty + grey dirty+ grey + other dirty dirty + other Observations 17,124 49,482 17,124 49,482 R-squared 0.070 0.051

DIMENSION-ORIENTED TAXONOMY OF DATA QUALITY …DIMENSION-ORIENTED TAXONOMY OF DATA QUALITY PROBLEMS IN ELECTRONIC HEALTH RECORD 101 different sources results in many ‘dirty data’

Keys to Soil Taxonomy - Higher Intellect€¦ · Keys to Soil Taxonomy Ninth Edition, 2003. ... nature are of extreme importance in soil classification. Plants can be grown under

Taxonomy for Personalization: January 6 Taxonomy CoP

Mongolia Green Taxonomy Introduction & Proposed Taxonomy

Dirty Rotten Imbeciles · 2019. 6. 19. · Dirty Rotten Imbeciles

DIRTY PANDA : KEEP YOUR HANDS DIRTY SINCE 2002 || Graffiti

Taxonomy Displays: Bridging UX & Taxonomy Design

RUNNING HEAD: A Dirty Word or a Dirty World€¦ · Web viewRUNNING HEAD: A Dirty Word or a Dirty World. A Dirty Word or a Dirty World? Attribute Framing, Political Affiliation,

Metadata and Taxonomy - Home - Taxonomy Strategies

2003 A comparative sequence analysis to revise the current taxonomy of the familyCoronaviridae

Small, 1989 Systematics or Taxonomy of Taxonomy

Cedarcrest Cross Country The 2003 season. The Dirty Dozen

1 Updated war plans, February 2003 Eos Life-Work Last chance to question US Dirty Bombs for Iraq Last chance to question US Dirty Bombs for Iraq Dai Williams,

Taxonomy Bootcamp 2011 Avoiding the Autobiographical Taxonomy: Creating the Right Taxonomy