icdm 2003 review data analysis - with comparison between 02 and 03 - xindong wu and alex tuzhilin...

24
ICDM 2003 Review Data Analysis - with comparison between 02 and 03 - Xindong Wu and Alex Tuzhilin Analyzed by Shusaku Tsumoto

Upload: diana-sparks

Post on 03-Jan-2016

223 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: ICDM 2003 Review Data Analysis - with comparison between 02 and 03 - Xindong Wu and Alex Tuzhilin Analyzed by Shusaku Tsumoto

ICDM 2003 Review Data Analysis

- with comparison between 02 and 03 -

Xindong Wu and Alex Tuzhilin

Analyzed by Shusaku Tsumoto

Page 2: ICDM 2003 Review Data Analysis - with comparison between 02 and 03 - Xindong Wu and Alex Tuzhilin Analyzed by Shusaku Tsumoto

Basic Statistics (Country)

• 37 countries, 486 Submissions

• Regular Papers: 58 (12%)

• Short Papers: 67 (14%)

• High Acceptance Ratio (Regular)– Israel: 4/11 (37%)– Hong Kong: 3/11 (33%)

Page 3: ICDM 2003 Review Data Analysis - with comparison between 02 and 03 - Xindong Wu and Alex Tuzhilin Analyzed by Shusaku Tsumoto

Country Total Regular ShortAcceptance

Ratio

USA 189 35 28 33%

China 45 2 0 4%

Australia 29 3 5 28%

Canada 28 0 6 21%

Germany 19 2 4 32%

Japan 19 4 3 37%

France 18 1 2 17%

Taiwan 16 0 3 19%

Brazil 15 0 0 0%

Hong Kong 12 4 2 50%

UK 12 1 2 25%

Israel 11 4 2 55%

Italy 8 1 1 25%

Finland 7 1 1 29%

India 7 0 1 14%

Korea 6 0 1 17%

Top 15 441 58 61 27%

Total 486 58 67 26%

Page 4: ICDM 2003 Review Data Analysis - with comparison between 02 and 03 - Xindong Wu and Alex Tuzhilin Analyzed by Shusaku Tsumoto

Comparison with 2002 (Top 5)

CountryAccepted

Ratio (2002)Country

AcceptanceRatio (2003)

Hong Kong 64.7% Israel 55.0%USA 47.9% Hong Kong 50.0%Canada 45.5% Japan 37.0%Finland 33.3% USA 33.0%France 33.3% Germany 32.0%

Page 5: ICDM 2003 Review Data Analysis - with comparison between 02 and 03 - Xindong Wu and Alex Tuzhilin Analyzed by Shusaku Tsumoto

Basic Statistics (Topics)• Top 5 of Submissions:

– Mining text and semi-structured data, and mining temporal, spatial and multimedia data

– Data mining and machine learning algorithms and methods in traditional areas and in new areas

– Data mining applications in electronic commerce, bioinformatics, computer security, Web intelligence

– Soft computing and uncertainty management– Data pre-processing, data reduction, feature selection

and feature transformation

• High Acceptance Ratio (Regular)– Statistics and probability in large-scale data mining– Security, privacy and social impact of data mining

Page 6: ICDM 2003 Review Data Analysis - with comparison between 02 and 03 - Xindong Wu and Alex Tuzhilin Analyzed by Shusaku Tsumoto

  Total Regular Short Acceptance Ratio

Mining text and semi-structured data, and mining temporal, spatial and multimedia data

81 10 12 27%

Data mining and machine learning algorithms and methods in traditional areas (such as classification, regression, clustering, probabilistic modeling, and association analysis), and in new areas

77 11 8 25%

Data mining applications in electronic commerce, bioinformatics, computer security, Web intelligence, intelligent learning database system

61 5 6 18%

Soft computing (including neural networks, fuzzy logic, evolutionary computation, and rough sets) and uncertainty management for data mining

46 2 9 24%

Data pre-processing, data reduction, feature selection and feature transformation 41 3 5 20%

Complexity, efficiency, and scalability issues in data mining 30 4 4 27%

Others 21 1 4 24%

Foundations of data mining 18 2 1 17%

Data and knowledge representation for data mining 16 3 1 25%

Human-machine interaction and visualization in data mining, and visual data mining

16 3 3 38%

Quality assessment and interestingness metrics of data mining results 16 2 3 31%

Statistics and probability in large-scale data mining 15 6 1 47%

High performance and distributed data mining 12 1 2 25%

Post-processing of data mining results 11 1 3 36%

Pattern recognition and scientific discovery 8 1 0 13%

Security, privacy and social impact of data mining 7 2 2 57%

Integration of data warehousing, OLAP and data mining 5 0 0 0%

Process-centric data mining and models of data mining process 5 1 3 80%

Total 486 58 67 26%

Page 7: ICDM 2003 Review Data Analysis - with comparison between 02 and 03 - Xindong Wu and Alex Tuzhilin Analyzed by Shusaku Tsumoto

Comparison with 2002 (Top 5)

Top 5 in 2002AcceptedRatio Top 5 in 2003

AcceptedRatio

Graph Mining 75.0% Process-centric DM 80.0%Temporal Data 52.6% Security, privacy 57.0%Theory 42.9% Statistics and Probability 47.0%Text Mining 42.1% Visual Data Mining 38.0%Rule 41.7% Post-processing 41.7%

Page 8: ICDM 2003 Review Data Analysis - with comparison between 02 and 03 - Xindong Wu and Alex Tuzhilin Analyzed by Shusaku Tsumoto

Review Scores

SCORE

5.00

4.50

4.00

3.50

3.00

2.50

2.00

1.50

1.00

.50

0.00

SCORE

度数

120

100

80

60

40

20

0

= .92 標準偏差

= 2.32平均

= 486.00有効数

SCORE2

4.50

4.00

3.50

3.00

2.50

2.00

1.50

1.00

.50

0.00

SCORE2

度数

100

80

60

40

20

0

= .90 標準偏差

= 2.35平均

= 347.00有効数

2002 2003 N 347 486Average: 2.39 2.32 SD 0.90 0.92

Page 9: ICDM 2003 Review Data Analysis - with comparison between 02 and 03 - Xindong Wu and Alex Tuzhilin Analyzed by Shusaku Tsumoto

Box Plot

486347 = 有効数

TOTAL_YE

20032002

TOTA

L_SC

6

5

4

3

2

1

0

- 1

Page 10: ICDM 2003 Review Data Analysis - with comparison between 02 and 03 - Xindong Wu and Alex Tuzhilin Analyzed by Shusaku Tsumoto

Comparison with 2002• Country vs Final Decision

– Regular: Hong Kong => Hong Kong, Israel– Short: USA => ?– Reject: Japan, Taiwan => Most of the countries

• Topics vs Final Decision– Regular: Temporal => Statistics and Probability

Text Visualization– Short: Similarity => Postprocessing– Reject: Bayesian => Feature Selection

Page 11: ICDM 2003 Review Data Analysis - with comparison between 02 and 03 - Xindong Wu and Alex Tuzhilin Analyzed by Shusaku Tsumoto

Corresponding Analysis (Country vs Final Decision)

-2

-1

0

1

2

3

4

5

-1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 3

RejectRegular

Short

Belgium

Israel

Hong Kong

USA

r2=0.235

China

Brasil

France

Poland

Japan

r1=0.325

Page 12: ICDM 2003 Review Data Analysis - with comparison between 02 and 03 - Xindong Wu and Alex Tuzhilin Analyzed by Shusaku Tsumoto

-2.5

-2

-1.5

-1

-0.5

0

0.5

1

1.5

2

-3 -2 -1 0 1 2

Corresponding Analysis (Topics vs Final Decision)

Reject

Short

Regular

Statistics and probability

Security, privacy

Process-centric DM

Integration of DTW, OLAP and DM

Post-processing

Human-machine interaction and visualization

r1=0.218

r2=0.200

Feature Selection

Page 13: ICDM 2003 Review Data Analysis - with comparison between 02 and 03 - Xindong Wu and Alex Tuzhilin Analyzed by Shusaku Tsumoto

- 1

- 0.5

0

0.5

1

1.5

2

2.5

3

3.5

0 0.2 0.4 0.6 0.8

Corresponding Analysis (# of Authors vs Final Decision)

Reject

Short

RegularProcess-centric DM

1

Human-machine interaction and visualization

r1=0.218

r2=0.200

4

5 23

6

Page 14: ICDM 2003 Review Data Analysis - with comparison between 02 and 03 - Xindong Wu and Alex Tuzhilin Analyzed by Shusaku Tsumoto

Corresponding Summaries• Country vs Final Decision

– Regular: Hong Kong, Israel

– Short: ?

– Reject: Most of the countries are located near this region.

• Topics vs Final Decision– Regular: Statistics and Probability, Visualization

– Short: Postprocessing

– Reject: Feature Selection

• # of Authors vs Final Decision– 1 or 4 : Regular– 2 or 3 : between Short and Regular

Page 15: ICDM 2003 Review Data Analysis - with comparison between 02 and 03 - Xindong Wu and Alex Tuzhilin Analyzed by Shusaku Tsumoto

Corresponding Analysis (2002)(Country vs Final Decision)

• Rule: [R1=0] [R_2=0]:| [R_1=0] | |

[R_2=0] |• Rule Relations between Sets

• Relation between Supporting Sets are very important.– Rough Set / Granular Computing

• Index for Rule Induction: – P(R2|R1), P(R1|R2), or f(P(R2|R1))

– Relation between Information Granules-4

-3

-2

-1

0

1

2

3

-2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5

Reject

Short

Regular

Hong Kong

Austria

Japan

Taiwan

Australia

FinlandUSA

CanadaChina

Thailand

Page 16: ICDM 2003 Review Data Analysis - with comparison between 02 and 03 - Xindong Wu and Alex Tuzhilin Analyzed by Shusaku Tsumoto

Corresponding Analysis in 2002(Category vs Final Decision)

-5

-4

-3

-2

-1

0

1

2

-3 -2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5

Reject

Short

Regular

Bayesian

Statistics

Similarity

Interestingness

Active LearningTheory

Temporal

Web Mining

Structured

Text Mining

SVM

Rule

TreeApplications

Association R

Page 17: ICDM 2003 Review Data Analysis - with comparison between 02 and 03 - Xindong Wu and Alex Tuzhilin Analyzed by Shusaku Tsumoto

Comparison with 2002• Country vs Final Decision

– Regular: Hong Kong => Hong Kong, Israel– Short: USA => ?– Reject: Japan, Taiwan => Most of the countries

• Topics vs Final Decision– Regular: Temporal => Statistics and Probability

Text Visualization– Short: Similarity => Postprocessing– Reject: Bayesian => Feature Selection

Page 18: ICDM 2003 Review Data Analysis - with comparison between 02 and 03 - Xindong Wu and Alex Tuzhilin Analyzed by Shusaku Tsumoto

Rule Mining

• Datasets– Sample Size: 486– Attributes: 5

• Paper No. : ordered by submission date• # of Authors• # of Characters in Title• Country• Category

– Analyzed by Clementine 7.1

Page 19: ICDM 2003 Review Data Analysis - with comparison between 02 and 03 - Xindong Wu and Alex Tuzhilin Analyzed by Shusaku Tsumoto

Rule Mining (2)• C5.0

– [FINAL=long]<= [Country=Israel] & [# of Authors>2]

& [# of Chars in Title <= 75.0]

(Confidence 0.667, Support : 3)

– [FINAL=Reject]<=[# of Author >4] & [Paper No.>117]

& [# of Chars in Titles > 71.0]

(Confidence 0.857 , Support: 10)

• # of Authors, Paper No, # of Chars : Important Features

Page 20: ICDM 2003 Review Data Analysis - with comparison between 02 and 03 - Xindong Wu and Alex Tuzhilin Analyzed by Shusaku Tsumoto

Rule Mining (3)• Generalized Rule Induction

– [FINAL = Reject]<=[PAPER No. < 67.500]

(Confidence: 90%, Support:10.7%)– [FINAL=Reject]<= [PAPER No. < 54.5]

& [# of Chars in Title > 49.5]

(Confidence: 100%, Support 4.73%)– [FINAL = long]<=[COUNTRY = Israel]

& [# of Chars in Title > 61.500] (Confidence: 60%, Support: 1.03%)

• Paper No.,# of Charits in Title: Important Features

Page 21: ICDM 2003 Review Data Analysis - with comparison between 02 and 03 - Xindong Wu and Alex Tuzhilin Analyzed by Shusaku Tsumoto

Rule Mining in 2002

• C5.0– [# of Chars in Titles> 43]

=> Rejected (Conf. 0.669, Support: 303)

– [Paper No. <= 722] & [COUNTRY = USA] & [Category =Temporal Data Mining]=> Regular (Conf. 0.833, Support :4)

Page 22: ICDM 2003 Review Data Analysis - with comparison between 02 and 03 - Xindong Wu and Alex Tuzhilin Analyzed by Shusaku Tsumoto

Rule Mining in 2002

• (Association) Rules– Rejected <= [Paper No.< 542.5] (Conf: 0.88, Suport :41)– Rejected <= [Paper No.< 542.5] & [# of Chars > 53.5 ]

(Conf: 0.833, Support :29)

– Regular <= [Country=Canada] & [Category=Text Mining] (Conf: 0.6, Support: 5)

• Paper No., Country, Category

Page 23: ICDM 2003 Review Data Analysis - with comparison between 02 and 03 - Xindong Wu and Alex Tuzhilin Analyzed by Shusaku Tsumoto

Comparison with 2002

• Important Features in 2003– # of Authors, Paper No, # of Chars– Early 57 papers, Long Titles, 2 authors

• Important Features in 2002– Paper No, # of Chars, Country, Category

– Early 52 papers, Long Titles

Page 24: ICDM 2003 Review Data Analysis - with comparison between 02 and 03 - Xindong Wu and Alex Tuzhilin Analyzed by Shusaku Tsumoto

Conclusions

• Do not submit a paper too fast ! – Reflection not only on the contents, but also on the titl

es needed

• Mining Text/Web/Semi-structured Data are very popular now.

• Statistics and Probability is a very stronger topic.• Security and Privacy Issues become stronger.• Visualization/Interaction are emerging in ICDM 2003:

– Visualization/Human-Machine Interaction– Postprocessing of DM Results– Process-centric DM