id3 algorithm
DESCRIPTION
TRANSCRIPT
![Page 1: ID3 Algorithm](https://reader034.vdocuments.net/reader034/viewer/2022052409/54679575af79599b108b6bc4/html5/thumbnails/1.jpg)
ID3 Algorithm
CS 157B: Spring 2010
Meg Genoar
![Page 2: ID3 Algorithm](https://reader034.vdocuments.net/reader034/viewer/2022052409/54679575af79599b108b6bc4/html5/thumbnails/2.jpg)
Iterative Dichotomiser 3
Ross Quinlan – 1987
C4.5 Precursor
Decision Tree Generation
![Page 3: ID3 Algorithm](https://reader034.vdocuments.net/reader034/viewer/2022052409/54679575af79599b108b6bc4/html5/thumbnails/3.jpg)
Ross Quinlan
Computer Scientist – UW 1968
Data Mining & Decision Theory
AI: Data Mining
ID3, C4.5, & C5.0
RuleQuest Research
![Page 4: ID3 Algorithm](https://reader034.vdocuments.net/reader034/viewer/2022052409/54679575af79599b108b6bc4/html5/thumbnails/4.jpg)
Max-Gain Split
Most Useful Attribute
Highest Information
Best Attribute
Measure of Uncertainty
Randomness
Efficient Separation of Decision Tree Elements
ID3 & Entropy
![Page 5: ID3 Algorithm](https://reader034.vdocuments.net/reader034/viewer/2022052409/54679575af79599b108b6bc4/html5/thumbnails/5.jpg)
Entropy
Entropy(S) = – Ppositive Log2Ppositive
– Pnegative Log2Pnegative
Ppositive: proportion of positive data
Pnegative: proportion of negative data
![Page 6: ID3 Algorithm](https://reader034.vdocuments.net/reader034/viewer/2022052409/54679575af79599b108b6bc4/html5/thumbnails/6.jpg)
Example…
A collection S consists of 20 data examples:
13 Yes : 7 No
Entropy(S) = – (13/20) Log2(13/20)
– (7/20) Log2(7/20)
Entropy(S) = 0.934
![Page 7: ID3 Algorithm](https://reader034.vdocuments.net/reader034/viewer/2022052409/54679575af79599b108b6bc4/html5/thumbnails/7.jpg)
Entropy Gain Value
Gain: Place to Split the Tree
High Gain > Low Gain
High Gain: Top of the Tree
Gain(A) = E(Current Set) - ∑ E(All Child Sets)
![Page 8: ID3 Algorithm](https://reader034.vdocuments.net/reader034/viewer/2022052409/54679575af79599b108b6bc4/html5/thumbnails/8.jpg)
Movie ExampleFilm
Country of Origin
Big Star Genre Success
1 United States Yes Science Fiction
True
2 United States No Comedy False
3 United States Yes Comedy True
4 Europe No Comedy True
5 Europe Yes Science Fiction
False
6 Europe Yes Romance False
7 Rest of World Yes Comedy False
8 Rest of World No Science Fiction
False
9 Europe Yes Comedy True
10 United States Yes Comedy True
![Page 9: ID3 Algorithm](https://reader034.vdocuments.net/reader034/viewer/2022052409/54679575af79599b108b6bc4/html5/thumbnails/9.jpg)
Entropy of Table
Is the Film a Success?
Entropy(5 Yes, 5 No) = – (5/10) Log2(5/10)
– (5/10) Log2(5/10)
Entropy(Success) = 1
![Page 10: ID3 Algorithm](https://reader034.vdocuments.net/reader034/viewer/2022052409/54679575af79599b108b6bc4/html5/thumbnails/10.jpg)
Split – Country of Origin
Film
Country of Origin
Big Star Genre Success
1 United States Yes Science Fiction
True
2 United States No Comedy False
3 United States Yes Comedy True
4 United States Yes Comedy TrueFilm
Country of Origin
Big Star Genre Success
1 Europe No Comedy True
2 Europe Yes Science Fiction
False
3 Europe Yes Romance False
4 Europe Yes Comedy TrueFilm
Country of Origin
Big Star Genre Success
1 Rest of World Yes Comedy False
2 Rest of World No Science Fiction
False
![Page 11: ID3 Algorithm](https://reader034.vdocuments.net/reader034/viewer/2022052409/54679575af79599b108b6bc4/html5/thumbnails/11.jpg)
Gain – Country of Origin
Where is the film from?
Entropy(USA) = – (3/4) Log2(3/4) – (1/4) Log2(1/4)
Entropy(USA) = 0.811
Entropy(Europe) = – (2/4) Log2(2/4) – (2/4) Log2(2/4)
Entropy(Europe) = 1
Entropy(Rest of World) = – (0/2) Log2(0/2) – (2/2) Log2(2/2)
Entropy(Rest of World) = 0
Gain(Origin) = 1 – (4/10 *0.811 + 4/10*1 + 2/10*0) = 0.276
![Page 12: ID3 Algorithm](https://reader034.vdocuments.net/reader034/viewer/2022052409/54679575af79599b108b6bc4/html5/thumbnails/12.jpg)
Split – Big StarFilm
Country of Origin
Big Star Genre Success
1 United States Yes Science Fiction
True
2 United States Yes Comedy True
3 Europe Yes Science Fiction
False
4 Europe Yes Romance False
5 Rest of World Yes Comedy False
6 Europe Yes Comedy True
7 United States Yes Comedy TrueFilm
Country of Origin
Big Star Genre Success
1 United States No Comedy False
2 Europe No Comedy True
3 Rest of World No Science Fiction
False
![Page 13: ID3 Algorithm](https://reader034.vdocuments.net/reader034/viewer/2022052409/54679575af79599b108b6bc4/html5/thumbnails/13.jpg)
Gain – Big Star
Is there a Big Star in the film?
Entropy(Yes) = – (4/7) Log2(4/7) – (3/7) Log2(3/7)
Entropy(Yes) = 0.985
Entropy(No) = – (1/3) Log2(1/3) – (2/3) Log2(2/3)
Entropy(No) = 0.918
Gain(Star) = 1 – (7/10 *0.985 + 3/10*0.918) = 0.0351
![Page 14: ID3 Algorithm](https://reader034.vdocuments.net/reader034/viewer/2022052409/54679575af79599b108b6bc4/html5/thumbnails/14.jpg)
Split – GenreFilm
Country of Origin
Big Star Genre Success
1 United States Yes Science Fiction
True
2 Europe Yes Science Fiction
False
3 Rest of World No Science Fiction
FalseFilm
Country of Origin
Big Star Genre Success
1 United States No Comedy False
2 United States Yes Comedy True
3 Europe No Comedy True
4 Rest of World Yes Comedy False
5 Europe Yes Comedy True
6 United States Yes Comedy TrueFilm
Country of Origin
Big Star Genre Success
1 Europe Yes Romance False
![Page 15: ID3 Algorithm](https://reader034.vdocuments.net/reader034/viewer/2022052409/54679575af79599b108b6bc4/html5/thumbnails/15.jpg)
Gain – Genre
What genre is the film?
Entropy(SciFi) = – (1/3) Log2(1/3) – (2/3) Log2(2/3)
Entropy(SciFi) = 0.918
Entropy(Com) = – (4/6) Log2(4/6) – (2/6) Log2(2/6)
Entropy(Com) = 0.918
Entropy(Rom) = – (0/1) Log2(0/1) – (1/1) Log2(1/1)
Entropy(Rom) = 0
Gain(Genre) = 1 – (3/10 *0.918 + 6/10*0.918+ 1/10*0) = 0.1738
![Page 16: ID3 Algorithm](https://reader034.vdocuments.net/reader034/viewer/2022052409/54679575af79599b108b6bc4/html5/thumbnails/16.jpg)
Compare Gains…
Gain(Origin) = 0.276
Gain(Star) = 0.0351
Gain(Genre) = 0.1738
![Page 17: ID3 Algorithm](https://reader034.vdocuments.net/reader034/viewer/2022052409/54679575af79599b108b6bc4/html5/thumbnails/17.jpg)
Compare Gains…
Gain(Origin) = 0.276
Gain(Star) = 0.0351
Gain(Genre) = 0.1738
First Split: Origin
![Page 18: ID3 Algorithm](https://reader034.vdocuments.net/reader034/viewer/2022052409/54679575af79599b108b6bc4/html5/thumbnails/18.jpg)
All Movies
United States Europe Rest of World
New Table New Table New Table
![Page 19: ID3 Algorithm](https://reader034.vdocuments.net/reader034/viewer/2022052409/54679575af79599b108b6bc4/html5/thumbnails/19.jpg)
All Movies
United States Europe Rest of World
New Table New Table New Table
![Page 20: ID3 Algorithm](https://reader034.vdocuments.net/reader034/viewer/2022052409/54679575af79599b108b6bc4/html5/thumbnails/20.jpg)
New Table – United States
Film
Country of Origin
Big Star Genre Success
1 United States Yes Science Fiction
True
2 United States No Comedy False
3 United States Yes Comedy True
4 United States Yes Comedy TrueEntropy(3 Yes, 1 No) = – (3/4) Log2(3/4) – (1/4)
Log2(1/4)
Entropy(Success) = 0.811
![Page 21: ID3 Algorithm](https://reader034.vdocuments.net/reader034/viewer/2022052409/54679575af79599b108b6bc4/html5/thumbnails/21.jpg)
Split – Big Star
Film
Country of Origin
Big Star Genre Success
1 United States Yes Science Fiction
True
2 United States Yes Comedy True
3 United States Yes Comedy TrueFilm
Country of Origin
Big Star Genre Success
1 United States No Comedy False
![Page 22: ID3 Algorithm](https://reader034.vdocuments.net/reader034/viewer/2022052409/54679575af79599b108b6bc4/html5/thumbnails/22.jpg)
Gain – Big Star
Is there a Big Star in the film?
Entropy(Yes) = – (3/3) Log2(3/3) – (0/3) Log2(0/3)
Entropy(Yes) = 0
Entropy(No) = – (0/1) Log2(0/1) – (1/1) Log2(1/1)
Entropy(No) = 0
Gain(Star) = 0.811 – (3/4 *0 + 1/4*0) = 0.811
![Page 23: ID3 Algorithm](https://reader034.vdocuments.net/reader034/viewer/2022052409/54679575af79599b108b6bc4/html5/thumbnails/23.jpg)
Split – Genre
Film
Country of Origin
Big Star Genre Success
1 United States Yes Science Fiction
True
Film
Country of Origin
Big Star Genre Success
1 United States No Comedy False
2 United States Yes Comedy True
3 United States Yes Comedy True
![Page 24: ID3 Algorithm](https://reader034.vdocuments.net/reader034/viewer/2022052409/54679575af79599b108b6bc4/html5/thumbnails/24.jpg)
Gain – Genre
What genre is the film?
Entropy(SciFi) = – (1/1) Log2(1/1) – (0/1) Log2(0/1)
Entropy(SciFi) = 0
Entropy(Com) = – (2/3) Log2(2/3) – (1/3) Log2(1/3)
Entropy(Com) = 0.918
Gain(Genre) = 0.811 – (1/4 *0 + 3/4*0.918) = 0.1225
![Page 25: ID3 Algorithm](https://reader034.vdocuments.net/reader034/viewer/2022052409/54679575af79599b108b6bc4/html5/thumbnails/25.jpg)
Compare Gains…
Gain(Star) = 0.811
Gain(Genre) = 0.1225
![Page 26: ID3 Algorithm](https://reader034.vdocuments.net/reader034/viewer/2022052409/54679575af79599b108b6bc4/html5/thumbnails/26.jpg)
Compare Gains…
Gain(Star) = 0.811
Gain(Genre) = 0.1225
Split: Star
![Page 27: ID3 Algorithm](https://reader034.vdocuments.net/reader034/viewer/2022052409/54679575af79599b108b6bc4/html5/thumbnails/27.jpg)
All Movies
United States Europe Rest of World
Star No Star
New Table New Table New Table
New Table New Table
![Page 28: ID3 Algorithm](https://reader034.vdocuments.net/reader034/viewer/2022052409/54679575af79599b108b6bc4/html5/thumbnails/28.jpg)
All Movies
United States Europe Rest of World
Star No Star
Sci-Fi Comedy
New Table New Table New Table
New Table Failure
Success Success
![Page 29: ID3 Algorithm](https://reader034.vdocuments.net/reader034/viewer/2022052409/54679575af79599b108b6bc4/html5/thumbnails/29.jpg)
All Movies
United States
Europe
Rest of World
Table
Star No Star
Sci-Fi
Comedy
New Failure
Success
Success
StarNo
Star
Sci-Fi
ComedyNew
Failure Success
Success
TableTable
![Page 30: ID3 Algorithm](https://reader034.vdocuments.net/reader034/viewer/2022052409/54679575af79599b108b6bc4/html5/thumbnails/30.jpg)
All Movies
United States
Europe
Rest of World
Table
Star No Star
Sci-Fi
Comedy
New Failure
Success
Success
StarNo
Star
Sci-Fi
ComedyNew
Failure Success
Success
TableTable
Comedy from the US, with a big star…
![Page 31: ID3 Algorithm](https://reader034.vdocuments.net/reader034/viewer/2022052409/54679575af79599b108b6bc4/html5/thumbnails/31.jpg)
All Movies
United States
Europe
Rest of World
Table
Star No Star
Sci-Fi
Comedy
New Failure
Success
Success
StarNo
Star
Sci-Fi
ComedyNew
Failure Success
Success
TableTable
Comedy from the US, with a big star…
![Page 32: ID3 Algorithm](https://reader034.vdocuments.net/reader034/viewer/2022052409/54679575af79599b108b6bc4/html5/thumbnails/32.jpg)
All Movies
United States
Europe
Rest of World
Table
Star No Star
Sci-Fi
Comedy
New Failure
Success
Success
StarNo
Star
Sci-Fi
ComedyNew
Failure Success
Success
TableTable
Comedy from the US, with a big star…
![Page 33: ID3 Algorithm](https://reader034.vdocuments.net/reader034/viewer/2022052409/54679575af79599b108b6bc4/html5/thumbnails/33.jpg)
All Movies
United States
Europe
Rest of World
Table
Star No Star
Sci-Fi
Comedy
New Failure
Success
Success
StarNo
Star
Sci-Fi
ComedyNew
Failure Success
Success
TableTable
Comedy from the US, with a big star…