id3 decision tree algorithm

14
ID3 Decision Tree Algorithm By: Giovanni Ponzio, Jory Weiss, and Enoc Flores Hernandez

Upload: others

Post on 01-Dec-2021

15 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: ID3 Decision Tree Algorithm

ID3 Decision Tree Algorithm

By: Giovanni Ponzio, Jory Weiss, and Enoc Flores Hernandez

Page 2: ID3 Decision Tree Algorithm

What’s The Goal of ID3

● The Iterative Dichotomiser 3 can help make more informed decisions with the use of data. Using a visualization, it creates a simple tree with a few levels of decision nodes.

Page 3: ID3 Decision Tree Algorithm

Who created ID3?

● John Ross Quinlan developed the ID3 Algorithm at the University of Sydney● Ross Quinlan is an Australian computer scientist and researcher in machine

learning, data mining, and decision theory

Page 4: ID3 Decision Tree Algorithm

What is a Decision Tree?

● A decision tree is a data structure made of nodes and edges

● Must be built from a given dataset containing columns representing

attributes and rows corresponding to records

● Each node is used to either make a decision or represent an outcome

● The root and intermediate nodes represent decisions (called decision

nodes) while the leaf nodes represent outcomes

Page 5: ID3 Decision Tree Algorithm

How should the data look?● The data that ID3 uses must meet a few

requirements:○ Must be an attribute-value pairs table (meaning each

column is an attribute containing discrete values)○ Each sample entry (each row) must have the same

attributes (columns), and each attribute must have a fixed number of possible discrete values.

○ Each attribute must have discrete values which are easily differentiable (hard, quite hard, flexible, soft, and quite soft would not be suitable values for an attribute)

○ There must be sufficient data so that valid patterns are distinguishable from chance occurrences.

Page 6: ID3 Decision Tree Algorithm

Decision tree used to answer the question Is the person fit?

Root Node: Is the person younger than 30 years old? Decision Nodes: 1. Does the person eat junk food?

2. Does the person work out? Leaf Nodes: 1. The person is Fit

2. The person is Unfit

Page 7: ID3 Decision Tree Algorithm

Some ID3 Terminology

Every column used to make decisions in the data set is called a feature or an attribute

The column used to make leaf nodes is the target attribute

The values pertaining to each attribute are referred to as classes

Page 8: ID3 Decision Tree Algorithm

Entropy● The ID3 Algorithm will select the “best” feature at each step while building the

tree (greedy)● It will decide which feature is “best” using calculations involving entropy● Entropy is a fundamental theorem used to to measure the importance of

information relative to its size● In this context, the entropy of a dataset is the measure of disorder in the

target attribute of the dataset

Entropy(S) = - ∑ pᵢ * log₂(pᵢ) ; i = 1 to n

Where S is the dataset, n is the number of classes in the target column and pi is the probability of class i (number of rows with ‘i’ in the target column divided by the total

number of rows)

Page 9: ID3 Decision Tree Algorithm

Information Gain

● Information gain is a measure used to determine how well a given attribute separates or classifies the target values.

● The attribute with the highest information gain is selected as the “best” one and is placed at the root decision node.

IG(S, A) = Entropy(S) - ∑((|Sᵥ| / |S|) * Entropy(Sᵥ))

Where SV is the set of rows in S for which the attribute A has value v

● ID3 uses Entropy and Information Gain to form a functioning decision tree

Page 10: ID3 Decision Tree Algorithm

Steps in ID3

1. Calculate the Information Gain (and thus the entropy) of every attribute in the dataset.

2. Select the feature with the maximum Information Gain as the root node of the Decision Tree, with its edges corresponding to it’s classes.

3. Calculate the Information Gain of the remaining attributes, and, again, select the feature with the maximum Information Gain as the next decision tree node

4. If all rows belong to the same class (in the target column), make the current node as a leaf node with the class as it’s label

5. Repeat this process until there are no remaining features, or until the decision tree has all leaf nodes.

Page 11: ID3 Decision Tree Algorithm
Page 12: ID3 Decision Tree Algorithm

Applications of ID3

The applications for ID3 are endless. As shown in the examples, ID3 is a classification algorithm that creates a decision tree which is used to make an educated guess about an attribute (target attribute) of a specific subject based its other known attributes. A few examples of applications are:

● Deciding whether or not to play a sport based on different weather attributes● Deciding whether or not somebody has an illness/injury based on the

symptoms they’re showing● Detecting web based attacks using on past data● E-commerce - efficiently implementing promotional or paid target advertising

on websites for potential buyers using past purchase records● Deciding the best move in chess based off the board and possible endings

Page 14: ID3 Decision Tree Algorithm

ReferencesElteir, Marwa K. Pseudocode of ID3 Decision Tree Algorithm, 2006, www.researchgate.net/figure/Pseudocode-of-ID3-Decision-Tree-Algorithm_fig13_259754610

“The ID3 Algorithm.” ID3, www.cise.ufl.edu/~ddd/cap6635/Fall-97/Short-papers/2.htm.

Jazuli, Hafidz. “An Introduction to Decision Tree Learning: ID3 Algorithm.” Medium, Machine Learning Guy, 13 May 2018,

medium.com/machine-learning-guy/an-introduction-to-decision-tree-learning-id3-algorithm-54c74eb2ad55.

News, SIGKDD. “SIGKDD Awards.” SIGKDD, 2020, www.kdd.org/awards/view/2011-sigkdd-innovation-award-dr.-j.-ross-quinlan.

Patil, M., et al. “Effective Concept for Detection of Web Based Attacks Using ID3 Algorithm.” Semantic Scholar, 1 Jan. 1970,

www.semanticscholar.org/paper/Effective-Concept-for-Detection-of-Web-Based-Using-Patil-Kapgate/d58f2b4698bfa064059956d77c7fa927f1645cc7.

Quinlan, Ross. “Ross Quinlan.” Ross Quinlan's Personal Homepage, www.rulequest.com/Personal/.

“Ross Quinlan.” Wikipedia, Wikimedia Foundation, 13 July 2020, en.wikipedia.org/wiki/Ross_Quinlan.

Sakkaf, Yaser. “Decision Trees for Classification: ID3 Algorithm Explained.” Medium, Towards Data Science, 12 Sept. 2020,

towardsdatascience.com/decision-trees-for-classification-id3-algorithm-explained-89df76e72df1.

Xiaohu, Wang, et al. “An Application of Decision Tree Based on ID3.” Physics Procedia, Elsevier, 12 Apr. 2012,

www.sciencedirect.com/science/article/pii/S1875389212006098.