Decision Trees
Building Decision Tree
Choosing the Splitting Attribute
A Criterion for Attribute Selection
Computing Information

Example: attribute “Outlook”

Computing the Information Gain

Continuing to Split

The Final Decision Tree

Wish List for a Purity Measure

Entropy
Expected Information Gain
Gain(S,a) is the information gained adding a sub-tree (Reduction in number of bits needed to classify an instance)
Problems?

Highly-Branching Attributes
Split for ID Code Attribute
Entropy of split = 0 (since each leaf node is “pure”), having only one case. Information gain is maximal for ID code

Gain Ratio
Computing the Gain Ratio
Example: intrinsic information for ID code
intrinsic_info(1,1, ,1) = 14 x( - 1/14 x log1/14)= 3.807 bits Importance of attribute decreases as intrinsic information grows.

More on the Gain Ratio
The Splitting Criterion in CART
Gini Index for 2 Attribute Values
Gini Index for 2 Attribute Values Example

Gini Index 2 Attributes Example 2
Select the split that decreases the Gini Index most. This is done over all possible places for a split and all possible variables to split.

Generate_DT(samples, attribute_list)
Time Complexity of Basic Algorithm
Scalability of DT Algorithms