Classification: Assign instances to predefined classes
Data basis:
Condition:
- a priori knowledge of classification for some instances (supervised learning)
Model building:
Generalization:
- apply rules to new instances
Classification: Assign instances tp predefined classes
Exemplary methods
Classification Examples
Can you think of binary vs. nominal classes for these examples?
Credit scoring:
- nominal
Marketing responses:
- binary: response vs. no response
Geo-Temporal events:
- can be both
Decision Tree Terminology
Which types of trees are there?
Binary tree:
- each node splits data at most in 2 sets
Classification tree:
- splits can lead to more than 2 branches
Decision tree:
- classes are nominal (categorical) or ordinal
Regression tree:
- classes are cardinal (continuous values)
Decision Tree Terminology
Input
instance pool ((x1, …, xn), c)
with x = set of independent attributes c = class attribute
Decision Tree Terminology
Output
Full tree
Decision Tree Terminology
Objective
Formulate rules of the type:
If (condition 1) AND … AND (condition n) then c
Decision Tree Terminology
Rule
Path from root to leaf
Generating a decision tree
Algorithm steps
Generating a decision tree
Algorithm design varies in …
Which decision tree algorithms are there?
(CH)AID
- (chi-squared) automatic interaction detection
CART
- classification and regression trees
ID3
- iterative dichotomizer
Which decision tree algorithms are there?
(CH)AID
(chi-squared) automatic interaction detection
Which decision tree algorithms are there?
CART
Classification and regression trees
Which decision tree algorithms are there?
ID3
Interactive dichotomizer 3
ID3: classification by entropy
Evaluating splits by information entropy
Decision tree pruning
What is pruning?
Simplify complicated decision trees to increase efficiency and avoid over-fitting oder under-fitting
Decision tree pruning
What kinds of pruning are there?
Top-down-pruning:
-> stopping criteria when building trees
Bottom-up-pruning:
Decision tree quality
Which indicators do you use to evaluate decision tree quality?
number of leaves
- number of generated rules (can be too many or too few)
depths of tree
- maximum rule length
external path length
- sum of all path lengths from root to leaf, determines memory requirements
weighted external path length
- sum of path lengths from root to leaf multiplied by the number of instances represented, measures classification costs
Decision trees: conclusion
Random forests
What are random forests?
Random forests are an example for approaches that generate several randomized instances of a model and classify data based on the aggregated results from this model set
Random forests
How do random forests work?
Generation: Generate k trees by
Generalisation: For each instance in the test set
Gradient boosted trees
Steps
Support Vector Machines