How can forecasts be evaluated?
- compare to naive forecast
How can forecasts be evaluated?
Compare the forecast to what actually happened
How can forecasts be evaluated?
Compare to naive forecast
Measuring forecast performance
Error measures for numerical values
Absolut: RMSE (root mean squared error)
-> depends on scale
Percentage: MAPE (mean absolute percentage error)
-> independent of scale
Self-fulfilling forecasts can have bad consequences
Example
Measuring classification performance
Error measures for categorical values
Measuring classification performance
Error measures for categorical values
Error rate per category
Recall = no. of instances correctly assigned to class / no. of instances that are actually in class
(starts from the true assignment)
Precision = no. of instances that are actually in class / no. of instances assigned to class
(starts from the predicted assignment)
Measuring classification performance
Error measures for categorical values
Error rate across categories
- weighted according to exogenous or endogenous importance of a class
Measuring classification performance
Error measures for categorical values
Comparing error rates
- expected error (probability) vs. observed error vs. error from benchmark approaches
Measuring classification performance
Benchmarking
Possible benchmarks
Measuring classification performance
Benchmarking
Benchmark factors beyond accuracy
Cross-Validation and Bootstrapping
Splitting the data set for evaluation
Example: Decision tree
Training set: build the tree
Validation set: prune the tree
Test set: evaluate the tree’s predictions
Cross-Validation and Bootstrapping
Splitting the data set for evaluation
Split the data set:
Build the model:
- Training set
- Validation set
(both overlapping)
Evaluate the model:
- test set
Cross-Validation and Bootstrapping
Hold out 1: k-fold Cross validation
Split the data set into k partitions of equal size
Repeat the cross validation k times, where the hold-out partition alternates across all k partitions
Average the result over the k repetitions for a single measure
Cross-Validation and Bootstrapping
Hold out 2: Bootstrap
What’s a lift factor?
Computing the lift factor for deterministic classification
Steps
Computing the lift factor for probabilistic classification
Steps
Lift chart