Define:
feature engineering
The process of selecting, modifying, or creating input variables (features) that help improve the performance of a machine learning model.
Good feature engineering can significantly enhance the accuracy of a model.
True or False:
Feature engineering is not necessary for machine learning models.
False
Effective feature engineering can greatly affect a model’s success by providing better inputs for learning.
Fill in the blank:
The dataset is split into ______ and testing sets to evaluate a model’s performance.
training
What is the purpose of the training dataset?
To teach a machine learning model by providing examples that the model can learn from.
What is the purpose of the testing dataset?
To evaluate the performance of a trained machine learning model on new, unseen data.
What is model evaluation in machine learning?
The process of assessing how well a machine learning model performs on a given dataset using various metrics.
True or False:
A model that performs well on training data will always perform well on new data.
False
This is not always true due to overfitting, where the model learns the training data too well and fails to generalize to new data.
What does overfitting mean?
A situation where a model learns the training data too closely, capturing noise as if it were a true pattern, which negatively impacts its performance on new data.
What does underfitting mean?
A situation where a model is too simple to capture the underlying patterns in the data, resulting in poor performance on both training and testing data.
Fill in the blank:
A common metric for evaluating classification models is ______.
accuracy
What is accuracy in the context of model evaluation?
The proportion of correct predictions made by a machine learning model out of all predictions made.
What is a limitation of using accuracy as an evaluation metric?
Accuracy can be misleading in imbalanced datasets where one class is much more common than the others.
What is precision in model evaluation?
The ratio of true positive predictions to the total number of positive predictions made, measuring the accuracy of positive predictions.
What is recall in model evaluation?
The ratio of true positive predictions to the total number of actual positive cases, measuring the model’s ability to find all the relevant cases.
What do the F1 score represent?
A metric that combines precision and recall into a single score by taking their harmonic mean, used to balance both metrics when they are important.
Which metric would you prioritize for a medical diagnosis model: precision or recall?
recall
Why would you prioritize recall for a medical diagnosis model?
In medical diagnosis, it’s crucial to identify as many positive cases as possible to ensure patients receive necessary treatment, even if some false positives occur.
What is cross-validation in the context of model evaluation?
A technique for assessing how a model will generalize to an independent dataset by training and testing the model multiple times with different data splits.
Fill in the blanks:
______ ______ is the process of selecting the best features to use in a machine learning model.
Feature selection
Why is feature selection important?
It helps reduce the complexity of a model, improves performance, and reduces the risk of overfitting by eliminating irrelevant or redundant features.
How does feature scaling affect machine learning models?
Feature scaling standardizes the range of independent variables, improving the convergence speed of algorithms and ensuring features contribute equally to the results.
What is the difference between normalization and standardization in feature scaling?
What is a confusion matrix?
A table used to evaluate the performance of a classification model, showing the true versus predicted classifications.
In a scenario where you need to recommend movies to users, which machine learning subfield would be most applicable?
Collaborative filtering, a technique within machine learning used in recommendation systems.