Exam ST8 in Sep 2015 provide 2 distinctive questions about model validation and model (factor selection) in question 10 and 11. The summary is:
Model (factor) selection
- Test score (AIC, BIC, Chi Square, F test)
- Hat matrix (score at which log likelihood falls off from optimum solution). ie Steep curvature means the parameters is tightly defined.
- Compare model relativity with expert judgment. Ie: draw graph of best estimate of predicted model +- 2stddev to see actual result fall inside the graph
- Consistency check with factor and interaction
Model validation
- Actual against expected
- Plot residual
- Gain curves
- Lift curves
With me, the differences between these two are very small. Both using some statistical method to compare between models.
Many of the method in model validation can be used in model selection and vice versa. After all it is just the trade off between over fitting and under fitting (ie: Test score AIC can be used in model selection as well).
In Kaggle some machine learning practioner is using AIC from both train, validation and test.
Model (factor) selection
- Test score (AIC, BIC, Chi Square, F test)
- Hat matrix (score at which log likelihood falls off from optimum solution). ie Steep curvature means the parameters is tightly defined.
- Compare model relativity with expert judgment. Ie: draw graph of best estimate of predicted model +- 2stddev to see actual result fall inside the graph
- Consistency check with factor and interaction
Model validation
- Actual against expected
- Plot residual
- Gain curves
- Lift curves
With me, the differences between these two are very small. Both using some statistical method to compare between models.
Many of the method in model validation can be used in model selection and vice versa. After all it is just the trade off between over fitting and under fitting (ie: Test score AIC can be used in model selection as well).
In Kaggle some machine learning practioner is using AIC from both train, validation and test.