Regression validation: Difference between revisions

Content deleted Content added
Goodness of fit: Simplify explanation
 
Line 6:
{{Main|Goodness of fit}}
 
One measure of goodness of fit is the [[coefficient of determination]], often denoted, ''R''<sup>2</sup>. In ([[coefficient of determination]]), which in ordinary least squares]] with an intercept, it ranges between 0 and 1. However, an ''R''<sup>2</sup> close to 1 does not guarantee that the model fits the data well:. For example, if the functional form of the model does not match the data, ''R''<sup>2</sup> can be high despite a poor model fit. as [[Anscombe's quartet]] shows,consists aof four example data sets with similarly high ''R''<sup>2</sup> canvalues, occurbut indata thethat presencesometimes ofclearly misspecificationdoes ofnot fit the functionalregression formline. of a relationship or inInstead, the presencedata ofsets include [[Outlier|outliers]], that[[High-leverage distortpoint|high-leverage thepoints]], trueor relationshipnon-linearities.
 
One problem with the ''R''<sup>2</sup> as a measure of model validity is that it can always be increased by adding more variables into the model, except in the unlikely event that the additional variables are exactly uncorrelated with the dependent variable in the data sample being used. This problem can be avoided by doing an [[F-test]] of the statistical significance of the increase in the ''R''<sup>2</sup>, or by instead using the [[adjusted R-squared|adjusted ''R''2]].