Friday, November 23, 2007

Multicollinearity

The predictors should not correlate. In the stepwise selection one variable might take the prediction of the another variable into the model and the second variable will not be taken in to the model. The variables might get also inconsistent correlation coefficients (negative if the effect is positive).

If we use wait events, there will in a lot of cases be multicollinearity. In other words, if we see a lot of waits on "db file sequential read", it may be expected that we will also see significant waits on "db file scattered read".

Using v$sysstat makes this even more of a problem. For example, "user I/O wait time" will almost certainly be correlated with "physical reads".

As such, we need to programmatically identify those predictors that are not correlated with each other.

No comments:

Locations of visitors to this page