haotu : an open lab notebook

2016/10/28

The relationship between VIF and R2 (r squared)

Filed under: Math and Stats, R, R Stats — Tags: , , — S @ 09:08

Variance Inflation Factor (VIF) is a common simple stat used to quantify multicollinearity in least squares regressions. It is calculated for each covariate in a regression, with higher values meaning that the covariate is more colinear with the other covariates. It technically measures “how much the variance (the square of the estimate’s standard deviation) of an estimated regression coefficient is increased because of collinearity.” The equation is:

{\displaystyle \mathrm {VIF_{i}} ={\frac {1}{1-R_{i}^{2}}}}

where R2i is from the regression of the covariate i on all the other covariates. The problem is where to draw the cutoff? Is a VIF > 2.5 too high? >5? or how about VIF>10, all have been used as cutoffs. Here is a figure of R2 vs VIF. As you can see, a cuttoff of 2.5 is an R2 of 0.60 and 10 is 0.90! While statistically, you could perhaps get away with these high inflations, what does it mean for your particular question? If you are dealing with a relationship among covariates that is as strong as 0.90, can you really be sure that the model and your interpretations are valid?

rplot

 

 

#VIF function
r<-function(x){1-(1/x)} #r is R2 and x is VIF
x<-seq(1,15,.1) #seq of VIFs
y<-sapply(x,r) #seq of R2
#plot
par(las=1)
plot(x,y,type="l",xlab="VIF",ylab="R2 of regression of focal covariate on all other covariates")
# common VIF cutoffs = 2.5, 5, 10
ly<-c(y[x==2.5],y[x==5],y[x==10])
lx<-c(2.5,5,10)
segments(lx,0,lx,ly,col="red")
segments(lx,ly,0,ly,col="red")

Advertisements

2013/11/15

Model Averaging Sum of Squares :: Weighted Explained Variance from AICc Model Weights

Filed under: Math and Stats, Model Averaging, R, R Stats — S @ 02:51

I have been using the MuMIn package for model averaging. Below is a function that gives the average sum of squares for each predictor variable across the candidate models (i.e., the sum of squares for the model average).

Download this code as a pdf:

model_averaging_SS

 
 #This code calculates model averaged Sum of Squares as a weighted mean of the SS for each predictor across the candidate set of models.
 #The protocol was that used to average coefficients across the candidate set.
 #See Johnson J.B. & Omland K.S. (2004). Model selection in ecology and evolution. Trends in Ecology & Evolution, 19, 101-108.
 
 require(MuMIn)
 require(car)
 
 getSS<-function(x) { #x is a standard glm model object
 if(length(coef(x))>1){
 An<-Anova(x,test.statistic="F")
 SS<-as.matrix(An$S)
 rownames(SS)<-rownames(An)
 return(t(SS))
 }
 }
 
 getSSs<-function(m.s,data=NULL){
 if(is.null(data)){stop("must include the matching data object for the model selection object, m.s")}
 m.fit<-lapply(lapply(get.models(m.s),formula),glm,data=data)
 
 hold<-matrix(data=0,length(m.fit),length(attr(m.s,"global")$coefficients))
 colnames(hold)<-c("Residuals",names(attr(m.s,"global")$coefficients)[-1])
 for(i in 1:(length(m.fit)))
 {
 SS<-getSS(m.fit[[i]])
 hold[i,colnames(SS)]<-SS
 }
 return(hold)
 }
 
 avgSS<-function(m.s,data=NULL,w.min=0){
 w<-m.s$weight
 hold<-as.matrix(getSSs(m.s,data))
 hold<-hold[w>w.min,]
 w<-w[w>w.min]
 if(is.null(dim(hold))){
 warning(paste("There was only one model with a weight > ",w.min," and returned the SS for this one model.",sep=""))
 return(hold)
 } else {
 return(apply(hold,2,weighted.mean,w=w,na.rm=TRUE))
 }
 }
 
 #EXAMPLE
 CorrNorm <- function(rho = 0.4, X1 = rnorm(100)) { #produces correlated Gaussian variables
 X2<-rnorm(length(X1))
 Z = data.frame(Y=rho*X1+sqrt(1-rho^2)*X2,X1)
 return(Z)
 } #http://r.789695.n4.nabble.com/generate-two-sets-of-random-numbers-that-are-correlated-td3736161.html
 y<-CorrNorm(.8)
 y<-data.frame(y,X2=CorrNorm(.2,X1 = y[,1])[,1])
 lm.D <- lm(Y ~ X1 + X2,data=y) #Global Model
 (m.s<-dredge(lm.D)) #All submodels of the global model (a model.selection object)
 avgSS(m.s,y) #Average Sum of Squares for all the models
 avgSS(m.s,y,w.min=0.5) #Average Sum of Squares for all models with weights greater than w.min
 

2013/07/31

Simple Cholesky decomposition example

Filed under: Math and Stats — S @ 05:02

There is a nice example here

the formula

the matrix

the solution

Blog at WordPress.com.