haotu : an open lab notebook

2016/10/28

The relationship between VIF and R2 (r squared)

Filed under: Math and Stats, R, R Stats — Tags: , , — S @ 09:08

Variance Inflation Factor (VIF) is a common simple stat used to quantify multicollinearity in least squares regressions. It is calculated for each covariate in a regression, with higher values meaning that the covariate is more colinear with the other covariates. It technically measures “how much the variance (the square of the estimate’s standard deviation) of an estimated regression coefficient is increased because of collinearity.” The equation is:

{\displaystyle \mathrm {VIF_{i}} ={\frac {1}{1-R_{i}^{2}}}}

where R2i is from the regression of the covariate i on all the other covariates. The problem is where to draw the cutoff? Is a VIF > 2.5 too high? >5? or how about VIF>10, all have been used as cutoffs. Here is a figure of R2 vs VIF. As you can see, a cuttoff of 2.5 is an R2 of 0.60 and 10 is 0.90! While statistically, you could perhaps get away with these high inflations, what does it mean for your particular question? If you are dealing with a relationship among covariates that is as strong as 0.90, can you really be sure that the model and your interpretations are valid?

rplot

 

 

#VIF function
r<-function(x){1-(1/x)} #r is R2 and x is VIF
x<-seq(1,15,.1) #seq of VIFs
y<-sapply(x,r) #seq of R2
#plot
par(las=1)
plot(x,y,type="l",xlab="VIF",ylab="R2 of regression of focal covariate on all other covariates")
# common VIF cutoffs = 2.5, 5, 10
ly<-c(y[x==2.5],y[x==5],y[x==10])
lx<-c(2.5,5,10)
segments(lx,0,lx,ly,col="red")
segments(lx,ly,0,ly,col="red")

2016/07/14

aggregate by removing NA

Filed under: Manipulate Data in R, R, R Stats, Uncategorized — Tags: — S @ 08:43
na.collapse<-function(x)
{
 x.<-unique(x[!is.na(x)])
   if(length(x.)==0)
 {
   return(NA)
   } else {
     if(length(x.)==1){
     return(x.)
   } else {
     return(paste(x.,collapse="|"))
   }
  }
}
na.collapse(x)

2016/06/24

save rda data file with compression

Filed under: errors in R, R, R, R Stats — Tags: , , , , — S @ 06:36
save(mydata,file="mydata.rda",compress="xz")

find non-ascii in R

Filed under: errors in R, R, R, R Stats, Uncategorized — Tags: , , , , — S @ 06:34
tools::showNonASCII(readLines("myfiles.R"))

2016/04/18

raster cell area size 1 degree cell size km2

Filed under: R, R spatial, R Stats, Uncategorized — Tags: , — S @ 08:14

For a 1×1 degree cell size raster

r <- raster(ncol=360, nrow=180) #or just the default r<-raster()
area(r)
area(r)$values

 

http://gis.stackexchange.com/questions/177622/r-calculate-raster-cell-size-in-map-units

 

http://gis.stackexchange.com/questions/29734/how-to-calculate-area-of-1-x-1-degree-cells-in-a-raster

2016/03/01

select multiple columns of a data.table by column name

Filed under: Manipulate Data in R, R, R Stats — Tags: , — S @ 12:39
myDT[,.(mycolname1,mycolname2,mycolname3)]

2015/03/10

check a phylo object

Filed under: Uncategorized — Tags: , , — S @ 09:07

See here:

https://github.com/emmanuelparadis/checkValidPhylo

or forked

https://github.com/mrhelmus/checkValidPhylo

2015/01/30

aggregate more than one column in data.table

Filed under: R — Tags: , , , , — S @ 11:15
# Average ability by grade
agg1<- fm1[, j=list(mean(x0, na.rm = TRUE),mean(x1, na.rm = TRUE)),by = key]

http://rprogramming.net/aggregate-data-in-r-using-data-table/

2014/11/07

BLAST from R

Filed under: Uncategorized — Tags: , , , , , , — S @ 08:16

There is a nice post here on the topic
The post takes code from the blastSequences function in the R package annotate

However, the code does give an object with enough info to then retrieve the sequences. I edited this code to be able to perform BLASTs in R for a given organism filter. The returned object gives gene ids and definitions.

You can find my cleaned and edited code is here

Error in write.table because data.frame contains a list

Filed under: Uncategorized — Tags: , , , — S @ 06:42

Error in .External2(C_writetable, x, file, nrow(x), p, rnames, sep, eol, :
unimplemented type 'list' in 'EncodeElement'

I got this error when trying to write a table write.table on an object. The object was a data.frame but one of the elements was a list (it is unclear why it was a list, just R magic). The simplest workaround is to convert the data.frame to a matrix as.matrix and save that object with write.table

Older Posts »

Create a free website or blog at WordPress.com.