haotu : an open lab notebook


R error in loading rJava :: error: No CurrentVersion entry in Software/JavaSoft registry!

Filed under: Java, R, R Stats — S @ 17:10

When I tried to load


a dependency of


I got the following error

Loading required package: rJava
Error : .onLoad failed in loadNamespace() for 'rJava', details:
 call: fun(libname, pkgname)
 error: No CurrentVersion entry in Software/JavaSoft registry! Try re-installing Java and make sure R and Java have matching architectures.
In addition: Warning message:
display list redraw incomplete 
Error: package ‘rJava’ could not be loaded

I am running Windows 7 on a 64 bit machine and the problem was that the only version of Java I had installed was a 32bit version. You will need to get the newest version of JRE (Java Runtime Environment). I found it here and here.

data.table and data.frame differences

Filed under: Manipulate Data in R, R, R Stats — S @ 16:13

from here

  • DT[3] refers to the 3rd row, but DF[3] refers to the 3rd column
  • DT[3,] == DT[3], but DF[,3] == DF[3] (somewhat confusingly)
  • For this reason we say the comma is optional in DT, but not optional in DF
  • DT[[3]] == DF[3] == DF[[3]]
  • DT[i,] where i is a single integer returns a single row, just like DF[i,], but unlike a matrix single row subset which returns a vector.
  • DT[,j,with=FALSE] where j is a single integer returns a one column data.table, unlike DF[,j]which returns a vector by default
  • DT[,"colA",with=FALSE][[1]] == DF[,"colA"].
  • DT[,colA] == DF[,"colA"]
  • DT[,list(colA)] == DF[,"colA",drop=FALSE]
  • DT[NA] returns 1 row of NA, but DF[NA] returns a copy of DF containing NA throughout.
  • The symbol NA is type logical in R, and is therefore recycled by [.data.frame. Intention wasprobably DF[NA_integer_]. [.data.table does this automatically for convenience.
  • DT[c(TRUE,NA,FALSE)] treats the NA as FALSE, but DF[c(TRUE,NA,FALSE)] returns NA rows
    for each NA
  • DT[ColA==ColB] is simpler than DF[!is.na(ColA) & !is.na(ColB) & ColA==ColB,]
  • data.frame(list(1:2,"k",1:4)) creates 3 columns, data.table creates one list column.
  • check.names is by default TRUE in data.frame but FALSE in data.table, for convenience.
  • stringsAsFactors is by default TRUE in data.frame but FALSE in data.table, for efficiency.
  • Since a global string cache was added to R, characters items are a pointer to the single cached string and there is no longer a performance benefit of coverting to factor.
  • Atomic vectors in list columns are collapsed when printed using “, ” in data.frame, but “,” in data.table with a trailing comma after the 6th item to avoid accidental printing of large embedded objects.
  • In [.data.frame we very often set drop=FALSE. When we forget, bugs can arise in edge cases where single columns are selected and all of a sudden a vector is returned rather than a single column data.frame. In [.data.table we took the opportunity to make it consistent and drop drop.
  • When a data.table is passed to a data.table-unaware package, that package it not concerned with any of these differences; it just works

make and check directories in R

Filed under: Manipulate Data in R, R, R Stats — Tags: , — S @ 07:32


Preserve dimensions in R when subsetting with names

Filed under: Manipulate Data in R, R, R Stats — S @ 12:45

Blog at WordPress.com.