haotu : an open lab notebook

2016/07/14

aggregate by removing NA

Filed under: Manipulate Data in R, R, R Stats, Uncategorized — Tags: — S @ 08:43
na.collapse<-function(x)
{
 x.<-unique(x[!is.na(x)])
   if(length(x.)==0)
 {
   return(NA)
   } else {
     if(length(x.)==1){
     return(x.)
   } else {
     return(paste(x.,collapse="|"))
   }
  }
}
na.collapse(x)

2016/06/24

save rda data file with compression

Filed under: errors in R, R, R, R Stats — Tags: , , , , — S @ 06:36
save(mydata,file="mydata.rda",compress="xz")

find non-ascii in R

Filed under: errors in R, R, R, R Stats, Uncategorized — Tags: , , , , — S @ 06:34
tools::showNonASCII(readLines("myfiles.R"))

2016/04/22

add unique id to duplicated grouped rows

Filed under: Manipulate Data in R, R, R, Uncategorized — S @ 12:30
library(plyr)
id(x)

2016/04/18

stack rasters with different extents etc..

Filed under: arcmap, R, R, R spatial, R Stats, Uncategorized — S @ 13:42
two<-resample(two,one)
stack(one,two)

2016/03/01

select multiple columns of a data.table by column name

Filed under: Manipulate Data in R, R, R Stats — Tags: , — S @ 12:39
myDT[,.(mycolname1,mycolname2,mycolname3)]

2016/02/22

true copy of a data.table

Filed under: Manipulate Data in R, R — S @ 10:13
mycopy<-copy(DT)

2016/02/19

Join polygons like countries from mapdata

Filed under: arcmap, R, R, R spatial — S @ 13:28
library(maps)
library(mapdata)
library(maptools)
library(rgeos)

DR<-map(database = 'worldHires',regions="Dominican Republic",fill=TRUE)
 IDs<-DR$names
 DR<-map2SpatialPolygons(DR,IDs, proj4string=CRS("+proj=longlat +datum=WGS84"))
Haiti<-map(database = 'worldHires',regions="Haiti",fill=TRUE)
 IDs<-Haiti$names
 Haiti<-map2SpatialPolygons(Haiti,IDs, proj4string=CRS("+proj=longlat +datum=WGS84"))
Hispaniola<-gUnion(DR, Haiti)
plot(Hispaniola)

Note fill=TRUE

2015/01/22

add column in data.table

Filed under: Manipulate Data in R, R, R, R Stats — S @ 07:09
DT[,new1:=v$r1]

 

Here is a post to add multiple columns to a data.table

http://stackoverflow.com/questions/11308754/add-multiple-columns-to-r-data-table-in-one-function-call

2014/09/05

data.table and data.frame differences

Filed under: Manipulate Data in R, R, R Stats — S @ 16:13

from here

  • DT[3] refers to the 3rd row, but DF[3] refers to the 3rd column
  • DT[3,] == DT[3], but DF[,3] == DF[3] (somewhat confusingly)
  • For this reason we say the comma is optional in DT, but not optional in DF
  • DT[[3]] == DF[3] == DF[[3]]
  • DT[i,] where i is a single integer returns a single row, just like DF[i,], but unlike a matrix single row subset which returns a vector.
  • DT[,j,with=FALSE] where j is a single integer returns a one column data.table, unlike DF[,j]which returns a vector by default
  • DT[,"colA",with=FALSE][[1]] == DF[,"colA"].
  • DT[,colA] == DF[,"colA"]
  • DT[,list(colA)] == DF[,"colA",drop=FALSE]
  • DT[NA] returns 1 row of NA, but DF[NA] returns a copy of DF containing NA throughout.
  • The symbol NA is type logical in R, and is therefore recycled by [.data.frame. Intention wasprobably DF[NA_integer_]. [.data.table does this automatically for convenience.
  • DT[c(TRUE,NA,FALSE)] treats the NA as FALSE, but DF[c(TRUE,NA,FALSE)] returns NA rows
    for each NA
  • DT[ColA==ColB] is simpler than DF[!is.na(ColA) & !is.na(ColB) & ColA==ColB,]
  • data.frame(list(1:2,"k",1:4)) creates 3 columns, data.table creates one list column.
  • check.names is by default TRUE in data.frame but FALSE in data.table, for convenience.
  • stringsAsFactors is by default TRUE in data.frame but FALSE in data.table, for efficiency.
  • Since a global string cache was added to R, characters items are a pointer to the single cached string and there is no longer a performance benefit of coverting to factor.
  • Atomic vectors in list columns are collapsed when printed using “, ” in data.frame, but “,” in data.table with a trailing comma after the 6th item to avoid accidental printing of large embedded objects.
  • In [.data.frame we very often set drop=FALSE. When we forget, bugs can arise in edge cases where single columns are selected and all of a sudden a vector is returned rather than a single column data.frame. In [.data.table we took the opportunity to make it consistent and drop drop.
  • When a data.table is passed to a data.table-unaware package, that package it not concerned with any of these differences; it just works
Older Posts »

Blog at WordPress.com.