haotu : an open lab notebook


R intersect polygons and points

Filed under: Uncategorized — S @ 19:42

R intersect polygons and points


# example data from raster package
soil <- st_read(system.file("external/lux.shp", package="raster")) %>% 
  # add in some fake soil type data
  mutate(soil = LETTERS[c(1:6,1:6)]) %>% 

# field polygons
field <- c("POLYGON((6 49.75,6 50,6.4 50,6.4 49.75,6 49.75))",
        "POLYGON((5.8 49.5,5.8 49.7,6.2 49.7,6.2 49.5,5.8 49.5))") %>% 
  st_as_sfc(crs = st_crs(soil)) %>% 
  st_sf(field = c('x','y'), geoms = ., stringsAsFactors = FALSE)

# intersect - note that sf is intelligent with attribute data!
pi <- st_intersection(soil, field)
plot(soil$geometry, axes = TRUE)
plot(field$geoms, add = TRUE)
plot(pi$geoms, add = TRUE, col='red')

# add in areas in m2
attArea <- pi %>% 
  mutate(area = st_area(.) %>% as.numeric())

# for each field, get area per soil type
attArea %>% 
  as_tibble() %>% 
  group_by(field, soil) %>% 
  summarize(area = sum(area))





The relationship between VIF and R2 (r squared)

Filed under: Math and Stats, R, R Stats — Tags: , , — S @ 09:08

Variance Inflation Factor (VIF) is a common simple stat used to quantify multicollinearity in least squares regressions. It is calculated for each covariate in a regression, with higher values meaning that the covariate is more colinear with the other covariates. It technically measures “how much the variance (the square of the estimate’s standard deviation) of an estimated regression coefficient is increased because of collinearity.” The equation is:

{\displaystyle \mathrm {VIF_{i}} ={\frac {1}{1-R_{i}^{2}}}}

where R2i is from the regression of the covariate i on all the other covariates. The problem is where to draw the cutoff? Is a VIF > 2.5 too high? >5? or how about VIF>10, all have been used as cutoffs. Here is a figure of R2 vs VIF. As you can see, a cuttoff of 2.5 is an R2 of 0.60 and 10 is 0.90! While statistically, you could perhaps get away with these high inflations, what does it mean for your particular question? If you are dealing with a relationship among covariates that is as strong as 0.90, can you really be sure that the model and your interpretations are valid?




#VIF function
r<-function(x){1-(1/x)} #r is R2 and x is VIF
x<-seq(1,15,.1) #seq of VIFs
y<-sapply(x,r) #seq of R2
plot(x,y,type="l",xlab="VIF",ylab="R2 of regression of focal covariate on all other covariates")
# common VIF cutoffs = 2.5, 5, 10


aggregate by removing NA

Filed under: Manipulate Data in R, R, R Stats, Uncategorized — Tags: — S @ 08:43
   } else {
   } else {


replace empty cells in google sheets

Filed under: Google, Sheets, Uncategorized — S @ 10:35


center gis point of a polygon shape file

Filed under: arcmap, R spatial — Tags: , — S @ 08:20


In the map document, open the attribute table for the polygon feature class by right-clicking the layer name.
In the attribute table, navigate to Table Options > Add Field and add two new fields of type Double. Name one ‘Longitude’ and the other ‘Latitude’.
Right-click the Longitude field and select Calculate Geometry.
In the Calculate Geometry dialog box, select ‘X Coordinate of Centroid’ from the Property drop-down menu. Click OK.
Right-click the Latitude field and select Calculate Geometry.
In the Calculate Geometry dialog box, select ‘Y Coordinate of Centroid’ from the Property drop-down menu. Click OK.


save rda data file with compression

Filed under: errors in R, R, R, R Stats — Tags: , , , , — S @ 06:36

find non-ascii in R

Filed under: errors in R, R, R, R Stats, Uncategorized — Tags: , , , , — S @ 06:34


Add timestamp to Google Sheets when data are entered

Filed under: Google, Google Docs, Sheets — Tags: , , — S @ 07:03

I wanted a time/date stamp for when a particular cell was populated with data.

First you will need to open the Script editor under Tools. Then paste and edit the below code.


function onEdit(event) { 
 var timezone = "GMT-4"; 
 var timestamp_format = "MM-dd-yyyy HH:mm"; // Timestamp Format. 
 var updateColName = "value"; // Column where data are populated
 var timeStampColName = "Timestamp"; // Column where timestamp is automatically populated
 var sheet = event.source.getSheetByName("data"); //Name of the sheet where you want to run this script. 
 var actRng = event.source.getActiveRange(); 
 var editColumn = actRng.getColumn(); 
 var index = actRng.getRowIndex(); 
 var headers = sheet.getRange(1, 1, 1, sheet.getLastColumn()).getValues(); 
 var dateCol = headers[0].indexOf(timeStampColName); 
 var updateCol = headers[0].indexOf(updateColName); updateCol = updateCol+1; 
 if (dateCol > -1 && index > 1 && editColumn == updateCol) { // only timestamp if 'Timestamp' header exists, but not in the header row itself! 
 var cell = sheet.getRange(index, dateCol + 1); 
 var date = Utilities.formatDate(new Date(), timezone, timestamp_format); 
 cell.setValue(date); } 

See here for original code as well as a video

Note that if you run the script (hit Play button) in the script editor you will get an error on line 6 simply because the “event” cannot be called in the editor. The code should still work fine.


assign nearest point to a polygon

Filed under: R, R spatial, R Stats, Uncategorized — S @ 11:31


n <- length(wer)
nearest <- character(n)
for (i in seq_along(nearest)) {
 nearest[i] <- names(polys)[which.min(gDistance(wer[i,], polys, byid=TRUE))]

This may give you warnings if your code is not projected into a planar coordinate system



add unique id to duplicated grouped rows

Filed under: Manipulate Data in R, R, R, Uncategorized — S @ 12:30
Older Posts »

Create a free website or blog at WordPress.com.