haotu : an open lab notebook

2014/02/14

scrape html table R

Filed under: Manipulate Data in R, R — Tags: , , , — S @ 03:25

 

library(XML)
theurl <- "http://en.wikipedia.org/wiki/Brazil_national_football_team"
tables <- readHTMLTable(theurl)
n.rows <- unlist(lapply(tables, function(t) dim(t)[1]))

the picked table is the longest one on the page

tables[[which.max(n.rows)]]


Advertisements

Blog at WordPress.com.