Loop through files to loop through variables!

Say you have a bunch of data files formatted in exactly the same way (which is not rare if you are scraping or if the data are clean), how do you loop through all the files at once, extract all the useful information, and bind them to a big matrix? Consider the following code:

Suppose all my files are named “1.csv”, …, “5.csv”, and we loop through files by

file.names <- c("1", "2", "3", "4", "5")
for (i in 1:length(file.names)) {
data <- readLines(paste(file.names[i], "csv", sep = "."))
read <- read.csv(textConnection(data), header = TRUE, stringsAsFactors = FALSE)
assign(paste(file.names[i]), read)
}

Oftentimes you would need to reshape your data. Suppose we are looking at such data

year    place1    place2
1999    1.1       7.8
...

An efficient way to reshape your data is to write a melt function:

my.melt <- function(x){
  x <- melt(x, id.vars=c('year'), 
  variable.name='place')
  x
}

Since all the files are the same, we get a  long list of variables that have the same dimension. Thus, we can merge all of them. Consider the example where I want to merge two of my variables:

var.names=list(var1, var2)
for (i in 1:length(var.names)){
  var.names[[i]] <- my.melt(var.names[[i]]) 
}

Alternatively, you can use lapply()

reshape <- lapply(var.names, my.melt)

Now, we need to cbind() all our data:

datalist = list() # create empty list
for (i in 1:5) {
  datalist[[i]] <- reshape[[i]]
}
merge <- do.call(cbind, datalist)
names(merge) <- c(var.names)

Definitely not the smartest way — but it works.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s