Archive for September, 2011

September 30, 2011

R to JSON for D3.js and Protovis

D3.js and Protovis use the file format JSON to hold input data. In particular, the most natural format is an array of JSON objects, such as

[ { "Petal.Width" : 0.2, "Species" : "setosa" }, { "Petal.Width" : 1.3, "Species" : "versicolor" }, { "Petal.Width" : 1.8, "Species" : "virginica" } ]

There seems to be no readily available tool for transform a R data frame into a JSON array such that each row in R becomes an JSON object in the array. The rjson package actually converts a data frame into a JSON hashmap, in which each column of the data frame becomes a named array (see below).

{"Petal.Width":[0.2,1.3,1.8],"Species":["setosa","versicolor","virginica"]}

Similar question has been asked at StackOverflow and a solution proposed. The solution involves transposing the data frame before feeding to the toJSON funciton in the rjson package.

toJSON(data.frame(t(dtf)))

The solution works for data frames with character strings. However, when a data frame has numeric columns, the final result will contain problematic quotation marks around numeric values, which makes the JSON file not suitable for D3.js or Protovis (see below). This is particularly problematic when the numbers are extremely large or small, in which case scientific notation is likely used by R to format numeric values.

{"X50":{"Petal.Width":"0.2","Species":"setosa"},"X100":{"Petal.Width":"1.3","Species":"versicolor"},"X150":{"Petal.Width":"1.8","Species":"virginica"}}

To address this problem, I wrote a function that will convert a data frame directory to a JSON string.

toJSONarray <- function(dtf){
clnms <- colnames(dtf)
name.value <- function(i){
quote <- '';
if(!class(dtf[, i]) %in% c('numeric', 'integer')){
quote <- '"';
}
paste('"', i, '" : ', quote, dtf[,i], quote, sep='')
}
objs <- apply(sapply(clnms, name.value), 1, function(x){paste(x, collapse=', ')})
objs <- paste('{', objs, '}')
res <- paste('[', paste(objs, collapse=', '), ']')
return(res)
}

With this function, the output will work in D3.js and Protovis:

[ { "Petal.Width" : 0.2, "Species" : "setosa" }, { "Petal.Width" : 1.3, "Species" : "versicolor" }, { "Petal.Width" : 1.8, "Species" : "virginica" } ]

Advertisements
September 20, 2011

Portable Web Server

I often need to show interactive visualization to clients. As the data in most case is confidential, I cannot put the visualization on a website with public access. Hence I need a USB drive which contains both the data and the web server.

A widely know portable web server is XAMPP. However, XAMPP contains a large number of files, syncing the folder with the desktop is often slow. Instead, I found Mongoose to be much easier to use. To run a web server, I only needs to copy the “mongoose.exe” file to the folder of my html files and double click the .exe file.

Another software I find useful for displaying visualization at a client’s computer is the portable Opera browser. It is an unfortunate fact that many offices still have only IE installed. I like Opera more than Firefox for visualization as Opera renders the HTML 5 “range” tag as a slider Firefox does not.

September 7, 2011

How to read a multiple-sheet Excel file into R?

In my daily job, I have to deal with many Excel 2003 files. These files often have multiple sheets, as a sheet in Excel can have at most 65536 rows. I find Hans-Peter Suter’s xlsReadWrite package very useful here.

As my data often contain columns of date-time information, I use the following values for the read.xls() function:

dateTime = 'isodatetime',
stringsAsFactors = FALSE

The date-time column will be returned as a vector of character strings, which can be converted into POSIXct using the function as.POSIXct().

For multiple sheets Excel files, I suggest the column classes to be as simple as possible—for example, use characters instead of factors. Changing of column classes should be left after sheets are read-in and combined. Below is an example code snippet for reading multiple-sheet parameters:

library(xlsReadWrite)
sheet1 <- read.xls(file='foo.xls', colNames = TRUE, sheet = 1, dateTime = 'isodatetime', stringsAsFactors = FALSE)
cnames <- colnames(sheet1)
sheets <- list(sheet1)
for(i in 2:4){
sheets[[i]] <- read.xls(file='foo.xls', colNames = cnames, sheet = i, dateTime = "isodatetime", stringsAsFactors = FALSE)
}
combineddata <- c()
for(i in 1:length(sheets)){
combineddata <- rbind(combineddata, sheets[[i]])
}