Archive for ‘Uncategorized’

February 29, 2012

When did C-section get popular?

C-section birth has a long history. But when did it get really popular? I generated this simple plot comparing the number births on a weekday and the number on a Saturday or Sunday. We can infer that at least in the place where I live, C-section is not popular until 1960.


January 30, 2012

Resolving contradictory gender information

In biomedical data analysis, it is conventional to standardize the quantity of interest by age and gender. When the data is the result of merging multiple sources, some subjects of interest may have contradictory gender information. If there are too many subjects with two genders, it would not appropriate to just exclude them from the analysis. One reasonable action is to assign a subject the gender that shows up in most records, assuming that the other gender is just noise as it appears not as frequently.

A fast way to implement this voting scheme in SQL is to code the gender information in numbers (e.g., 0 for males and 1 for females) and then use the aggregate construct round(avg(gender)). This is much faster than first counting for each subject the records of each gender and then comparing the counts.

September 7, 2011

How to read a multiple-sheet Excel file into R?

In my daily job, I have to deal with many Excel 2003 files. These files often have multiple sheets, as a sheet in Excel can have at most 65536 rows. I find Hans-Peter Suter’s xlsReadWrite package very useful here.

As my data often contain columns of date-time information, I use the following values for the read.xls() function:

dateTime = 'isodatetime',
stringsAsFactors = FALSE

The date-time column will be returned as a vector of character strings, which can be converted into POSIXct using the function as.POSIXct().

For multiple sheets Excel files, I suggest the column classes to be as simple as possible—for example, use characters instead of factors. Changing of column classes should be left after sheets are read-in and combined. Below is an example code snippet for reading multiple-sheet parameters:

sheet1 <- read.xls(file='foo.xls', colNames = TRUE, sheet = 1, dateTime = 'isodatetime', stringsAsFactors = FALSE)
cnames <- colnames(sheet1)
sheets <- list(sheet1)
for(i in 2:4){
sheets[[i]] <- read.xls(file='foo.xls', colNames = cnames, sheet = i, dateTime = "isodatetime", stringsAsFactors = FALSE)
combineddata <- c()
for(i in 1:length(sheets)){
combineddata <- rbind(combineddata, sheets[[i]])

August 30, 2011

Convert Date-Time in Excel to POSIXct

This is an one-liner to convert a (numeric) date-time column from Excel to POSIXct:

rdatetime <- as.POSIXct( exceldatetime*24*3600 + as.POSIXct("1899-12-30 00:00") )