Archive for ‘Posts’

June 24, 2014

Detecting contaminated birthdates using generalized additive models

Erroneous patient birthdates are common in health databases. Detection of these errors usually involves manual verification, which can be resource intensive and impractical. By identifying a frequent manifestation of birthdate errors, this paper presents a principled and statistically driven procedure to identify erroneous patient birthdates.

Generalized additive models (GAM) enabled explicit incorporation of known demographic trends and birth patterns. With false positive rates controlled, the method identified birthdate contamination with high accuracy. In the health data set used, of the 58 actual incorrect birthdates manually identified by the domain expert, the GAM-based method identified 51, with 8 false positives (resulting in a positive predictive value of 86.0% (51/59) and a false negative rate of 12.0% (7/58)). These results outperformed linear time-series models.

The GAM-based method is an effective approach to identify systemic birthdate errors, a common data quality issue in both clinical and administrative databases, with high accuracy.

See full text at

December 16, 2012

Override a non-generic method in S3 OOP system

In R OOP, you can override a generic method for your custom class. For example, you can define print() and plot() methods for an class Foo. It is also easy to overrisk a non-generic method if you use S4 OOP system, through the S4 function setGeneric(). But S3 does not have a function equivalent to setGeneric().

Suppose you want to write a method sample() for class Foo. As the built-in sample() function is not generic, simply defining sample.Foo() will break the original sample(). For example, calling


will result in the following error message:

Error in UseMethod(“sample”) :
no applicable method for ‘sample’ applied to an object of class “c(‘double’, ‘numeric’)”

Here is the correct way to override a non-generic method in S3. First, you copy the non-generic method to the default method for the generic method you want to create. Then, you create the generic method and the method for class Foo.

sample.default = sample
sample <- function(obj, ...){
sample.Foo <- function(obj, ...){
December 6, 2012

Rank 1 update to Cholesky factorization in R

Rank 1 update can be achieved in Matlab with the built-in function cholupdate(). More details about the function can be found here:

In R, the recommended Matrix package has added a function updown() to handle rank 1 update. However, the documentation of that function is not easy to understand, and I find the example given is misleading. That prompted me to replicate the Matlab example with the updown() function. The resulting code snippet is shown below.

A <- symmetric.pascal.matrix(4)
R <- chol(A)

x <- c(0, 0, 0, 1)

A + x%*%t(x)

R <- Cholesky(Matrix(A, sparse=T))
x <- Matrix(x)
R1.sparse <- updown('+', x, R)
R1 <- as(R1.sparse, 'Matrix')

x <- Matrix(c(0, 0, 0, 1/sqrt(2)))
R1.sparse <- updown('-', x, R)
R1 <- as(R1.sparse, 'Matrix')

Note that matrices have to be in the sparse representation to be used in the updown() function.

December 4, 2012

Estimating the intensity of ward admission and its effect on emergency department access block


Emergency department access block is an urgent problem faced by many public hospitals today. When access block occurs, patients in need of acute care cannot access inpatient wards within an optimal time frame. A widely held belief is that access block is the end product of a long causal chain, which involves poor discharge planning, insufficient bed capacity, and inadequate admission intensity to the wards. This paper studies the last link of the causal chain—the effect of admission intensity on access block, using data from a metropolitan hospital in Australia. We applied several modern statistical methods to analyze the data. First, we modeled the admission events as a nonhomogeneous Poisson process and estimated time-varying admission intensity with penalized regression splines. Next, we established a functional linear model to investigate the effect of the time-varying admission intensity on emergency department access block. Finally, we used functional principal component analysis to explore the variation in the daily time-varying admission intensities. The analyses suggest that improving admission practice during off-peak hours may have most impact on reducing the number of ED access blocks.


Tags: ,
October 18, 2012

Lightweight http daemons for Windows and Mac

As a Dropbox user, I often need to start a web server from a non-standard root folder location. Under Windows, I have been using Mongoose and loves it. Mongoose is a single .exe file that can be put into the root folder of a web project.

For Mac and Linux with Python, life is even better. A light weight server is already installed. I just need to run the command
sudo python -mSimpleHTTPServer 80
and the web contents can be accessed through the URL

October 3, 2012

Similar functions in R and Matlab

Although R and Matlab have many similarities, switching between the two languages is not always easy. Below is a lookup table that hopefully will make the process easier.

Matlab R Note
size dim
find which
( [
s(offset:(offset+len-1)) substr(s, offset, len)
@x lambda(x) function(x) { lambda(x) } Anonymous function in Matlab cannot contain more than one statements.
num2str as.character
ischar is.character
class class
unique unique
& (|) & (|) element-wise logic operators
~ !
sum sum vector function
length length vector function
dec2hex as.hexmode
strcat paste
September 17, 2012

Circumvent Skype’s unable to add contact problem

In my workplace, the internet connection is through a proxy server. With Skype, I was able to talk to people in my contact list, but I was not able to add new contact. The problem and some solutions have been posted in the Skype community forum. However none of the solutions works on my Skype installation. Finally I found a way to add contact:

1. In Skype, go to Contacts -> “Import Contacts”. It will ask you for username and password for the proxy server.
2. Enter your username and password.
3. Now go to Contacts -> “Add Contact”. You should be able to add contact now.

September 6, 2012

How to tell if the R session you are running is 64 bit or 32 bit?

Under Windows, here is a simple hack to find out whether you are running a 64 bit or 32 bit R.

  1. Type the command version in the prompt.
  2. If you see arch           x86_64, then it is 64 bit. Otherwise, it is 32 bit.
June 18, 2012

How to create CMYK images for journal publications?

Many journals require color figures to be in CMYK format. If you are not doing this over and over again, here is a quick and dirty way to do it, assuming that you are a Windows user.

1. Create your RGB figure, say, foo.tif.
2. Download and install ImageMagick from
3. Download Adobe CMYK profiles from this link
4. Copy the profile files (with extension .icc) to the folder containing your figures.
5. Run the dos command from that the figure folder
convert foo.tif +profile icc -profile sRGB.icc -profile WebCoatedFORGRA28.icc foo-cmyk.tif
Of course, the colors in the CMYK file will look slightly different from the original RGB file. Try use a different icc file in the command to get the best result.

February 29, 2012

When did C-section get popular?

C-section birth has a long history. But when did it get really popular? I generated this simple plot comparing the number births on a weekday and the number on a Saturday or Sunday. We can infer that at least in the place where I live, C-section is not popular until 1960.