Archive for January, 2012

January 30, 2012

Resolving contradictory gender information

In biomedical data analysis, it is conventional to standardize the quantity of interest by age and gender. When the data is the result of merging multiple sources, some subjects of interest may have contradictory gender information. If there are too many subjects with two genders, it would not appropriate to just exclude them from the analysis. One reasonable action is to assign a subject the gender that shows up in most records, assuming that the other gender is just noise as it appears not as frequently.

A fast way to implement this voting scheme in SQL is to code the gender information in numbers (e.g., 0 for males and 1 for females) and then use the aggregate construct round(avg(gender)). This is much faster than first counting for each subject the records of each gender and then comparing the counts.