Resolving contradictory gender information

In biomedical data analysis, it is conventional to standardize the quantity of interest by age and gender. When the data is the result of merging multiple sources, some subjects of interest may have contradictory gender information. If there are too many subjects with two genders, it would not appropriate to just exclude them from the analysis. One reasonable action is to assign a subject the gender that shows up in most records, assuming that the other gender is just noise as it appears not as frequently.

A fast way to implement this voting scheme in SQL is to code the gender information in numbers (e.g., 0 for males and 1 for females) and then use the aggregate construct round(avg(gender)). This is much faster than first counting for each subject the records of each gender and then comparing the counts.

Advertisements

One Comment to “Resolving contradictory gender information”

  1. This is my first time go to see at here and i am in fact impressed
    to read everthing at one place.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: