# Simpson’s Paradox and Gender Diversity

“Statistics are used much like a drunk uses a lamppost: for support, not illumination.”

-Vin Scully, American sports commentator

HR department is asked to submit data on various HR parameters, including gender diversity like, male-female employees ratio, number of women recruited against number of women who applied for job, number of women in leadership position. One thing they should look for and be aware of while submitting data is- Simpson’s Paradox.

When you are showing relation between two variables (ex. gender and recruitment), you cannot always trust the relationship  between two variables, there is one (or more than one) variable which is working in background, it is called “lurking variable”, this is not included in analysis, but it can substantially alter your interpretation of data.

In probability and statistics, Simpson’s paradox, is a paradox in which a trend that appears in different groups of data disappears when these groups are combined, and the reverse trend appears for the aggregate data.

We will take well known example of Simpson’s paradox-Berkeley gender bias case.

In 1973, University of California, Berkeley was sued for bias against women who had applied for admission to graduate schools there. The admission figures showed that men applying were more likely than women to be admitted, looking at data one could conclude that it was not due to chance.

 Applicants Admitted Men 8442 44% Women 4321 35%

But when department wise analysis was done, it appeared that no department was significantly biased against women. In fact, most departments had a small but statistically significant bias in favour of women.

 Department Men Women Applicants Admitted Applicants Admitted A 825 62% 108 82% B 560 63% 25 68% C 325 37% 593 34% D 417 33% 375 35% E 191 28% 393 24% F 373 6% 341 7%

This happened because “lurking variable” was not considered while filing discrimination suit. So what was the “lurking variable” here, which was causing Simpson’s paradox?

Lurking variable here was applying pattern of male and female candidates. Some departments were more competitive like E & F (i.e. they admitted less candidates) and some were less competitive like A & B (i.e. they admitted more candidates). A& B were favoured by male applicants and E & F by female applicants. So in amalgamated data, it appeared that male candidate was more likely to get selected.