Thursday, March 19, 2015

Statistics To Trust or Not to Trust

Statistics are widely used throughout today especially in the media. People often use them to support their opinions or explain a fact. However statistics are slippery creatures that are often extremely biased.

So the question is how do we know when to trust a statistic?

Darrel Huff the author of How to Lie With Statistics, whom is also a mathematician writes,
"To be worth much, a report based on sampling must use a representative sample, which is one from which every source of bias has been removed." (Huff 20)

Examples of bias:
-using too small of groups
-favoring one group over another

The definition of bias is prejudice in favor of or against a person, or group compared with another.
Bias is not fair and greatly skews a statistics accuracy.

So consider this statistic

"The average Yaleman, Class of 24. makes $25,111 a year." This is from Time Magazine. (Huff, 13).

Is this statistic biased?


If you answered no, you are wrong. This statistic is extremely biased.

Darrel Huff lists many reasons for the Yale sample being biased.
Here are a few of them

-The statistic is to precise.
          "There is a small likely hood that the average income of any far-flung group is ever going to be known down to the dollar." (Huff 13).

-People lie.
          "Furthermore, this lovely average is undoubtedly calculated from the amounts the Yale men said they earned... Some people when asked their incomes exaggerate out of vanity or optimism. Others minimize." (Huff 14).

- The responses
           "There are bound to many whose addresses are unknown twenty-five years later." (Huff 15).
           "And, those whose addresses are known, many will not respond to the questionnaire."
           "Those who are most likely to reply are those with incomes to brag about, such as the CEOs the executives of big companies, people who have made it in the world." (Huff 17).

Here I have made a chart of the Yale statistic clearly showing how the statistic is biased.








Other pointers on how to tell if a statistic can be trusted...

Dont trust them. Unless you know for sure that the source can be trusted and more often then not the statistic and the source that used it can not be trusted.