I
believe whenever you watch a cooking contest, say U.S. Master Chef, you’ll see
that Chef Gordon Ramsay, who is one of the three judges on the show, would
taste the dish prepared by the contestants normally with only one small spoon.
Then he will give his take on the dish whether he thinks the food is good or
the food is trash. Chef Ramsay will never make a secret of his disdain for
certain food because he will tell you in your face that your food sucks if he
thinks it is. Have you ever wondered how he can be sure that he exactly knows
how the whole dish taste like with just one single spoon? Well, the answer is
simply most of the time one single spoon of the dish can tell you everything
that you need to know about the whole dish. In the world of statistics, one
spoon of the dish is a representative sample of the whole dish. The whole dish
would be referred to as the population. You see, Chef Ramsay does not need to
finish the whole dish to know whether the dish is delicious or not. Sure, if he
wants to be extremely accurate, he can taste the whole dish, but his opinion on
the dish would not be different from the taste of a single spoon. You may
wonder “how does food taste test by Chef Ramsay have anything to do with statistics?”
Well, it is a great and simple analogy with inferential statistics and
especially today’s concept of Central Limit Theorem.
What
fascinates me is that we can make a strong statement and inference about the
whole population with just a small sample drawn from the population that we attempt
to study. Such inference can be done thanks to an elegant concept called
Central Limit Theorem (CLT). Economist Charles Wheelan called it the LeBron
James of statistics. My inspiration for writing this article is because of
Charles as well. For those who does not follow basketball but follow football,
the CLT is like the Cristiano Ronaldo of statistics - powerful and elegant.
Before we unravel the gist of the Central Limited Theorem, probably it’s better to start with a simple example inspired by Charles. Let say that the famous school of engineering has a field trip to the beach. The engineering students were randomly assigned to 20 buses and the trip took 5 hours. After 5 hours, 19 buses arrived at the destination except for 1 bus that went missing. You and the rescuers searched the forest and found a bus with several foreign young people who don’t speak your language. Statistics to the rescue!!! You found that the average math score of these people are 65 (assume that everyone is carrying a math report card or you ask everyone to solve a difficult integral question. I know I know, it’s ridiculous but that’s for simplicity). You, as the smartest statistician of the rescuers, sighed and you told everyone that this is not the bus of engineering students. There is no way in hell engineering students who learn all of those complex derivatives and integral would score that low on math (on average). Later, with latest Google Translate technology, we learn that this is the bus of students who major in Khimal (a make-up language and you’ll find no result from Google). This shows why their average math score is not so high because they specialize in language not complex calculation.
Before we unravel the gist of the Central Limited Theorem, probably it’s better to start with a simple example inspired by Charles. Let say that the famous school of engineering has a field trip to the beach. The engineering students were randomly assigned to 20 buses and the trip took 5 hours. After 5 hours, 19 buses arrived at the destination except for 1 bus that went missing. You and the rescuers searched the forest and found a bus with several foreign young people who don’t speak your language. Statistics to the rescue!!! You found that the average math score of these people are 65 (assume that everyone is carrying a math report card or you ask everyone to solve a difficult integral question. I know I know, it’s ridiculous but that’s for simplicity). You, as the smartest statistician of the rescuers, sighed and you told everyone that this is not the bus of engineering students. There is no way in hell engineering students who learn all of those complex derivatives and integral would score that low on math (on average). Later, with latest Google Translate technology, we learn that this is the bus of students who major in Khimal (a make-up language and you’ll find no result from Google). This shows why their average math score is not so high because they specialize in language not complex calculation.
Well guys, that’s it. That’s the Ronaldo of Statistics. That’s Central Limit Theorem. Simply, CLT states that the sample drawn from the population will represent similar characteristics to the population as a whole. A bus of engineering student will be similar to the whole engineering student. A spoon of the dish is very similar to the whole dish. However, each sample drawn from the population will slightly differ from one another but there is a very low probability or low likelihood (unlikelihood???) that the sample is extremely different from the population. It’s just like the average math score of engineering students on each of the 20 buses will slightly differ from the true average math score, but the probability that engineering students on one of the bus have an average math score totally different from the true average math score of all engineering students is very, very low. Yes, there may be some engineering students who would score 65 on math, but it’s highly unlikely that most of the engineering students on the bus that we found would also score 65, as we know that engineering students are very competent in math or they wouldn’t be admitted to engineering school in the first place. Therefore, we can reject that the student bus with an average math score of 65 is not the engineering student group.
Yes, we made it. This is the intuition behind Central Limit Theorem and what’s left is just some calculation and formula related with sample mean and sample standard deviation and the normal distribution, but we won’t touch for today. I think the intuition will help you understand those formula very easily. I hope we can go over the formula in the next post. Until then, please appreciate the beauty of the Ronaldo of Statistics.
No comments:
Post a Comment