It is largely because of lack of knowledge of what statistics is that the person untrained in it trusts himself with a tool quite as dangerous as any he may pick out from the whole armamentarium of scientific methodology. –Edwin B. Wilson (1927), quoted in Stephen M. Stigler, The Seven Pillars of Statistical Wisdom.
Imagine you’re responsible for testing some aspects of a complex software product, and one of your colleagues comes up with the following request:
- Hey, can you write a self-contained function to test the results of software component X, and returns TRUE if the data set generated by X is normally distributed, and FALSE otherwise?
What’s a poor software developer to do?
Well, you cherish the fond memories of your first statistics class that you took more than 20 years ago, and say: “I’ll plot a histogram of the data, and see if it’s normal!”
But of course, in less than a second you realize that manual visual inspection of a plot will not make an automated test, not at all! So as a brilliant software developer with math background, you say, “easy, I’ll just grab my secret weapon, that is, Python and its SciPy library to smash through this little statistical challenge!” You’re happy that you can stand on the shoulders of the giants, and use a well-documented, simple function such as scipy.stats.normaltest.
But then, real life happens, and for whatever business and technological reasons, you realize that you can’t make third-party libraries such as SciPy a part of your test suite on a whim. Not for a single test at least. And after all, don’t you like to have self-contained code, with minimal dependencies? You say to yourself, “let’s get down to the basics, and build a short, simple, correct, and self-contained function that tests for normality.”
So, the first question is… given a bunch of data points, how do you test for normality, what’s the correct algorithm? Well, actually, that should have been the second question, first one, being the essential question of:
- Dear colleague, WHY do you want to test for normality?
Let’s assume that you’ve exhausted your colleague with Five Whys, and it’s clear why you need that test.
So, back to normality testing: how do you test for normality? Again, the first question should’ve been different:
- Can you test for normality? In other words, do we have an algorithm that we can trust for sure to return TRUE, if the data set in question has the normal distribution of values?
As a statistically-savvy software developer, you know you have to tread carefully now, because you’re in the domain of fundamental statistics, and you don’t want to write a function that can mislead people with results that can be easily misinterpreted.
So you do your research, after all it’s 21. century, and R Project, as well as, Python have solid statistical libraries. Knowing that R people are a little more sensitive about statistical correctness, both in terms of implementation and interpretation, you reach for one of the normality tests in R, and read its documentation:
- SnowsPenultimateNormalityTest: “The theory for this test is based on the probability of getting a rational number from a truly continuous distribution defined on the reals. The main goal of this test is to quickly give a p-value for those that feel it necessary to test the uninteresting and uninformative null hypothesis that the data represents an exact normal, and allows the user to then move on to much more important questions, like “is the data close enough to the normal to use normal theory inference?“. After running this test (or better instead of running this and any other test of normality) you should ask yourself what it means to test for normality and why you would want to do so. Then plot the data and explore the interesting/useful questions.”
“Thank you very much!” you say, “but I’m not in the business of plotting data, right? I need to automate this stuff, no way to involve manual visual inspection by a human!”. So you continue your research, only to come across this discussion:
- “Normality tests don’t do what most think they do. Shapiro–Wilk test, Anderson-Darling test, and others are null hypothesis tests AGAINST the assumption of normality. These should not be used to determine whether to use normal theory statistical procedures. In fact they are of virtually no value to the data analyst. Under what conditions are we interested in rejecting the null hypothesis that the data are normally distributed? I have never come across a situation where a normal test is the right thing to do. When the sample size is small, even big departures from normality are not detected, and when your sample size is large, even the smallest deviation from normality will lead to a rejected null.”
So now, you’re even more confused. And to make things more interesting, you come across this discussion: “Is normality testing ‘essentially useless‘?”:
- “It’s not an argument. It is a (a bit strongly stated) fact that formal normality tests always reject on the huge sample sizes we work with today. It’s even easy to prove that when n gets large, even the smallest deviation from perfect normality will lead to a significant result. And as every data set has some degree of randomness, no single data set will be a perfectly normally distributed sample. But in applied statistics the question is not whether the data/residuals … are perfectly normal, but normal enough for the assumptions to hold.”
- “When thinking about whether normality testing is ‘essentially useless’, one first has to think about what it is supposed to be useful for. Many people (well… at least, many scientists) misunderstand the question the normality test answer. The question normality tests answer: Is there convincing evidence of any deviation from the Gaussian ideal? With moderately large real data sets, the answer is almost always yes.The question scientists often expect the normality test to answer: Do the data deviate enough from the Gaussian ideal to “forbid” use of a test that assumes a Gaussian distribution? Scientists often want the normality test to be the referee that decides when to abandon conventional (ANOVA, etc.) tests and instead analyze transformed data or use a rank-based non-parametric test or a re-sampling or bootstrap approach. For this purpose, normality tests are not very useful.”
“Maybe plotting a histogram and looking at it would be really easier after all, even if not automated!” you sigh to yourself. But then, surprise! You can’t trust your ‘eyes’, as in, “is the following normally distributed, or what shall I trust after all?“:
And you give up after reading “If my histogram shows a bell-shaped curve, can I say my data is normally distributed?“.
Resources for those who want to learn more about Normality Testing
- Normality test according to Wikipedia
- A Gentle Introduction to Normality Tests in Python
- Testing for Normality — Applications with Python
- Statistical notes for clinical researchers: assessing normal distribution (2) using skewness and kurtosis
- Normality Tests for Statistical Analysis: A Guide for Non-Statisticians
- Testing for Normality
- 68–95–99.7 rule
- Is the Shapiro-Wilk test only applicable to smaller sample sizes?
- scipy.stats.normaltest documentation