Statistical significance is commonly discussed in market research, but many people have an incomplete and often inaccurate understanding of how stat testing works and how the results of stat testing should be interpreted. Below we dive more into what Statistical Significance is and some of its common misunderstandings.
We will try to be more entertaining than your monotone STATS 101 professor.
1. The null hypothesis is the key to understanding stat testing
Stat testing is based on the theory of the null hypothesis. Many people find this concept odd and confusing, but it is simple and is a key tenet of scientific investigation. The idea is that there are always two hypotheses, the null and the alternative. The null hypothesis says that two samples are from the same group, while the alternative says there is a difference between the two groups. Stat tests typically focus on the probability that the null hypothesis is true. So the tests assume the two samples are the same (the null hypothesis), and only if the tests show that the probability comes back very low (usually .10 or .05 but varies by field and usage occasion) that the null is true can we safely reject the null and accept the alternative hypothesis.
So lets look at this in terms of a real example. If you were running a Coke vs Pepsi challenge taste test (blind test of course, since you are no amateur!) and you wanted to see if one brand tastes better than the other. For the purposes of stat testing, the null hypothesis would be that both brands taste the same, and the alternative hypothesis would be that one brand tastes better than the other. The stat testing formula would return a p-value (probability) that the null is true. So if the p-value comes back as .04, it means there is a 4-in-100 chance that they taste the same, so you say that is a significant difference in taste perceptions.
2. Remember that it is always a probability
Many people have trouble with this concept, but it is important to remember that stat testing is always a probability it is never binary. Many people seem to struggle with this idea and think of stat testing as black and white. In the taste test example above, the .04 means there is still a 4% chance that there is no taste difference. You are always weighing the sensitivity of your test with the odds that it will create more errors by being either too stringent or too lenient. And when someone says "this is significant", you should always understand the level the tests were run at since it is not uncommon to see a wide range of levels used for testing.
3. Confidence intervals and stat testing are not the same thing
Many people mistakenly believe that you can compare the confidence intervals of two samples and know whether there is a significant difference based on whether the intervals overlap. This is not the case, as it is quite possible to have two samples with significantly different means but overlapping confidence intervals.
4. Sample size is critical
Sample size has a massive influence on significance testing. If you have a very large sample, even small differences can be significant. And if you have a small sample, even very real differences can be difficult to detect. Always know your sample size and how it impacts your significance testing. As a rule of thumb, most tests are pretty sensitive with 300 people or more, and at 1,000 respondents or more the test is going to be very sensitive.
5. Significant does not equal important
This is perhaps the most important misunderstanding related to significance testing. Sig testing is about identifying reliable differences, not about identifying important differences. A difference between two groups can be very small, to the point of being immaterial, but consistent enough to be significant. This is especially common with large sample sizes. Always be mindful of the size of the effect when evaluating the importance of it, not just the significance level.