Understanding what ‘statistical significance’ means on Geeklab can be confusing, but don’t worry!
In this post, we’ll dive a little bit deeper into what calculating statistical significance on Geeklab entails.
What does significance mean?
Statistical significance is a term used in analysis to relate the reliability and confidence placed in the results of a study or experiment. When we say that something is statistically significant, it means that the findings or differences observed are unlikely to have occurred by chance.
To determine statistical significance, researchers use statistical tests and calculations to analyse data. These tests help assess whether the observed results are meaningful and not simply due to random variation. If the statistical analysis shows that the likelihood of the observed results occurring by chance is very low, then we can say the results are statistically significant.
Relationship between winning probabilities and significance
Understanding the correlation between winning probabilities and statistical significance is crucial when A/B testing. Winning probabilities reflect the probability of a variant outperforming the other, providing insights into which variant is more likely to succeed. By considering both winning probabilities and statistical significance together, we can make more confident decisions and prioritise strategies that show a higher probability of success or confidently identify the variant with higher chances of success and make data-driven decisions.
In the context of A/B testing, there can be various types of significances and daily significance values. Different significances serve different purposes, such as: overall campaign performance, subgroup analysis, or specific metrics like click-through rates or conversion rates. The significance thresholds may vary based on the specific requirements of the analysis and the level of confidence desired. Daily significance values provide a temporal perspective, enabling the monitoring of performance changes over time.
Calculating statistical significance
Let’s use a concrete example of this.
You create two different icon variants to test against each other. Variant A and variant B. You want to determine if there is a statistically significant difference in the conversion rate (CVR) of the two variants.
Variant A has a CVR of 2%, while variant B has a CVR of 4%. It seems like version B might be performing better, but we need to determine if this difference is statistically significant.
To assess statistical significance, we conduct a statistical test on the conversion rate data. The test calculates the probability that the observed difference in CVR between variant A and variant B could have occurred by chance. It provides a p-value as a result.
Let’s say the statistical test produces a p-value of 0.03. Since 0.03 is below the threshold of 0.05, we can conclude that there is a statistically significant difference in the click-through rates of variant A and variant B assuming that the level of confidence is 95%.
So, in this campaign example, statistical significance would suggest that variant B’s higher conversion rate is not simply a coincidence but is likely a result of a real effect. It allows you to make data-driven decisions by identifying which icon variant is performing better.
What if my campaign doesn’t reach significance?
Sometimes it happens that the campaign doesn’t reach significance but it doesn’t mean the test was pointless. In such cases, the following approaches will show you some insights:
Consider confidence intervals
Look at the confidence intervals for the performance metrics of variants A and B. A wider confidence interval indicates more uncertainty, while a narrower interval suggests higher precision. If the confidence intervals overlap substantially, it suggests that the observed differences may not be statistically significant, and there is no clear winner.
Explore secondary metrics
In addition to the primary metric you are evaluating, consider examining secondary metrics that might provide additional insights. Look for any consistent patterns or trends across different metrics. While there may not be a clear winner in terms of the primary metric, secondary metrics could reveal nuances or variations that can help guide decision-making.
User segmentation analysis
Perform a user segmentation analysis to identify if specific segments of users or subgroups respond differently to variants A and B. By dissecting the data based on user attributes, behaviors, or preferences, you may uncover patterns that are not evident when looking at the aggregate data. This analysis can provide insights into which variant performs better for specific user segments, even if there isn’t a clear winner overall.
Consider practical significance
Assess the practical significance of the observed differences. Even if the statistical significance is not apparent, consider the magnitude of the differences in terms of real-world impact. Evaluate whether the observed differences, although not statistically significant, are practically meaningful for your specific goals and context.
Iterate and optimize
A lack of a clear winner in an A/B test does not necessarily mean the test
is inconclusive. It may indicate an opportunity for further optimization and iteration. Consider
refining the variants or introducing new elements based on the insights gained from the test, and
then retest to gather additional data and make more informed decisions.
Want to get started with testing? Reach us here 🤩