The "stop at significance" trap
Most A/B tools show real-time significance and tempt you to stop the moment you hit 95% confidence. This is the most common methodological error in CRO. CXL's 2024 analysis of 200+ "winning" tests that were stopped early found that 41% of those wins reversed when re-run to full duration. The reason: significance fluctuates wildly with low sample sizes. A test that hits 95% on day 3 will often dip back to 80% on day 5 and rebound to 92% on day 8. Stopping early biases you toward noise that happens to be high at the moment you check.
The right mental model
Think of an A/B test as a sample from a population, not a race to a threshold. The question isn't "is the variant winning yet?" — it's "have I collected enough sample to know whether the difference is real?" The sample size required depends on your baseline conversion rate and the lift you want to detect. Pre-calculate using Evan Miller's calculator or your tool's built-in version, then run for that long. Don't re-check the calculator mid-test.
For low-traffic sites
If you don't have the sample size to run rigorous tests (under 1,000 weekly conversions), abandon A/B testing for now. Use qualitative methods — heatmaps, session recordings, user interviews — until traffic catches up. A bad A/B test with 200 conversions per variant is worse than no test at all. We cover this trade-off in our A/B testing priority framework.