SortLab Docs
A/B Testing

Interpreting Results

How to read and act on your A/B test results in SortLab, including understanding metrics, statistical significance, and next steps.

Interpreting Results

Your A/B test has finished running. Now it's time to look at the data, understand what it means, and decide on your next move. This guide walks you through everything you need to know to make a confident, data-driven decision.

Reading the results dashboard

When a test completes, SortLab presents a results summary that compares the two variants side by side. You'll see:

  • Variant labels — Control (A) with the "Current" badge and Challenger (B) with the "New" badge, so you always know which strategy is which.
  • Performance metrics — Revenue, Orders, and Conversion Rate for each variant.
  • Percentage difference — How much better (or worse) the Challenger performed compared to the Control for each metric.
  • Winner indication — SortLab highlights the variant that came out ahead based on your primary metric.

The dashboard is designed to give you the full picture at a glance, but take a moment to look at all the metrics before making a decision.

Understanding the metrics

SortLab tracks three key metrics during every A/B test. Here's what each one tells you and why it matters.

Revenue

Revenue is the total dollar amount generated from the tested collection during each variant's active periods. This is typically the most important metric, since the primary goal of sorting optimization is to maximize the money your store earns.

How to read it: If Challenger (B) generated $12,400 in revenue while Control (A) generated $10,800, the Challenger produced roughly 15% more revenue. That's a strong signal that the new strategy is worth adopting.

Orders

Orders represent the total number of transactions that included at least one product from the tested collection. A strategy that generates more orders is doing a better job of getting customers to buy.

How to read it: More orders with similar revenue means customers are buying more frequently but spending less per order. Fewer orders with higher revenue means customers are purchasing higher-value items. Consider what matters more for your business.

Conversion Rate

Conversion rate is the percentage of visitors who viewed the collection and went on to make a purchase. It measures how effectively a sorting strategy turns browsers into buyers.

How to read it: A higher conversion rate means the strategy is doing a better job of showing the right products to the right people. Even a small improvement in conversion rate, say from 3.2% to 3.8%, can have a significant impact on revenue over time.

No single metric tells the whole story. A strategy might have a lower conversion rate but higher revenue because it promotes premium products. Look at all three metrics together to get the full picture.

What is statistical significance?

You might notice that one variant outperforms the other in your results, but how do you know the difference is real and not just due to random chance? That's where statistical significance comes in.

Statistical significance means the observed difference in performance between the two variants is large enough and consistent enough that it's very unlikely to be a coincidence. In practical terms:

  • Significant result — You can be confident that the winning strategy genuinely performs better. It's safe to adopt it.
  • Not significant — The difference between the two variants is too small or inconsistent to draw a reliable conclusion. The observed gap could just be noise.

Think of it like flipping a coin. If you flip 10 times and get 6 heads, that doesn't prove the coin is unfair — it could be random variation. But if you flip 1,000 times and get 600 heads, something is clearly going on. More data and larger differences lead to stronger conclusions.

What affects statistical significance

Several factors influence whether your test reaches statistical significance:

  • Traffic volume — Higher-traffic collections produce significant results faster because there's more data to work with.
  • Size of the difference — A large performance gap between variants is easier to confirm as significant than a small one.
  • Test duration — Longer tests collect more data, increasing the chances of reaching significance.
  • Consistency — If one variant performs better most of the time (not just during a few spikes), significance is more likely.

When to end a test early

In most cases, you should let your test run for its full configured duration. However, there are a few situations where ending early makes sense:

  • Clear, overwhelming winner — If one variant is outperforming the other by a wide margin (for example, 30%+ more revenue) and the result is statistically significant, you have your answer.
  • Negative impact on revenue — If the Challenger is performing dramatically worse and you're concerned about lost sales, it's reasonable to stop the test and stick with the Control.
  • External disruption — If something unusual happened during the test (a flash sale, a site outage, a viral social media post), the data may be unreliable. Consider stopping the test and running a new one under normal conditions.

Avoid ending a test early just because one variant looks like it's ahead after a day or two. Early results are often misleading because they're based on limited data. Short-term fluctuations can create the appearance of a winner that doesn't hold up over time.

Applying the winning strategy

Once you've identified a clear winner, here's how to put it into action:

  1. Review the full results — Confirm that the winning variant outperforms on the metrics that matter most to your business. Revenue is usually the north star, but consider orders and conversion rate too.
  2. Apply the strategy — If the Challenger won, go to the collection and switch its sorting strategy to match the Challenger's configuration. If the Control won, no changes are needed — your current strategy is already the best option.
  3. Monitor performance — After applying the winning strategy, keep an eye on your collection's metrics in the Analytics dashboard to confirm the improvement holds.

Winning strategies can change over time as your product catalog, customer base, and market conditions evolve. Consider re-testing every few months to make sure your sorting stays optimized.

Common pitfalls and how to avoid them

Ending tests too early

As mentioned above, this is the most common mistake. A few days of data is rarely enough to reach a reliable conclusion. Stick to the configured duration unless there's a strong reason to stop.

Ignoring statistical significance

If a test shows that Challenger (B) earned 2% more revenue but the result is not statistically significant, don't treat it as a win. The difference could easily be due to random variation. In this case, either run the test longer or accept that the two strategies perform similarly.

Testing too many things at once

If you change the strategy, enable Quick Toggles, and modify the collection's products all at the same time, you won't know which change caused the difference. Keep your tests focused: change one thing, measure the impact, then iterate.

Not accounting for seasonality

A test that runs during Black Friday will look very different from one that runs in January. If you're testing during an unusual period (holiday sales, promotional events), the results may not reflect typical performance. Run follow-up tests during normal periods to confirm.

Running tests on low-traffic collections

Collections with very few visitors per day take much longer to produce statistically significant results. If a collection gets fewer than a handful of visitors daily, consider testing a higher-traffic collection first and applying the learnings more broadly.

Best practices for ongoing optimization

A/B testing isn't a one-time activity — it's an ongoing process of continuous improvement. Here's a framework for getting the most value over time:

  1. Start with your top collections. Focus on the collections that drive the most traffic and revenue. Improvements here have the biggest impact on your bottom line.
  2. Test bold changes first. Your first tests should compare meaningfully different strategies (for example, "Maximize Revenue" vs. "Promote New Arrivals"). Once you've found the best general approach, you can fine-tune with smaller variations.
  3. Keep a testing cadence. Set a recurring reminder to revisit your collections every quarter. Run new tests to make sure your strategies still perform well as your catalog and customer base change.
  4. Document what you learn. Keep notes on which strategies work best for which types of collections. Over time, you'll build intuition that makes each new test faster and more targeted.
  5. Combine with analytics. Use the Analytics dashboard alongside your test results to understand the bigger picture of how sorting impacts your store's performance.

Next steps

Now that you know how to interpret results, you're equipped to run a continuous optimization loop for your store.

On this page