Bayesian A/B Testing with Python

In this tutorial, you'll learn how to implement Bayesian A/B testing, which provides more intuitive and actionable results than traditional frequentist approaches.

Why Bayesian A/B Testing?

Traditional A/B testing has limitations:

P-values are often misinterpreted
Fixed sample sizes required
Binary "significant/not significant" outcomes
Can't directly answer "which variant is better?"

Bayesian A/B testing addresses these issues by providing:

Direct probability statements ("Variant B has a 95% probability of being better")
Flexibility to stop tests early or continue collecting data
Full probability distributions over possible effects
Natural incorporation of prior information

Setup

First, install the required packages:

pip install numpy scipy matplotlib pymc

The Beta-Binomial Model

For conversion rate testing, we use the Beta-Binomial conjugate model:

import numpy as np
from scipy import stats
import matplotlib.pyplot as plt

class BayesianABTest:
    def __init__(self, alpha_prior=1, beta_prior=1):
        """
        Initialize with Beta prior parameters.
        Beta(1,1) is a uniform prior (no prior information).
        """
        self.alpha_prior = alpha_prior
        self.beta_prior = beta_prior

    def update(self, conversions, trials):
        """
        Update posterior distribution given data.

        Parameters:
        -----------
        conversions : int
            Number of successful conversions
        trials : int
            Total number of trials
        """
        alpha_post = self.alpha_prior + conversions
        beta_post = self.beta_prior + (trials - conversions)
        return stats.beta(alpha_post, beta_post)

Running the Test

Let's analyze a real A/B test scenario:

# Test data
variant_a_trials = 1000
variant_a_conversions = 120

variant_b_trials = 1000
variant_b_conversions = 145

# Create test instance
test = BayesianABTest(alpha_prior=1, beta_prior=1)

# Calculate posterior distributions
posterior_a = test.update(variant_a_conversions, variant_a_trials)
posterior_b = test.update(variant_b_conversions, variant_b_trials)

# Visualize
x = np.linspace(0, 0.25, 1000)
plt.figure(figsize=(10, 6))
plt.plot(x, posterior_a.pdf(x), label='Variant A', linewidth=2)
plt.plot(x, posterior_b.pdf(x), label='Variant B', linewidth=2)
plt.xlabel('Conversion Rate')
plt.ylabel('Probability Density')
plt.title('Posterior Distributions')
plt.legend()
plt.show()

Key Metrics

Calculate actionable metrics:

def calculate_probability_b_better(posterior_a, posterior_b, n_samples=100000):
    """
    Calculate probability that B is better than A using Monte Carlo sampling.
    """
    samples_a = posterior_a.rvs(n_samples)
    samples_b = posterior_b.rvs(n_samples)
    return (samples_b > samples_a).mean()

def calculate_expected_loss(posterior_a, posterior_b, n_samples=100000):
    """
    Calculate expected loss of choosing each variant.
    """
    samples_a = posterior_a.rvs(n_samples)
    samples_b = posterior_b.rvs(n_samples)

    loss_choosing_a = np.maximum(samples_b - samples_a, 0).mean()
    loss_choosing_b = np.maximum(samples_a - samples_b, 0).mean()

    return loss_choosing_a, loss_choosing_b

# Calculate metrics
prob_b_better = calculate_probability_b_better(posterior_a, posterior_b)
loss_a, loss_b = calculate_expected_loss(posterior_a, posterior_b)

print(f"Probability B is better: {prob_b_better:.1%}")
print(f"Expected loss choosing A: {loss_a:.4f}")
print(f"Expected loss choosing B: {loss_b:.4f}")

Decision Making

Use these metrics to make decisions:

High confidence threshold: If P(B > A) > 95%, choose B
Loss threshold: If expected loss < 0.001, the risk is acceptable
Relative improvement: Calculate the expected lift from switching

Advanced Topics

For more sophisticated analyses:

Multiple variants: Extend to multi-armed bandit problems
Continuous metrics: Use Normal-Normal conjugate models for revenue
Hierarchical models: Pool information across related tests
Sequential testing: Optimal stopping rules for Bayesian tests

Conclusion

Bayesian A/B testing provides intuitive, actionable insights that help make better decisions faster. The ability to directly answer "what's the probability B is better?" makes results much easier to communicate to stakeholders.

Try implementing this on your next A/B test and see the difference!