Bayesian A/B Testing with Python
A hands-on tutorial for implementing Bayesian A/B tests that provide more intuitive results than traditional methods.
Bayesian A/B Testing with Python
In this tutorial, you'll learn how to implement Bayesian A/B testing, which provides more intuitive and actionable results than traditional frequentist approaches.
Why Bayesian A/B Testing?
Traditional A/B testing has limitations:
- P-values are often misinterpreted
- Fixed sample sizes required
- Binary "significant/not significant" outcomes
- Can't directly answer "which variant is better?"
Bayesian A/B testing addresses these issues by providing:
- Direct probability statements ("Variant B has a 95% probability of being better")
- Flexibility to stop tests early or continue collecting data
- Full probability distributions over possible effects
- Natural incorporation of prior information
Setup
First, install the required packages:
pip install numpy scipy matplotlib pymc
The Beta-Binomial Model
For conversion rate testing, we use the Beta-Binomial conjugate model:
import numpy as np
from scipy import stats
import matplotlib.pyplot as plt
class BayesianABTest:
def __init__(self, alpha_prior=1, beta_prior=1):
"""
Initialize with Beta prior parameters.
Beta(1,1) is a uniform prior (no prior information).
"""
self.alpha_prior = alpha_prior
self.beta_prior = beta_prior
def update(self, conversions, trials):
"""
Update posterior distribution given data.
Parameters:
-----------
conversions : int
Number of successful conversions
trials : int
Total number of trials
"""
alpha_post = self.alpha_prior + conversions
beta_post = self.beta_prior + (trials - conversions)
return stats.beta(alpha_post, beta_post)
Running the Test
Let's analyze a real A/B test scenario:
# Test data
variant_a_trials = 1000
variant_a_conversions = 120
variant_b_trials = 1000
variant_b_conversions = 145
# Create test instance
test = BayesianABTest(alpha_prior=1, beta_prior=1)
# Calculate posterior distributions
posterior_a = test.update(variant_a_conversions, variant_a_trials)
posterior_b = test.update(variant_b_conversions, variant_b_trials)
# Visualize
x = np.linspace(0, 0.25, 1000)
plt.figure(figsize=(10, 6))
plt.plot(x, posterior_a.pdf(x), label='Variant A', linewidth=2)
plt.plot(x, posterior_b.pdf(x), label='Variant B', linewidth=2)
plt.xlabel('Conversion Rate')
plt.ylabel('Probability Density')
plt.title('Posterior Distributions')
plt.legend()
plt.show()
Key Metrics
Calculate actionable metrics:
def calculate_probability_b_better(posterior_a, posterior_b, n_samples=100000):
"""
Calculate probability that B is better than A using Monte Carlo sampling.
"""
samples_a = posterior_a.rvs(n_samples)
samples_b = posterior_b.rvs(n_samples)
return (samples_b > samples_a).mean()
def calculate_expected_loss(posterior_a, posterior_b, n_samples=100000):
"""
Calculate expected loss of choosing each variant.
"""
samples_a = posterior_a.rvs(n_samples)
samples_b = posterior_b.rvs(n_samples)
loss_choosing_a = np.maximum(samples_b - samples_a, 0).mean()
loss_choosing_b = np.maximum(samples_a - samples_b, 0).mean()
return loss_choosing_a, loss_choosing_b
# Calculate metrics
prob_b_better = calculate_probability_b_better(posterior_a, posterior_b)
loss_a, loss_b = calculate_expected_loss(posterior_a, posterior_b)
print(f"Probability B is better: {prob_b_better:.1%}")
print(f"Expected loss choosing A: {loss_a:.4f}")
print(f"Expected loss choosing B: {loss_b:.4f}")
Decision Making
Use these metrics to make decisions:
- High confidence threshold: If P(B > A) > 95%, choose B
- Loss threshold: If expected loss < 0.001, the risk is acceptable
- Relative improvement: Calculate the expected lift from switching
Advanced Topics
For more sophisticated analyses:
- Multiple variants: Extend to multi-armed bandit problems
- Continuous metrics: Use Normal-Normal conjugate models for revenue
- Hierarchical models: Pool information across related tests
- Sequential testing: Optimal stopping rules for Bayesian tests
Conclusion
Bayesian A/B testing provides intuitive, actionable insights that help make better decisions faster. The ability to directly answer "what's the probability B is better?" makes results much easier to communicate to stakeholders.
Try implementing this on your next A/B test and see the difference!