Causal Impact Analysis in Python
Learn to measure intervention effects using Google's Causal Impact methodology with Python.
This tutorial teaches you how to measure the causal effect of an intervention using Bayesian structural time series models, based on Google's CausalImpact methodology.
Problem Statement
Suppose you launched a new marketing campaign on January 1st. You see an increase in sales afterward, but how can you tell for sure the increase is due to the intervention and not to normal factors. For example:
- Sales fluctuate naturally without direct interventions
- Seasonality affects sales
- Other factors may be changing
So how do you isolate the true effect of your campaign? One solution is to apply Causal Impact.
The Causal Impact Approach
Causal Impact aims to build a model that predicts what sales would have been without the campaign. This can then be compared to actual sales - the difference is assumed to be the impact from the campaign. So how do we build that model? Causal Impact relies on a Bayesian Structural Time Series (BSTS) model.
Let's break down the model. At its core, BSTS has a structural time series. Structural time series models have been around for a while. Effectively, the structural time series aims to decompose the data into interpretable components, albeit unobserved components. The approach is structural in the sense that it models the data by considering the underlying structural elements (interpretable components) that generate the data:
- Trend - "long" term influence. For example, think population change.
- Seasonality - time dependent patterns such as those induced by...well the seasons.
- Regression - impact of external variables.
Again, these aren't observed but must be inferred from the data. They each evolve according to their own equations. Causal Impact uses BSTS Automatic Control Selection:
- Uses spike-and-slab priors for Bayesian variable selection among control time series
- Automatically determines which control series are most predictive, avoiding overfitting
- You can throw in many potential controls and let the model choose
Spike and slab priors are a Bayesian approach to feature selection. With this approach, you simultaneously estimate whether a variable should be included (spike at zero) and the coefficient if it is included (wide distribution - slab). The spike corresponds to a prior with a very tight distribution at zero. A wide distribution allows the model to explore non-zero values. A high probability non-zero value implies the featrue is important and yields its coefficient.
Installation
pip install causalimpact pandas matplotlib numpy
Data Preparation
You need:
- Target metric: The outcome you want to measure (e.g., sales)
- Control metrics: Related metrics unaffected by your intervention
- Pre-period: Data before the intervention
- Post-period: Data after the intervention
import pandas as pd
import numpy as np
from causalimpact import CausalImpact
# Example: Generate synthetic data
np.random.seed(42)
dates = pd.date_range('2024-01-01', '2024-12-31', freq='D')
# Create correlated time series
control1 = np.random.randn(len(dates)).cumsum() + 100
control2 = np.random.randn(len(dates)).cumsum() + 50
# Target: correlated with controls, plus intervention effect after day 300
target = 0.5 * control1 + 0.3 * control2 + np.random.randn(len(dates)) * 5
# Add intervention effect (20% increase) after day 300
intervention_day = 300
target[intervention_day:] *= 1.20
# Create DataFrame
data = pd.DataFrame({
'target': target,
'control1': control1,
'control2': control2
}, index=dates)
print(data.head())
Running the Analysis
# Define pre and post periods
pre_period = ['2024-01-01', '2024-10-26'] # Days 0-299
post_period = ['2024-10-27', '2024-12-31'] # Days 300-365
# Run Causal Impact
ci = CausalImpact(data, pre_period, post_period)
# Print summary
print(ci.summary())
print(ci.summary(output='report'))
Interpreting Results
The analysis provides several key outputs:
1. Point Estimates
- Actual: What actually happened
- Predicted: What would have happened (counterfactual)
- Cumulative effect: Total impact over the post-period
2. Credible Intervals
- 95% posterior intervals showing uncertainty
- Unlike confidence intervals, these have direct probabilistic interpretation
3. Probability of Causal Effect
- P(effect > 0): Probability the intervention had a positive effect
Visualization
# Plot results
ci.plot()
The plot shows three panels:
- Original: Actual vs. predicted (counterfactual)
- Pointwise: Difference between actual and predicted
- Cumulative: Cumulative effect over time
Example Interpretation
Posterior Inference {Causal Impact}
Average Cumulative
Actual 156.3 10,314.0
Prediction (CI 95%) 131.2 [128.5, 133.8] 8,659.0 [8,482.0, 8,834.0]
Absolute effect (CI 95%) 25.1 [22.5, 27.8] 1,655.0 [1,480.0, 1,832.0]
Relative effect (CI 95%) 19.1% [17.1%, 21.2%] 19.1% [17.1%, 21.2%]
Posterior tail-area probability p: 0.001
Posterior prob. of a causal effect: 99.9%
This tells us:
- The intervention increased the metric by ~19% on average
- We're 99.9% confident there was a positive effect
- The total impact was approximately 1,655 units
Model Assumptions
For valid inference, ensure:
- Control metrics unaffected: Intervention didn't impact control series
- Stable relationship: Pre-period relationship holds in post-period
- No confounders: No other major changes during post-period
- Sufficient pre-period data: Need enough data to model seasonality/trends
Advanced Usage
Custom Priors
# Specify more informative priors if you have domain knowledge
ci = CausalImpact(
data,
pre_period,
post_period,
prior_level_sd=0.01 # Tighter prior on level changes
)
Model Diagnostics
# Check model fit
import matplotlib.pyplot as plt
# Plot one-step-ahead predictions in pre-period
# Good fit = predictions track actual values closely
Common Pitfalls
- Too short pre-period: Need enough data for reliable model
- Multiple interventions: Confounds the analysis
- Selecting controls post-hoc: Should be decided before analysis
- Ignoring model assumptions: Validate that assumptions hold
Real-World Applications
This technique works for:
- Marketing campaign measurement
- Product launch impact
- Policy change evaluation
- Website redesign effects
- Pricing change analysis
Next Steps
To go deeper:
- Study Bayesian structural time series (BSTS) models
- Learn about synthetic control methods
- Explore sensitivity analysis techniques
- Investigate multi-market causal impact
Conclusion
Causal Impact provides a rigorous way to measure intervention effects when randomized experiments aren't feasible. By combining Bayesian time series models with synthetic controls, you can make credible causal claims from observational data.
Try it on your next campaign or product launch!