Causal Impact Analysis in Python

This tutorial teaches you how to measure the causal effect of an intervention using Bayesian structural time series models, based on Google's CausalImpact methodology.

Problem Statement

Suppose you launched a new marketing campaign on January 1st. You see an increase in sales afterward, but how can you tell for sure the increase is due to the intervention and not to normal factors. For example:

Sales fluctuate naturally without direct interventions
Seasonality affects sales
Other factors may be changing

So how do you isolate the true effect of your campaign? One solution is to apply Causal Impact.

The Causal Impact Approach

Causal Impact aims to build a model that predicts what sales would have been without the campaign. This can then be compared to actual sales - the difference is assumed to be the impact from the campaign. So how do we build that model? Causal Impact relies on a Bayesian Structural Time Series (BSTS) model.

Let's break down the model. At its core, BSTS has a structural time series. Structural time series models have been around for a while. Effectively, the structural time series aims to decompose the data into interpretable components, albeit unobserved components. The approach is structural in the sense that it models the data by considering the underlying structural elements (interpretable components) that generate the data:

Trend - "long" term influence. For example, think population change.
Seasonality - time dependent patterns such as those induced by...well the seasons.
Regression - impact of external variables.

Again, these aren't observed but must be inferred from the data. They each evolve according to their own equations. Causal Impact uses BSTS Automatic Control Selection:

Uses spike-and-slab priors for Bayesian variable selection among control time series
Automatically determines which control series are most predictive, avoiding overfitting
You can throw in many potential controls and let the model choose

Spike and slab priors are a Bayesian approach to feature selection. With this approach, you simultaneously estimate whether a variable should be included (spike at zero) and the coefficient if it is included (wide distribution - slab). The spike corresponds to a prior with a very tight distribution at zero. A wide distribution allows the model to explore non-zero values. A high probability non-zero value implies the featrue is important and yields its coefficient.

Installation

pip install causalimpact pandas matplotlib numpy

Data Preparation

You need:

Target metric: The outcome you want to measure (e.g., sales)
Control metrics: Related metrics unaffected by your intervention
Pre-period: Data before the intervention
Post-period: Data after the intervention

import pandas as pd
import numpy as np
from causalimpact import CausalImpact

# Example: Generate synthetic data
np.random.seed(42)
dates = pd.date_range('2024-01-01', '2024-12-31', freq='D')

# Create correlated time series
control1 = np.random.randn(len(dates)).cumsum() + 100
control2 = np.random.randn(len(dates)).cumsum() + 50

# Target: correlated with controls, plus intervention effect after day 300
target = 0.5 * control1 + 0.3 * control2 + np.random.randn(len(dates)) * 5

# Add intervention effect (20% increase) after day 300
intervention_day = 300
target[intervention_day:] *= 1.20

# Create DataFrame
data = pd.DataFrame({
    'target': target,
    'control1': control1,
    'control2': control2
}, index=dates)

print(data.head())

Running the Analysis

# Define pre and post periods
pre_period = ['2024-01-01', '2024-10-26']  # Days 0-299
post_period = ['2024-10-27', '2024-12-31']  # Days 300-365

# Run Causal Impact
ci = CausalImpact(data, pre_period, post_period)

# Print summary
print(ci.summary())
print(ci.summary(output='report'))

Interpreting Results

The analysis provides several key outputs:

1. Point Estimates

Actual: What actually happened
Predicted: What would have happened (counterfactual)
Cumulative effect: Total impact over the post-period

2. Credible Intervals

95% posterior intervals showing uncertainty
Unlike confidence intervals, these have direct probabilistic interpretation

3. Probability of Causal Effect

P(effect > 0): Probability the intervention had a positive effect

Visualization

# Plot results
ci.plot()

The plot shows three panels:

Original: Actual vs. predicted (counterfactual)
Pointwise: Difference between actual and predicted
Cumulative: Cumulative effect over time

Example Interpretation

Posterior Inference {Causal Impact}

                          Average        Cumulative
Actual                    156.3          10,314.0
Prediction (CI 95%)       131.2 [128.5, 133.8]    8,659.0 [8,482.0, 8,834.0]
Absolute effect (CI 95%)  25.1 [22.5, 27.8]       1,655.0 [1,480.0, 1,832.0]
Relative effect (CI 95%)  19.1% [17.1%, 21.2%]    19.1% [17.1%, 21.2%]

Posterior tail-area probability p: 0.001
Posterior prob. of a causal effect: 99.9%

This tells us:

The intervention increased the metric by ~19% on average
We're 99.9% confident there was a positive effect
The total impact was approximately 1,655 units

Model Assumptions

For valid inference, ensure:

Control metrics unaffected: Intervention didn't impact control series
Stable relationship: Pre-period relationship holds in post-period
No confounders: No other major changes during post-period
Sufficient pre-period data: Need enough data to model seasonality/trends

Advanced Usage

Custom Priors

# Specify more informative priors if you have domain knowledge
ci = CausalImpact(
    data,
    pre_period,
    post_period,
    prior_level_sd=0.01  # Tighter prior on level changes
)

Model Diagnostics

# Check model fit
import matplotlib.pyplot as plt

# Plot one-step-ahead predictions in pre-period
# Good fit = predictions track actual values closely

Common Pitfalls

Too short pre-period: Need enough data for reliable model
Multiple interventions: Confounds the analysis
Selecting controls post-hoc: Should be decided before analysis
Ignoring model assumptions: Validate that assumptions hold

Real-World Applications

This technique works for:

Marketing campaign measurement
Product launch impact
Policy change evaluation
Website redesign effects
Pricing change analysis

Next Steps

To go deeper:

Study Bayesian structural time series (BSTS) models
Learn about synthetic control methods
Explore sensitivity analysis techniques
Investigate multi-market causal impact

Conclusion

Causal Impact provides a rigorous way to measure intervention effects when randomized experiments aren't feasible. By combining Bayesian time series models with synthetic controls, you can make credible causal claims from observational data.

Try it on your next campaign or product launch!