In an ideal world, every product change would be tested with a randomized controlled trial. But reality is messier. Sometimes you cannot randomize—the feature shipped to everyone, legal constraints prevent holdouts, or the sample size is too small for statistical power.
When I face these situations, I turn to quasi-experimental methods. Here is my playbook.
The Problem with Observational Data
The fundamental challenge is confounding. Users who adopt a new feature are different from those who do not. Maybe they are more engaged, more tech-savvy, or joined during a specific marketing campaign. Simply comparing adopters vs. non-adopters tells you nothing about the feature is true impact.
I have seen teams make this mistake repeatedly—celebrating a feature “win” that was really just selection bias.
Method 1: Difference-in-Differences
When a feature rolls out at different times to different groups (e.g., by region or platform), DiD can work beautifully. The key assumption is parallel trends—that treated and control groups would have moved together absent the treatment.
# Simplified DiD estimation
import statsmodels.formula.api as smf
model = smf.ols("outcome ~ treated * post + C(group) + C(time)", data=df)
results = model.fit()
# The treated:post coefficient is your treatment effect
Always plot those pre-trends. If they are not parallel, DiD will mislead you.
Method 2: Synthetic Control
When you have one treated unit and many potential controls, synthetic control constructs a “synthetic” version of the treated unit from a weighted combination of controls. I have used this extensively for geo-experiments—it handles the messiness of real markets better than simple comparisons.
Method 3: Regression Discontinuity
If treatment assignment has a cutoff (e.g., users above X engagement score get the feature), RD exploits the discontinuity at that threshold. Users just above and below the cutoff are nearly identical, creating a local randomization.
This is underutilized, in my opinion. Many product features have natural cutoffs that nobody thinks to exploit.
When to Use What
| Method | Best When | Key Assumption |
|---|---|---|
| DiD | Staggered rollout | Parallel trends |
| Synthetic Control | Single treated unit | Pre-treatment fit |
| RD | Assignment cutoff | Continuity at cutoff |
The Bottom Line
No method is perfect. The best approach combines multiple methods and checks whether they tell a consistent story. When they diverge, that is often where the most interesting learning happens—it usually means you are missing something important about the underlying dynamics.