Causality/2

23. Conformal Inference for Synthetic Controls

밤 편지 2025. 4. 14. 07:41

https://matheusfacure.github.io/python-causality-handbook/Conformal-Inference-for-Synthetic-Control.html


Synthetic Control Refresher

Synthetic Control (SC) is a particularly useful causal inference technique for when you have a single treatment unit and very few control units, but you have repeated observation of each unit through time (although there are plenty of SC extensions in the Big Data world). The canonical use case is when you want to know the impact of the treatment in one geography (like a state) and you use the other untreated states as controls. In our Synthetic Control chapter, we’ve motivated the technique by trying to estimate the effect of Proposition 99 (a bill passed in 1988 that increased cigarette tax in California) in cigarette sales.

 

In order to do that, we have to estimate what would have happened to California, had it not passed Proposition 99. This boils down to estimating the counterfactual Yt(0) so that we can compare it to the observed outcome in the post intervention periods:

 

There are many methods to do that, among which, we have Synthetic Controls. Synthetic Controls tries to model Y(0) for the treated unit by combining multiple control units in such a way that they mimic the pre-treatment behavior of the treated unit. In our case, this means finding a combination of states that, together, approximate the cigarette sales trend in California prior to Proposition 99. This is done because we rarely have a control unit that follows the same pattern as the treatment unit. We can see that by plotting the cigarette sales trend for multiple states. Notice none of them have a trend that closely resembles that of California.

 

That is why we combine multiple control units. The goal is, if we don’t have a good enough control, we can craft a synthetic one that resembles the treated unit the way we want.

 

In order to find the combination of states that better approximate the pretreatment trend of California, the Synthetic Control method runs a horizontal regression, where the rows are the time periods and the columns are the states. It tries to find the weights that, when multiplied by the control states, better approximate the treated state

 

Since we have more states (39, some were discarded from the analysis) than time periods, an unconstrained regression would simply overfit, which is why Synthetic Control imposes two restrictions:

  1. Weights must sum to 1;
  2. Weights must be non-negative;

Or, in mathematical terms, let y be the vector of outcomes for the treated state in the pre-treated periods, X the T0 by J matrix, where each column is a state j and each row is a period t prior to the intervention period, T1 = T0 + 1

 

Combined, these constraints means we are defining the synthetic control as a convex combination of the control units. It also means we are not doing any dangerous extrapolation and that our synthetic control will use only a small subset of control units.

 

Here is what this looks like in code, as an Sklearn estimator:

 

We can now plot, side by side the trend for California and for the synthetic control we’ve just created. The difference between these two lines is the estimated effect of Proposition 99 in California.

 

From the look of this plot, it looks like Proposition 99 had a pretty big effect on the reduction of cigarette sales.


Inference for Grown Ups

In the Synthetic Control chapter, we showed an inference procedure where we’ve permuted units, pretending control units where treated. This is also referred to as a placebo test, where we check the effect of units that haven’t gone through the treatment. If the estimated effect in the treated unit is bigger than most of the placebo effects, we say that this effect estimate is significant.

 

In our example, we can see that the post-treatment difference for California is quite extreme, when compared to the other states. However, there are also some states with terrible pre-treatment fit, which then translates to a huge error in the post-intervention period. The guideline here is to remove units with high pretreatment error, but how high is a bit more complicated. Not only that, this procedure assumes a random assignment of the intervention, which is hard to believe for this kind of policy intervention (see Abadie, 2021)

 

One alternative method for inference is to recast the problem of effect estimation as counterfactual prediction. If you think about it, all we are trying to do is predict the counterfactual Yi,t(0) where i is the treated unit and t≥T1, that is, in the post intervention period. If we do that, we can leverage the literature on Conformal Prediction for inference. Interestingly enough, this method is quite general and applies to other models of Yi,t(0) but let’s focus just on Synthetic Controls here.

 

To understand this procedure, let’s first look at how we would do Hypothesis Tests and get P-Values.


Hypothesis Test and P-Values

Let’s say we are interested in testing the Hypothesis about the trajectory of effects in the post treatment period 

 

For instance, if we wish to test for no effect whatsoever, we can set θ^0=(0,...,0). Notice that this hypothesis fully determines the counterfactual outcome in the absence of treatment:

 

The key idea is to then generate data following the null hypothesis we want to test and check the residuals of a model for Y(0) in this generated data. If the residuals are too extreme, we say that the data is unlikely to have come from the null hypothesis we’ve postulated. 

 

The first step is to generate data under the null hypothesis. This is achieved by simply subtracting the postulated null from the outcome of the treated unit, just like in the equation above. Here is the code to do that.

 

If we postulate the null of no effect, the data under that null means that Y(0)=Y(1)=Y, which is just the trajectory of observed outcome we see for the treated state of California. Now, if we postulate that the null is -4, that is, Proposition 99 decreases cigarette sales by 4 packs, then Y(0)=Y(1)−(−4), which shifts the trajectory of the post treatment outcomes by +4. This is very intuitive. If we think the bill decreases cigarette sales, then, in the absence of it, we should see higher levels of cigarette sales than the one we have in our observed data.

 

The next part of the inference procedure is to fit a model for the counterfactual Y(0) (which we get with the function we just created) in the entire data, pre and post-treatment period. This is an important distinction between how we usually fit synthetic controls. The idea here is that the model must be estimated with the entire data, under the postulated null hypothesis, to avoid huge post intervention residuals. With this model, we then compute the residuals ut^=Yt−Y^t(0) for all time periods t.

 

The function to do that first uses the with_effect function we created earlier to generate data under the null. Then, it fits the model in this data under the null. Next, we estimate Y(0) by making predictions with the recently fit model. Finally, we compute the residuals u^t and stores everything in a dataframe.

 

With our data, to get the residuals for H0:0, meaning Proposition 99 had no effect, we can simply pass 0 as the null for our function.

 

The result is a dataframe containing the estimated residuals for each time period, something we will use going forward. Remember that the idea here is to see if that residual, in the post intervention period, is too high. If it is, the data is unlikely to have come from this null, where the effect is zero. To get a visual idea of what we are talking about, we can inspect the error of our model in the post intervention period.


Test Statistic

This visual evidence is interesting for our own understanding, but we need to be a bit more precise here. This is done by the definition of a Test Statistic S, which summarizes how big are the residuals and hence, how unikly is the data we saw, under the null.

 

Notice that this statistic is computed using only the post-intervention period, with t≥T0+1. So, although we use all the data to fit our model for the counterfactual Y(0), we check the residuals only for the outcome which concerns the formulated null hypothesis, that is, the post-intervention period.

 

High values of this test statistic indicate poor post intervention fit and, hence rejection of the null. However, we could have pretty big test statistics in the post-intervention period if our model is poorly fitted, even if H0 is true. This means we can’t define high in absolute terms. Rather, we have to think about how high are the post intervention residuals - and test statistics - in comparison to the pre-intervention residuals.


P-Value

To compute the P-value, we block-permute the residuals, calculating the test statistic in each permutation. This procedure is better understood by the following picture

 

Once we do that, we will end up with T test statistics, one for each of the block permutations.

 

Let Π be the set of all block permutations, by the definition of P-value

 

and u^π0 is the original (unpermuted) vector or residuals. In plain terms, we are simply finding the proportion of times that the unpermuted test statistic is higher (more extreme) than the test statistics obtained by all possible block permutations.

 

To implement this, we will make use of the np.roll function, which takes an array and circles it, mujustch like we’ve represented in the image above.

 

Remember, this is the P-value for the null hypothesis which states that the effect in all time periods is zero: θ=(θ_T0+1=0, ... ,θ_T=0). From our effect plot from the Synthetic Control, we get the feeling that the effect of Proposition 99 is not a fixed number. We can see that it starts small, around -5, but gradually increases to -25. For this reason, it might be interesting to plot the confidence interval for effect each post treatment period individually, rather than just testing a null hypothesis about an entire affect trajectory.


Confidence Intervals

To understand how we can place a confidence interval around the effect of each post-treatment period, let’s first try to understand how we would define the confidence interval for a single time period. If we have a single period, then H0 is defined in terms of a scalar value, rather than a trajectory vector θ. This means we can generate a fine line of H0s and compute the P-value associated with each null. For example, Let’s say we think the effect of Proposition 99 in the year 1988 (the year it passed) is somewhere between -20 and 20. We can then build a table containing a bunch of H0, from -20 to 20, and each associated P-value:

 

With the functions we’ve defined, this can be achieved by first appending the period of interest (1988 in this example) at the end of the pre-intervention period, creating what is called an augmented dataset. Then, we iterate over the fine line of nulls, computing the p-value of a post-intervention window of size 1, which starts at the period of interest

 

As you can see, the result is a table where the row index is the null hypothesis and the row values are the p-values.

 

To build the confidence interval, all we need to do is filter out the H0s that gave us a low P-value. Remember that low p-value means that the data we have is unlikely to have come from that null. For instance, if we define the significant level α to be 0.1, we remove H0s that have P-value lower than 0.1.

 

This gives us the confidence interval for the effect in 1988.

 

We can also plot the H0 by P-value to better understand how this confidence interval was obtained. In the figure below, the dashed line is the 0.1 line, which is the α we’ve specified. The blue lines mark the confidence intervals. H0 outside these lines have a P-value lower than 0.1.

 

All there’s left to do is repeat the procedure above for each time period. This means that, for each post intervention year, appending it to the end of the pre-intervention period to create the augmented dataset and then computing the confidence interval just like we’ve done above.


Reference

This Appendix based on the paper An Exact and Robust Conformal Inference Method for Counterfactual and Synthetic Controls, by Victor Chernozhukov, Kaspar Wüthrich, Yinchu Zhu. I would like to give special thanks to Kaspar, who clarified a lot of the questions I had.

 

For additional resources on Synthetic Controls, check out Using Synthetic Controls: Feasibility, Data Requirements, and Methodological Aspects, by Alberto Abadie (2021).