1.2 Potential Outcomes
What if?
Table of Contents
Introduction
This post will aim to detail the Potential Outcomes Framework1 and some important assumptions that follow. These ideas will form the foundation for most if not all discussion in the foreseeable future.
Going back to the Springfield dataset , we see that A can be viewed as a binary random variable representing treatment assignment (for the sake of brevity, let A=1 represent treatment and A=0 represent control). Now, we can define our Potential Outcome variable \( Y(A) \) as :
“What would have been the outcome if an individual was administered treatment A”
Since treatment status is binary, we can further specify the Potential Outcome variable under treatment and control as \( Y(1) \) and \( Y(0) \). There are two crucial things to note here. Firstly, even though \( Y(1) \) and \( Y(0) \) are “functions” of treatment status, they are still fundamentally random variables. A follow-up point is that we are able to define the Potential Outcome of a randomly sampled individual from the total population as \( Y_i(A) \). Next, we can formalize the Individual Causal Effect (ICE) for index \( i\) as :
\( \begin{align} ICE &= Y_i(1) - Y_i(0) \\ \end{align} \)
In plain English, ICE is the difference between what would have happened if individual i received treatment as opposed to the control. As such we would say that the vaccine has an ICE on Homer if he would have recovered after taking the vaccine and not actually recovering if he had been given the control (or vice versa). Individual effects might provide some insight, but we can also define the Average Causal Effect (ACE) through expectation :
\( \begin{align} ACE &= \mathrm{E}[Y(1) - Y(0)] \\ \end{align} \)
ACE (and its counterparts) are extremely important estimands as they will be our main way to understand causal effects in the overall population, while also leveraging large-sample properties for estimation.
It is important to realize that up until this point we have only spoken of hypothetical outcomes - “what would have happened”. We may be able to think about hypotheticals but how do Potential Outcomes actually relate to the real-world that we observe?
The fundamental challenge of Causal Inference
To answer that question, we look to a subset of the Springfield dataset. We can think of \( Y \) as our observed results, and we can non-parameterically calculate the observed effect between vaccinated and control cohorts as :
\( \begin{align} \text{Observed effect} &= P(Y = 0 | A = 1) - P(Y = 0 | A = 0)\\ &= \frac{3}{4} - \frac{1}{4} \\ &= 0.5 \end{align} \)
Patients | A | Y | Y(1) | Y(0) |
---|---|---|---|---|
Homer | 0 | 0 | ? | 0 |
Bart | 1 | 0 | 0 | ? |
Lisa | 0 | 1 | ? | 1 |
Millhouse | 1 | 0 | 0 | ? |
Krusty | 1 | 0 | 0 | ? |
Maggie | 0 | 1 | ? | 1 |
Marge | 1 | 1 | 1 | ? |
Selma | 0 | 1 | ? | 1 |
At this point you might have realized that the columns corresponding to Potential Outcomes are scattered with question marks. This is the unavoidable, fundamental challenge of Causal Inference - that for any individual we can only observe one realized Potential Outcome but never both. After all, we can’t simply travel back in time to reassign Homer the vaccine after he’s already been given the control!
And the reality is that Potential and Observed Outcomes can be very different. Assume that we are somehow able to gain knowledge of the Potential Outcomes in our patient sample and calculate the nonparametric sample ACE as :
\( \begin{align} \text{Sample ACE} &= P(Y(1) = 0) - P(Y(0)=0)\\ &= \frac{4}{8} - \frac{5}{8} \\ &= -\frac{1}{8} \end{align} \)
Patients | A | Y | G | Y(1) | Y(0) |
---|---|---|---|---|---|
Homer | 0 | 0 | M | 0 | 0 |
Bart | 1 | 0 | M | 0 | 0 |
Lisa | 0 | 1 | F | 1 | 1 |
Millhouse | 1 | 0 | M | 0 | 0 |
Krusty | 1 | 0 | M | 0 | 0 |
Maggie | 0 | 1 | F | 1 | 1 |
Marge | 1 | 1 | F | 1 | 0 |
Selma | 0 | 1 | F | 1 | 1 |
Here we see that the true Causal Effect of the vaccine is not only much smaller than the Observed Effect, but its magnitiude is actually reversed. As we spoke about in the previous post, Gender was identified as a Confounding variable in this analysis. Here we clearly see that males were not only disproportionately assigned the vaccine, but based on their potential outcomes they would have recovered with or without vaccine. Ultimately, it was the combination of these two phenomena caused by Gender that resulted in a spurious Observed Effect when there was truly little to no Causal Effect!
Constructing a solution
At this point, it might seem that we have set ourselves up for an impossible task - inferring Causal Effects from data that can never be observed. There is however a way to provably circumvent this impossibility. But to do so would require several conditions to hold.
Positivity
Starting off with the most intuitive of the conditions, Positivity basically implies that the probability of receiving every category of treatment is \( > 0 \). Assuming that treatment is binary, this can be simplified to :
\( 0 < P(A=1) < 1 \)
Do note that a version “Conditional Positivity”, \( 0 < P(A=1 | X) < 1 \), will play an important role in later applications.
Stable Unit Treatment Value (STUVA) & Consistency
First coined by Prof Rubin2, STUVA is described as :
"… that is, there is no interference between units … and there are no versions of treatments leading to ’technical errors’."
For STUVA to hold, we have to assume: 1) that an individual’s potential outcome is independent of another person’s treatment assignment and 2) there is only one well-defined version of each treatment (\(A=0\) / \(A=1\)), which implies a single well-defined version of each potential outcome (\(Y(0)\) / \(Y(1)\)).
An extension of this assumption is Consistency (not to be confused with statistical consistency), which states that if treatments and potential outcomes are well-defined then :
If \(\; A_i = a \), then \(\; Y_i = Y_i(a) \)
Or Equivalently, \(P(Y | A = a) = P(Y(a) | A = a) \). In other words, the distribution of observed outcomes under treatment \(A =a\) equals the distribution of potential outcomes.
Exchangeability
In my (very humble) opinion, Exchangeability is perhaps the most “important” yet elusive of the conditions to grasp. Earlier, potential outcomes were introduced as hypothetical scenarios under a particular treatment. Another way to think of potential outcomes is to see them as a reflection of the inherent characteristics/traits of an individual. After all, what would have happened to a patient is dependent on a multitude of individualized variables, like Age, Gender, Race, etc.
With this in mind, let us define the Exchangeability condition :
\(Y(A) \mathrel{\unicode{x2AEB}} A \)
Taken at face value, Exchangeability simply implies that potential outcomes are independent from treatment assignment. Though technically correct, I personally have never found this description to be particularly informative. Instead, I find it much more helpful to think of Exchangeability as literally implying that we could exchange every patient between the treated/untreated cohorts and \(Y(A)\) would still have the same distribution in each cohort. This also means that even after swapping patients, both swapped cohorts would still be at the same “risk of outcome” as the original assignments.
For instance, if we naively assume that Age is sole determinant of \(Y(A) \), Exchangeability would hold if patients in both treated and untreated cohorts had the same Age distribution. Here it is important to note that Exchangeability does NOT imply that \(Y \mathrel{\unicode{x2AEB}} A \). Assuming association between treatment and outcome, the distribution of observed outcomes are clearly dependent on treatment assignment (imagine observing the results from a 90/10 assignment over a 50/50).
Problem Solved?
Now with all our conditions at hand, we can begin to bridge the gap between observed results and unobserved potential outcomes. The sequence of transformations is as follows:
\( \begin{align} ACE &= \mathrm{E}[Y(1) - Y(0)] \\ &= \mathrm{E}[Y(1) | A =1] - \mathrm{E}[Y(0) | A = 0] \ \textit{, by Exchangeability} \\ &= \mathrm{E}[Y | A =1] - \mathrm{E}[Y | A = 0] \ \textit{, by Consistency} \end{align} \)
In addition to assuming Positivity (such that all probabilities are non-zero), assuming Exchangeability in conjunction with Consistency allows us to directly infer ATE with our observed results!
Before we get ahead ourselves and declare Causal Inference to be a solved problem, lets take a step back and examine whether these assumptions actually imply and if really hold.
Firstly, how likely is it for both treated and untreated populations in a randomly sampled observational cohort to be completely exchangeable. Clearly the possibility of this happening is extremely unlikely, we would obviously expect the characteristics of individuals who receive treatment to differ greatly from individuals who don’t.
Secondly, can we always assume that consistency holds? Accurately defining what constitutes treatment may be straightforward in some cases (i.e vaccinations) but more ambiguous in more complicated scenarios (i.e surgical procedures). Furthermore, in the field of the social sciences, how do we begin to study the causal effects of phenomena like obesity and poverty when they are inherently difficult to define as “treatments”.
Lastly, even if Positivity theoretically holds, there may be certain instances where the probability of being assigned a particular treatment is either very low or high in a cohort (i.e rare diseases). Even though the assignment probabilities are not strictly 0, these extreme values make downstream estimation much more variable and make resultant causal estimates questionable.
In summary, this proposed “solution” to the challenge of Causal Inference should only be taken as a starting point, albeit an extremely important one. As we will see in later topics, these assumptions and concepts will become our ever-reliable toolbox in tackling Causal questions.
-
Splawa-Neyman, Jerzy, Dorota M. Dabrowska, and Terrence P. Speed. “On the application of probability theory to agricultural experiments. Essay on principles. Section 9.” Statistical Science (1990): 465-472. ↩︎
-
Rubin, Donald B. “Randomization analysis of experimental data: The Fisher randomization test comment.” Journal of the American statistical association 75.371 (1980): 591-593. ↩︎