@@ -22,8 +22,17 @@ Y & := f(T,X,Z,\epsilon_Y). \nonumber
For any cases where $T=0$ in the data, we calculate the counterfactual value of $Y$ if we had had $T=1$. We use the approach by Pearl consisting of three steps abduction, action prediction. We describe first what happens on fixed parameters and later generalize to the case where parameters are learned from data.
In the abduction step we update the distribution of the disturbance terms $(\epsilon_R, \epsilon_Z, \epsilon_X, \epsilon_T,\epsilon_Y)$ to take into account the evidence $T=0,Y=1,X=x$. At this point we make use of the additional information a negative decision has on the unobserved risk factor $Z$. We can directly update
In the abduction step we update the distribution of the disturbance terms $P(\epsilon_R, \epsilon_Z, \epsilon_X, \epsilon_T,\epsilon_Y)$ to take into account the evidence $P(\epsilon_R, \epsilon_Z, \epsilon_X, \epsilon_T,\epsilon_Y|T=0,Y=1,X=x)$. At this point we make use of the additional information a negative decision has on the unobserved risk factor $Z$. We directly know $\epsilon_X=X$ and can calculate $\epsilon_R$ from the data. As the next step is not affected by $\epsilon_T$ we do not need it either. Due to the form of $f$ the observation does not give any information on $\epsilon_Y$. We only need to determine $P(\epsilon_Z| R=\epsilon_R,T=0,X=\epsilon_X)$.
Action step involves intervening on $T$ and setting $T=1$.
Finally in the prediction step we estimate $Y$ by taking account the observations:
\begin{eqnarray*}
E(Y)&=&\int f(T=1,X=x,Z=\epsilon_z,\epsilon_Y) \\
&& P(Z=\epsilon_Z|R=\epsilon_R, T=0, X=x)
P(\epsilon_Y) d\epsilon_Z d\epsilon_Y
\end{eqnarray*}
Taking into account that we need to learn parameters from the data we integrate this expression over the posterior of the parameters. Note that since $Z$ is unobserved, it is not straightforwardly clear that we can estimate parameters associated to it. However, since $Z$ is not observed we can assume it has zero mean and unit variance. Furthermore we can assume positivity of parameters, since $Z$ increases risk of failure and induces $T=0$ decisions.
\section{Counterfactual-Based Imputation For Selective Labels}