Newer
Older
%!TEX root = main.tex
\appendix
\section{Counterfactual Inference}\label{sec:counterfactuals}
Here we derive Equation~\ref{eq:counterfactual_eq}, via Pearl's counterfactual inference protocol involving three steps: abduction, action, and inference \cite{pearl2010introduction}. Our model can be represented with the following structural equations over the graph structure in Figure~\ref{fig:causalmodel}:
\begin{align}
\judge & := \epsilon_{\judge}, \quad
% \nonumber \\
\unobservable := \epsilon_\unobservable, \quad
% \nonumber \\
\obsFeatures := \epsilon_\obsFeatures, \quad \decision := g(\human,\obsFeatures,\unobservable,\epsilon_{\decision }), \quad %\nonumber\\
\outcome := f(\decision,\obsFeatures,\unobservable,\epsilon_\outcome). \nonumber
\end{align}
\noindent
For any cases where $\decision=0$ in the data, we calculate the counterfactual value of $\outcome$ if we had had $\decision=1$. We assume here that all these parameters, functions and distributions are known.
In the \emph{abduction} step we determine $\prob{\epsilon_\human, \epsilon_\unobservable, \epsilon_\obsFeatures, \epsilon_{\decision},\epsilon_\outcome|\judgeValue,\obsFeaturesValue,\decision=0}$, the distribution of the stochastic disturbance terms updated to take into account the observed evidence on the decision maker, observed features and the decision (given the decision $\decision=0$ disturbances are independent of $\outcome$). %At this point we make use of the additional information a negative decision has on the unobserved risk factor $Z$.
We directly know $\epsilon_\obsFeatures=\obsFeaturesValue$ and $\epsilon_{_\judge}=\judgeValue$.
%PROBLEM: As the next step of inferring outcome $\outcome$ is not affected by $\epsilon_{\decision}$ we do not need to take it into account. \acomment{Is this what happens?} \rcomment{See big note.}
Due to the special form of $f$ the observed evidence is independent of $\epsilon_\outcome$ when $\decision = 0$. We only need to determine $\prob{\epsilon_\unobservable,\epsilon_{\decision}|\humanValue,\obsFeaturesValue,\decision=0}$.
Next, the \emph{action} step involves intervening on $\decision$ and setting $\decision=1$ by intervention.
Finally in the \emph{prediction} step we estimate $\outcome$:
\begin{eqnarray*}
\hspace{-0.1cm} E_{\decision \leftarrow 1}(\outcome|\judgeValue,\decision=0,\obsFeaturesValue)
&=& \hspace{-0.1cm} \int \hspace{-0.1cm} f(\decision=1,\obsFeaturesValue,\unobservable = \epsilon_\unobservable,\epsilon_\outcome) \prob{\epsilon_\unobservable, \epsilon_\decision |\judgeValue,\decision=0,\obsFeaturesValue}
\prob{\epsilon_\outcome} d\epsilon_{\unobservable} \diff{\epsilon_\outcome}\diff{\epsilon_\decision}\\
&=& \int \prob{\outcome=1|\decision=1,\obsFeaturesValue,\unobservableValue} \prob{\unobservableValue|\judgeValue,\decision=0,\obsFeaturesValue} \diff{\unobservableValue}
\end{eqnarray*}
where we used $\epsilon_\unobservable=\unobservableValue$ and integrated out $\epsilon_\decision$ and $\epsilon_\outcome$. This gives us the counterfactual expectation of $Y$ for a single subject.
\section{On the Priors of the Bayesian Model} \label{sec:model_definition}\label{sec:priors}
The priors for $\gamma_\obsFeatures, ~\beta_\obsFeatures, ~\gamma_\unobservable$ and $\beta_\unobservable$ were defined using the gamma-mixture representation of Student's t-distribution with $\nu=6$ degrees of freedom.
The gamma-mixture is obtained by first sampling a precision parameter from $\Gamma$($\nicefrac{\nu}{2},~\nicefrac{\nu}{2}$) and then drawing the coefficient from zero-mean Gaussian with that precision.
%variance equal to the inverse of the sampled precision parameter.
%The gamma-mixture is obtained by first sampling a precision parameter from Gamma($\nicefrac{\nu}{2},~\nicefrac{\nu}{2}$) distribution.
%Then the coefficient is drawn from zero-mean Gaussian distribution with variance equal to the inverse of the sampled variance parameter.
This procedure was applied to the scale parameters $\eta_\unobservable, ~\eta_{\beta_\obsFeatures}$ and $\eta_{\gamma_\obsFeatures}$ as shown below.
%The scale parameters $\eta_\unobservable, ~\eta_{\beta_\obsFeatures}$ and $\eta_{\gamma_\obsFeatures}$ were sampled independently from Gamma$(\nicefrac{6}{2},~\nicefrac{6}{2})$ and then the coefficients were sampled from Gaussian distribution with expectation $0$ and variance $\eta_\unobservable^{-1}, ~\eta_{\beta_\obsFeatures}^{-1}$ and $\eta_{\gamma_\obsFeatures}^{-1}$ as shown below.
%
For vector-valued \obsFeatures, the components of $\gamma_\obsFeatures$ ($\beta_\obsFeatures$) were sampled independently with a joint precision parameter $\eta_{\gamma_\obsFeatures}$ ($\beta_{\gamma_\obsFeatures}$).
%
The coefficients for the unobserved confounder \unobservable were bounded to the positive values to ensure identifiability.
\begin{align}
\eta_\unobservable, \eta_{\beta_\obsFeatures}, \eta_{\gamma_\obsFeatures} \sim \Gamma(3, 3), \;
\gamma_\unobservable,\beta_\unobservable \sim N_+(0, \eta_\unobservable^{-1}),\;
\gamma_\obsFeatures \sim N(0, \eta_{\gamma_\obsFeatures}^{-1}),\;
\beta_\obsFeatures \sim N(0, \eta_{\beta_\obsFeatures}^{-1})\nonumber
\end{align}
The intercepts for the %\judgeAmount
decision makers in the data and outcome \outcome had hierarchical Gaussian priors with variances $\sigma_\decision^2$ and $\sigma_\outcome^2$. The decision makers had a joint variance parameter $\sigma_\decision^2$.
\begin{align}
\sigma_\decision^2, ~\sigma_\outcome^2 \sim N_+(0, \tau^2),\quad
\alpha_\judgeValue \sim N(0, \sigma_\decision^2),\quad
The parameters $\sigma_\decision^2$ and $\sigma_\outcome^2$ were drawn independently from Gaussian distributions with mean $0$ and variance $\tau^2=1$, and restricted to the positive real axis.