appendix.tex

%!TEX root = main.tex

\appendix

\section{Counterfactual Inference}\label{sec:counterfactuals}

Here we derive Equation~\ref{eq:counterfactual_eq}, via Pearl's counterfactual inference protocol involving three steps: abduction, action, and inference \cite{pearl2010introduction}. Our model can be represented with the following structural equations over the graph structure in Figure~\ref{fig:causalmodel}:
\begin{align}
\judge & := \epsilon_{\judge}, \quad 
% \nonumber \\
\unobservable := \epsilon_\unobservable, \quad   
 % \nonumber \\
 \obsFeatures := \epsilon_\obsFeatures,  \quad \decision  := g(\human,\obsFeatures,\unobservable,\epsilon_{\decision }), \quad  %\nonumber\\
\outcome := f(\decision,\obsFeatures,\unobservable,\epsilon_\outcome).  \nonumber 
\end{align}
\noindent
For any cases where $\decision=0$ in the data, we calculate the counterfactual value of $\outcome$ if we had had $\decision=1$.  We assume here that all these parameters, functions and distributions are known.
In the \emph{abduction} step we determine $\prob{\epsilon_\human, \epsilon_\unobservable, \epsilon_\obsFeatures, \epsilon_{\decision},\epsilon_\outcome|\judgeValue,\obsFeaturesValue,\decision=0}$,  the distribution of the stochastic disturbance terms updated to take into account the observed evidence on the decision maker, observed features and the decision (given the decision $\decision=0$ disturbances are independent of $\outcome$). %At this point we make use of the additional information a negative decision has on the unobserved risk factor $Z$. 
We directly know $\epsilon_\obsFeatures=\obsFeaturesValue$ and $\epsilon_{_\judge}=\judgeValue$. 
%PROBLEM: As the next step of inferring outcome $\outcome$ is not affected by $\epsilon_{\decision}$ we do not need to take it into account. \acomment{Is this what happens?} \rcomment{See big note.}  
Due to the special form of $f$ the observed evidence is independent of $\epsilon_\outcome$ when $\decision = 0$. We only need to determine $\prob{\epsilon_\unobservable,\epsilon_{\decision}|\humanValue,\obsFeaturesValue,\decision=0}$.
Next, the \emph{action} step involves intervening on $\decision$ and setting $\decision=1$ by intervention.
Finally in the \emph{prediction} step we estimate $\outcome$:
\begin{eqnarray*}
\hspace{-0.1cm} E_{\decision \leftarrow 1}(\outcome|\judgeValue,\decision=0,\obsFeaturesValue)
 &=&  \hspace{-0.1cm}  \int  \hspace{-0.1cm} f(\decision=1,\obsFeaturesValue,\unobservable = \epsilon_\unobservable,\epsilon_\outcome)   \prob{\epsilon_\unobservable, \epsilon_\decision |\judgeValue,\decision=0,\obsFeaturesValue}
\prob{\epsilon_\outcome}  d\epsilon_{\unobservable} \diff{\epsilon_\outcome}\diff{\epsilon_\decision}\\
 &=&    \int   \prob{\outcome=1|\decision=1,\obsFeaturesValue,\unobservableValue}  \prob{\unobservableValue|\judgeValue,\decision=0,\obsFeaturesValue} \diff{\unobservableValue}
\end{eqnarray*}
where we used $\epsilon_\unobservable=\unobservableValue$ and integrated out $\epsilon_\decision$ and $\epsilon_\outcome$. This gives us the counterfactual expectation of $Y$ for a single subject.


\section{On the Priors of the Bayesian Model} \label{sec:model_definition}\label{sec:priors}

The priors for $\gamma_\obsFeatures, ~\beta_\obsFeatures, ~\gamma_\unobservable$ and $\beta_\unobservable$ were defined using the gamma-mixture representation of Student's t-distribution with $\nu=6$ degrees of freedom.
%
The gamma-mixture is obtained by first sampling a precision parameter from $\Gamma$($\nicefrac{\nu}{2},~\nicefrac{\nu}{2}$) and then drawing the coefficient from zero-mean Gaussian with that precision.
%variance equal to the inverse of the sampled precision parameter.
%The gamma-mixture is obtained by first sampling a precision parameter from Gamma($\nicefrac{\nu}{2},~\nicefrac{\nu}{2}$) distribution.
%
%Then the coefficient is drawn from zero-mean Gaussian distribution with variance equal to the inverse of the sampled variance parameter.
%
This procedure was applied to the scale parameters $\eta_\unobservable, ~\eta_{\beta_\obsFeatures}$ and $\eta_{\gamma_\obsFeatures}$ as shown below.
%The scale parameters $\eta_\unobservable, ~\eta_{\beta_\obsFeatures}$ and $\eta_{\gamma_\obsFeatures}$ were sampled independently from Gamma$(\nicefrac{6}{2},~\nicefrac{6}{2})$ and then the coefficients were sampled from Gaussian distribution with expectation $0$ and variance $\eta_\unobservable^{-1}, ~\eta_{\beta_\obsFeatures}^{-1}$ and $\eta_{\gamma_\obsFeatures}^{-1}$ as shown below. 
%
For vector-valued \obsFeatures, the components of $\gamma_\obsFeatures$ ($\beta_\obsFeatures$) were sampled independently with a joint precision parameter $\eta_{\gamma_\obsFeatures}$ ($\beta_{\gamma_\obsFeatures}$).
%
The coefficients for the unobserved confounder \unobservable were bounded to the positive values to ensure identifiability.
\begin{align}
\eta_\unobservable, \eta_{\beta_\obsFeatures}, \eta_{\gamma_\obsFeatures} \sim \Gamma(3, 3), \; 
\gamma_\unobservable,\beta_\unobservable \sim N_+(0, \eta_\unobservable^{-1}),\;
\gamma_\obsFeatures  \sim N(0, \eta_{\gamma_\obsFeatures}^{-1}),\;
\beta_\obsFeatures \sim N(0, \eta_{\beta_\obsFeatures}^{-1})\nonumber
\end{align}
The intercepts for the %\judgeAmount 
decision makers in the data and outcome \outcome had hierarchical Gaussian priors with variances $\sigma_\decision^2$ and $\sigma_\outcome^2$. The decision makers had a joint variance parameter $\sigma_\decision^2$.
\begin{align}
\sigma_\decision^2, ~\sigma_\outcome^2 \sim N_+(0, \tau^2),\quad
\alpha_\judgeValue \sim N(0, \sigma_\decision^2),\quad
\alpha_\outcome \sim N(0, \sigma_\outcome^2) \nonumber
\end{align}
%
The parameters $\sigma_\decision^2$ and $\sigma_\outcome^2$ were drawn independently from Gaussian distributions  with  mean $0$ and variance $\tau^2=1$, and restricted to the positive real axis.