...

3c7074c3 · Antti Hyttinen · edb0cc64 · 3c7074c3
Commit 3c7074c3 authored 5 years ago by Antti Hyttinen
--- a/paper/imputation.tex
+++ b/paper/imputation.tex
@@ -67,7 +67,7 @@ The figure also explicates the interplay of features \obsFeatures and \unobserva



-Taking into account that we need to learn parameters from the data we integrate this expression over the posterior of the parameters. Note that since $Z$ is unobserved, it is not straightforwardly clear that we can estimate parameters associated to it. However, since $Z$ is not observed we can assume it has zero mean and unit variance. Furthermore we can assume positivity of parameters, since $Z$ increases risk of failure and induces $T=0$ decisions.
+%Taking into account that we need to learn parameters from the data we integrate this expression over the posterior of the parameters. Note that since $Z$ is unobserved, it is not straightforwardly clear that we can estimate parameters associated to it. However, since $Z$ is not observed we can assume it has zero mean and unit variance. Furthermore we can assume positivity of parameters, since $Z$ increases risk of failure and induces $T=0$ decisions.



@@ -78,7 +78,7 @@ Taking into account that we need to learn parameters from the data we integrate
 %Stan allows us to directly sample from the posterior both of the parameters and the unobservable features.


-\subsection{Overview of Our Approach}
+%\subsection{Overview of Our Approach}

 Having provided the intuition for our approach, in what follows we describe it in detail.
 %
@@ -88,20 +88,20 @@ In other words, we wish to answer a `what-if' question: for each specific case w
 %
 In the formalism of causal inference~\cite{pearl2010introduction}, we wish to evaluate the counterfactual expectation
 \begin{align}
-	\cfoutcome = & \expectss{\decision_{\human_j} \leftarrow 1}{\outcome~| \obsFeatures = \obsFeaturesValue, \decision_{\human_j}  = 0; \dataset} \nonumber
-	\\
+	\cfoutcome = & \expectss{\decision_{\human} \leftarrow 1}{\outcome~| \obsFeatures = \obsFeaturesValue, \decision_{\human}  = 0; \dataset} \nonumber
+%	\\
 	 %=&  \int   P(Y=1|T=1,x,z)   P(z|R=r_j, T_{H_j} =0, x) dz
-	= & \probss{\decision_{\human_j} \leftarrow 1}{\outcome = 1 | \obsFeatures = \obsFeaturesValue, \decision_{\human_j} = 0; \dataset}
+%	= & \probss{\decision_{\human} \leftarrow 1}{\outcome = 1 | \obsFeatures = \obsFeaturesValue, \decision_{\human} = 0; \dataset}
 	 \label{eq:counterfactual}
 \end{align}
-The expression above concerns a specific entry in the dataset with features $\obsFeatures=x$, for which $\human_j$ made a decision $\decision_{\human_j} = 0$.
+The expression above concerns a specific entry in the dataset with features $\obsFeatures=x$, for which $\human$ made a decision $\decision_{\human} = 0$.
 %
-%It is read as follows: conditional on what we know from the data entry ($\obsFeatures = \obsFeaturesValue$, $\decision_{\human_j} = 0$) as well as from the entire dataset \dataset
+%It is read as follows: conditional on what we know from the data entry ($\obsFeatures = \obsFeaturesValue$, $\decision_{\human} = 0$) as well as from the entire dataset \dataset
 %, the probability that the outcome would have been positive ($\outcome = 1$) %in the hypothetical case %we had intervened to make 
 %had the decision been positive  ($\decision_{H_j} = 1$).
-It expresses the probability that the outcome would have been positive ($\outcome = 1$) had the decision been positive  ($\decision_{H_j} = 1$), conditional on what we know from the data entry ($\obsFeatures = \obsFeaturesValue$, $\decision_{\human_j} = 0$) as well as from the entire dataset \dataset.
+It expresses the probability that the outcome would have been positive ($\outcome = 1$) had the decision been positive  ($\decision_{H_j} = 1$), conditional on what we know from the data entry ($\obsFeatures = \obsFeaturesValue$, $\decision_{\human} = 0$) as well as from the entire dataset \dataset.
 %
-Notice that the presence of \dataset in the conditional part of~\ref{eq:counterfactual} gives us more information about the data entry compared to the entry-specific quantities ($\obsFeatures = \obsFeaturesValue$, $\decision_{\human_j} = 0$) and is thus not redundant.
+Notice that the presence of \dataset in the conditional part of~\ref{eq:counterfactual} gives us more information about the data entry compared to the entry-specific quantities ($\obsFeatures = \obsFeaturesValue$, $\decision_{\human} = 0$) and is thus not redundant.
 %
 In particular, it provides information about the leniency and other parameters of decider \human, which in turn is important to infer information about the unobserved variables \unobservable, as discussed in the beginning of this section.

@@ -136,7 +136,7 @@ Since the decision are ultimately based on the risk factors for behaviour, we mo
 \begin{equation}
 \prob{\decision_{H_j} = 0~|~\leniency_j = \leniencyValue, \obsFeatures = \obsFeaturesValue, \unobservable = \unobservableValue}  =  \invlogit(\alpha_j + \gamma_\obsFeaturesValue\obsFeaturesValue + \gamma_\unobservableValue \unobservableValue + \epsilon_\decisionValue)  \label{eq:judgemodel} 
 \end{equation}%
-Note that index $j$ refers to decision maker $\human_j$. Parameter $\alpha_{j}$ provides for the leniency of a decision maker by $\logit(\leniencyValue_j)$. 
+Note that index $j$ refers to decision maker $\human$. Parameter $\alpha_{j}$ provides for the leniency of a decision maker by $\logit(\leniencyValue_j)$. 
 Note that we are making the simplifying assumption that coefficients $\gamma$ are the same for all defendants, but decision makers are allowed to differ in intercept $\alpha_j \approx \logit(\leniencyValue_j)$ so as to model varying leniency levels among them. % (Eq. \ref{eq:leniencymodel}).

 %The decision makers in the data differ from each other only by leniency.
@@ -166,11 +166,11 @@ Note that we are making the simplifying assumption that coefficients $\gamma$ ar


 %
-%In addition, we consider \judgeAmount instances $\{\human_j, j = 1, 2, \ldots, \judgeAmount\}$ of decision makers \human.
+%In addition, we consider \judgeAmount instances $\{\human, j = 1, 2, \ldots, \judgeAmount\}$ of decision makers \human.
 %
 %For the purposes of Bayesian modelling, we present the hierarchical model and explicate our assumptions about the relationships and the quantities below.
 %
-%Note that index $j$ refers to decision maker $\human_j$ and \invlogit is the standard logistic function.
+%Note that index $j$ refers to decision maker $\human$ and \invlogit is the standard logistic function.

 %\noindent
 %\hrulefill
@@ -220,9 +220,9 @@ We use prior distributions given in Appendix~X to ensure the identifiability of
 For the model defined above, the counterfactual $\hat{Y}$ can be computed by the approach of Pearl.
 For fully defined model (fixed parameters)  $\hat{Y}$ can be determined by the following expression:
 \begin{align}
-\cfoutcome & = \int   \prob{Y=1|T=1,x,z}   \prob{z|R=r_j, T_{H_j} =0, x}dz
+\cfoutcome & = \int   \prob{\outcome=1|\decision=1,\obsFeaturesValue,\unobservableValue}   \prob{z|\leniency=\leniencyValue, \decision_{\human} =0, \obsFeaturesValue}\diff{\unobservableValue}
 \end{align}
-as derived in detail in Appendix~X.  In essence, we determine the distribution of the unobserved features $Z$ using the decision, observed features, and the leniency of the employed decision maker, and then determine the distribution of $Y$ conditional on all features, integrating over the unobserved features.
+as derived in detail in Appendix~B.1.  In essence, we determine the distribution of the unobserved features $Z$ using the decision, observed features, and the leniency of the employed decision maker, and then determine the distribution of $Y$ conditional on all features, integrating over the unobserved features.

 Having obtained a posterior probability distribution for parameters \parameters: % in parameter space \parameterSpace, we can now expand expression~(\ref{eq:counterfactual}) as follows.
 %\begin{align}
@@ -238,7 +238,7 @@ Having obtained a posterior probability distribution for parameters \parameters:
 \end{equation}
 %The value of the first factor in the integrand of the expression above is provided by the model in Equation~\ref{eq:defendantmodel}, while the second is sampled by MCMC, as explained above.
 %
-Note that, for all data entries other than the ones with $\decision_{\human_j} = 0$ and $\decision_\machine = 1$, we trivially have \cfoutcome = \outcome
+Note that, for all data entries other than the ones with $\decision_{\human} = 0$ and $\decision_\machine = 1$, we trivially have \cfoutcome = \outcome
 where \outcome is the outcome recorded in the dataset \dataset.


@@ -266,11 +266,11 @@ In practice, we use the MCMC functionality of Stan\footnote{\url{https://mc-stan
 %
 %The causal diagram of Figure~\ref{fig:causalmodel} provides the structure of causal relationships for quantities of interest.
 %%
-%In addition, we consider \judgeAmount instances $\{\human_j, j = 1, 2, \ldots, \judgeAmount\}$ of decision makers \human.
+%In addition, we consider \judgeAmount instances $\{\human, j = 1, 2, \ldots, \judgeAmount\}$ of decision makers \human.
 %%
 %For the purposes of Bayesian modelling, we present the hierarchical model and explicate our assumptions about the relationships and the quantities below.
 %%
-%Note that index $j$ refers to decision maker $\human_j$ and \invlogit is the standard logistic function.
+%Note that index $j$ refers to decision maker $\human$ and \invlogit is the standard logistic function.
 %
 %\noindent
 %\hrulefill