From 9ed0692fc4b7ebdbdeda054c56da1e8a710b3fc8 Mon Sep 17 00:00:00 2001
From: Antti Hyttinen <ajhyttin@gmail.com>
Date: Tue, 7 Jan 2020 13:16:19 +0200
Subject: [PATCH] ...

---
 paper/imputation.tex | 32 +++++++++++++++-----------------
 1 file changed, 15 insertions(+), 17 deletions(-)

diff --git a/paper/imputation.tex b/paper/imputation.tex
index 4bd90ad..67b3618 100644
--- a/paper/imputation.tex
+++ b/paper/imputation.tex
@@ -96,9 +96,10 @@ In the formalism of causal inference~\cite{pearl2010introduction}, we wish to ev
 \end{align}
 The expression above concerns a specific entry in the dataset with features $\obsFeatures=x$, for which $\human_j$ made a decision $\decision_{\human_j} = 0$.
 %
-It is read as follows: conditional on what we know from the data entry ($\obsFeatures = \obsFeaturesValue$, $\decision_{\human_j} = 0$) as well as from the entire dataset \dataset
-, the probability that the outcome would have been positive ($\outcome = 1$) %in the hypothetical case %we had intervened to make 
-had the decision been positive  ($\decision_{H_j} = 1$).
+%It is read as follows: conditional on what we know from the data entry ($\obsFeatures = \obsFeaturesValue$, $\decision_{\human_j} = 0$) as well as from the entire dataset \dataset
+%, the probability that the outcome would have been positive ($\outcome = 1$) %in the hypothetical case %we had intervened to make 
+%had the decision been positive  ($\decision_{H_j} = 1$).
+It expresses the probability that the outcome would have been positive ($\outcome = 1$) had the decision been positive  ($\decision_{H_j} = 1$), conditional on what we know from the data entry ($\obsFeatures = \obsFeaturesValue$, $\decision_{\human_j} = 0$) as well as from the entire dataset \dataset.
 %
 Notice that the presence of \dataset in the conditional part of~\ref{eq:counterfactual} gives us more information about the data entry compared to the entry-specific quantities ($\obsFeatures = \obsFeaturesValue$, $\decision_{\human_j} = 0$) and is thus not redundant.
 %
@@ -204,7 +205,7 @@ In particular, we consider the full probabilistic model defined in Equations \re
 % 
 %Notice that by ``parameters'' here we refer to all quantities that are not considered as known with certainty from the input, and so parameters include unobserved features \unobservable.
 %
-We use prior distributions given in Appendix~X�to ensure the identifiability of the parameters. 
+We use prior distributions given in Appendix~X�to ensure the identifiability of the parameters.
 %Formally, we obtain the parameter posterior by
 %\begin{equation}
 %	\prob{\parameters | \dataset} = \frac{\prob{\dataset | \parameters} \prob{\parameters}}{\prob{\dataset}} .
@@ -213,11 +214,11 @@ We use prior distributions given in Appendix~X
 
 \spara{Computing counterfactuals} 
 For the model defined above, the counterfactual $\hat{Y}$ can be computed by the approach of Pearl.
-As shown in Appendix , for fully defined model (fixed parameters)  $\hat{Y}$ can be determined by the following expression:
+For fully defined model (fixed parameters)  $\hat{Y}$ can be determined by the following expression:
 \begin{align}
 \cfoutcome & = \int   P(Y=1|T=1,x,z)   P(z|R=r_j, T_{H_j} =0, x)dz
 \end{align}
-as derived in detail in Appendix~X.
+as derived in detail in Appendix~X.  In essence, we determine the distribution of the unobserved features $Z$ using the decision, observed features, and the leniency of the employed decision maker, and then determine the distribution of $Y$ conditional on all features.
 
 Having obtained a posterior probability distribution for parameters \parameters: % in parameter space \parameterSpace, we can now expand expression~(\ref{eq:counterfactual}) as follows.
 %\begin{align}
@@ -228,22 +229,18 @@ Having obtained a posterior probability distribution for parameters \parameters:
 %%Antti dont want to put specific parameters since P(z|...) depends on so many?
 %\end{align}
 
-\begin{align}
-\cfoutcome & = \int   P(Y=1|T=1,x,z, \theta)   P(z|R=r_j, T_{H_j} =0, x,\theta)\prob{\parameters | \dataset} dz\diff{\parameters}
-\end{align}
-%The value of the first factor in the integrand of the expression above is provided by the model in Equation~\ref{eq:defendantmodel}, while the second is sampled by MCMC, as explained above.
-%
-
-Note that, for all data entries other than the ones with $\decision_\human = 0$ and $\decision_\machine = 1$, we trivially have 
 \begin{equation}
-	\cfoutcome = \outcome
+\cfoutcome  = \int   P(Y=1|T=1,x,z, \theta)   P(z|R=r_j, T_{H_j} =0, x,\theta)\prob{\parameters | \dataset} dz\diff{\parameters} \label{eq:theposterior}
 \end{equation}
+%The value of the first factor in the integrand of the expression above is provided by the model in Equation~\ref{eq:defendantmodel}, while the second is sampled by MCMC, as explained above.
+%
+Note that, for all data entries other than the ones with $\decision_{\human_j} = 0$ and $\decision_\machine = 1$, we trivially have \cfoutcome = \outcome
 where \outcome is the outcome recorded in the dataset \dataset.
 
 
 
 \spara{Implementation} 
-The result is computed numerically over the sample.
+The result of Equation~\ref{eq:theposterior}  is computed numerically:
 %\begin{equation}
 %	\cfoutcome \approxeq \sum_{\parameters\in\sample}\prob{\outcome = 1 | \obsFeatures = \obsFeaturesValue, \decision = 1, \alpha, \beta_{_\obsFeatures}, \beta_{_\unobservable}, \unobservable} \label{eq:expandcf}
 %\end{equation}
@@ -255,9 +252,10 @@ The result is computed numerically over the sample.
 \end{equation}
 where the sums are taken over samples of $\parameters$ and $z$ obtained from their respective posteriors.
 %
-In practice, we use the MCMC functionality of Stan\footnote{\url{https://mc-stan.org/}} to obtain a sample \sample of this posterior distribution, where each element of \sample contains one instance of parameters \parameters.
+In practice, we use the MCMC functionality of Stan\footnote{\url{https://mc-stan.org/}}.
+% to obtain a sample \sample of this posterior distribution, where each element of \sample contains one instance of parameters \parameters.
 %
-Sample \sample can now be used to compute various probabilistic quantities of interest, including a (posterior) distribution of \unobservable for each entry in dataset \dataset.
+%Sample \sample can now be used to compute various probabilistic quantities of interest, including a (posterior) distribution of \unobservable for each entry in dataset \dataset.
 
 %Original by Michael and Riku
 %\subsection{Model definition} \label{sec:model_definition}
-- 
GitLab