...

800d4768 · Antti Hyttinen · ad23e0f0 · 800d4768
Commit 800d4768 authored 5 years ago by Antti Hyttinen
--- a/paper/imputation.tex
+++ b/paper/imputation.tex
@@ -88,8 +88,9 @@ In other words, we wish to answer a `what-if' question: for each specific case w
 %
 In the formalism of causal inference~\cite{pearl2010introduction}, we wish to evaluate the counterfactual expectation
 \begin{align}
-	\cfoutcome = & \expectss{\decision_{H_j} \leftarrow 1}{\outcome~| \obsFeatures = \obsFeaturesValue, \decision_{H_j}  = 0} \nonumber\\ %; \dataset
-	 =&  \int   P(Y=1|T=1,x,z)   P(z|R=r_j, T_{H_j} =0, x) dz
+	\cfoutcome = & \expectss{\decision_{H_j} \leftarrow 1}{\outcome~| \obsFeatures = \obsFeaturesValue, \decision_{H_j}  = 0; \dataset} %\nonumber
+	%\\ %
+	 %=&  \int   P(Y=1|T=1,x,z)   P(z|R=r_j, T_{H_j} =0, x) dz
 	%= & \probss{\decision \leftarrow 1}{\outcome = 1 | \obsFeatures = \obsFeaturesValue, \decision_\human = 0; \dataset}
 	 \label{eq:counterfactual}
 \end{align}
@@ -99,12 +100,7 @@ It is read as follows: conditional on what we know from the data entry ($\obsFea
 , consider the probability that the outcome would have been positive ($\outcome = 1$) in the hypothetical case %we had intervened to make 
 the decision was positive.
 %
-
-
-
 Notice that the presence of \dataset in the conditional part of~\ref{eq:counterfactual} gives us more information about the data entry compared to the entry-specific quantities ($\obsFeatures = \obsFeaturesValue$, $\decision_\human = 0$) and is thus not redundant.
-
-
 %
 In particular, it provides information about the leniency and other parameters of decider \human, which in turn is important to infer information about the unobserved variables \unobservable, as discussed in the beginning of this section.

@@ -133,9 +129,9 @@ Here \invlogit is the standard \acomment{inverse?} logistic function.

 %\acomment{What??? We have many decision makers used???}
 Since the decision are ultimately based on the risk factors for behaviour, we model the decisions in the data similarly according to a logistic regression over the risk factors, leniency for the decision maker and a noise term:
-\begin{eqnarray}
-\prob{\decision_{H_j} = 0~|~\leniency_j = \leniencyValue, \obsFeatures = \obsFeaturesValue, \unobservable = \unobservableValue} & = & \invlogit(\alpha_j + \gamma_\obsFeaturesValue\obsFeaturesValue + \gamma_\unobservableValue \unobservableValue + \epsilon_\decisionValue)  \label{eq:judgemodel} 
-\end{eqnarray}%
+\begin{equation}
+\prob{\decision_{H_j} = 0~|~\leniency_j = \leniencyValue, \obsFeatures = \obsFeaturesValue, \unobservable = \unobservableValue}  =  \invlogit(\alpha_j + \gamma_\obsFeaturesValue\obsFeaturesValue + \gamma_\unobservableValue \unobservableValue + \epsilon_\decisionValue)  \label{eq:judgemodel} 
+\end{equation}%
 Note that index $j$ refers to decision maker $\human_j$. Parameter $\alpha_{j}$ provides for the leniency of a decision maker by $\logit(\leniencyValue_j)$. 
 Note that we are making the simplifying assumption that coefficients $\gamma$ are the same for all defendants, but decision makers are allowed to differ in intercept $\alpha_j \approx \logit(\leniencyValue_j)$ so as to model varying leniency levels among them. % (Eq. \ref{eq:leniencymodel}).
 We use prior distributions given in Appendix for all parameters to ensure their identifiability. 
@@ -207,7 +203,7 @@ We use prior distributions given in Appendix for all parameters to ensure their
 %
 In particular, we consider the full probabilistic model defined in Equations \ref{eq:judgemodel} -- \ref{eq:defendantmodel} and obtain the posterior distribution of its parameters $\parameters = \{ \alpha_\outcomeValue, \beta_\obsFeaturesValue, \beta_\unobservableValue, \alpha_j,  \gamma_\obsFeaturesValue, \gamma_\unobservableValue\}$. %, where $i = 1, \ldots, \datasize$, conditional on the dataset.
 % 
-Notice that by ``parameters'' here we refer to all quantities that are not considered as known with certainty from the input, and so parameters include unobserved features \unobservable.
+%Notice that by ``parameters'' here we refer to all quantities that are not considered as known with certainty from the input, and so parameters include unobserved features \unobservable.
 %
 Formally, we obtain
 \begin{equation}
@@ -220,13 +216,13 @@ Sample \sample can now be used to compute various probabilistic quantities of in

 \spara{Computing counterfactuals} 
 Having obtained a posterior probability distribution for parameters \parameters in parameter space \parameterSpace, we can now expand expression~(\ref{eq:counterfactual}) as follows.
-\begin{align}
-	\cfoutcome & = \probss{\decision \leftarrow 1}{\outcome = 1 | \obsFeatures = \obsFeaturesValue, \decision_\human = 0; \dataset} \nonumber \\
-	& = \int_\parameterSpace\probss{\decision \leftarrow 1}{\outcome = 1 | \obsFeatures = \obsFeaturesValue, \parameters, \decision_\human = 0; \dataset}\ \prob{\parameters | \dataset}\ \diff{\parameters} \nonumber \\
-	& = \int_\parameterSpace\prob{\outcome = 1 | \obsFeatures = \obsFeaturesValue, \doop{\decision = 1}, \parameters}\ \prob{\parameters | \dataset}\ \diff{\parameters} \nonumber \\
-%	& = \int_\parameterSpace\prob{\outcome = 1 | \obsFeatures = \obsFeaturesValue, \doop{\decision = 1}, \alpha, \beta_{_\obsFeatures}, \beta_{_\unobservable}, \unobservable} \prob{\parameters | \dataset}\ \diff{\parameters}
-%Antti dont want to put specific parameters since P(z|...) depends on so many?
-\end{align}
+%\begin{align}
+%	\cfoutcome & = \probss{\decision \leftarrow 1}{\outcome = 1 | \obsFeatures = \obsFeaturesValue, \decision_\human = 0; \dataset} \nonumber \\
+%	& = \int_\parameterSpace\probss{\decision \leftarrow 1}{\outcome = 1 | \obsFeatures = \obsFeaturesValue, \parameters, \decision_\human = 0; \dataset}\ \prob{\parameters | \dataset}\ \diff{\parameters} \nonumber \\
+%	& = \int_\parameterSpace\prob{\outcome = 1 | \obsFeatures = \obsFeaturesValue, \doop{\decision = 1}, \parameters}\ \prob{\parameters | \dataset}\ \diff{\parameters} \nonumber \\
+%%	& = \int_\parameterSpace\prob{\outcome = 1 | \obsFeatures = \obsFeaturesValue, \doop{\decision = 1}, \alpha, \beta_{_\obsFeatures}, \beta_{_\unobservable}, \unobservable} \prob{\parameters | \dataset}\ \diff{\parameters}
+%%Antti dont want to put specific parameters since P(z|...) depends on so many?
+%\end{align}

 \begin{align}
 \cfoutcome & = \int   P(Y=1|T=1,x,z, \theta)   P(z|R=r_j, T_{H_j} =0, x,\theta)\prob{\parameters | \dataset} dz\diff{\parameters}
@@ -241,9 +237,9 @@ The result is computed numerically over the sample.
 \begin{equation}
 	\cfoutcome \approxeq \sum_{z} \sum_{\parameters
 	%\in\sample
-	}\prob{\outcome = 1 | \obsFeatures = \obsFeaturesValue, \decision = 1, \alpha, \beta_{_\obsFeatures}, \beta_{_\unobservable}, z} \label{eq:expandcf}
+	}\prob{\outcome = 1 |  \decision = 1, \obsFeaturesValue, z, \theta} \label{eq:expandcf}
 \end{equation}
-where the sums are taken over samples obtained in the inference.
+where the sums are taken over samples of $\parameters$ and $z$ obtained from their respective posteriors.
 Note that, for all data entries other than the ones with $\decision_\human = 0$ and $\decision_\machine = 1$, we trivially have 
 \begin{equation}
 	\cfoutcome = \outcome