...

615e4f94 · Antti Hyttinen · 90ab2366 · 615e4f94
Commit 615e4f94 authored 5 years ago by Antti Hyttinen
--- a/paper/imputation.tex
+++ b/paper/imputation.tex
@@ -88,19 +88,19 @@ In other words, we wish to answer a `what-if' question: for each specific case w
 %
 In the formalism of causal inference~\cite{pearl2010introduction}, we wish to evaluate the counterfactual expectation
 \begin{align}
-	\cfoutcome = & \expectss{\decision_{H_j} \leftarrow 1}{\outcome~| \obsFeatures = \obsFeaturesValue, \decision_{H_j}  = 0; \dataset} %\nonumber
-	%\\ %
+	\cfoutcome = & \expectss{\decision_{\human_j} \leftarrow 1}{\outcome~| \obsFeatures = \obsFeaturesValue, \decision_{\human_j}  = 0; \dataset} \nonumber
+	\\
 	 %=&  \int   P(Y=1|T=1,x,z)   P(z|R=r_j, T_{H_j} =0, x) dz
-	%= & \probss{\decision \leftarrow 1}{\outcome = 1 | \obsFeatures = \obsFeaturesValue, \decision_\human = 0; \dataset}
+	= & \probss{\decision \leftarrow 1}{\outcome = 1 | \obsFeatures = \obsFeaturesValue, \decision_{\human_j} = 0; \dataset}
 	 \label{eq:counterfactual}
 \end{align}
-The probability expression above concerns a specific entry in the dataset with features $\obsFeatures=x$, for which $\human_j$ made a decision $\decision_{\human_j} = 0$.
+The expression above concerns a specific entry in the dataset with features $\obsFeatures=x$, for which $\human_j$ made a decision $\decision_{\human_j} = 0$.
 %
-It is read as follows: conditional on what we know from the data entry ($\obsFeatures = \obsFeaturesValue$, $\decision_{\human_j} = 0$) %as well as from the entire dataset \dataset
-, consider the probability that the outcome would have been positive ($\outcome = 1$) in the hypothetical case %we had intervened to make 
-the decision was positive.
+It is read as follows: conditional on what we know from the data entry ($\obsFeatures = \obsFeaturesValue$, $\decision_{\human_j} = 0$) as well as from the entire dataset \dataset
+, the probability that the outcome would have been positive ($\outcome = 1$) %in the hypothetical case %we had intervened to make 
+had the decision been positive  ($\decision_{H_j} = 1$).
 %
-Notice that the presence of \dataset in the conditional part of~\ref{eq:counterfactual} gives us more information about the data entry compared to the entry-specific quantities ($\obsFeatures = \obsFeaturesValue$, $\decision_\human = 0$) and is thus not redundant.
+Notice that the presence of \dataset in the conditional part of~\ref{eq:counterfactual} gives us more information about the data entry compared to the entry-specific quantities ($\obsFeatures = \obsFeaturesValue$, $\decision_{\human_j} = 0$) and is thus not redundant.
 %
 In particular, it provides information about the leniency and other parameters of decider \human, which in turn is important to infer information about the unobserved variables \unobservable, as discussed in the beginning of this section.

@@ -134,7 +134,7 @@ Since the decision are ultimately based on the risk factors for behaviour, we mo
 \end{equation}%
 Note that index $j$ refers to decision maker $\human_j$. Parameter $\alpha_{j}$ provides for the leniency of a decision maker by $\logit(\leniencyValue_j)$. 
 Note that we are making the simplifying assumption that coefficients $\gamma$ are the same for all defendants, but decision makers are allowed to differ in intercept $\alpha_j \approx \logit(\leniencyValue_j)$ so as to model varying leniency levels among them. % (Eq. \ref{eq:leniencymodel}).
-We use prior distributions given in Appendix for all parameters to ensure their identifiability. 
+
 %The decision makers in the data differ from each other only by leniency.
 %\noindent
 %\hrulefill
@@ -197,22 +197,29 @@ We use prior distributions given in Appendix for all parameters to ensure their
 %
 %However, in many settings, a decision maker would be considered good if the two functions were the same -- i.e., if the probability to make a positive decision was the same as the probability to obtain a successful outcome after a positive decision.

-
-
-\spara{Parameter estimation} We take a Bayesian approach to learn the model over the dataset \dataset.
+%\spara{Parameter estimation} 
+We take a Bayesian approach to learn the model over the dataset \dataset.
 %
 In particular, we consider the full probabilistic model defined in Equations \ref{eq:judgemodel} -- \ref{eq:defendantmodel} and obtain the posterior distribution of its parameters $\parameters = \{ \alpha_\outcomeValue, \beta_\obsFeaturesValue, \beta_\unobservableValue, \alpha_j,  \gamma_\obsFeaturesValue, \gamma_\unobservableValue\}$. %, where $i = 1, \ldots, \datasize$, conditional on the dataset.
 % 
 %Notice that by ``parameters'' here we refer to all quantities that are not considered as known with certainty from the input, and so parameters include unobserved features \unobservable.
 %
-Formally, we obtain
-\begin{equation}
-	\prob{\parameters | \dataset} = \frac{\prob{\dataset | \parameters} \prob{\parameters}}{\prob{\dataset}} .
-\end{equation}
+We use prior distributions given in Appendix~X to ensure the identifiability of the parameters. 
+%Formally, we obtain the parameter posterior by
+%\begin{equation}
+%	\prob{\parameters | \dataset} = \frac{\prob{\dataset | \parameters} \prob{\parameters}}{\prob{\dataset}} .
+%\end{equation}


 \spara{Computing counterfactuals} 
-Having obtained a posterior probability distribution for parameters \parameters in parameter space \parameterSpace, we can now expand expression~(\ref{eq:counterfactual}) as follows.
+For the model defined above, the counterfactual $\hat{Y}$ can be computed by the approach of Pearl.
+As shown in Appendix , for fully defined model (fixed parameters)  $\hat{Y}$ can be determined by the following expression:
+\begin{align}
+\cfoutcome & = \int   P(Y=1|T=1,x,z)   P(z|R=r_j, T_{H_j} =0, x)dz
+\end{align}
+as derived in detail in Appendix~X.
+
+Having obtained a posterior probability distribution for parameters \parameters: % in parameter space \parameterSpace, we can now expand expression~(\ref{eq:counterfactual}) as follows.
 %\begin{align}
 %	\cfoutcome & = \probss{\decision \leftarrow 1}{\outcome = 1 | \obsFeatures = \obsFeaturesValue, \decision_\human = 0; \dataset} \nonumber \\
 %	& = \int_\parameterSpace\probss{\decision \leftarrow 1}{\outcome = 1 | \obsFeatures = \obsFeaturesValue, \parameters, \decision_\human = 0; \dataset}\ \prob{\parameters | \dataset}\ \diff{\parameters} \nonumber \\
@@ -224,8 +231,18 @@ Having obtained a posterior probability distribution for parameters \parameters
 \begin{align}
 \cfoutcome & = \int   P(Y=1|T=1,x,z, \theta)   P(z|R=r_j, T_{H_j} =0, x,\theta)\prob{\parameters | \dataset} dz\diff{\parameters}
 \end{align}
-The value of the first factor in the integrand of the expression above is provided by the model in Equation~\ref{eq:defendantmodel}, while the second is sampled by MCMC, as explained above.
+%The value of the first factor in the integrand of the expression above is provided by the model in Equation~\ref{eq:defendantmodel}, while the second is sampled by MCMC, as explained above.
 %
+
+Note that, for all data entries other than the ones with $\decision_\human = 0$ and $\decision_\machine = 1$, we trivially have 
+\begin{equation}
+	\cfoutcome = \outcome
+\end{equation}
+where \outcome is the outcome recorded in the dataset \dataset.
+
+
+
+\spara{Implementation} 
 The result is computed numerically over the sample.
 %\begin{equation}
 %	\cfoutcome \approxeq \sum_{\parameters\in\sample}\prob{\outcome = 1 | \obsFeatures = \obsFeaturesValue, \decision = 1, \alpha, \beta_{_\obsFeatures}, \beta_{_\unobservable}, \unobservable} \label{eq:expandcf}
@@ -237,23 +254,6 @@ The result is computed numerically over the sample.
 	}\prob{\outcome = 1 |  \decision = 1, \obsFeaturesValue, z, \theta} \label{eq:expandcf}
 \end{equation}
 where the sums are taken over samples of $\parameters$ and $z$ obtained from their respective posteriors.
-Note that, for all data entries other than the ones with $\decision_\human = 0$ and $\decision_\machine = 1$, we trivially have 
-\begin{equation}
-	\cfoutcome = \outcome
-\end{equation}
-where \outcome is the outcome recorded in the dataset \dataset.
-
-\spara{Evaluation of decisions}
-Expression~\ref{eq:expandcf} gives us a direct way to evaluate the outcome of decisions $\decision_\machine$ for any data entry for which $\decision_\human = 0$.
-%
-Note though that, unlike entries for which $\decision_\human = 1$ that takes integer values $\{0, 1\}$, \cfoutcome may take fractional values $\cfoutcome \in [0, 1]$.
-
-
-Having obtained outcome estimates for data entries with $\decision_\human = 0$ and $\decision_\machine = 1$, it is now straightforward to obtain an estimate for the failure rate $\failurerate$ of decision maker \machine: it is simply the average value of \cfoutcome over all data entries.
-%
-Our approach is summarized in Figure~\ref{fig:approach}.
-
-\spara{Implementation} 
 %
 In practice, we use the MCMC functionality of Stan\footnote{\url{https://mc-stan.org/}} to obtain a sample \sample of this posterior distribution, where each element of \sample contains one instance of parameters \parameters.
 %
@@ -326,6 +326,16 @@ Sample \sample can now be used to compute various probabilistic quantities of in
 %% 
 %The Gaussians were restricted to the positive real numbers and both had mean $0$ and variance $\tau^2=1$ -- other values were tested but observed to have no effect.

+\spara{Evaluation of decisions}
+Expression~\ref{eq:expandcf} gives us a direct way to evaluate the outcome of decisions $\decision_\machine$ for any data entry for which $\decision_\human = 0$.
+%
+Note though that, unlike entries for which $\decision_\human = 1$ that takes integer values $\{0, 1\}$, \cfoutcome may take fractional values $\cfoutcome \in [0, 1]$.
+
+
+Having obtained outcome estimates for data entries with $\decision_\human = 0$ and $\decision_\machine = 1$, it is now straightforward to obtain an estimate for the failure rate $\failurerate$ of decision maker \machine: it is simply the average value of \cfoutcome over all data entries.
+%
+Our approach is summarized in Figure~\ref{fig:approach}.
+

 \subsection{Decision makers built on counterfactuals}
 So far in our discussion, we have focused on the task of evaluating the performance of a decision-maker \machine that is specified as input to the task.