From 9ed0692fc4b7ebdbdeda054c56da1e8a710b3fc8 Mon Sep 17 00:00:00 2001 From: Antti Hyttinen <ajhyttin@gmail.com> Date: Tue, 7 Jan 2020 13:16:19 +0200 Subject: [PATCH] ... --- paper/imputation.tex | 32 +++++++++++++++----------------- 1 file changed, 15 insertions(+), 17 deletions(-) diff --git a/paper/imputation.tex b/paper/imputation.tex index 4bd90ad..67b3618 100644 --- a/paper/imputation.tex +++ b/paper/imputation.tex @@ -96,9 +96,10 @@ In the formalism of causal inference~\cite{pearl2010introduction}, we wish to ev \end{align} The expression above concerns a specific entry in the dataset with features $\obsFeatures=x$, for which $\human_j$ made a decision $\decision_{\human_j} = 0$. % -It is read as follows: conditional on what we know from the data entry ($\obsFeatures = \obsFeaturesValue$, $\decision_{\human_j} = 0$) as well as from the entire dataset \dataset -, the probability that the outcome would have been positive ($\outcome = 1$) %in the hypothetical case %we had intervened to make -had the decision been positive ($\decision_{H_j} = 1$). +%It is read as follows: conditional on what we know from the data entry ($\obsFeatures = \obsFeaturesValue$, $\decision_{\human_j} = 0$) as well as from the entire dataset \dataset +%, the probability that the outcome would have been positive ($\outcome = 1$) %in the hypothetical case %we had intervened to make +%had the decision been positive ($\decision_{H_j} = 1$). +It expresses the probability that the outcome would have been positive ($\outcome = 1$) had the decision been positive ($\decision_{H_j} = 1$), conditional on what we know from the data entry ($\obsFeatures = \obsFeaturesValue$, $\decision_{\human_j} = 0$) as well as from the entire dataset \dataset. % Notice that the presence of \dataset in the conditional part of~\ref{eq:counterfactual} gives us more information about the data entry compared to the entry-specific quantities ($\obsFeatures = \obsFeaturesValue$, $\decision_{\human_j} = 0$) and is thus not redundant. % @@ -204,7 +205,7 @@ In particular, we consider the full probabilistic model defined in Equations \re % %Notice that by ``parameters'' here we refer to all quantities that are not considered as known with certainty from the input, and so parameters include unobserved features \unobservable. % -We use prior distributions given in Appendix~X�to ensure the identifiability of the parameters. +We use prior distributions given in Appendix~X�to ensure the identifiability of the parameters. %Formally, we obtain the parameter posterior by %\begin{equation} % \prob{\parameters | \dataset} = \frac{\prob{\dataset | \parameters} \prob{\parameters}}{\prob{\dataset}} . @@ -213,11 +214,11 @@ We use prior distributions given in Appendix~X \spara{Computing counterfactuals} For the model defined above, the counterfactual $\hat{Y}$ can be computed by the approach of Pearl. -As shown in Appendix , for fully defined model (fixed parameters) $\hat{Y}$ can be determined by the following expression: +For fully defined model (fixed parameters) $\hat{Y}$ can be determined by the following expression: \begin{align} \cfoutcome & = \int P(Y=1|T=1,x,z) P(z|R=r_j, T_{H_j} =0, x)dz \end{align} -as derived in detail in Appendix~X. +as derived in detail in Appendix~X. In essence, we determine the distribution of the unobserved features $Z$ using the decision, observed features, and the leniency of the employed decision maker, and then determine the distribution of $Y$ conditional on all features. Having obtained a posterior probability distribution for parameters \parameters: % in parameter space \parameterSpace, we can now expand expression~(\ref{eq:counterfactual}) as follows. %\begin{align} @@ -228,22 +229,18 @@ Having obtained a posterior probability distribution for parameters \parameters: %%Antti dont want to put specific parameters since P(z|...) depends on so many? %\end{align} -\begin{align} -\cfoutcome & = \int P(Y=1|T=1,x,z, \theta) P(z|R=r_j, T_{H_j} =0, x,\theta)\prob{\parameters | \dataset} dz\diff{\parameters} -\end{align} -%The value of the first factor in the integrand of the expression above is provided by the model in Equation~\ref{eq:defendantmodel}, while the second is sampled by MCMC, as explained above. -% - -Note that, for all data entries other than the ones with $\decision_\human = 0$ and $\decision_\machine = 1$, we trivially have \begin{equation} - \cfoutcome = \outcome +\cfoutcome = \int P(Y=1|T=1,x,z, \theta) P(z|R=r_j, T_{H_j} =0, x,\theta)\prob{\parameters | \dataset} dz\diff{\parameters} \label{eq:theposterior} \end{equation} +%The value of the first factor in the integrand of the expression above is provided by the model in Equation~\ref{eq:defendantmodel}, while the second is sampled by MCMC, as explained above. +% +Note that, for all data entries other than the ones with $\decision_{\human_j} = 0$ and $\decision_\machine = 1$, we trivially have \cfoutcome = \outcome where \outcome is the outcome recorded in the dataset \dataset. \spara{Implementation} -The result is computed numerically over the sample. +The result of Equation~\ref{eq:theposterior} is computed numerically: %\begin{equation} % \cfoutcome \approxeq \sum_{\parameters\in\sample}\prob{\outcome = 1 | \obsFeatures = \obsFeaturesValue, \decision = 1, \alpha, \beta_{_\obsFeatures}, \beta_{_\unobservable}, \unobservable} \label{eq:expandcf} %\end{equation} @@ -255,9 +252,10 @@ The result is computed numerically over the sample. \end{equation} where the sums are taken over samples of $\parameters$ and $z$ obtained from their respective posteriors. % -In practice, we use the MCMC functionality of Stan\footnote{\url{https://mc-stan.org/}} to obtain a sample \sample of this posterior distribution, where each element of \sample contains one instance of parameters \parameters. +In practice, we use the MCMC functionality of Stan\footnote{\url{https://mc-stan.org/}}. +% to obtain a sample \sample of this posterior distribution, where each element of \sample contains one instance of parameters \parameters. % -Sample \sample can now be used to compute various probabilistic quantities of interest, including a (posterior) distribution of \unobservable for each entry in dataset \dataset. +%Sample \sample can now be used to compute various probabilistic quantities of interest, including a (posterior) distribution of \unobservable for each entry in dataset \dataset. %Original by Michael and Riku %\subsection{Model definition} \label{sec:model_definition} -- GitLab