Pass over imputation

c7f0a5c2 · Michael Mathioudakis · 7f302205 · c7f0a5c2 · c7f0a5c2
Commit c7f0a5c2 authored 5 years ago by Michael Mathioudakis
--- a/paper/experiments.tex
+++ b/paper/experiments.tex
@@ -2,6 +2,7 @@
 % The above command helps compiling in TexShop on a MAc. Hitting typeset complies sl.tex directly instead of producing an error here.

 \section{Experiments}
+\label{sec:experiments}

 \todo{Michael}{We should make sure that what we describe in the experiments follows what we have in the technical part of the paper, without describing the technical details again. This may require a light re-write of the experiments. See also other notes below.}


--- a/paper/imputation.tex
+++ b/paper/imputation.tex
@@ -4,32 +4,6 @@
 \section{Counterfactual-Based Imputation For Selective Labels}
 \label{sec:imputation}

-\begin{figure*}
-\begin{center}
-%\includegraphics[height=2in]{img/setting}
-\includegraphics[height=2in]{img/fig3_antti}
-
-\end{center}
-\caption{
-The figure summarizes our approach for counterfactual-based imputation when evaluating decision maker $\machine$.
-%
-Negative decisions ($\decision = 0$) by decision maker $\machine$  are evaluated as successful ($\cfoutcome = 1$),  shown with dashed arrows. 
-%
-Positive decisions ($\decision = 1$) by decision maker $M$ for which the decision in the data was also positive ($\decision = 1$) are evaluated according to the outcome $\outcome$ in the data, as marked by the solid arrow. 
-%
-For the remaining cases (second and third subjects), the evaluated outcomes $\cfoutcome$ are based on our counterfactual imputation technique.  The failure rate of the  decision maker $\machine$ is here $2.7/7=38.6 \%$.
-\acomment{Riku double check the last sentence!} \rcomment{It is correct. Should the corresponding leniency level $(3/7\approx0.43)$ be mentioned here?}
-% For negative decisions by decision maker $\human$ ($\decision = 0$), the outcome is evaluated according to the table of imputed outcomes $\hat\outcome$ (dotted arrows).
-%
-%Imputed outcomes are produced from the dataset outcomes by making a counterfactual prediction for those cases where $\human$ had made a negative decision (solid arrows). \acomment{Need to get rid of $\outcome_\human$ etc. Lets have data, decision by M and then $\hat\outcome$. We do not need the second column for Y, also the text does not support it.
-%The figure summarizes our approach for counterfactual-based imputation.
-%%
-%Negative decisions by decision maker $M$ ($\decision = 0$) are evaluated as successful ($\outcome = 1$) (shown with dashed arrows). For negative decisions by decision maker $\human$ ($\decision = 0$), the outcome is evaluated according to the table of imputed outcomes $\hat\outcome$ (dotted arrows).
-%%
-%Imputed outcomes are produced from the dataset outcomes by making a counterfactual prediction for those cases where $\human$ had made a negative decision (solid arrows). \acomment{Need to get rid of $\outcome_\human$ etc. Lets have data, decision by M and then $\hat\outcome$. We do not need the second column for Y, also the text does not support it.
-}
-\label{fig:approach}
-\end{figure*}

 If decision maker \machine makes a positive decision for a case where decision maker \human had made a negative decision, how can we infer the outcome \outcome in the hypothetical case where \machine's decision had been followed? 
 %
@@ -59,13 +33,13 @@ However, if the judge was both lenient and precise -- i.e., was able to make tho
 %
 And therefore, if a positive decision were made, {\it the above reasoning suggests that a negative outcome is more likely than what would have been predicted based alone on the recorded features \obsFeatures of released defendants}.

-Our approach for evaluating decision of $\machine$ on cases where $\human$ made a negative decision unfolds over three steps: first, we learn a causal model over the dataset; then, we compute counterfactuals to predict unobserved outcomes; and finally, we use these predictions to evaluate a set of decisions.
+Our approach for evaluating decision of $\machine$ on cases where $\human$ made a negative decision unfolds over three steps: first, we learn a causal model over the dataset; then, we compute counterfactuals to predict unobserved outcomes; and finally, we use these predictions to evaluate a set of decisions by \machine.




-\note{Michael}{Actually, the paragraph above describes a scenario where {\it labeled outcomes} and possibly {\it contraction} would fail. Specifically, create cases where:
-(i) Z has much larger coefficient than X, and (ii) the judge is good (the two logistic functions for judge decision and outcome are the same), and (iii) the machine is trained on labeled outcomes. The machine will see that the outcome is successful regardless of X, because Z will dominate the positive (and negative) decisions. So it will learn that everyone can be released. Labeled outcomes will evaluate the machine as good -- but our approach will uncover its true performance.}
+% \note{Michael}{Actually, the paragraph above describes a scenario where {\it labeled outcomes} and possibly {\it contraction} would fail. Specifically, create cases where:
+% (i) Z has much larger coefficient than X, and (ii) the judge is good (the two logistic functions for judge decision and outcome are the same), and (iii) the machine is trained on labeled outcomes. The machine will see that the outcome is successful regardless of X, because Z will dominate the positive (and negative) decisions. So it will learn that everyone can be released. Labeled outcomes will evaluate the machine as good -- but our approach will uncover its true performance.}



@@ -75,11 +49,23 @@ Our approach for evaluating decision of $\machine$ on cases where $\human$ made

 Recall from Section~\ref{sec:setting} that Figure~\ref{fig:causalmodel} provides the structure of causal relationships for quantities of interest.
 We use the following causal model over this structure, building on what is used by Lakkaraju et al.~\cite{lakkaraju2017selective} and others~\cite{mccandless2007bayesian,jung2018algorithmic}. 
-
-First, we assume the observed and unobserved feature vectors $\obsFeatures,\unobservable$ representing risks can be condensed into one dimension risk factors~\cite{mccandless2007bayesian}, for example by using propensity scores~\cite{rosenbaum1983central,austin2011introduction}. Motivated by the central limit theorem we model these risk factors with Gaussian distributions. Furthermore, since $\unobservable$ is unobserved we can assume its variance to be 1 without loss of generality, thus $\unobservable \sim N(0,1)$(Any deviation from this can be achieved by adjusting intercepts and coefficients in the following).
+%
+First, we assume that the
+% the observed feature vectors \obsFeatures and 
+unobserved features \unobservable can be modeled as a one-dimensional risk factor~\cite{mccandless2007bayesian}, for example by using propensity scores~\cite{rosenbaum1983central,austin2011introduction}. 
+%
+Moreover, we are also going to present our modeling approach for the case of a single observed feature \obsFeatures -- this is done only for simplicity of presentation, as it is straightforward to extend the model to the case of multiple features \obsFeatures, as we do in the experiments (Section~\ref{sec:experiments}).
+%
+Motivated by the central limit theorem we model both \obsFeatures and \unobservable with Gaussian distributions.
+%
+Furthermore, since $\unobservable$ is unobserved we can assume its variance to be 1 without loss of generality, thus $\unobservable \sim N(0,1)$. 
+%
+% (Any deviation from this can be achieved by adjusting intercepts and coefficients in the following).


-According to the selective labels setting we have $Y=1$ whenever $T=0$. When $T=1$, subject behaviour is subject to a logistic regression model over the features $\obsFeatures$ and $\unobservable$:
+In the setting we consider (Section~\ref{sec:}), a negative decision $T=0$ leads to successful outcome $Y=1$.
+%
+When $T=1$, the probability of success is given by a logistic regression model over the features $\obsFeatures$ and $\unobservable$:
 \begin{eqnarray}
 \prob{\outcome=1~|~\decision, \obsFeaturesValue, \unobservableValue} & =&
 	\begin{cases}
@@ -88,19 +74,22 @@ According to the selective labels setting we have $Y=1$ whenever $T=0$. When $T=
 	\end{cases}
 \end{eqnarray}
 Here \invlogit is the standard logistic function.
-
-Since the decisions are ultimately based on expected behaviour, we model the decisions in the data similarly according to a logistic regression over the features:
+%
+% Since the decisions are ultimately based on expected behaviour, 
+We model the decisions in the data similarly according to a logistic regression over the features:
 \begin{equation}
 \prob{\decision = 1~|~\judgeValue,\obsFeaturesValue, \unobservableValue}  =  \invlogit(\alpha_\judgeValue + \gamma_\obsFeatures \obsFeaturesValue + \gamma_\unobservable \unobservableValue 
 )  \label{eq:judgemodel} 
 \end{equation}%
-Note that we are here making the simplifying assumption that coefficients $\gamma_\obsFeatures,\gamma_\unobservable$ are the same for all  $\human_\judgeValue$, but decision makers are allowed to differ in intercept $\alpha_\judgeValue$.
-Parameter $\alpha_{\judgeValue}$ controls the leniency of a decision maker $\human_\judgeValue \in \human$.  
+%
+Note that we are making the simplifying assumption that coefficients $\gamma_\obsFeatures,\gamma_\unobservable$ are the same for all  $\human_\judgeValue$, but decision makers are allowed to differ in intercept $\alpha_\judgeValue$.
+%
+Parameter $\alpha_{\judgeValue}$ controls the leniency of a decision maker $\human_\judgeValue \in \humanset$.  


 We take a Bayesian approach to learn the model over the dataset \dataset.
 %
-In particular, we consider the full probabilistic model defined in Equations \ref{eq:defendantmodel} -- \ref{eq:judgemodel} and obtain the posterior distribution of its parameters $\parameters = \{ \alpha_\outcome, \beta_\obsFeatures, \beta_\unobservable,  \gamma_\obsFeatures, \gamma_\unobservable\} \cup \bigcup_{\human_\judgeValue \in \human} \{\alpha_\judgeValue\}$, which includes intercepts $\alpha_\judgeValue$ for all $\human_\judgeValue$ employed in the data. 
+In particular, we consider the full probabilistic model defined in Equations \ref{eq:defendantmodel} and \ref{eq:judgemodel} and obtain the posterior distribution of its parameters $\parameters = \{ \alpha_\outcome, \beta_\obsFeatures, \beta_\unobservable,  \gamma_\obsFeatures, \gamma_\unobservable\} \cup \bigcup_{\human_\judgeValue \in \human} \{\alpha_\judgeValue\}$, which includes intercepts $\alpha_\judgeValue$ for all $\human_\judgeValue$ employed in the data. 
 We use prior distributions given in Appendix~\ref{sec:priors} to ensure the identifiability of the parameters.


@@ -108,15 +97,15 @@ We use prior distributions given in Appendix~\ref{sec:priors} to ensure the iden

 We remind that the goal is to provide a solution to Problem~\ref{problem:the} -- and, to do that, we wish to address those cases where $\machine$ decides $\decision = 1$ while the data has a negative decision $\decision = 0$, where evaluation cannot be performed directly.
 %
-In other words, we wish to answer a `what-if' question: for each specific case where \human decided $\decision = 0$, what if we had intervened to alter the decision to $\decision = 1$?
-%
+In other words, we wish to answer a `what-if' question: for each specific case where a decision maker $\human_\judgeValue$ decided $\decision = 0$, what if we had intervened to alter the decision to $\decision = 1$?
+
 In the formalism of causal inference~\cite{pearl2010introduction}, we wish to evaluate the counterfactual expectation
 \begin{align}
 	\cfoutcome = & \expectss{\decision \leftarrow 1}{\outcome~| \obsFeaturesValue, \judgeValue, \decision  = 0; \dataset} 
 	 \label{eq:counterfactual}
 \end{align}
 The expression above concerns a specific entry in the dataset with features $\obsFeatures=x$, for which decision maker $\human_\judgeValue$ made a decision $\decision = 0$.
-
+%
 It expresses the probability that the outcome would have been positive ($\outcome = 1$) had the decision been positive  ($\decision = 1$), conditional on what we know from the data entry ($\obsFeatures = \obsFeaturesValue$, $\decision = 0$, $\judge = \judgeValue$) as well as from the entire dataset \dataset.
 %
 Notice that the presence of \dataset in the conditional part of~\ref{eq:counterfactual} gives us more information about the data entry compared to the entry-specific quantities 
@@ -153,7 +142,7 @@ The result of Equation~\ref{eq:theposterior}  can be computed numerically:
 \end{equation}
 where the sums are taken over $N$ samples of $\parameters$ and $\unobservable$ obtained from their respective posteriors.
 %
-In practice, we use the MCMC functionality of Stan\footnote{\url{https://mc-stan.org/}}.
+In practice, we use the MCMC functionality of Stan\footnote{\url{https://mc-stan.org/}} to obtain these samples.


 \subsection{Evaluating Decision Makers}
@@ -166,6 +155,32 @@ Having obtained outcome estimates for all data entries, it is now straightforwar
 %
 Our approach is summarized in Figure~\ref{fig:approach}.

+\begin{figure}[t!]
+\begin{center}
+%\includegraphics[height=2in]{img/setting}
+\includegraphics[width=0.95\columnwidth]{img/fig3_antti}
+\end{center}
+\caption{
+The figure summarizes our approach for counterfactual-based imputation when evaluating decision maker $\machine$.
+%
+Negative decisions ($\decision = 0$) by decision maker $\machine$  are evaluated as successful ($\cfoutcome = 1$),  shown with dashed arrows. 
+%
+Positive decisions ($\decision = 1$) by decision maker $M$ for which the decision in the data was also positive ($\decision = 1$) are evaluated according to the outcome $\outcome$ in the data, as marked by the solid arrow. 
+%
+For the remaining cases (second and third), the evaluated outcomes $\cfoutcome$ are based on our counterfactual imputation technique.  The failure rate of the  decision maker $\machine$ is $2.7/7=38.6 \%$ here.
+% \acomment{Riku double check the last sentence!} \rcomment{It is correct. Should the corresponding leniency level $(3/7\approx0.43)$ be mentioned here?}
+% For negative decisions by decision maker $\human$ ($\decision = 0$), the outcome is evaluated according to the table of imputed outcomes $\hat\outcome$ (dotted arrows).
+%
+%Imputed outcomes are produced from the dataset outcomes by making a counterfactual prediction for those cases where $\human$ had made a negative decision (solid arrows). \acomment{Need to get rid of $\outcome_\human$ etc. Lets have data, decision by M and then $\hat\outcome$. We do not need the second column for Y, also the text does not support it.
+%The figure summarizes our approach for counterfactual-based imputation.
+%%
+%Negative decisions by decision maker $M$ ($\decision = 0$) are evaluated as successful ($\outcome = 1$) (shown with dashed arrows). For negative decisions by decision maker $\human$ ($\decision = 0$), the outcome is evaluated according to the table of imputed outcomes $\hat\outcome$ (dotted arrows).
+%%
+%Imputed outcomes are produced from the dataset outcomes by making a counterfactual prediction for those cases where $\human$ had made a negative decision (solid arrows). \acomment{Need to get rid of $\outcome_\human$ etc. Lets have data, decision by M and then $\hat\outcome$. We do not need the second column for Y, also the text does not support it.
+}
+\label{fig:approach}
+\end{figure}
+

 %\subsection{Decision makers built on counterfactuals}
 %So far in our discussion, we have focused on the task of evaluating the performance of a decision-maker \machine that is specified as input to the task.