diff --git a/paper/sl.tex b/paper/sl.tex index a0d982080903fe7b665aff941ea4ff6a4c8adebe..aea76c9617c9d53ca8cade9087ad63141bf26b55 100755 --- a/paper/sl.tex +++ b/paper/sl.tex @@ -214,38 +214,6 @@ This estimate is vital in the employment machine learning and AI systems to ever %The "eventual goal" is to create such an evaluator module that it can outperform (have a lower failure on all levels of acceptance rate) the deciders in the data generating process. The problem is of course comparing the performance of the deciders. We try to address that. -\subsection{Causal Modeling} - -\begin{figure} - \begin{tikzpicture}[->,>=stealth',node distance=1.5cm, semithick] - - \tikzstyle{every state}=[fill=none,draw=black,text=black] - - \node[state] (R) {$R$}; - \node[state] (X) [right of=R] {$X$}; - \node[state] (T) [below of=X] {$T$}; - \node[state] (Z) [rectangle, right of=X] {$Z$}; - \node[state] (Y) [below of=Z] {$Y$}; - - \path (R) edge (T) - (X) edge (T) - edge (Y) - (Z) edge (T) - edge (Y) - (T) edge (Y); -\end{tikzpicture} -\caption{ $R$ leniency of the decision maker, $T$ is a binary decision, $Y$ is the outcome that is selectively labled. Background features $X$ for a subject affect the decision and the outcome. Additional background features $Z$ are visible only to the decision maker in use. }\label{fig:model} -\end{figure} - -We model the selective labels setting as summarized by Figure~\ref{fig:model}\cite{lakkaraju2017selective}. - -The outcome $Y$ is affected by the observed background factors $X$, unobserved background factors $Z$. These background factors also influence the decision $T$ taken in the data. Hence $Z$ includes information that was used by the decision maker in the data but that is not available to us as observations. - In addition, there may be other background factors that affect $Y$ but not $T$. In addition, we assume the decision is affected by some observed leniency level $R \in [0,1]$ of the decision maker. - - -We use a propensity score framework to model $X$ and $Z$: they are assumed continuous Gaussian variables, with the interpretation that they represent summarized risk factors such that higher values denote higher risk for a negative outcome ($Y=0$). Hence the Gaussianity assumption here is motivated by the central limit theorem. - -\acomment{Not sure if this is good to discuss here or in the next section: if we would like the next section be full of our contributions and not lakkarajus, we should place it here.} %\setcounter{section}{1} @@ -316,15 +284,55 @@ We use a propensity score framework to model $X$ and $Z$: they are assumed conti \section{Counterfactual-Based Imputation For Selective Labels} -\acomment{This chapter should be our contributions. One discuss previous results we build over but one should consider putting them in the previous section.} +%\acomment{This chapter should be our contributions. One discuss previous results we build over but one should consider putting them in the previous section.} + +\subsection{Causal Modeling} + +\begin{figure} + \begin{tikzpicture}[->,>=stealth',node distance=1.5cm, semithick] + + \tikzstyle{every state}=[fill=none,draw=black,text=black] + + \node[state] (R) {$R$}; + \node[state] (X) [right of=R] {$X$}; + \node[state] (T) [below of=X] {$T$}; + \node[state] (Z) [rectangle, right of=X] {$Z$}; + \node[state] (Y) [below of=Z] {$Y$}; + + \path (R) edge (T) + (X) edge (T) + edge (Y) + (Z) edge (T) + edge (Y) + (T) edge (Y); +\end{tikzpicture} +\caption{ $R$ leniency of the decision maker, $T$ is a binary decision, $Y$ is the outcome that is selectively labled. Background features $X$ for a subject affect the decision and the outcome. Additional background features $Z$ are visible only to the decision maker in use. }\label{fig:model} +\end{figure} + +We model the selective labels setting as summarized by Figure~\ref{fig:model}\cite{lakkaraju2017selective}. +The outcome $Y$ is affected by the observed background factors $X$, unobserved background factors $Z$. These background factors also influence the decision $T$ taken in the data. Hence $Z$ includes information that was used by the decision maker in the data but that is not available to us as observations. + In addition, there may be other background factors that affect $Y$ but not $T$. In addition, we assume the decision is affected by some observed leniency level $R \in [0,1]$ of the decision maker. -\acomment{We need to start by noting that with a simple example how we assume this to work. If X indicates a safe subject that is jailed, then we know that (I dont know how this applies to other produces) that Z must have indicated a serious risk. This makes $Y=0$ more likely than what regression on $X$ suggests.} + +We use a propensity score framework to model $X$ and $Z$: they are assumed continuous Gaussian variables, with the interpretation that they represent summarized risk factors such that higher values denote higher risk for a negative outcome ($Y=0$). Hence the Gaussianity assumption here is motivated by the central limit theorem. + +\acomment{Not sure if this is good to discuss here or in the next section: if we would like the next section be full of our contributions and not lakkarajus, we should place it here.} + +\subsection{Imputation} + +%\acomment{We need to start by noting that with a simple example how we assume this to work. If X indicates a safe subject that is jailed, then we know that (I dont know how this applies to other produces) that Z must have indicated a serious risk. This makes $Y=0$ more likely than what regression on $X$ suggests.} done by Riku! \acomment{I do not understand what we are doing from this section. It needs to be described ASAP.} -Our approach is based on the fact that in almost all cases, some information regarding the latent variable is recoverable. For illustration, let us consider defendant $i$ who has been given a negative decision $\decisionValue_i = 0$. If the defendant's private features $\featuresValue_i$ would indicate that this subject would be safe to release, we could easily deduce that the unobservable variable $\unobservableValue_i$ had contained so significant information that the defendant had to be jailed. In an opposite situation, where the features $\featuresValue_i$ clearly imply that the defendant is dangerous and is subsequently jailed, we do not have that much information available on the latent variable. +Our approach is based on the fact that in almost all cases, some information regarding the latent variable is recoverable. For illustration, let us consider defendant $i$ who has been given a negative decision $\decisionValue_i = 0$. If the defendant's private features $\featuresValue_i$ would indicate that this subject would be safe to release, we could easily deduce that the unobservable variable $\unobservableValue_i$ indicated high risk since +%contained so significant information that +the defendant had to be jailed. In turn, this makes $Y=0$ more likely than what would have been predicted based on $\featuresValue_i$ alone. +In an opposite situation, where the features $\featuresValue_i$ clearly imply that the defendant is dangerous and is subsequently jailed, we do not have that much information available on the latent variable. + +\acomment{The above assumes that the decision maker in the data is good and not bad.} + In counterfactual-based imputation we use counterfactual values of the outcome $\outcome_{\decisionValue=1}$ to impute the missing labels. The SCM required to compute the counterfactuals is presented in figure \ref{fig:model}. Using Stan, we model the observed data as \begin{align} \label{eq:data_model} @@ -332,6 +340,8 @@ In counterfactual-based imputation we use counterfactual values of the outcome $ \decision ~|~D, x & \sim \text{Bernoulli}(\invlogit(\alpha_{j} + \beta_{xt} x + \beta_{zt}z)). \\ \nonumber \end{align} +\acomment{Are the coefs really the same in the model?} + That is, we fit one logistic regression model modelling the decisions based on the observable features \features and the identity of the judge using all of the data. The identity of the judge is encoded into the intercept $\alpha_j$. (We use different intercepts for different judges.) We model the observed outcomes with $\decision = 1$ with a separate regression model to learn the parameters: coefficients $\beta_{xy}$ for the observed features, $\beta_{zy}$ for the unobserved features, the sole intercept $\alpha_y$ and the possible value for the latent variable \unobservable. Using the samples from the posterior distribution for all the parameters given by Stan, we can estimate the values of the counterfactuals. The counterfactuals are formally drawn from the posterior predictive distribution @@ -423,7 +433,7 @@ Using Stan, draw $S$ samples of the all parameters from the posterior distributi \caption{Counterfactual-based imputation} \end{algorithm} -\section{Extension To Non-Linearity (2nd priority)} +%\section{Extension To Non-Linearity (2nd priority)} % If X has multiple dimensions or the relationships between the features and the outcomes are clearly non-linear the presented approach can be extended to accomodate non-lineairty. Jung proposed that... Groups... etc etc.