Skip to content
Snippets Groups Projects
Commit 5e6026a4 authored by Antti Hyttinen's avatar Antti Hyttinen
Browse files
parents 888ca3e6 0633ad0a
No related branches found
No related tags found
No related merge requests found
\section{The Selective Labels Framework}
We begin by formalizing the selective labels setting.
Let binary variable $T$ denote a decision, where $T=1$ is interpreted as a positive decision. The binary variable $Y$ measures some outcome that is affected by the decision $T$. The selective labels issue is that in the observed data when $T=0$ then deterministically\footnote{Alternatively, we could see it as not observing the value of $Y$ when $T=0$ inducing a problem of selection bias.\acomment{Want to keep this interpretation in the footnote not to interfere with the main interpretation.}} $Y=1$.
For example, consider that
$T$ denotes a decision to jail $T=0$ or bail $T=1$.
Outcome $Y=0$ then marks that a defendant offended and $Y=1$ the defendant did not. When a defendant is jailed $T=0$ the defendant obviously did not violate the bail and thus always $Y=1$.
\subsection{Decision Makers}
A decision maker $D(r)$ makes the decision $T$ based on the characteristics of the subject. We assume the decision maker gets input leniency $r$, which defines what percentage of subjects the decision maker makes a positive decision for. A decision maker may be human or a machine learning system. They seek to predict outcome $Y$ based on what they know and then decide $T$ based on this prediction: a negative decision $T=0$ is prefered for subjects predicted to have negative outcome $Y=0$ and a positive decision $T=1$ when the outcome is predicted as positive $Y=1$.
% We especially consider machine learning system that need to use similar data as used for the evaluation; they also need to take into account the selective labels issue.
In the bail or jail example, a decision maker seeks to jail $T=0$ all dangerous defendants that would violate their bail ($Y=0$), but let out the defendants that will not violate their bail. The leniency $r$ refers to the portion of bail decisions.
The difference between the decision makers in the data and $D(r)$ is that usually we cannot observe all the information that has been available to the decision makers in the data.
% In addition, we usually cannot observe the full decision-making process of the decider in the data step contrary to the decider in the modelling step.
With unobservables we refer to some latent, usually non-written information regarding a certain outcome that is only available to the decision-maker. For example, a judge in court can observe the defendant's behaviour and level of remorse which might be indicative of bail violation. We denote the latent information regarding a person's guilt with variable \unobservable.
\subsection{Evaluating Decision Makers}
The goodness of a decision maker can be examined as follows.
%Acceptance rate (AR) is the number of positive decisions ($T=1$) divided by the number of all decisions.
%DO WE NEED ACCEPTANCE RATE ANY MORE
Failure rate (FR) is the number of undesired outcomes ($Y=0$) divided by the number of all decisions.
% One special characteristic of FR in this setting is that a failure can only occur with a positive decision ($T=1$).
%That means that a failure rate of zero can be achieved just by not giving any positive decisions but that is not the ultimate goal.
A good decision maker achieves as low failure rate FR as possible, for any leniency level.
However, the data we have does not directly provide a way to evaluate FR. If a decision maker decides $T=1$ for a subject that had $T=0$ in the data, the outcome $Y$ recorded in the data is based on the decision $T=0$ and hence $Y=1$ regardless of the decision taken by $D$. The number of negative outcomes $Y=0$ for these decision needs to be calculated in some non-trivial way.
In the example situation the difficulty is occurs when a decision maker decides to bail $T=0$ a defendant that has been jailed in the data, we cannot directly observe whether the defendant was about to offend or not.
Therefore, the aim is here to give an estimate of the FR at any given AR for any decision maker $D$, formalized as follows:
\begin{problem}
Given selectively labeled data, and a decision maker $D(r)$, give an estimate of the failure rate FR for any leniency $r$.
\end{problem}
\noindent
The estimate of the evaluator should also be accurate for all levels of leniency.
This estimate is vital in the employment machine learning and AI systems to every day use.
% Given the selective labeling of data and the latent confounders present, our goal is to create an evaluator module that can output a reliable estimate of a given decider module's performance. We use acceptance rate and failure rate as measures against which we compare our evaluators because they have direct and easily understandable counterparts in the real world / applicable domains. The evaluator module should be able to accurately estimate the failure rate for all levels of leniency and all data sets.
%The "eventual goal" is to create such an evaluator module that it can outperform (have a lower failure on all levels of acceptance rate) the deciders in the data generating process. The problem is of course comparing the performance of the deciders. We try to address that.
%\setcounter{section}{1}
%\section{ Framework ( by Riku)}
%In this section, we define the key terms used in this paper, present the modular framework for selective labels problems and state our problem.
%Antti: In conference papers we do not waste space for such in this paper stuff!! In journals one can do that.
%\begin{itemize}
%\item Definitions \\
% In this paper we apply our approach on binary (positive / negative) outcomes, but our approach is readily extendable to accompany continuous or categorical responses. Then we could use e.g. sum of squared errors or other appropriate metrics as the measure for good performance.
% With positive or negative outcomes we refer to...
%\begin{itemize}
%\item Failure rate
% \begin{itemize}
% \item %Failure rate (FR) is defined as the ratio of undesired outcomes to given decisions. One special characteristic of FR in this setting is that a failure can only occur with a positive decision / we can only observe the outcome when the corresponding decision is positive.
% \item %That means that a failure rate of zero can be achieved just by not giving any positive decisions but that is not the ultimate goal. (rather about finding a good balance. > Resource issues in prisons etc.)
%\end{itemize}
% \item Acceptance rate
%\begin{itemize}
% \item %Acceptance rate (AR) or leniency is defined as the ratio of positive decisions to all decisions that a decision-maker will give. (Semantically, what is the difference between AR and leniency? AR is always computable, leniency doesn't manifest.) A: a good question! can we get ir of one
%\item
% In some settings, (justice, medicine) people might want to find out if X\% are accepted what is the resulting failure rate, and what would be the highest acceptance rate to have to have the failure rate at an acceptable level.
% \item We want to know the trade-off between acceptances and failure rate.
% \item %Lakkaraju mentioned the problem in the data that judges which have a higher leniency have labeled a larger portion of the data (which might results in bias).
% \item As mentioned earlier, these differences in AR might lead to subjects getting different decisions while haven the same observable and unobservable characteristics.
%\end{itemize}
% \item % Some deciders might have an incentive for positive decisions if it can mean e.g. savings. Judge makes saving by not jailing a defendant. Doctor makes savings by not assigning patient for a higher intensity care. (move to motivation?)
%\item
%\end{itemize}
%\begin{itemize}
%\item Modules \\
% We separated steps that modify the data into separate modules to formally define how they work. With observational data sets, the data goes through only a modelling step and an evaluation step. With synthetic data, we also need to define a data generating step. We call the blocks doing these steps {\it modules}. To fully define a module, one must define its input and output. Modules have different functions, inputs and outputs. Modules are interchangeable with a similar type of module if they share the same input and output (You can change decider module of type A with decider module of type B). With this modular framework we achieve a unified way of presenting the key differences in different settings.
% \begin{itemize}
% \item Decider modules
%\begin{itemize}
% \item In general, the decider module assigns predictions to the observations based on some information.
% \item %The information available to a decision-maker in the decider module includes observable and -- possibly -- unobservable features, denoted with X and Z respectively.
% \item %The predictions given by a decider module can be relative or absolute. With relative predictions we refer to that a decider module can give out a ranking of the subjects based on their predicted tendency towards an outcome. Absolute predictions can be either binary or continuous in nature. For example, they can correspond to yes or no decisions or to a probability value.
% \item %Inner workings (procedure/algorithm) of the module may or may not be known. In observational data sets, the mechanism or the decider which has labeled the data is usually unknown. E.g. we do not -- eactly -- know how judges obtain a decision. Conversely, in synthetic data sets the procedure creating the decisions is fully known because we define the process.
% \item The decider (module) in the data step has unobservable information available for making the decisions.
% \item %The behaviour of the decider module in the data generating step can be defined in many ways. We have used both the method presented by Lakkaraju et al. and two methods of our own. We created these two deciders to remove the interdependencies of the decisions made by the decider Lakkaraju et al. presented.
% \item \end{itemize}
% \item Evaluator modules
%\begin{itemize}
% \item Evaluator module gets the decisions, observable features of the subject and predictions made by the deciders and outputs an estimate of...
% \item The evaluator module outputs a reliable estimate of a decider module's performance. The estimate is created by the evaluator module and it should
% \begin{itemize}
% % \item be precise and unbiased
% \item have a low variance
% \item be as robust as possible to slight changes in the data generation.
% \end{itemize}
% \item The estimate of the evaluator should also be accurate for all levels of leniency.
%\end{itemize}
% \end{itemize}
%\item Example: in observational data sets, the deciders have already made decision concerning the subjects and we have a selectively labeled data set available. In the modular framework we refer to the actions of the human labelers as a decider module which has access to latent information.
%\item Problem formulation \\
%The "eventual goal" is to create such a decider module that it can outperform (have a lower failure on all levels of acceptance rate) the deciders in the data generating process. The problem is of course comparing the performance of the deciders. We try to address that.
%(It's important to somehow keep these two different goals separate.)
%We show that our method is robust against violations and modifications in the data generating mechanisms.
%\end{itemize}
\ No newline at end of file
This diff is collapsed.
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment