Within 15 pages.

fa1d98e0 · Antti Hyttinen · 7949f069 · fa1d98e0 · fa1d98e0 · fa1d98e0
Commit fa1d98e0 authored 4 years ago by Antti Hyttinen
--- a/DS2020/appendix.tex
+++ b/DS2020/appendix.tex
@@ -32,9 +32,10 @@ where we used $\epsilon_\unobservable=\unobservableValue$ and integrated out $\e

 \section{On the Priors of the Bayesian Model} \label{sec:model_definition}\label{sec:priors}

-The priors for the coefficients $\gamma_\obsFeatures, ~\beta_\obsFeatures, ~\gamma_\unobservable$ and $\beta_\unobservable$ were defined using the gamma-mixture representation of Student's t-distribution with $\nu=6$ degrees of freedom.
+The priors for $\gamma_\obsFeatures, ~\beta_\obsFeatures, ~\gamma_\unobservable$ and $\beta_\unobservable$ were defined using the gamma-mixture representation of Student's t-distribution with $\nu=6$ degrees of freedom.
 %
-The gamma-mixture is obtained by first sampling a precision parameter from $\Gamma$($\nicefrac{\nu}{2},~\nicefrac{\nu}{2}$) and then drawing the coefficient from zero-mean Gaussian with variance equal to the inverse of the sampled precision parameter.
+The gamma-mixture is obtained by first sampling a precision parameter from $\Gamma$($\nicefrac{\nu}{2},~\nicefrac{\nu}{2}$) and then drawing the coefficient from zero-mean Gaussian with that precision.
+%variance equal to the inverse of the sampled precision parameter.
 %The gamma-mixture is obtained by first sampling a precision parameter from Gamma($\nicefrac{\nu}{2},~\nicefrac{\nu}{2}$) distribution.
 %
 %Then the coefficient is drawn from zero-mean Gaussian distribution with variance equal to the inverse of the sampled variance parameter.

--- a/DS2020/experiments.tex
+++ b/DS2020/experiments.tex
@@ -56,6 +56,16 @@ with $b_\obsFeatures = b_\unobservable = 1$.
 %
 Additional noise is added to the outcome of each case via $e_\outcome$, which was drawn from a zero-mean Gaussian distribution with small variance,  $e_\outcome\sim \gaussian{0}{0.1}$. The data set was split in half to training and tests sets, such that each decision maker appears only in one. The evaluated decision maker $\machine$ is trained on the training set while the evaluation is based only on the test set. 

+\begin{figure}[!t]
+%\begin{center}
+%\includegraphics[width=0.95\linewidth,trim={0 0 0 1.0cm},clip]{./img/_deciderH_independent_deciderM_batch_maxR_0_9coefZ1_0_all} 
+\includegraphics[width=0.49\linewidth,trim={1cm 0 2.5cm 1.8cm},clip]{./img/_deciderH_independent_deciderM_batch_maxR_0_9coefZ1_0_all}\quad\includegraphics[width=0.49\linewidth,trim={1cm 0 2.5cm 1.8cm},clip]{./img/_deciderH_independent_deciderM_batch_maxR_0_5coefZ1_0_all_fixed}
+%\end{center}
+\caption{Left: Evaluation of \batch decision maker on data with \independent. Error bars show std. of the \failurerate estimate across 10 datasets. In this basic setting, both our \cfbi and contraction follow the true evaluation curve closely but \cfbi exhibits lower variation.
+Right: Evaluating \batch on data employing \independent and with leniency at most $0.5$. \cfbi offers sensible estimates of the failure rates for all levels of leniency, whereas \contraction only up to leniency $0.5$.
+}
+\label{fig:basic}\label{fig:results_rmax05}
+\end{figure}

 \subsection{Decision Makers} \label{sec:decisionmakers}
 \label{sec:dm_exps}
@@ -112,16 +122,7 @@ Risk scores are computed with a logistic regression model which is trained on th
 %
 For the \independent decision maker \machine, the cumulative distribution function of the risk scores is constructed using the empirical distribution of risk scores of all the observations in the training data.

-\begin{figure}[!t]
-%\begin{center}
-%\includegraphics[width=0.95\linewidth,trim={0 0 0 1.0cm},clip]{./img/_deciderH_independent_deciderM_batch_maxR_0_9coefZ1_0_all} 
-\includegraphics[width=0.49\linewidth,trim={1cm 0 2.5cm 1.8cm},clip]{./img/_deciderH_independent_deciderM_batch_maxR_0_9coefZ1_0_all}\quad\includegraphics[width=0.49\linewidth,trim={1cm 0 2.5cm 1.8cm},clip]{./img/_deciderH_independent_deciderM_batch_maxR_0_5coefZ1_0_all_fixed}
-%\end{center}
-\caption{Left: Evaluation of \batch decision maker on data with \independent. Error bars show std. of the \failurerate estimate across 10 datasets. In this basic setting, both our \cfbi and contraction follow the true evaluation curve closely but \cfbi exhibits lower variation.
-Right: Evaluating \batch on data employing \independent and with leniency at most $0.5$. \cfbi offers sensible estimates of the failure rates for all levels of leniency, whereas \contraction only up to leniency $0.5$.
-}
-\label{fig:basic}\label{fig:results_rmax05}
-\end{figure}
+



@@ -192,7 +193,7 @@ In addition, \cfbi exhibits considerably lower variation than \contraction.

 \begin{figure}[!t]
 \center
-\includegraphics[width=0.7\linewidth,trim={0 0 0 0.25cm},clip]{./img/sl_errors_betaZ1}
+\includegraphics[width=0.65\linewidth,trim={0 0 0 0.25cm},clip]{./img/sl_errors_betaZ1}
 \caption{Mean absolute error (MAE) of estimate w.r.t. true evaluation.
 Error bars show std. of the absolute error over 10 datasets. \cfbi offers robust estimates across different decision makers. The error of \contraction varies within and across different decision makers.}
 \label{fig:results_errors}
@@ -234,13 +235,7 @@ We note that, when we compare with \trueevaluation, the accuracy \cfbi decreases
 This observation is important in the sense that decision makers based on elaborate machine learning techniques, %such as \cfbi, 
 may well allow for evaluation at higher leniency rates than those (often human) employed in the data.

-\begin{figure}[!t]
-\center
-\includegraphics[width=0.70\linewidth,trim={0 0 0 0.25cm},clip]{img/sl_errors_betaZ5}
-\caption{MAE of estimate w.r.t true evaluation when the effect of the unobserved $\unobservable$ is high ($b_\unobservable=5$). The decision quality is poorer, but \cfbi can still evaluate the decisions accurately. \contraction shows higher variance and lower accuracy.}
-\label{fig:highz}
-\end{figure} % RL: Note that only machine decision maker is poorer, not the human.
-%\subsection{Results}
+

 \spara{The effect of unobservables.} 
 % So far in our synthetic experiments, we have assumed that observed and unobserved features are of equal importance in determining possible outcomes, an assumption encoded in the value of parameters $b_\obsFeatures$ and $b_\unobservable$ which were equal to $1$ (see Section~\ref{sec:syntheticsetting}).
@@ -266,18 +261,13 @@ Thus overall, in these synthetic settings our method achieves more accurate resu

 %\newpage

-\begin{figure}[!t]
-\center
-\includegraphics[width=0.6\linewidth,trim={0 0 0 0.25cm},clip]{./img/sl_errors_compas}
-\caption{Results with COMPAS data. Error bars show std. of the absolute \failurerate estimate errors across all levels of leniency w.r.t. true evaluation. \cfbi gives both more accurate and precise estimates despite of the number of judges used.
-% Performance of \contraction gets notably worse as the number of judges increases.
-}
-\label{fig:results_compas}
-\end{figure}
+


 \subsection{COMPAS data}

+
+
 COMPAS %(Correctional Offender Management Profiling for Alternative Sanctions)
 is 
 % Equivant's (formerly Northpointe)
@@ -322,9 +312,22 @@ The built logistic regression model was used in decision maker \machine in the t
 %
 The deployed machine decision maker was defined to release \leniencyValue fraction of the defendants with the lowest probability for negative outcome.

+\begin{figure}[!t]
+\center
+\includegraphics[width=0.65\linewidth,trim={0 0 0 0.25cm},clip]{img/sl_errors_betaZ5}
+\caption{MAE of estimate w.r.t true evaluation when the effect of the unobserved $\unobservable$ is high ($b_\unobservable=5$). The decision quality is poorer, but \cfbi can still evaluate the decisions accurately. \contraction shows higher variance and lower accuracy.}
+\label{fig:highz}
+\end{figure} % RL: Note that only machine decision maker is poorer, not the human.
+%\subsection{Results}

-
-
+\begin{figure}[!t]
+\center
+\includegraphics[width=0.6\linewidth,trim={0 0 0 0.25cm},clip]{./img/sl_errors_compas}
+\caption{Results with COMPAS data. Error bars show std. of the absolute \failurerate estimate errors across all levels of leniency w.r.t. true evaluation. \cfbi gives both more accurate and precise estimates despite of the number of judges used.
+% Performance of \contraction gets notably worse as the number of judges increases.
+}
+\label{fig:results_compas}
+\end{figure}


 Figure~\ref{fig:results_compas}  shows the errors of failure rate of \batch as a function of the number of judges in the data (also batch decision makers).

--- a/DS2020/imputation.tex
+++ b/DS2020/imputation.tex
@@ -82,7 +82,7 @@ unobserved features \unobservable as a (continuous) one-dimensional risk factor.
 Motivated by the central limit theorem, we use a Gaussian distribution for it, and
 since $\unobservable$ is unobserved we can assume without loss of generality that $\unobservable \sim N(0,1)$. %, for example by using propensity scores~\cite{}. 
 %
-For simplicity of presentation, we present here the case of a single observed feature \obsFeatures~-- it is straightforward to extend the model to multiple features \obsFeatures.
+For presentation, we use here a single observed feature \obsFeatures~-- it is straightforward to extend the model to multiple features \obsFeatures.
 %WELL ACTUALLY REASONING IS NOT THIS BUT THE GENERAL POINT
 %ABOUT PROPENSITY SCORE; IT IS BETTER CONFIDENCE ALL OBSERVATIONS TO A SINGLE VARIABLE, 
 % we do this only in the compas section and it is not advertised there
@@ -179,9 +179,9 @@ Having obtained a posterior probability distribution for parameters \parameters
 \judgeValue,
 \decision=0,\obsFeaturesValue,\parameters} \diff{\unobservableValue}\prob{\parameters | \dataset} \diff{\parameters} \label{eq:theposterior}
 \end{equation}
-Note that, for all data entries other than the ones with $\decision = 0$ 
-we trivially have \cfoutcome = \outcome
-where \outcome is the outcome recorded in the dataset \dataset.
+For all data entries other than the ones with $\decision = 0$ 
+we have \cfoutcome = \outcome
+where \outcome is the outcome recorded in \dataset.
 %\spara{Implementation} 
 The result of Equation~\ref{eq:theposterior}  can be computed numerically:
 \begin{equation}

--- a/DS2020/related.tex
+++ b/DS2020/related.tex
@@ -44,7 +44,9 @@ Kallus et al. obtain improved policies from data possibly biased by a baseline p
 %At its core, our task is to answer a `what-if' question, i.e., ``what would the outcome have been if a different decision had been made'' (a counterfactual) -- often mentioned as the `fundamental problem' in causal inference~\cite{holland1986statistics}.
 %I THINK THIS SOUNDS A BIT SILLY
 % SELECTION BIAS
-More generally, our setting exhibits  {\it selection bias}~\cite{hernan2004structural}, \emph{latent confounding}~\cite{pearl2010introduction}, and  \emph{missing data}~\cite{little2019statistical} (depending on how the outcomes for negative decisions are interpreted). In particular, our setting violates the strong assumptions of {\it ignorability} and {\it missing at random (MAR)} in the context of missing data.
+More generally, our setting exhibits  {\it selection bias}~\cite{hernan2004structural}, \emph{latent confounding}~\cite{pearl2010introduction}, and  \emph{missing data}~\cite{little2019statistical} (depending on how the outcomes for negative decisions are interpreted). In particular, our setting violates 
+%the strong assumptions of
+ {\it ignorability} and {\it missing at random (MAR)} in the context of missing data.
 %
 % In our setting, any model predicting outcomes can only directly use data samples where the decision was positive.
 % MISSING DATA %IMPUTATION 
@@ -63,7 +65,7 @@ More generally, our setting exhibits  {\it selection bias}~\cite{hernan2004struc
 %
 %\fi
 %\iffalse
-The effectiveness of causal modelling and use of counterfactuals is also demonstrated in recent work on e.g. fairness~\cite{DBLP:conf/icml/Kusner0LS19,coston2020counterfactual,madras2019fairness,corbett2017algorithmic,DBLP:conf/aaai/ZhangB18}. 
+The effectiveness of causal modelling and counterfactuals is also demonstrated in recent work on e.g. fairness~\cite{DBLP:conf/icml/Kusner0LS19,coston2020counterfactual,madras2019fairness,corbett2017algorithmic,DBLP:conf/aaai/ZhangB18}. 
 %DBLP:conf/icml/NabiMS19,
 % BottouPCCCPRSS13,,DBLP:journals/jmlr/DBLP:conf/icml/JohanssonSS16
 %
@@ -71,7 +73,7 @@ The effectiveness of causal modelling and use of counterfactuals is also demonst
 %
 %Also identifiability questions in the presence of selection bias or missing data mechanisms require detailed causal modelling~\cite{bareinboim2012controlling,hernan2004structural,little2019statistical}.
 %To properly assess decision procedures for their performance and fairness we need to understand the causal relations 
-Finally, more applied work related in particular to recidivism, can be found e.g.  in~\cite{tolan2019why,brennan2009evaluating,royal}.
+More applied work related to recidivism, can be found e.g.  in~\cite{tolan2019why,brennan2009evaluating,royal}.
 %kleinberg2018human17fair,
 %\fi ,chouldechova20
 %murder,
\ No newline at end of file