%\note{Riku}{From KDD: ''In addition, authors can provide an optional two (2) page supplement at the end of their submitted paper (it needs to be in the same PDF file and start at page 10) focused on reproducibility. This supplement can only be used to include (i) information necessary for reproducing the experimental results, insights, or conclusions reported in the paper (e.g., various algorithmic and model parameters and configurations, hyper-parameter search spaces, details related to dataset filtering and train/test splits, software versions, detailed hardware configuration, etc.), and (ii) any pseudo-code, or proofs that due to space limitations, could not be included in the main nine-page manuscript, but that help in reproducibility (see reproducibility policy below for more details).''}
...
...
@@ -21,9 +22,9 @@
% \end{itemize}
%\end{itemize}
We used Python 3.6.9 and PyStan v.2.19.0.0 with cmdstanpy 0.4.3 for all experiments.
\subsection{Counterfactual Inference}
\section{Counterfactual Inference}
%\note{Antti}{Writing here in the language I know, to make the assumptions we are making clear.}
...
...
@@ -118,58 +119,50 @@ where we used $\epsilon_\unobservable=\unobservableValue$ and integrated out $\e
\caption{Results of experiments with COMPAS data using different number of judges. }
%\label{fig:}
\end{figure*}
\subsection{On the Priors}\label{sec:model_definition}
\iffalse
\note{Riku}{Copied from sec 3.5}
\section{On the Priors}\label{sec:model_definition}
The causal diagram of Figure~\ref{fig:causalmodel} provides the structure of causal relationships for quantities of interest.
%
In addition, we consider \judgeAmount instances $\{\human_j, j =1, 2, \ldots, \judgeAmount\}$ of decision makers \human.
%\iffalse
%\note{Riku}{Copied from sec 3.5}
%
For the purposes of Bayesian modelling, we present the hierarchical model and explicate our assumptions about the relationships and the quantities below.
%
Note that index $j$ refers to decision maker $\human_j$ and \invlogit is the standard logistic function.
As stated in the equations above, we consider normalized features \obsFeatures and \unobservable.
%The causal diagram of Figure~\ref{fig:causalmodel} provides the structure of causal relationships for quantities of interest.
%%
%In addition, we consider \judgeAmount instances $\{\human_j, j = 1, 2, \ldots, \judgeAmount\}$ of decision makers \human.
%%
%For the purposes of Bayesian modelling, we present the hierarchical model and explicate our assumptions about the relationships and the quantities below.
%%
%Note that index $j$ refers to decision maker $\human_j$ and \invlogit is the standard logistic function.
%
Moreover, the probability that the decision maker makes a positive decision takes the form of a logistic function (Equation~\ref{eq:judgemodel}).
%
Note that we are making the simplifying assumption that coefficients $\gamma$ are the same for all defendants, but decision makers are allowed to differ in intercept $\alpha_j \approx\logit(\leniencyValue_j)$ so as to model varying leniency levels among them (Eq. \ref{eq:leniencymodel}).
The probability that the outcome is successful conditional on a positive decision (Eq.~\ref{eq:defendantmodel}) is also provided by a logistic function, applied on the same features as the logistic formula of equation \ref{eq:judgemodel}.
%
In general, these two logistic functions may differ in their coefficients.
%As stated in the equations above, we consider normalized features \obsFeatures and \unobservable.
%%
%Moreover, the probability that the decision maker makes a positive decision takes the form of a logistic function (Equation~\ref{eq:judgemodel}).
%%
%Note that we are making the simplifying assumption that coefficients $\gamma$ are the same for all defendants, but decision makers are allowed to differ in intercept $\alpha_j \approx \logit(\leniencyValue_j)$ so as to model varying leniency levels among them (Eq. \ref{eq:leniencymodel}).
%%
%The probability that the outcome is successful conditional on a positive decision (Eq.~\ref{eq:defendantmodel}) is also provided by a logistic function, applied on the same features as the logistic formula of equation \ref{eq:judgemodel}.
%%
%In general, these two logistic functions may differ in their coefficients.
%%
%However, in many settings, a decision maker would be considered good if the two functions were the same -- i.e., if the probability to make a positive decision was the same as the probability to obtain a successful outcome after a positive decision.
%
However, in many settings, a decision maker would be considered good if the two functions were the same -- i.e., if the probability to make a positive decision was the same as the probability to obtain a successful outcome after a positive decision.
\fi
%\fi
For the Bayesian modelling, the priors for the coefficients $\gamma_\obsFeatures, ~\beta_\obsFeatures, ~\gamma_\unobservable$ and $\beta_\unobservable$ were defined using the gamma-mixture representation of Student's t-distribution with $6$ degrees of freedom.
%
...
...
@@ -200,11 +193,12 @@ The variance parameters $\sigma_\decision^2$ and $\sigma_\outcome^2$ were drawn
%
The Gaussians were restricted to the positive real numbers and both had mean $0$ and variance $\tau^2=1$.
\hide{
The sampler diagnostics exhibited poor performance only with XXX decider having E-BFMI value constantly below the nominal threshold of 0.2. Having a low value of E-BFMI with the sampler implies that the posterior may not have been explored fully.
}
%\hide{
%The sampler diagnostics exhibited poor performance only with XXX decider having E-BFMI value constantly below the nominal threshold of 0.2. Having a low value of E-BFMI with the sampler implies that the posterior may not have been explored fully.
\caption{Results of experiments with COMPAS data using different number of judges. }
%\label{fig:}
\end{figure}
These figures also feature the \textbf{Probabilistic} decision maker: Each subject is released with probability based on the logistic regression model, where the leniency is inputted through $\alpha_j$.
%\rcomment{Hard to justify any more? Or this decision maker could now be described as follows: Each subject is released with probability equal to some risk score which differs based on the assigned judge. In the experiments, the risk scores were computed with equation \ref{eq:judgemodel} where leniency was inputted through $\alpha_j$.}
\todo{Michael}{Create and use macros for all main terms and mathematical quantities, so that they stay consistent throughout the paper. Already done for previous sections}
We thoroughly tested our proposed method for evaluating decision maker performance in terms of accuracy, variability and robustness. We employed both synthetic and real data, including decision of several different kinds of decision makers. We compare performance especially to the state-of-the-art contraction technique of \citet{lakkaraju2017selective}.
We thoroughly tested our proposed method for evaluating decision maker performance in terms of accuracy, variability and robustness. We employed both synthetic and real data, including decision of several different kinds of decision makers. We compare performance especially to the state-of-the-art contraction technique of \citet{lakkaraju2017selective}. We used Python 3.6.9 and PyStan v.2.19.0.0 with cmdstanpy 0.4.3 for all experiments.
The figure shows a stituation in the {\it bail-or-jail} scenario, where a machine makes decisions for the {\it same defendants} previously decided by a judge. When the machine decides to allow bail for the defendant but the judge had denied bail, then we cannot evaluate directly the machine's decision.
The figure shows a situation in the {\it bail-or-jail} scenario, where a machine makes decisions for the {\it same defendants} previously decided by a judge. When the machine decides to allow bail for the defendant but the judge had denied bail, then we cannot evaluate directly the machine's decision.