@@ -242,7 +242,7 @@ These results show the accuracy of the different evaluators (Section~\ref{sec:ev
%Figure~\ref{fig:basic} shows some representative results from this process.
%SOME REPRESENTATIVE??? WE CANNOT BE THIS VAGUE
\spara{The basic setting.} Figure~\ref{fig:basic} shows estimated failure rates for each of the evaluators, at different leniency levels, when decisions in the data were made by \batch decision maker, while \machine was of \independent type.
\spara{The basic setting.} Figure~\ref{fig:basic} shows estimated failure rates for each of the evaluators, at different leniency levels, when decisions in the data were made by \independent decision maker, while \machine was of \batch type.
%
%Specifically, the plot shows the %
%The same leniency level was used for decisions in the data and for decisions by \machine.
...
...
@@ -301,7 +301,7 @@ Again, our interpretation is that this is due to the fact that \contraction cruc
\caption{Evaluating batch decision maker on data employing independent decision makers and with leniency at most $0.5$. The proposed method (\cfbi) offers good estimates of the failure rates for all levels of leniency, whereas contraction cailure rate only up to leniency $0.5$.}
\caption{Evaluating \batch on data employing \independent and with leniency at most $0.5$. The proposed method (\cfbi) offers sensible estimates of the failure rates for all levels of leniency, whereas \contractionproduces failure rates only up to leniency $0.5$.}
\label{fig:results_rmax05}
\end{figure}
...
...
@@ -398,7 +398,7 @@ The deployed machine decision maker was defined to release \leniencyValue fracti
\caption{Error of estimate w.r.t true evaluation when the effect of the unobserved $\unobservable$ is high ($\beta_\unobservable=\gamma_\unobservable=5$). Although the decision maker quality is poorer, the proposed approach (\cfbi) can still evaluate the decision accurately. Contraction shows higher variance and less accuracy}
\caption{Error of estimate w.r.t true evaluation when the effect of the unobserved $\unobservable$ is high ($\beta_\unobservable=\gamma_\unobservable=5$). Although the decision maker quality is poorer, the proposed approach (\cfbi) can still evaluate the decision accurately. \contraction shows higher variance and less accuracy.}
\label{fig:highz}
\end{figure}% RL: Note that only machine decision maker is poorer, not the human.
%\subsection{Results}
...
...
@@ -406,7 +406,7 @@ The deployed machine decision maker was defined to release \leniencyValue fracti
\caption{Results with COMPAS data, error bars represent the standard deviation of the \failurerate estimate errors across all levels of leniency with regard to true evaluation. \cfbi gives both accurate and precise estimates despite of the number of judges used. Performance of ets notably worse when data includes decisions by increasing number of judges.
\caption{Results with COMPAS data. Error bars represent the standard deviation of the \failurerate estimate errors across all levels of leniency with regard to true evaluation. \cfbi gives both accurate and precise estimates despite of the number of judges used. Performance of \contraction gets notably worse as the number of judges increases.