Newer
Older
& & \tabitem Data $\D$ with properties $\{x_i, j_i, t_i, y_i\}$ \\
& & \tabitem acceptance rate r \\
& & \tabitem knowledge that X affects Y \\
& & \tabitem more intricate knowledge about $\M$ ? \\[.5\normalbaselineskip]
& & {\ul Potential outcomes evaluator} \\
& & \tabitem Data $\D$ with properties $\{x_i, j_i, t_i, y_i\}$ \\
& & \tabitem acceptance rate r \\
& & \tabitem knowledge that X affects Y \\[.5\normalbaselineskip]
\section{Old results} \label{sec:results}
Results obtained from running algorithm \ref{alg:perf_comp} are presented in table \ref{tab:results} and figure \ref{fig:results}. All parameters are in their default values and a logistic regression model is trained.
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
1135
1136
1137
1138
1139
1140
1141
1142
1143
1144
1145
1146
1147
1148
1149
1150
1151
1152
1153
1154
1155
1156
1157
1158
1159
1160
1161
1162
1163
1164
1165
1166
1167
1168
1169
1170
1171
1172
1173
1174
1175
1176
1177
1178
1179
1180
1181
1182
1183
1184
1185
1186
1187
1188
1189
1190
1191
1192
1193
1194
1195
1196
1197
1198
1199
1200
1201
1202
1203
1204
1205
1206
1207
1208
1209
1210
1211
1212
1213
1214
1215
1216
1217
1218
1219
1220
1221
1222
1223
1224
1225
1226
1227
1228
1229
1230
1231
1232
1233
1234
1235
1236
1237
1238
1239
1240
1241
1242
1243
1244
1245
1246
1247
1248
1249
1250
1251
1252
1253
\begin{table}[H]
\centering
\caption{Mean absolute error (MAE) w.r.t true evaluation. \\ \emph{RL: Updated 26 June.}}
\begin{tabular}{l | c c}
Method & MAE without Z & MAE with Z \\ \hline
Labeled outcomes & 0.107249375 & 0.0827844\\
Human evaluation & 0.002383729 & 0.0042517\\
Contraction & 0.004633164 & 0.0075497\\
Causal model, ep & 0.000598624 & 0.0411532\\
\end{tabular}
\label{tab:results}
\end{table}
\begin{figure}[]
\centering
\begin{subfigure}[b]{0.5\textwidth}
\includegraphics[width=\textwidth]{sl_without_Z_8iter}
\caption{Results without unobservables}
\label{fig:results_without_Z}
\end{subfigure}
~ %add desired spacing between images, e. g. ~, \quad, \qquad, \hfill etc.
%(or a blank line to force the subfigure onto a new line)
\begin{subfigure}[b]{0.5\textwidth}
\includegraphics[width=\textwidth]{sl_with_Z_8iter_betaZ_1_0}
\caption{Results with unobservables, $\beta_Z=1$.}
\label{fig:results_with_Z}
\end{subfigure}
\caption{Failure rate vs. acceptance rate with varying levels of leniency. Logistic regression was trained on labeled training data. \emph{RL: Updated 26 June.}}
\label{fig:results}
\end{figure}
\subsection{$\beta_Z=0$ and data generated with unobservables.}
If we assign $\beta_Z=0$, almost all failure rates drop to zero in the interval 0.1, ..., 0.3 but the human evaluation failure rate. Results are presented in figures \ref{fig:betaZ_1_5} and \ref{fig:betaZ_0}.
The disparities between figures \ref{fig:results_without_Z} and \ref{fig:betaZ_0} (result without unobservables and with $\beta_Z=0$) can be explained in the slight difference in the data generating process, namely the effect of $\epsilon$. The effect of adding $\epsilon$ (noise to the decisions) is further explored in section \ref{sec:epsilon}.
\begin{figure}[]
\centering
\begin{subfigure}[b]{0.475\textwidth}
\includegraphics[width=\textwidth]{sl_with_Z_4iter_betaZ_1_5}
\caption{Results with unobservables, $\beta_Z$ set to 1.5 in algorithm \ref{alg:data_with_Z}.}
\label{fig:betaZ_1_5}
\end{subfigure}
\quad %add desired spacing between images, e. g. ~, \quad, \qquad, \hfill etc.
%(or a blank line to force the subfigure onto a new line)
\begin{subfigure}[b]{0.475\textwidth}
\includegraphics[width=\textwidth]{sl_with_Z_4iter_beta0}
\caption{Results with unobservables, $\beta_Z$ set to 0 in algorithm \ref{alg:data_with_Z}.}
\label{fig:betaZ_0}
\end{subfigure}
\caption{Effect of $\beta_z$. Failure rate vs. acceptance rate with unobservables in the data (see algorithm \ref{alg:data_with_Z}). Logistic regression was trained on labeled training data. Results from algorithm \ref{alg:perf_comp}.}
\label{fig:betaZ_comp}
\end{figure}
\subsection{Noise added to the decision and data generated without unobservables} \label{sec:epsilon}
In this part, Gaussian noise with zero mean and 0.1 variance was added to the probabilities $P(Y=0|X=x)$ after sampling Y but before ordering the observations in line 5 of algorithm \ref{alg:data_without_Z}. Results are presented in Figure \ref{fig:sigma_figure}.
\begin{figure}[]
\centering
\includegraphics[width=0.5\textwidth]{sl_without_Z_3iter_sigma_sqrt_01}
\caption{Failure rate with varying levels of leniency without unobservables. Noise has been added to the decision probabilities. Logistic regression was trained on labeled training data.}
\label{fig:sigma_figure}
\end{figure}
\subsection{Predictions with random forest classifier} \label{sec:random_forest}
In this section the predictive model was switched to random forest classifier to examine the effect of changing the predictive model. Results are practically identical to those presented in figure \ref{fig:results} previously and are presented in figure \ref{fig:random_forest}.
\begin{figure}[]
\centering
\begin{subfigure}[b]{0.475\textwidth}
\includegraphics[width=\textwidth]{sl_withoutZ_4iter_randomforest}
\caption{Results without unobservables.}
\label{fig:results_without_Z_rf}
\end{subfigure}
\quad %add desired spacing between images, e. g. ~, \quad, \qquad, \hfill etc.
%(or a blank line to force the subfigure onto a new line)
\begin{subfigure}[b]{0.475\textwidth}
\includegraphics[width=\textwidth]{sl_withZ_6iter_betaZ_1_0_randomforest}
\caption{Results with unobservables, $\beta_Z=1$.}
\label{fig:results_with_Z_rf}
\end{subfigure}
\caption{Failure rate vs. acceptance rate with varying levels of leniency. Random forest classifier was trained on labeled training data}
\label{fig:random_forest}
\end{figure}
\subsection{Sanity check for predictions}
Predictions were checked by drawing a graph of predicted Y versus X, results are presented in figure \ref{fig:sanity_check}. The figure indicates that the predicted class labels and the probabilities for them are consistent with the ground truth.
\begin{figure}[]
\centering
\includegraphics[width=0.5\textwidth]{sanity_check}
\caption{Predicted class label and probability of $Y=1$ versus X. Prediction was done with a logistic regression model. Colors of the points denote ground truth (yellow = 1, purple = 0). Data set was created with the unobservables.}
\label{fig:sanity_check}
\end{figure}
\subsection{Fully random model $\M$}
Given our framework defined in section \ref{sec:framework}, the results presented next are with model $\M$ that outputs probabilities 0.5 for every instance of $x$. Labeling process is still as presented in algorithm \ref{alg:data_with_Z}.
\begin{figure}[]
\centering
\begin{subfigure}[b]{0.475\textwidth}
\includegraphics[width=\textwidth]{sl_without_Z_15iter_random_model}
\caption{Failure rate vs. acceptance rate. Data without unobservables. Machine predictions with random model.}
\label{fig:random_predictions_without_Z}
\end{subfigure}
\quad %add desired spacing between images, e. g. ~, \quad, \qquad, \hfill etc.
%(or a blank line to force the subfigure onto a new line)
\begin{subfigure}[b]{0.475\textwidth}
\includegraphics[width=\textwidth]{sl_with_Z_15iter_fully_random_model}
\caption{Failure rate vs. acceptance rate. Data with unobservables. Machine predictions with random model.}
\label{fig:random_predictions_with_Z}
\end{subfigure}
\caption{Failure rate vs. acceptance rate with varying levels of leniency. Machine predictions were done with completely random model, that is prediction $P(Y=0|X=x)=0.5$ for all $x$.}
\label{fig:random_predictions}
\end{figure}
\subsection{Modular framework -- Monte Carlo evaluator} \label{sec:modules_mc}
For these results, data was generated either with module in algorithm \ref{alg:dg:coinflip_with_z} (drawing Y from Bernoulli distribution with parameter $\pr(Y=0|X, Z, W)$ as previously) or with module in algorithm \ref{alg:dg:threshold_with_Z} (assign Y based on the value of $\invlogit(\beta_XX+\beta_ZZ)$). Decisions were determined using one of the two modules: module in algorithm \ref{alg:decider:quantile} (decision based on quantiles) or \ref{alg:decider:lakkaraju} ("human" decision-maker as in \cite{lakkaraju17}). Curves were computed with True evaluation (algorithm \ref{alg:eval:true_eval}), Labeled outcomes (\ref{alg:eval:labeled_outcomes}), Human evaluation (\ref{alg:eval:human_eval}), Contraction (\ref{alg:eval:contraction}) and Monte Carlo evaluators (\ref{alg:eval:mc}). Results are presented in figure \ref{fig:modules_mc}. The corresponding MAEs are presented in table \ref{tab:modules_mc}.
From the result table we can see that the MAE is at the lowest when the data generating process corresponds closely to the Monte Carlo algorithm.
\begin{table}[]
\centering
\caption{Mean absolute error w.r.t true evaluation. See modules used in section \ref{sec:modules_mc}. Bern = Bernoulli, indep. = independent, TH = threshold}
\begin{tabular}{l | c c c c}
Method & Bern + indep. & Bern + non-indep. & TH + indep. & TH + non-indep.\\ \hline
Labeled outcomes & 0.111075 & 0.103235 & 0.108506 & 0.0970325\\
Human evaluation & 0.027298 & NaN (TBA) & 0.049582 & 0.0033916\\
Contraction & 0.004206 & 0.004656 & 0.005557 & 0.0034591\\
Monte Carlo & 0.001292 & 0.016629 & 0.009429 & 0.0179825\\
\end{tabular}
\label{tab:modules_mc}
\end{table}
\begin{figure}[]
\centering
\begin{subfigure}[b]{0.475\textwidth}
\includegraphics[width=\textwidth]{sl_with_Z_10iter_coinflip_quantile_defaults_mc}
\caption{Outcome Y from Bernoulli, independent decisions using the quantiles.}
%\label{fig:modules_mc_without_Z}
\end{subfigure}
\quad %add desired spacing between images, e. g. ~, \quad, \qquad, \hfill etc.
%(or a blank line to force the subfigure onto a new line)
\begin{subfigure}[b]{0.475\textwidth}
\includegraphics[width=\textwidth]{sl_with_Z_20iter_threshold_quantile_defaults_mc}
\caption{Outcome Y from threshold rule, independent decisions using the quantiles.}
%\label{fig:modules_mc_with_Z}
\end{subfigure}
\begin{subfigure}[b]{0.475\textwidth}
\includegraphics[width=\textwidth]{sl_with_Z_10iter_coinflip_lakkarajudecider_defaults_mc}
\caption{Outcome Y from Bernoulli, non-independent decisions.}
%\label{fig:modules_mc_without_Z}
\end{subfigure}
\quad %add desired spacing between images, e. g. ~, \quad, \qquad, \hfill etc.
%(or a blank line to force the subfigure onto a new line)
\begin{subfigure}[b]{0.475\textwidth}
\includegraphics[width=\textwidth]{sl_with_Z_10iter_threshold_lakkarajudecider_defaults_mc}
\caption{Outcome Y from threshold rule, non-independent decisions.}
%\label{fig:modules_mc_with_Z}
\end{subfigure}
\caption{Failure rate vs. acceptance rate with varying levels of leniency. Different combinations of deciders and data generation modules. See other modules used in section \ref{sec:modules_mc}}
\label{fig:modules_mc}
\end{figure}
\section{Diagnostic figures} \label{sec:diagnostic}
Here we present supplementary figures of all the settings in the main result section.
\begin{figure}[]
\centering
\includegraphics[width=\textwidth]{sl_diagnostic_bernoulli_independent_without_Z}
\caption{Results from estimating failure rate with different levels of leniency using different methods.}
%\label{fig:}
\end{figure}
\begin{figure}[]
\centering
\includegraphics[width=\textwidth]{sl_diagnostic_bernoulli_independent_with_Z}
\caption{Results from estimating failure rate with different levels of leniency using different methods.}
%\label{fig:}
\end{figure}
\begin{figure}[]
\centering
\includegraphics[width=\textwidth]{sl_diagnostic_threshold_independent_with_Z}
\caption{Results from estimating failure rate with different levels of leniency using different methods.}
%\label{fig:}
\end{figure}
\begin{figure}[]
\centering
\includegraphics[width=\textwidth]{sl_diagnostic_bernoulli_batch_with_Z}
\caption{Results from estimating failure rate with different levels of leniency using different methods.}
%\label{fig:}
\end{figure}
\begin{figure}[]
\centering
\includegraphics[width=\textwidth]{sl_diagnostic_threshold_batch_with_Z}
\caption{Results from estimating failure rate with different levels of leniency using different methods.}
%\label{fig:}
\end{figure}
\begin{figure}[]
\centering
\includegraphics[width=\textwidth]{sl_diagnostic_random_decider_with_Z}
\caption{Results from estimating failure rate with different levels of leniency using different methods.}
%\label{fig:}
\end{figure}
\begin{figure}[]
\centering
\includegraphics[width=\textwidth]{sl_diagnostic_biased_decider_with_Z}
\caption{Results from estimating failure rate with different levels of leniency using different methods.}
%\label{fig:}
\end{figure}
\begin{figure}[]
\centering
\includegraphics[width=\textwidth]{sl_diagnostic_bad_decider_with_Z}
\caption{Results from estimating failure rate with different levels of leniency using different methods.}
%\label{fig:}
\end{figure}
%\end{appendices}