diff --git a/analysis_and_scripts/notes.tex b/analysis_and_scripts/notes.tex
index 6102a245153bb95af64ad66c5873effb9fc19b92..375ab254f0756cca6b5a421afa7bc1f3eca2fedd 100644
--- a/analysis_and_scripts/notes.tex
+++ b/analysis_and_scripts/notes.tex
@@ -109,6 +109,30 @@ Mnemonic rule for the binary coding: zero bad (crime or jail), one good!
 
 \section{RL's notes about the selective labels paper (optional reading)} \label{sec:comments}
 
+\setcounter{figure}{-1}
+\begin{wrapfigure}{r}{0.3\textwidth} %this figure will be at the right
+    \centering
+    \begin{tikzpicture}[->,>=stealth',node distance=1.5cm, semithick]
+
+  \tikzstyle{every state}=[fill=none,draw=black,text=black]
+
+  \node[state] (R)                    {$R$};
+  \node[state] (X) [right of=R] {$X$};
+  \node[state] (T) [below of=X] {$T$};
+  \node[state] (Z) [rectangle, right of=X] {$Z$};
+  \node[state] (Y) [below of=Z] {$Y$};
+
+  \path (R) edge (T)
+        (X) edge (T)
+	     edge (Y)
+        (Z) edge (T)
+	     edge (Y)
+        (T) edge (Y);
+\end{tikzpicture}
+\caption{Initial model.}
+\label{fig:initial_model}
+\end{wrapfigure}
+
 \emph{This chapter is to present my comments and insight regarding the topic.}
 
 The motivating idea behind the SL paper of Lakkaraju et al. \cite{lakkaraju17} is to evaluate if machines could improve on human performance. In general case, comparing the performance of human and machine evaluations is simple. In the domains addressed by Lakkaraju et al. simple comparisons would be unethical and therefore algorithms are required. (Other approaches, such as a data augmentation algorithm has been proposed by De-Arteaga \cite{dearteaga18}.)
@@ -250,14 +274,17 @@ The above framework is now separated into three different modules: data generati
 
 The next module, namely the decider, assigns decisions for each observation with a given/defined way. This 'decision' can be either the most likely value for y (argmax likelihood y, usually binary 0 or 1), probability of an outcome or an ordering of the defendants.
 
-\textcolor{red}{RL: To do: Clarify the following.}
-
-The evaluator module takes as an input a data sample, some information about the data generation and some information about the decider. The data sample includes features $X, T$ and $Y$ where $Y \in \{0, 1, NA\}$ as specified before. The "something we know about $\M$" might be knowledge on the distribution of some of the variables or their interdependencies. In our example, we know that the $X$ is a standard Gaussian and independent from the other variables. From the decider it is known that its decisions are affected by leniency and private properties X. Next we try to simulate the decision-maker's process within the data sample. But to do this we need to learn the predictive model $\B$ with the restriction that Z can't be observed. 
+The evaluator module takes a data sample, some information about the data generation and some information about the decider as input. The data sample includes features $X,~T$ and $Y$ where $Y \in \{0, 1, NA\}$ as specified before. The "some information about $\M$" might be knowledge on the distribution of some of the variables or their interdependencies. In our example, we know that the $X$ is a standard Gaussian and independent from the other variables. From the decider it is known that its decisions are affected by leniency and private properties X. Next we try to simulate the decision-maker's process within the data sample. But to do this we need to learn the predictive model $\B$ with the restriction that Z can't be observed. With this setting, we now need to define an algorithm which outputs $\mathbb{E}[FR~|~input]$.
 
 \begin{quote}
 \emph{MM:} For example, consider an evaluation process that knows (i.e., is given as input) the decision process and what decisions it took for a few data points. The same evaluation process knows only some of the attributes of those data points -- and therefore it has only partial information about the data generation process. To make the example more specific, consider the case of decision process $\s$ mentioned above, which does not know W -- and consider an evaluation process that knows exactly how $\s$ works and what decisions it took for a few data points, but does not know either W or Z of those data points. This evaluation process outputs the expected value of FR according to the information that's given to it.
 \end{quote}
 
+\textcolor{red}{For what does the evaluator provide the estimate of failure rate for?}
+
+\textbf{Example} For illustration, let's consider the contraction algorithm in this framework. Data is generated from three Gaussians and the outcome Y is decided as in algorithm \ref{alg:data_with_Z}. Now the deciders are humans assigning the decisions with the ordering tactic as presented in algorithm \ref{alg:data_with_Z}. The evaluator module now takes the whole of the data as input, knows that X affects the results and that the variables are Gaussians. Now the evaluator module ....
+
+
 \section{Data generation}
 
 Both of the data generating algorithms are presented in this chapter.