diff --git a/paper/sl.tex b/paper/sl.tex
index 51511afd85a215220a9f508ed13dcf0955893cb6..d77c6c139857cb94325ae69e48c39679a1e8a751 100755
--- a/paper/sl.tex
+++ b/paper/sl.tex
@@ -38,6 +38,7 @@
 \newcommand{\ourtitle}{A Causal Approach for Selective Labels}
 
 \input{macros}
+\usepackage{chato-notes}
 
 
 \title{\ourtitle}
@@ -173,6 +174,14 @@ Alternatively, we can have an empirical measure \empiricalPerformance of perform
 \label{eqn:gp}	
 \end{equation}
 
+\note[MM]{
+	Use the following for empirical performance?
+	\begin{equation}
+\empiricalPerformance = \frac{1}{\datasize} \sum_{(\featuresValue, \outcomeValue)\in\dataset}  \score{\featuresValue} \indicator{F(\featuresValue) < r} 
+\label{eqn:gp}	
+\end{equation}
+}
+
 \subsection{Comments}
 Roughly speaking, the above formulas should work well if `bail' cases (\decision = 1) cover well the area spanned by the observed features of defendants -- i.e., we do not have large areas of \features with no or too few bail cases.