From 92707b9ad34fe77b17f876ae549ffd7753508dd2 Mon Sep 17 00:00:00 2001 From: Riku-Laine <28960190+Riku-Laine@users.noreply.github.com> Date: Tue, 7 May 2019 10:07:11 +0300 Subject: [PATCH] Renamed files --- .../Analysis_07MAY2019_new.ipynb | 645 ++++++++++++++++++ ...nb => Bachelors_thesis_analyses_OLD.ipynb} | 0 analysis_and_scripts/Untitled.ipynb | 587 ---------------- 3 files changed, 645 insertions(+), 587 deletions(-) create mode 100644 analysis_and_scripts/Analysis_07MAY2019_new.ipynb rename analysis_and_scripts/{Bachelors_thesis_analyses.ipynb => Bachelors_thesis_analyses_OLD.ipynb} (100%) delete mode 100644 analysis_and_scripts/Untitled.ipynb diff --git a/analysis_and_scripts/Analysis_07MAY2019_new.ipynb b/analysis_and_scripts/Analysis_07MAY2019_new.ipynb new file mode 100644 index 0000000..aabbd42 --- /dev/null +++ b/analysis_and_scripts/Analysis_07MAY2019_new.ipynb @@ -0,0 +1,645 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "toc": true + }, + "source": [ + "<h1>Table of Contents<span class=\"tocSkip\"></span></h1>\n", + "<div class=\"toc\"><ul class=\"toc-item\"><li><span><a href=\"#Causal-model\" data-toc-modified-id=\"Causal-model-1\"><span class=\"toc-item-num\">1 </span>Causal model</a></span><ul class=\"toc-item\"><li><span><a href=\"#Notes\" data-toc-modified-id=\"Notes-1.1\"><span class=\"toc-item-num\">1.1 </span>Notes</a></span></li></ul></li><li><span><a href=\"#Synthetic-data\" data-toc-modified-id=\"Synthetic-data-2\"><span class=\"toc-item-num\">2 </span>Synthetic data</a></span></li><li><span><a href=\"#Algorithms\" data-toc-modified-id=\"Algorithms-3\"><span class=\"toc-item-num\">3 </span>Algorithms</a></span><ul class=\"toc-item\"><li><span><a href=\"#Contraction-algorithm\" data-toc-modified-id=\"Contraction-algorithm-3.1\"><span class=\"toc-item-num\">3.1 </span>Contraction algorithm</a></span></li><li><span><a href=\"#Causal-algorithm\" data-toc-modified-id=\"Causal-algorithm-3.2\"><span class=\"toc-item-num\">3.2 </span>Causal algorithm</a></span></li></ul></li><li><span><a href=\"#Performance-comparison\" data-toc-modified-id=\"Performance-comparison-4\"><span class=\"toc-item-num\">4 </span>Performance comparison</a></span><ul class=\"toc-item\"><li><span><a href=\"#Predictive-models\" data-toc-modified-id=\"Predictive-models-4.1\"><span class=\"toc-item-num\">4.1 </span>Predictive models</a></span></li><li><span><a href=\"#Visual-comparison\" data-toc-modified-id=\"Visual-comparison-4.2\"><span class=\"toc-item-num\">4.2 </span>Visual comparison</a></span></li></ul></li></ul></div>" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Causal model\n", + "\n", + "Our model is defined by the probabilistic expression \n", + "\n", + "\\begin{equation}\\label{model_disc}\n", + "P(Y=0 | \\text{do}(R=r)) = \\sum_x \\underbrace{P(Y=0|X=x, T=1)}_\\text{1} \n", + "\\overbrace{P(T=1|R=r, X=x)}^\\text{2} \n", + "\\underbrace{P(X=x)}_\\text{3}\n", + "\\end{equation}\n", + "\n", + "which is equal to \n", + "\n", + "\\begin{equation}\\label{model_cont}\n", + "P(Y=0 | \\text{do}(R=r)) = \\int_x P(Y=0|X=x, T=1)P(T=1|R=r, X=x)P(X=x)\n", + "\\end{equation}\n", + "\n", + "for continuous $x$. Model as a graph (Z is a latent variable, and can be excluded from the expression with do-calculus by showing that $X$ is admissible for adjustment):\n", + "\n", + "<!---  --->\n", + "\n", + "For predicting the probability of negative outcome the following should hold because by Pearl $P(Y=0 | \\text{do}(R=r), X=x) = P(Y=0 | R=r, X=x)$ when $X$ is an admissible set:\n", + "\n", + "\\begin{equation} \\label{model_pred}\n", + "P(Y=0 | \\text{do}(R=r), X=x) = P(Y=0|X=x, T=1)P(T=1|R=r, X=x).\n", + "\\end{equation}\n", + "\n", + "Still it should be noted that this prediction takes into account the probability of the individual to be given a positive decision ($T=1$), see second term in \\ref{model_pred}.\n", + "\n", + "----\n", + "\n", + "### Notes\n", + "\n", + "* Equations \\ref{model_disc} and \\ref{model_cont} describe the whole causal effect in the population (the causal effect of changing $r$ over all strata $X$).\n", + "* Prediction should be possible with \\ref{model_pred}. Both terms can be learned from the data. NB: the probability $P(Y=0 | \\text{do}(R=r), X=x)$ is lowest when the individual $x$ is the most dangerous or the least dangerous. How could we infer/predict the counterfactual \"what is the probability of $Y=0$ if we were to let this individual go?\" has yet to be calculated.\n", + "* Is the effect of R learned/estimated correctly if it is just plugged in to a predictive model (e.g. logistic regression)?\n", + "* $P(Y=0 | do(R=0)) = 0$ only in this application. My predictive models say that when $r=0$ the probability $P(Y=0) \\approx 0.027$ which would be a natural estimate in another application/scenario (e.g. in medicine the probability of an adverse event when a stronger medicine is distributed to everyone. Then the probability will be close to zero but not exactly zero.)" + ] + }, + { + "cell_type": "code", + "execution_count": 89, + "metadata": {}, + "outputs": [], + "source": [ + "# Imports\n", + "\n", + "import numpy as np\n", + "import pandas as pd\n", + "from datetime import datetime\n", + "import matplotlib.pyplot as plt\n", + "import scipy.stats as scs\n", + "import scipy.integrate as si\n", + "import seaborn as sns\n", + "import numpy.random as npr\n", + "from sklearn.preprocessing import OneHotEncoder\n", + "from sklearn.linear_model import LogisticRegression\n", + "from sklearn.ensemble import RandomForestClassifier\n", + "\n", + "# Settings\n", + "\n", + "%matplotlib inline\n", + "\n", + "plt.rcParams.update({'font.size': 16})\n", + "plt.rcParams.update({'figure.figsize': (14, 7)})\n", + "\n", + "# Suppress deprecation warnings.\n", + "\n", + "import warnings\n", + "\n", + "def fxn():\n", + " warnings.warn(\"deprecated\", DeprecationWarning)\n", + "\n", + "with warnings.catch_warnings():\n", + " warnings.simplefilter(\"ignore\")\n", + " fxn()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Synthetic data\n", + "\n", + "In the chunk below, we generate the synthetic data as described by Lakkaraju et al. The default values and definitions of $Y$ and $T$ values follow their description.\n", + "\n", + "**Parameters**\n", + "\n", + "* M = `nJudges_M`, number of judges\n", + "* N = `nSubjects_N`, number of subjects assigned to each judge\n", + "* betas $\\beta_i$ = `beta_i`, where $i \\in \\{X, Z, W\\}$ are coefficients for the respected variables\n", + "\n", + "**Columns of the data:**\n", + "\n", + "* `judgeID_J` = judge IDs as running numbering from 0 to `nJudges_M - 1`\n", + "* R = `acceptanceRate_R`, acceptance rates\n", + "* X = `X`, invidual's features observable to all (models and judges)\n", + "* Z = `Z`, information observable for judges only\n", + "* W = `W`, unobservable / inaccessible information\n", + "* T = `decision_T`, bail-or-jail decisions where $T=0$ represents jail decision and $T=1$ bail decision.\n", + "* Y = `result_Y`, result variable, if $Y=0$ person will or would recidivate and if $Y=1$ person will or would not commit a crime." + ] + }, + { + "cell_type": "code", + "execution_count": 90, + "metadata": {}, + "outputs": [], + "source": [ + "# Set seed for reproducibility\n", + "#npr.seed(0)\n", + "\n", + "def generateData(nJudges_M=100,\n", + " nSubjects_N=500,\n", + " beta_X=1.0,\n", + " beta_Z=1.0,\n", + " beta_W=0.2):\n", + "\n", + " # Assign judge IDs as running numbering from 0 to nJudges_M - 1\n", + " judgeID_J = np.repeat(np.arange(0, nJudges_M, dtype=np.int32), nSubjects_N)\n", + "\n", + " # Sample acceptance rates uniformly from a closed interval\n", + " # from 0.1 to 0.9 and round to tenth decimal place.\n", + " acceptance_rates = np.round(npr.uniform(.1, .9, nJudges_M), 10)\n", + "\n", + " # Replicate the rates so they can be attached to the corresponding judge ID.\n", + " acceptanceRate_R = np.repeat(acceptance_rates, nSubjects_N)\n", + "\n", + " # Sample the variables from standard Gaussian distributions.\n", + " X = npr.normal(size=nJudges_M * nSubjects_N)\n", + " Z = npr.normal(size=nJudges_M * nSubjects_N)\n", + " W = npr.normal(size=nJudges_M * nSubjects_N)\n", + "\n", + " probabilities_Y = 1 / (1 + np.exp(-(beta_X * X + beta_Z * Z + beta_W * W)))\n", + "\n", + " # 0 if P(Y = 0| X = x; Z = z; W = w) >= 0.5 , 1 otherwise\n", + " result_Y = 1 - probabilities_Y.round()\n", + " \n", + " # For the conditional probabilities of T we add noise ~ N(0, 0.1)\n", + " probabilities_T = 1 / (1 + np.exp(-(beta_X * X + beta_Z * Z)))\n", + " probabilities_T += npr.normal(0, np.sqrt(0.1), nJudges_M * nSubjects_N)\n", + "\n", + " # Initialize decision values as 1\n", + " decision_T = np.ones(nJudges_M * nSubjects_N)\n", + "\n", + " # Initialize the dataframe\n", + " df_init = pd.DataFrame(np.column_stack(\n", + " (judgeID_J, acceptanceRate_R, X, Z, W, result_Y, probabilities_T,\n", + " decision_T)),\n", + " columns=[\n", + " \"judgeID_J\", \"acceptanceRate_R\", \"X\", \"Z\", \"W\",\n", + " \"result_Y\", \"probabilities_T\", \"decision_T\"\n", + " ])\n", + "\n", + " # Sort by judges then probabilities\n", + " data = df_init.sort_values(by=[\"judgeID_J\", \"probabilities_T\"],\n", + " ascending=False)\n", + "\n", + " # Iterate over the data. Subject is in the top (1-r)*100% if\n", + " # his within-judge-index is over acceptance threshold times\n", + " # the number of subjects assigned to each judge. If subject\n", + " # is over the limit they are assigned a zero, else one.\n", + " data.reset_index(drop=True, inplace=True)\n", + "\n", + " data['decision_T'] = np.where(\n", + " (data.index.values % nSubjects_N) <\n", + " ((1 - data['acceptanceRate_R']) * nSubjects_N), 0, 1)\n", + "\n", + " return data\n", + "\n", + "\n", + "df = generateData()" + ] + }, + { + "cell_type": "code", + "execution_count": 91, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "(25000, 8)\n", + "(25000, 8)\n", + "(25000, 8)\n", + "(25000, 8)\n" + ] + }, + { + "data": { + "text/html": [ + "<div>\n", + "<style scoped>\n", + " .dataframe tbody tr th:only-of-type {\n", + " vertical-align: middle;\n", + " }\n", + "\n", + " .dataframe tbody tr th {\n", + " vertical-align: top;\n", + " }\n", + "\n", + " .dataframe thead th {\n", + " text-align: right;\n", + " }\n", + "</style>\n", + "<table border=\"1\" class=\"dataframe\">\n", + " <thead>\n", + " <tr style=\"text-align: right;\">\n", + " <th>decision_T</th>\n", + " <th>1</th>\n", + " </tr>\n", + " <tr>\n", + " <th>result_Y</th>\n", + " <th></th>\n", + " </tr>\n", + " </thead>\n", + " <tbody>\n", + " <tr>\n", + " <th>0.0</th>\n", + " <td>3911</td>\n", + " </tr>\n", + " <tr>\n", + " <th>1.0</th>\n", + " <td>8759</td>\n", + " </tr>\n", + " </tbody>\n", + "</table>\n", + "</div>" + ], + "text/plain": [ + "decision_T 1\n", + "result_Y \n", + "0.0 3911\n", + "1.0 8759" + ] + }, + "execution_count": 91, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# Split the data set to test and train\n", + "from sklearn.model_selection import train_test_split\n", + "train, test = train_test_split(df, test_size=0.5, random_state=0)\n", + "\n", + "print(train.shape)\n", + "print(test.shape)\n", + "\n", + "train_labeled = train.copy()\n", + "test_labeled = test.copy()\n", + "\n", + "# Set results as NA if decision is negative.\n", + "train_labeled.result_Y = np.where(train.decision_T == 0, np.nan, train.result_Y)\n", + "test_labeled.result_Y = np.where(test.decision_T == 0, np.nan, test.result_Y)\n", + "\n", + "print(train_labeled.shape)\n", + "print(test_labeled.shape)\n", + "\n", + "tab = train_labeled.groupby(['result_Y', 'decision_T']).size()\n", + "tab.unstack()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Algorithms\n", + "\n", + "### Contraction algorithm\n", + "\n", + "Below is an implementation of Lakkaraju's team's algorithm presented in [their paper](https://helka.finna.fi/PrimoRecord/pci.acm3098066). Relevant parameters to be passed to the function are presented in the description." + ] + }, + { + "cell_type": "code", + "execution_count": 92, + "metadata": {}, + "outputs": [], + "source": [ + "def contraction(df,\n", + " judgeIDJ_col,\n", + " decisionT_col,\n", + " resultY_col,\n", + " modelProbS_col,\n", + " accRateR_col,\n", + " r,\n", + " binning=False):\n", + " '''\n", + " This is an implementation of the algorithm presented by Lakkaraju\n", + " et al. in their paper \"The Selective Labels Problem: Evaluating \n", + " Algorithmic Predictions in the Presence of Unobservables\" (2017).\n", + " \n", + " Parameters:\n", + " df = The (Pandas) data frame containing the data, judge decisions,\n", + " judge IDs, results and probability scores.\n", + " judgeIDJ_col = String, the name of the column containing the judges' IDs\n", + " in df.\n", + " decisionT_col = String, the name of the column containing the judges' decisions\n", + " resultY_col = String, the name of the column containing the realization\n", + " modelProbS_col = String, the name of the column containing the probability\n", + " scores from the black-box model B.\n", + " accRateR_col = String, the name of the column containing the judges' \n", + " acceptance rates\n", + " r = Float between 0 and 1, the given acceptance rate.\n", + " binning = Boolean, should judges with same acceptance rate be binned\n", + " \n", + " Returns:\n", + " u = The estimated failure rate at acceptance rate r.\n", + " '''\n", + " # Sort first by acceptance rate and judge ID.\n", + " sorted_df = df.sort_values(by=[accRateR_col, judgeIDJ_col],\n", + " ascending=False)\n", + "\n", + " if binning:\n", + " # Get maximum leniency\n", + " max_leniency = sorted_df[accRateR_col].values[0].round(1)\n", + "\n", + " # Get list of judges that are the most lenient\n", + " most_lenient_list = sorted_df.loc[sorted_df[accRateR_col].round(1) ==\n", + " max_leniency, judgeIDJ_col]\n", + "\n", + " # Subset to obtain D_q\n", + " D_q = sorted_df[sorted_df[judgeIDJ_col].isin(\n", + " most_lenient_list.unique())].copy()\n", + " else:\n", + " # Get most lenient judge\n", + " most_lenient_ID = sorted_df[judgeIDJ_col].values[0]\n", + "\n", + " # Subset\n", + " D_q = sorted_df[sorted_df[judgeIDJ_col] == most_lenient_ID].copy()\n", + "\n", + " # All observations of R_q have observed outcome labels\n", + " R_q = D_q[D_q[decisionT_col] == 1]\n", + "\n", + " # \"Observations deemed as high risk by B are at the top of this list\"\n", + " R_sort_q = R_q.sort_values(by=modelProbS_col, ascending=False)\n", + "\n", + " number_to_remove = int(\n", + " round((1.0 - r) * D_q.shape[0] - (D_q.shape[0] - R_q.shape[0])))\n", + "\n", + " # \"R_B is the list of observations assigned to t = 1 by B\"\n", + " R_B = R_sort_q[number_to_remove:R_sort_q.shape[0]]\n", + "\n", + " return np.sum(R_B[resultY_col] == 0) / D_q.shape[0]" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Causal algorithm\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": 93, + "metadata": {}, + "outputs": [], + "source": [ + "def f(x, model, class_value):\n", + " '''\n", + " Parameters:\n", + " x = individual features\n", + " model = a trained sklearn predictive model. Predicts probabilities for given x.\n", + " class_value = the result (class) to predict (usually 0 or 1).\n", + " \n", + " Returns:\n", + " The probabilities (as vector) of class value (class_value) given \n", + " individual features (x) and the trained, predictive model (model).\n", + " '''\n", + " if x.ndim == 1:\n", + " # if x is vector, transform to column matrix.\n", + " f_values = model.predict_proba(np.array(x).reshape(-1, 1))\n", + " else:\n", + " f_values = model.predict_proba(x)\n", + "\n", + " return f_values[:, model.classes_ == class_value].flatten()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Performance comparison\n", + "\n", + "Below we try to replicate the results obtained by Lakkaraju and compare their model's performance to the one of ours.\n", + "\n", + "### Predictive models\n", + "\n", + "Lakkaraju says that they used logistic regression. We construct the models using only *observed observations*, i.e. observations for which labels are available. We then predict the probability of negative outcome for all observations in the test data and attach it to our data set." + ] + }, + { + "cell_type": "code", + "execution_count": 94, + "metadata": {}, + "outputs": [], + "source": [ + "# instantiate the model (using the default parameters)\n", + "logreg = LogisticRegression(solver='lbfgs')\n", + "\n", + "# fit, reshape X to be of shape (n_samples, n_features)\n", + "logreg = logreg.fit(\n", + " train_labeled.X[train_labeled.decision_T == 1].values.reshape(-1, 1),\n", + " train_labeled.result_Y[train_labeled.decision_T == 1])\n", + "\n", + "# predict probabilities and attach to data\n", + "label_probs_logreg = logreg.predict_proba(test.X.values.reshape(-1, 1))\n", + "\n", + "test = test.assign(B_prob_0_logreg=label_probs_logreg[:, 0])\n", + "test_labeled = test_labeled.assign(B_prob_0_logreg=label_probs_logreg[:, 0])" + ] + }, + { + "cell_type": "code", + "execution_count": 95, + "metadata": {}, + "outputs": [], + "source": [ + "# Train model for predicting the probability of positive decision with a given\n", + "# leniency r and indivual features x.\n", + "\n", + "# Instantiate the model (using the default parameters)\n", + "decision_model = LogisticRegression(solver='lbfgs')\n", + "\n", + "# fit, reshape X to be of shape (n_samples, n_features)\n", + "decision_model = decision_model.fit(train[['X', 'acceptanceRate_R']],\n", + " train.decision_T)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Visual comparison\n", + "\n", + "Let's plot the failure rates against the acceptance rates using the difference." + ] + }, + { + "cell_type": "code", + "execution_count": 96, + "metadata": {}, + "outputs": [ + { + "data": { + "image/png": "\n", + "text/plain": [ + "<Figure size 1008x576 with 1 Axes>" + ] + }, + "metadata": { + "needs_background": "light" + }, + "output_type": "display_data" + } + ], + "source": [ + "failure_rates = np.zeros((8, 5))\n", + "\n", + "for r in np.arange(1, 9):\n", + " \n", + " #### True evaluation\n", + " # Sort by failure probabilities, subjects with the smallest risk are first. \n", + " df_sorted = test.sort_values(by='B_prob_0_logreg', inplace=False, \n", + " ascending=True)\n", + "\n", + " to_release = int(round(df_sorted.shape[0] * r / 10))\n", + "\n", + " # Failure was coded as zero.\n", + " failure_rates[r - 1, 0] = np.mean(df_sorted.result_Y[0:to_release] == 0)\n", + " \n", + " #### Labeled outcomes only\n", + " # Sort by failure probabilities, subjects with the smallest risk are first. \n", + " df_sorted = test_labeled.sort_values(by='B_prob_0_logreg', inplace=False,\n", + " ascending=True)\n", + " \n", + " # Ensure that only labeled outcomes are available\n", + " df_sorted = df_sorted[df_sorted.decision_T == 1]\n", + " \n", + " to_release = int(round(df_sorted.shape[0] * r / 10))\n", + "\n", + " failure_rates[r - 1, 1] = np.mean(df_sorted.result_Y[0:to_release] == 0)\n", + " \n", + " #### Human error rate\n", + " # Get judges with correct leniency as list\n", + " correct_leniency_list = test_labeled.judgeID_J[\n", + " test_labeled['acceptanceRate_R'].round(1) == r / 10].values\n", + "\n", + " # Released are the people they judged and released, T = 1\n", + " released = test_labeled[test_labeled.judgeID_J.isin(correct_leniency_list)\n", + " & (test_labeled.decision_T == 1)]\n", + "\n", + " # Get their failure rate, aka ratio of reoffenders to number of people judged in total\n", + " failure_rates[r - 1, 2] = np.sum(\n", + " released.result_Y == 0) / correct_leniency_list.shape[0]\n", + " # onko jakaja oikein\n", + " \n", + " #### Contraction, logistic regression\n", + " failure_rates[r - 1, 3] = contraction(\n", + " test_labeled, 'judgeID_J', 'decision_T', 'result_Y', 'B_prob_0_logreg',\n", + " 'acceptanceRate_R', r / 10, False)\n", + "\n", + " #### P(Y=0 | T=1, X=x)*P(T=1 | R=r, X=x)*P(X=x)\n", + " failure_rates[r - 1, 4] = si.quad(lambda x: f(np.array([x]), logreg, 0) * \n", + " f(np.array([[x, r/10]]), decision_model, 1) * \n", + " scs.norm.pdf(x), -np.inf, np.inf)[0]\n", + "\n", + "# Error bars TBA\n", + "\n", + "plt.figure(figsize=(14, 8))\n", + "plt.plot(np.arange(0.1, 0.9, .1), failure_rates[:, 0], label='True Evaluation', c='green')\n", + "plt.plot(np.arange(0.1, 0.9, .1), failure_rates[:, 1], label='Labeled outcomes', c='lime')\n", + "plt.plot(np.arange(0.1, 0.9, .1), failure_rates[:, 2], label='Human evaluation', c='red')\n", + "plt.plot(np.arange(0.1, 0.9, .1), failure_rates[:, 3], label='Contraction, log.', c='blue')\n", + "plt.plot(np.arange(0.1, 0.9, .1), failure_rates[:, 4], label='Causal effect', c='magenta')\n", + "\n", + "plt.title('Failure rate vs. Acceptance rate')\n", + "plt.xlabel('Acceptance rate')\n", + "plt.ylabel('Failure rate')\n", + "plt.legend()\n", + "plt.grid()\n", + "plt.show()" + ] + }, + { + "cell_type": "code", + "execution_count": 97, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "0.0 (0.018718463137853268, 7.749450073818988e-11)\n", + "1.0 (0.33301477999280144, 6.337618003666896e-09)\n" + ] + } + ], + "source": [ + "# Below are estimates for P(Y=0 | do(R=0)) and P(Y=0 | do(R=1))\n", + "r = 0.0\n", + "print(r, si.quad(lambda x: f(np.array([[x, r]]), decision_model, 1) * \\\n", + " f(np.array([x]), logreg, 0) * scs.norm.pdf(x), -np.inf, np.inf))\n", + "\n", + "r = 1.0\n", + "print(r, si.quad(lambda x: f(np.array([[x, r]]), decision_model, 1) * \\\n", + " f(np.array([x]), logreg, 0) * scs.norm.pdf(x), -np.inf, np.inf))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "So it can be concluded that:\n", + "\n", + "\\begin{equation*}\n", + "P(Y=0 | \\text{do}(R=0)) \\approx 0.018 \\\\\n", + "P(Y=0 | \\text{do}(R=1)) \\approx 0.340 \\\\\n", + "\\end{equation*}" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.7.0" + }, + "toc": { + "base_numbering": 1, + "nav_menu": {}, + "number_sections": true, + "sideBar": true, + "skip_h1_title": true, + "title_cell": "Table of Contents", + "title_sidebar": "Contents", + "toc_cell": true, + "toc_position": {}, + "toc_section_display": true, + "toc_window_display": true + }, + "varInspector": { + "cols": { + "lenName": 16, + "lenType": 16, + "lenVar": 40 + }, + "kernels_config": { + "python": { + "delete_cmd_postfix": "", + "delete_cmd_prefix": "del ", + "library": "var_list.py", + "varRefreshCmd": "print(var_dic_list())" + }, + "r": { + "delete_cmd_postfix": ") ", + "delete_cmd_prefix": "rm(", + "library": "var_list.r", + "varRefreshCmd": "cat(var_dic_list()) " + } + }, + "types_to_exclude": [ + "module", + "function", + "builtin_function_or_method", + "instance", + "_Feature" + ], + "window_display": false + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/analysis_and_scripts/Bachelors_thesis_analyses.ipynb b/analysis_and_scripts/Bachelors_thesis_analyses_OLD.ipynb similarity index 100% rename from analysis_and_scripts/Bachelors_thesis_analyses.ipynb rename to analysis_and_scripts/Bachelors_thesis_analyses_OLD.ipynb diff --git a/analysis_and_scripts/Untitled.ipynb b/analysis_and_scripts/Untitled.ipynb deleted file mode 100644 index 9bc5b75..0000000 --- a/analysis_and_scripts/Untitled.ipynb +++ /dev/null @@ -1,587 +0,0 @@ -{ - "cells": [ - { - "cell_type": "markdown", - "metadata": { - "toc": true - }, - "source": [ - "<h1>Table of Contents<span class=\"tocSkip\"></span></h1>\n", - "<div class=\"toc\"><ul class=\"toc-item\"><li><span><a href=\"#Causal-model\" data-toc-modified-id=\"Causal-model-1\"><span class=\"toc-item-num\">1 </span>Causal model</a></span><ul class=\"toc-item\"><li><span><a href=\"#Notes\" data-toc-modified-id=\"Notes-1.1\"><span class=\"toc-item-num\">1.1 </span>Notes</a></span></li></ul></li><li><span><a href=\"#Algorithms\" data-toc-modified-id=\"Algorithms-2\"><span class=\"toc-item-num\">2 </span>Algorithms</a></span><ul class=\"toc-item\"><li><span><a href=\"#Contraction-algorithm\" data-toc-modified-id=\"Contraction-algorithm-2.1\"><span class=\"toc-item-num\">2.1 </span>Contraction algorithm</a></span></li><li><span><a href=\"#Causal-algorithm\" data-toc-modified-id=\"Causal-algorithm-2.2\"><span class=\"toc-item-num\">2.2 </span>Causal algorithm</a></span></li></ul></li><li><span><a href=\"#Performance-comparison\" data-toc-modified-id=\"Performance-comparison-3\"><span class=\"toc-item-num\">3 </span>Performance comparison</a></span><ul class=\"toc-item\"><li><span><a href=\"#Predictive-models\" data-toc-modified-id=\"Predictive-models-3.1\"><span class=\"toc-item-num\">3.1 </span>Predictive models</a></span></li><li><span><a href=\"#Visual-comparison\" data-toc-modified-id=\"Visual-comparison-3.2\"><span class=\"toc-item-num\">3.2 </span>Visual comparison</a></span></li></ul></li></ul></div>" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Causal model\n", - "\n", - "Our model is defined by the probabilistic expression \n", - "\n", - "\\begin{equation}\\label{model_disc}\n", - "P(Y=0 | \\text{do}(R=r)) = \\sum_x \\underbrace{P(Y=0|X=x, T=1)}_\\text{1} \n", - "\\overbrace{P(T=1|R=r, X=x)}^\\text{2} \n", - "\\underbrace{P(X=x)}_\\text{3}\n", - "\\end{equation}\n", - "\n", - "which is equal to \n", - "\n", - "\\begin{equation}\\label{model_cont}\n", - "P(Y=0 | \\text{do}(R=r)) = \\int_x P(Y=0|X=x, T=1)P(T=1|R=r, X=x)P(X=x)\n", - "\\end{equation}\n", - "\n", - "for continuous $x$. Model as a graph (Z is a latent variable, and can be excluded from the expression with do-calculus by showing that $X$ is admissible for adjustment):\n", - "\n", - "<!---  --->\n", - "\n", - "For predicting the probability of negative outcome the following should hold because by Pearl $P(Y=0 | \\text{do}(R=r), X=x) = P(Y=0 | R=r, X=x)$ when $X$ is an admissible set:\n", - "\n", - "\\begin{equation} \\label{model_pred}\n", - "P(Y=0 | \\text{do}(R=r), X=x) = P(Y=0|X=x, T=1)P(T=1|R=r, X=x).\n", - "\\end{equation}\n", - "\n", - "Still it should be noted that this prediction takes into account the probability of the individual to be given a positive decision ($T=1$), see second term in \\ref{model_pred}.\n", - "\n", - "----\n", - "\n", - "### Notes\n", - "\n", - "* Equations \\ref{model_disc} and \\ref{model_cont} describe the whole causal effect in the population (the causal effect of changing $r$ over all strata $X$).\n", - "* Prediction should be possible with \\ref{model_pred}. Both terms can be learned from the data. NB: the probability $P(Y=0 | \\text{do}(R=r), X=x)$ is lowest when the individual $x$ is the most dangerous or the least dangerous. How could we infer/predict the counterfactual \"what is the probability of $Y=0$ if we were to let this individual go?\" has yet to be calculated.\n", - "* Is the effect of R learned/estimated correctly if it is just plugged in to a predictive model (e.g. logistic regression)?\n", - "* $P(Y=0 | do(R=0)) = 0$ only in this application. My predictive models say that when $r=0$ the probability $P(Y=0) \\approx 0.027$ which would be a natural estimate in another application/scenario (e.g. in medicine the probability of an adverse event when a stronger medicine is distributed to everyone. Then the probability will be close to zero but not exactly zero.)" - ] - }, - { - "cell_type": "code", - "execution_count": 3, - "metadata": {}, - "outputs": [ - { - "name": "stderr", - "output_type": "stream", - "text": [ - "C:\\ProgramData\\Anaconda3\\lib\\site-packages\\sklearn\\ensemble\\weight_boosting.py:29: DeprecationWarning: numpy.core.umath_tests is an internal NumPy module and should not be imported. It will be removed in a future NumPy release.\n", - " from numpy.core.umath_tests import inner1d\n" - ] - } - ], - "source": [ - "# Imports\n", - "\n", - "import numpy as np\n", - "import pandas as pd\n", - "from datetime import datetime\n", - "import matplotlib.pyplot as plt\n", - "import scipy.stats as scs\n", - "import scipy.integrate as si\n", - "import seaborn as sns\n", - "import numpy.random as npr\n", - "from sklearn.preprocessing import OneHotEncoder\n", - "from sklearn.linear_model import LogisticRegression\n", - "from sklearn.ensemble import RandomForestClassifier\n", - "\n", - "# Settings\n", - "\n", - "%matplotlib inline\n", - "\n", - "plt.rcParams.update({'font.size': 16})\n", - "plt.rcParams.update({'figure.figsize': (14, 7)})\n", - "\n", - "# Suppress deprecation warnings.\n", - "\n", - "import warnings\n", - "\n", - "def fxn():\n", - " warnings.warn(\"deprecated\", DeprecationWarning)\n", - "\n", - "with warnings.catch_warnings():\n", - " warnings.simplefilter(\"ignore\")\n", - " fxn()" - ] - }, - { - "cell_type": "code", - "execution_count": 4, - "metadata": {}, - "outputs": [], - "source": [ - "# Set seed for reproducibility\n", - "npr.seed(0)\n", - "\n", - "def generateData(nJudges_M=100,\n", - " nSubjects_N=500,\n", - " beta_X=1.0,\n", - " beta_Z=1.0,\n", - " beta_W=0.2):\n", - "\n", - " # Assign judge IDs as running numbering from 0 to nJudges_M - 1\n", - " judgeID_J = np.repeat(np.arange(0, nJudges_M, dtype=np.int32), nSubjects_N)\n", - "\n", - " # Sample acceptance rates uniformly from a closed interval\n", - " # from 0.1 to 0.9 and round to tenth decimal place.\n", - " acceptance_rates = np.round(npr.uniform(.1, .9, nJudges_M), 10)\n", - "\n", - " # Replicate the rates so they can be attached to the corresponding judge ID.\n", - " acceptanceRate_R = np.repeat(acceptance_rates, nSubjects_N)\n", - "\n", - " # Sample the variables from standard Gaussian distributions.\n", - " X = npr.normal(size=nJudges_M * nSubjects_N)\n", - " Z = npr.normal(size=nJudges_M * nSubjects_N)\n", - " W = npr.normal(size=nJudges_M * nSubjects_N)\n", - "\n", - " probabilities_Y = 1 / (1 + np.exp(-(beta_X * X + beta_Z * Z + beta_W * W)))\n", - "\n", - " # 0 if P(Y = 0| X = x; Z = z; W = w) >= 0.5 , 1 otherwise\n", - " result_Y = 1 - probabilities_Y.round()\n", - "\n", - " probabilities_T = 1 / (1 + np.exp(-(beta_X * X + beta_Z * Z)))\n", - " probabilities_T += npr.normal(0, np.sqrt(0.1), nJudges_M * nSubjects_N)\n", - "\n", - " # Initialize decision values as 1\n", - " decision_T = np.ones(nJudges_M * nSubjects_N)\n", - "\n", - " # Initialize the dataframe\n", - " df_init = pd.DataFrame(\n", - " np.column_stack((judgeID_J, acceptanceRate_R, X, Z, W, result_Y,\n", - " probabilities_T, decision_T)),\n", - " columns=[\n", - " \"judgeID_J\", \"acceptanceRate_R\", \"X\", \"Z\", \"W\", \"result_Y\",\n", - " \"probabilities_T\", \"decision_T\"\n", - " ])\n", - "\n", - " # Sort by judges then probabilities\n", - " data = df_init.sort_values(\n", - " by=[\"judgeID_J\", \"probabilities_T\"], ascending=False)\n", - "\n", - " # Iterate over the data. Subject is in the top (1-r)*100% if\n", - " # his within-judge-index is over acceptance threshold times\n", - " # the number of subjects assigned to each judge. If subject\n", - " # is over the limit they are assigned a zero, else one.\n", - " data.reset_index(drop=True, inplace=True)\n", - "\n", - " data['decision_T'] = np.where(\n", - " (data.index.values % nSubjects_N) <\n", - " ((1 - data['acceptanceRate_R']) * nSubjects_N), 0, 1)\n", - "\n", - " return data\n", - "\n", - "df = generateData()" - ] - }, - { - "cell_type": "code", - "execution_count": 5, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "(25000, 8)\n", - "(25000, 8)\n", - "(25000, 8)\n", - "(25000, 8)\n" - ] - }, - { - "data": { - "text/html": [ - "<div>\n", - "<style scoped>\n", - " .dataframe tbody tr th:only-of-type {\n", - " vertical-align: middle;\n", - " }\n", - "\n", - " .dataframe tbody tr th {\n", - " vertical-align: top;\n", - " }\n", - "\n", - " .dataframe thead th {\n", - " text-align: right;\n", - " }\n", - "</style>\n", - "<table border=\"1\" class=\"dataframe\">\n", - " <thead>\n", - " <tr style=\"text-align: right;\">\n", - " <th>decision_T</th>\n", - " <th>1</th>\n", - " </tr>\n", - " <tr>\n", - " <th>result_Y</th>\n", - " <th></th>\n", - " </tr>\n", - " </thead>\n", - " <tbody>\n", - " <tr>\n", - " <th>0.0</th>\n", - " <td>3650</td>\n", - " </tr>\n", - " <tr>\n", - " <th>1.0</th>\n", - " <td>8216</td>\n", - " </tr>\n", - " </tbody>\n", - "</table>\n", - "</div>" - ], - "text/plain": [ - "decision_T 1\n", - "result_Y \n", - "0.0 3650\n", - "1.0 8216" - ] - }, - "execution_count": 5, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "# Split the data set to test and train\n", - "from sklearn.model_selection import train_test_split\n", - "train, test = train_test_split(df, test_size=0.5, random_state=0)\n", - "\n", - "print(train.shape)\n", - "print(test.shape)\n", - "\n", - "train_labeled = train.copy()\n", - "test_labeled = test.copy()\n", - "\n", - "# Set results as NA if decision is negative.\n", - "train_labeled.result_Y = np.where(train.decision_T == 0, np.nan, train.result_Y)\n", - "test_labeled.result_Y = np.where(test.decision_T == 0, np.nan, test.result_Y)\n", - "\n", - "print(train_labeled.shape)\n", - "print(test_labeled.shape)\n", - "\n", - "tab = train_labeled.groupby(['result_Y', 'decision_T']).size()\n", - "tab.unstack()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Algorithms\n", - "\n", - "### Contraction algorithm\n", - "\n", - "Below is an implementation of Lakkaraju's team's algorithm presented in [their paper](https://helka.finna.fi/PrimoRecord/pci.acm3098066). Relevant parameters to be passed to the function are presented in the description." - ] - }, - { - "cell_type": "code", - "execution_count": 6, - "metadata": {}, - "outputs": [], - "source": [ - "def contraction(df,\n", - " judgeIDJ_col,\n", - " decisionT_col,\n", - " resultY_col,\n", - " modelProbS_col,\n", - " accRateR_col,\n", - " r,\n", - " binning=False):\n", - " '''\n", - " This is an implementation of the algorithm presented by Lakkaraju\n", - " et al. in their paper \"The Selective Labels Problem: Evaluating \n", - " Algorithmic Predictions in the Presence of Unobservables\" (2017).\n", - " \n", - " Parameters:\n", - " df = The (Pandas) data frame containing the data, judge decisions,\n", - " judge IDs, results and probability scores.\n", - " judgeIDJ_col = String, the name of the column containing the judges' IDs\n", - " in df.\n", - " decisionT_col = String, the name of the column containing the judges' decisions\n", - " resultY_col = String, the name of the column containing the realization\n", - " modelProbS_col = String, the name of the column containing the probability\n", - " scores from the black-box model B.\n", - " accRateR_col = String, the name of the column containing the judges' \n", - " acceptance rates\n", - " r = Float between 0 and 1, the given acceptance rate.\n", - " binning = Boolean, should judges with same acceptance rate be binned\n", - " \n", - " Returns:\n", - " u = The estimated failure rate at acceptance rate r.\n", - " '''\n", - " # Sort first by acceptance rate and judge ID.\n", - " sorted_df = df.sort_values(\n", - " by=[accRateR_col, judgeIDJ_col], ascending=False)\n", - "\n", - " if binning:\n", - " # Get maximum leniency\n", - " max_leniency = sorted_df[accRateR_col].values[0].round(1)\n", - "\n", - " # Get list of judges that are the most lenient\n", - " most_lenient_list = sorted_df.loc[sorted_df[accRateR_col].round(1) ==\n", - " max_leniency, judgeIDJ_col]\n", - "\n", - " # Subset to obtain D_q\n", - " D_q = sorted_df[sorted_df[judgeIDJ_col].isin(\n", - " most_lenient_list.unique())].copy()\n", - " else:\n", - " # Get most lenient judge\n", - " most_lenient_ID = sorted_df[judgeIDJ_col].values[0]\n", - "\n", - " # Subset\n", - " D_q = sorted_df[sorted_df[judgeIDJ_col] == most_lenient_ID].copy()\n", - "\n", - " # All observations of R_q have observed outcome labels\n", - " R_q = D_q[D_q[decisionT_col] == 1]\n", - "\n", - " # \"Observations deemed as high risk by B are at the top of this list\"\n", - " R_sort_q = R_q.sort_values(by=modelProbS_col, ascending=False)\n", - "\n", - " number_to_remove = int(\n", - " round((1.0 - r) * D_q.shape[0] - (D_q.shape[0] - R_q.shape[0])))\n", - "\n", - " # \"R_B is the list of observations assigned to t = 1 by B\"\n", - " R_B = R_sort_q[number_to_remove:R_sort_q.shape[0]]\n", - "\n", - " return np.sum(R_B[resultY_col] == 0) / D_q.shape[0]" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Causal algorithm\n", - "\n" - ] - }, - { - "cell_type": "code", - "execution_count": 7, - "metadata": {}, - "outputs": [], - "source": [ - "def f(x, model, class_value):\n", - " '''\n", - " Parameters:\n", - " x = individual features\n", - " model = a trained sklearn predictive model. Predicts probabilities for given x.\n", - " class_value = the result (class) to predict (usually 0 or 1).\n", - " \n", - " Returns:\n", - " The probabilities (as vector) of class value (class_value) given \n", - " individual features (x) and the trained, predictive model (model).\n", - " '''\n", - " if x.ndim == 1:\n", - " # if x is vector, transform to column matrix.\n", - " f_values = model.predict_proba(np.array(x).reshape(-1, 1))\n", - " else:\n", - " f_values = model.predict_proba(x)\n", - "\n", - " return f_values[:, model.classes_ == class_value].flatten()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Performance comparison\n", - "\n", - "Below we try to replicate the results obtained by Lakkaraju and compare their model's performance to the one of ours.\n", - "\n", - "### Predictive models\n", - "\n", - "Lakkaraju says that they used logistic regression. We construct the models using only *observed observations*, i.e. observations for which labels are available. We then predict the probability of negative outcome for all observations in the test data and attach it to our data set." - ] - }, - { - "cell_type": "code", - "execution_count": 8, - "metadata": {}, - "outputs": [], - "source": [ - "# instantiate the model (using the default parameters)\n", - "logreg = LogisticRegression(solver='lbfgs')\n", - "\n", - "# fit, reshape X to be of shape (n_samples, n_features)\n", - "logreg = logreg.fit(\n", - " train_labeled.X[train_labeled.decision_T == 1].values.reshape(-1, 1),\n", - " train_labeled.result_Y[train_labeled.decision_T == 1])\n", - "\n", - "# predict probabilities and attach to data\n", - "label_probs_logreg = logreg.predict_proba(test.X.values.reshape(-1, 1))\n", - "\n", - "test = test.assign(B_prob_0_logreg=label_probs_logreg[:, 0])\n", - "test_labeled = test_labeled.assign(B_prob_0_logreg=label_probs_logreg[:, 0])" - ] - }, - { - "cell_type": "code", - "execution_count": 9, - "metadata": {}, - "outputs": [], - "source": [ - "# Train model for predicting the probability of positive decision with a given\n", - "# leniency r and indivual features x.\n", - "\n", - "# Instantiate the model (using the default parameters)\n", - "decision_model = LogisticRegression(solver='lbfgs')\n", - "\n", - "# fit, reshape X to be of shape (n_samples, n_features)\n", - "decision_model = decision_model.fit(train[['X', 'acceptanceRate_R']], train.decision_T)\n" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Visual comparison\n", - "\n", - "Let's plot the failure rates against the acceptance rates using the difference." - ] - }, - { - "cell_type": "code", - "execution_count": 11, - "metadata": {}, - "outputs": [ - { - "data": { - "image/png": "\n", - "text/plain": [ - "<Figure size 1008x576 with 1 Axes>" - ] - }, - "metadata": { - "needs_background": "light" - }, - "output_type": "display_data" - } - ], - "source": [ - "failure_rates = np.zeros((8, 5))\n", - "\n", - "for r in np.arange(1, 9):\n", - " \n", - " #### True evaluation\n", - " # Sort by failure probabilities, subjects with the smallest risk are first. \n", - " df_sorted = test.sort_values(\n", - " by='B_prob_0_logreg', inplace=False, ascending=True)\n", - "\n", - " to_release = int(round(df_sorted.shape[0] * r / 10))\n", - "\n", - " # Failure was coded as zero.\n", - " failure_rates[r - 1, 0] = np.mean(df_sorted.result_Y[0:to_release] == 0)\n", - " \n", - " #### Labeled outcomes only\n", - " # Sort by failure probabilities, subjects with the smallest risk are first. \n", - " df_sorted = test_labeled.sort_values(\n", - " by='B_prob_0_logreg', inplace=False, ascending=True)\n", - " \n", - " # Ensure that only labeled outcomes are available\n", - " df_sorted = df_sorted[df_sorted.decision_T == 1]\n", - " \n", - " to_release = int(round(df_sorted.shape[0] * r / 10))\n", - "\n", - " failure_rates[r - 1, 1] = np.mean(df_sorted.result_Y[0:to_release] == 0)\n", - " \n", - " #### Human error rate\n", - " # Get judges with correct leniency as list\n", - " correct_leniency_list = test_labeled.judgeID_J[\n", - " test_labeled['acceptanceRate_R'].round(1) == r / 10].values\n", - "\n", - " # Released are the people they judged and released, T = 1\n", - " released = test_labeled[test_labeled.judgeID_J.isin(correct_leniency_list)\n", - " & (test_labeled.decision_T == 1)]\n", - "\n", - " # Get their failure rate, aka ratio of reoffenders to number of people judged in total\n", - " failure_rates[r - 1, 2] = np.sum(\n", - " released.result_Y == 0) / correct_leniency_list.shape[0]\n", - " # onko jakaja oikein\n", - " \n", - " #### Contraction, logistic regression\n", - " failure_rates[r - 1, 3] = contraction(\n", - " test_labeled, 'judgeID_J', 'decision_T', 'result_Y', 'B_prob_0_logreg',\n", - " 'acceptanceRate_R', r / 10, False)\n", - "\n", - " #### P(Y=0 | T=1, X=x)*P(T=1 | R=r, X=x)*P(X=x)\n", - " failure_rates[r - 1, 4] = si.quad(lambda x: f(np.array([x]), logreg, 0)*f(np.array([[x, r/10]]), decision_model, 1)*scs.norm.pdf(x), -np.inf, np.inf)[0]\n", - "\n", - "# klassifikaatioille scipy.stats semin kautta error barit xerr ja yerr argumenttien kautta\n", - "\n", - "plt.figure(figsize=(14, 8))\n", - "plt.plot(np.arange(0.1, 0.9, .1), failure_rates[:, 0], label='True Evaluation', c='green')\n", - "plt.plot(np.arange(0.1, 0.9, .1), failure_rates[:, 1], label='Labeled outcomes', c='lime')\n", - "plt.plot(np.arange(0.1, 0.9, .1), failure_rates[:, 2], label='Human evaluation', c='red')\n", - "plt.plot(np.arange(0.1, 0.9, .1), failure_rates[:, 3], label='Contraction, log.', c='blue')\n", - "plt.plot(np.arange(0.1, 0.9, .1), failure_rates[:, 4], label='Integrand', c='magenta')\n", - "\n", - "plt.title('Failure rate vs. Acceptance rate')\n", - "plt.xlabel('Acceptance rate')\n", - "plt.ylabel('Failure rate')\n", - "plt.legend()\n", - "plt.grid()\n", - "plt.show()" - ] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.7.0" - }, - "toc": { - "base_numbering": 1, - "nav_menu": {}, - "number_sections": true, - "sideBar": true, - "skip_h1_title": true, - "title_cell": "Table of Contents", - "title_sidebar": "Contents", - "toc_cell": true, - "toc_position": {}, - "toc_section_display": true, - "toc_window_display": true - }, - "varInspector": { - "cols": { - "lenName": 16, - "lenType": 16, - "lenVar": 40 - }, - "kernels_config": { - "python": { - "delete_cmd_postfix": "", - "delete_cmd_prefix": "del ", - "library": "var_list.py", - "varRefreshCmd": "print(var_dic_list())" - }, - "r": { - "delete_cmd_postfix": ") ", - "delete_cmd_prefix": "rm(", - "library": "var_list.r", - "varRefreshCmd": "cat(var_dic_list()) " - } - }, - "types_to_exclude": [ - "module", - "function", - "builtin_function_or_method", - "instance", - "_Feature" - ], - "window_display": false - } - }, - "nbformat": 4, - "nbformat_minor": 2 -} -- GitLab