# *Causal approach to selective labels*

Repository for the research project called *Causal approach to selective labels* 
with the required data sets. This repository has originally been forked from 
[the ProPublica repository for COMPAS analysis](https://github.com/propublica/compas-analysis).

## Structure of the repository

The contents of the repository is divided into four main folders: 
* `analysis_and_scripts` contains the scripts and notebooks for performing the analysis. Additionally, the
folder contains `notes.tex` file which contains much of the different research done.
* `data` folder contains the original data sets from the ProPublica analysis (see below for more information)
* `figures` contains the figures used for the notes file mentioned earlier. This 
folder is also used for the figures for the BSocSc thesis.
* `paper` contains the draft for a research publication.

Original README:

```               Low  High
                 +---------+
Didn't Reoffend  |____|____|
Reoffended       |    |    |
                 +---------+


This repository contains a Jupyter notebook and data for the ProPublica story "Machine Bias."

Story:
https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing/

Methodology:
https://www.propublica.org/article/how-we-analyzed-the-compas-recidivism-algorithm/

Notebook (you'll probably want to follow along in the methodology):
https://github.com/propublica/compas-analysis/blob/master/Compas%20Analysis.ipynb

Main Dataset:
compas.db - a sqlite3 database containing criminal history, jail and prison time, demographics and COMPAS risk scores for defendants from Broward County.

Other files as needed for the analysis.