Diferencia entre revisiones de «Fairness and bias correction in Machine Learning»

De FdIwiki ELP
Saltar a: navegación, buscar
(Other fairness criteria)
(/* Adversarial debiasing Brian Hu Zhang; Blake Lemoine; Margaret Mitchell, Mitigating Unwanted Biases with Adversarial Learning. Retrieved 17 December 2019 Joyce Xu, Algorithmic Solutions to Algorithmic Bias: A Technical Guide. Retrieved 17 December...)
 
(No se muestran 19 ediciones intermedias de 3 usuarios)
Línea 1: Línea 1:
 
''Spanish version: [[Equidad y corrección de sesgos en Aprendizaje Automático|Equidad y corrección de sesgos en Aprendizaje Automático]]''
 
''Spanish version: [[Equidad y corrección de sesgos en Aprendizaje Automático|Equidad y corrección de sesgos en Aprendizaje Automático]]''
  
In machine learning, a given algorithm is said to be '''fair''', or to have '''fairness''' if its results are independent of some variables we consider to be sensitive and not related with it (f.e.: gender, race, sexual orientation, etc.). In this article, several criteria and metrics to test if an algorithm is fair, or not, are explained. There is also a section about techniques to avoid bias and make fair algorithms.
+
In machine learning, a given algorithm is said to be '''fair''', or to have '''fairness''' if its results are independent of some variables we consider to be sensitive and not related with it (f.e.: gender, ethnicity, sexual orientation, etc.).
 +
 
 +
== Context ==
 +
 
 +
Research about fairness in machine learning is a relatively recent topic. Most of articles about it have been written in the last three years<ref name="Articles">[https://fairmlclass.github.io/1.html#/4 ''Moritz Hardt, Berkeley'']. Retrieved 18 December 2019</ref>. Some of the most important facts in this topic are the following:
 +
* In 2018, IBM introduces AI Fairness 360, a Python library with several algorithms to reduce bias in a program, increasing its fairness. <ref name="IBM">[https://aif360.mybluemix.net/ ''IBM AI Fairness 360'']. Retrieved 18 December 2019</ref>
 +
* Facebook made public, in 2018, their use of a tool, Fairness Flow, to detect bias in their AI. However, said tool code is not accesible, and it is not known if it really corrects this bias. <ref name="Facebook">[https://qz.com/1268520/facebook-says-it-has-a-tool-to-detect-bias-in-its-artificial-intelligence/ ''Fairness Flow el detector de sesgos de Facebook'']. Retrieved 28 December 2019</ref>
 +
* In 2019, Google publishes a set of tools in Github to study the effects of fairness in the long run. <ref name="Google">[https://github.com/google/ml-fairness-gym ''ML-Fairness gym'']. Retrieved 18 December 2019</ref>
 +
 
 +
A pesar de que se siguen perfeccionando los algoritmos utilizados, los principales avances vienen de la concienciación por parte de algunas grandes empresas de la importancia que va a tener en la sociedad la reducción del sesgo en los algoritmos de aprendizaje automático en un futuro.
  
 
== Fairness criteria in classification problems<ref name="Barocas">Solon Barocas; Moritz Hardt; Arvind Narayanan, [http://www.fairmlbook.org ''Fairness and Machine Learning'']. Retrieved 15 December 2019.</ref> ==
 
== Fairness criteria in classification problems<ref name="Barocas">Solon Barocas; Moritz Hardt; Arvind Narayanan, [http://www.fairmlbook.org ''Fairness and Machine Learning'']. Retrieved 15 December 2019.</ref> ==
  
In [[wikipedia: Statistical classification | classification]] problems, an algorithm learns a function to predict a discrete characteristic <!-- <math display="inline"> Y </math> --> [[File: Y.JPG|1000x1000px|frameless|text-bottom]], the target variable, from known characteristics <!-- <math display="inline"> X </math> --> [[File: X.JPG|1000x1000px|frameless|text-bottom]]. We model <!-- <math display="inline"> A </math> --> [[File: A.JPG|1000x1000px|frameless|text-bottom]] as a discrete [[wikipedia:random variable | random variable]] which encodes some characteristics contained or implictly encoded in <!-- <math display="inline"> X </math> --> [[File: X.JPG|1000x1000px|frameless|text-bottom]] that we consider as sensitive characteristics (gender, ethnicity, sexuality, etc.). We finally denote by <!-- <math display="inline"> R </math> --> [[File: R.JPG|1000x1000px|frameless|text-bottom]] the prediction of the classifier.
+
In [[wikipedia: Statistical classification | classification]] problems, an algorithm learns a function to predict a discrete characteristic <!-- <math display="inline"> Y </math> --> [[File: Y.JPG|1000x1000px|frameless|text-bottom]], the target variable, from known characteristics <!-- <math display="inline"> X </math> --> [[File: X.JPG|1000x1000px|frameless|text-bottom]]. We model <!-- <math display="inline"> A </math> --> [[File: A.JPG|1000x1000px|frameless|text-bottom]] as a discrete [[wikipedia:random variable | random variable]] which encodes some characteristics contained or implictly encoded in <!-- <math display="inline"> X </math> --> [[File: X.JPG|1000x1000px|frameless|text-bottom]] that we consider as sensitive characteristics (gender, ethnicity, sexual orientation, etc.). We finally denote by <!-- <math display="inline"> R </math> --> [[File: R.JPG|1000x1000px|frameless|text-bottom]] the prediction of the classifier.
 
Now let us define three main criteria to evaluate if a given classifier is fair, that is, if its predictions are not influenced by some of this sensitive variables.
 
Now let us define three main criteria to evaluate if a given classifier is fair, that is, if its predictions are not influenced by some of this sensitive variables.
  
 
=== Independence ===
 
=== Independence ===
  
We say the [[wikipedia:random variable | random variables]] <!-- <math display="inline">(R,A)</math> --> [[File: R,A.JPG|1000x1000px|frameless]] satisfy independence if the sensitive characteristics <!-- <math display="inline"> A </math> --> [[File: A.JPG|1000x1000px|frameless|text-bottom]] are [[wikipedia:Independence (probability theory)|statistically independent]] to the prediction <!-- <math display="inline"> R </math> --> [[File: R.JPG|1000x1000px|frameless|text-bottom]], and we write <!-- <math display="inline"> R \bot A </math> --> [[File: RbotA.JPG|1000x1000px|frameless]].
+
We say the [[wikipedia:random variable | random variables]] <!-- <math display="inline">(R,A)</math> --> [[File: R,A.JPG|1000x1000px|frameless]] satisfy '''independence''' if the sensitive characteristics <!-- <math display="inline"> A </math> --> [[File: A.JPG|1000x1000px|frameless|text-bottom]] are [[wikipedia:Independence (probability theory)|statistically independent]] to the prediction <!-- <math display="inline"> R </math> --> [[File: R.JPG|1000x1000px|frameless|text-bottom]], and we write <!-- <math display="inline"> R \bot A </math> --> [[File: RbotA.JPG|1000x1000px|frameless]].
  
 
We can also express this notion with the following formula:
 
We can also express this notion with the following formula:
Línea 17: Línea 26:
 
This means that the [[wikipedia:probability theory | probability]] of being classified by the algorithm in each of the groups is equal for two individuals with different sensitive characteristics.
 
This means that the [[wikipedia:probability theory | probability]] of being classified by the algorithm in each of the groups is equal for two individuals with different sensitive characteristics.
  
Yet another equivalent expression for equivalence can be given using the concept of [[wikipedia:mutual information | mutual information]] between [[wikipedia:random variables | random variables]], defined as  
+
Yet another equivalent expression for independence can be given using the concept of [[wikipedia:mutual information | mutual information]] between [[wikipedia:random variables | random variables]], defined as  
 
<!-- <math display="block"> I(X,Y) = H(X) + H(Y) - H(X,Y) </math> -->  
 
<!-- <math display="block"> I(X,Y) = H(X) + H(Y) - H(X,Y) </math> -->  
 
[[File:MutInf.JPG|1000x1000px|frameless|center]]
 
[[File:MutInf.JPG|1000x1000px|frameless|center]]
Línea 30: Línea 39:
 
=== Separation ===
 
=== Separation ===
  
We say the [[wikipedia:random variable | random variables]] <!-- <math display="inline">(R,A,Y)</math> --> [[File: R,A,Y.JPG|1000x1000px|frameless]] satisfy separation if the sensitive characteristics <!-- <math display="inline"> A </math> --> [[File: A.JPG|1000x1000px|frameless|text-bottom]] are [[wikipedia:Independence (probability theory)|statistically independent]] to the prediction <!-- <math display="inline"> R </math> --> [[File: R.JPG|1000x1000px|frameless|text-bottom]] given the target value <!-- <math display="inline"> Y </math> --> [[File: Y.JPG|1000x1000px|frameless|text-bottom]], and we write <!-- <math display="inline"> R \bot A | Y </math> --> [[File: RbotAbarY.JPG|1000x1000px|frameless]].
+
We say the [[wikipedia:random variable | random variables]] <!-- <math display="inline">(R,A,Y)</math> --> [[File: R,A,Y.JPG|1000x1000px|frameless]] satisfy '''separation''' if the sensitive characteristics <!-- <math display="inline"> A </math> --> [[File: A.JPG|1000x1000px|frameless|text-bottom]] are [[wikipedia:Independence (probability theory)|statistically independent]] to the prediction <!-- <math display="inline"> R </math> --> [[File: R.JPG|1000x1000px|frameless|text-bottom]] given the target value <!-- <math display="inline"> Y </math> --> [[File: Y.JPG|1000x1000px|frameless|text-bottom]], and we write <!-- <math display="inline"> R \bot A | Y </math> --> [[File: RbotAbarY.JPG|1000x1000px|frameless]].
  
 
We can also express this notion with the following formula:
 
We can also express this notion with the following formula:
Línea 47: Línea 56:
 
=== Sufficiency ===
 
=== Sufficiency ===
  
We say the [[wikipedia:random variable | random variables]] <!-- <math display="inline">(R,A,Y)</math> --> [[File: R,A,Y.JPG|1000x1000px|frameless]] satisfy sufficiency if the sensitive characteristics <!-- <math display="inline"> A </math> --> [[File: A.JPG|1000x1000px|frameless|text-bottom]] are [[wikipedia:Independence (probability theory)|statistically independent]] to the target value <!-- <math display="inline"> Y </math> --> [[File: Y.JPG|1000x1000px|frameless|text-bottom]] given the prediction <!-- <math display="inline"> R </math> --> [[File: R.JPG|1000x1000px|frameless|text-bottom]], and we write <!-- <math display="inline"> Y \bot A | R </math> --> [[File: YbotAbarR.JPG|1000x1000px|frameless]].
+
We say the [[wikipedia:random variable | random variables]] <!-- <math display="inline">(R,A,Y)</math> --> [[File: R,A,Y.JPG|1000x1000px|frameless]] satisfy '''sufficiency''' if the sensitive characteristics <!-- <math display="inline"> A </math> --> [[File: A.JPG|1000x1000px|frameless|text-bottom]] are [[wikipedia:Independence (probability theory)|statistically independent]] to the target value <!-- <math display="inline"> Y </math> --> [[File: Y.JPG|1000x1000px|frameless|text-bottom]] given the prediction <!-- <math display="inline"> R </math> --> [[File: R.JPG|1000x1000px|frameless|text-bottom]], and we write <!-- <math display="inline"> Y \bot A | R </math> --> [[File: YbotAbarR.JPG|1000x1000px|frameless]].
  
 
We can also express this notion with the following formula:
 
We can also express this notion with the following formula:
Línea 60: Línea 69:
 
* If <!-- <math display="inline"> A </math> --> [[File: A.JPG|1000x1000px|frameless|text-bottom]] and <!-- <math display="inline"> Y </math> --> [[File: Y.JPG|1000x1000px|frameless|text-bottom]] are not [[wikipedia:Independence (probability theory)|statistically independent]], then sufficiency and independence cannot both hold.
 
* If <!-- <math display="inline"> A </math> --> [[File: A.JPG|1000x1000px|frameless|text-bottom]] and <!-- <math display="inline"> Y </math> --> [[File: Y.JPG|1000x1000px|frameless|text-bottom]] are not [[wikipedia:Independence (probability theory)|statistically independent]], then sufficiency and independence cannot both hold.
 
* Assumming <!-- <math display="inline"> Y </math> --> [[File: Y.JPG|1000x1000px|frameless|text-bottom]] is binary, if <!-- <math display="inline"> A </math> --> [[File: A.JPG|1000x1000px|frameless|text-bottom]] and <!-- <math display="inline"> Y </math> --> [[File: Y.JPG|1000x1000px|frameless|text-bottom]] are not [[wikipedia:Independence (probability theory)|statistically independent]], and <!-- <math display="inline"> R </math> --> [[File: R.JPG|1000x1000px|frameless|text-bottom]] and <!-- <math display="inline"> Y </math> --> [[File: Y.JPG|1000x1000px|frameless|text-bottom]] are not [[wikipedia:Independence (probability theory)|statistically independent]] either, then independence and separation cannot both hold.
 
* Assumming <!-- <math display="inline"> Y </math> --> [[File: Y.JPG|1000x1000px|frameless|text-bottom]] is binary, if <!-- <math display="inline"> A </math> --> [[File: A.JPG|1000x1000px|frameless|text-bottom]] and <!-- <math display="inline"> Y </math> --> [[File: Y.JPG|1000x1000px|frameless|text-bottom]] are not [[wikipedia:Independence (probability theory)|statistically independent]], and <!-- <math display="inline"> R </math> --> [[File: R.JPG|1000x1000px|frameless|text-bottom]] and <!-- <math display="inline"> Y </math> --> [[File: Y.JPG|1000x1000px|frameless|text-bottom]] are not [[wikipedia:Independence (probability theory)|statistically independent]] either, then independence and separation cannot both hold.
* If <!-- <math display="inline">(R,A,Y)</math> --> [[File: R,A,Y.JPG|1000x1000px|frameless]]  as a [[wikipedia:joint distribution | joint distribution]] has positive [[wikipedia:probability | probability]] for all its possible values and <!-- <math display="inline"> A </math> --> [[File: A.JPG|1000x1000px|frameless|text-bottom]] and <!-- <math display="inline"> Y </math> --> [[File: Y.JPG|1000x1000px|frameless|text-bottom]] are not [[wikipedia:Independence (probability theory)|statistically independent]], then separation and sufficiency cannot both hold.
+
* If <!-- <math display="inline">(R,A,Y)</math> --> [[File: R,A,Y.JPG|1000x1000px|frameless]]  as a [[wikipedia:joint distribution | joint distribution]] has positive [[wikipedia:probability theory | probability]] for all its possible values and <!-- <math display="inline"> A </math> --> [[File: A.JPG|1000x1000px|frameless|text-bottom]] and <!-- <math display="inline"> Y </math> --> [[File: Y.JPG|1000x1000px|frameless|text-bottom]] are not [[wikipedia:Independence (probability theory)|statistically independent]], then separation and sufficiency cannot both hold.
  
 
==Metrics<ref name="metrics_paper">Sahil Verma; Julia Rubin, [https://fairware.cs.umass.edu/papers/Verma.pdf ''Fairness Definitions Explained'']. Retrieved 15 December 2019</ref>==
 
==Metrics<ref name="metrics_paper">Sahil Verma; Julia Rubin, [https://fairware.cs.umass.edu/papers/Verma.pdf ''Fairness Definitions Explained'']. Retrieved 15 December 2019</ref>==
  
Most statistical measures of fairness rely on different metrics, so we will start by defining them. When working with a binary classifier, both the predicted and the actual classes can take two values: positive and negative. Now let us start explaining the different possible relations between predicted and actual outcome:[[File:Binary_confusion_matrix.png|200x150px|frame|50px|Confusion matrix]]
+
Most statistical measures of fairness rely on different metrics, so we will start by defining them. When working with a binary classifier, both the predicted and the actual classes can take two values: positive and negative. Now let us start explaining the different possible relations between predicted and actual outcome:[[File:Binary_confusion_matrix.JPG|200x150px|frame|50px|Confusion matrix]]
* True positive (TP): The case where both the predicted and the actual outcome are in the positive class.
+
* '''True positive (TP)''': The case where both the predicted and the actual outcome are in the positive class.
* True negative (TN): The case where both the predicted and the actual outcome are in the negative class.
+
* '''True negative (TN)''': The case where both the predicted and the actual outcome are in the negative class.
* False positive (FP): A case predicted to be in the positive class when the actual outcome is in the negative one.
+
* '''False positive (FP)''': A case predicted to be in the positive class when the actual outcome is in the negative one.
* False negative (FN): A case predicted to be in the negative class when the actual outcome is in the positive one.
+
* '''False negative (FN)''': A case predicted to be in the negative class when the actual outcome is in the positive one.
This relations can be easily represented with a [[wikipedia:confusion matrix | confusion matrix]], a table which describes the accuracy of a classification model. In this matrix, rows and columns represent instances of the predicted and the actual cases, respectively.
+
This relations can be easily represented with a [[wikipedia:confusion matrix | confusion matrix]], a table which describes the accuracy of a classification model. In this matrix, columns and rows represent instances of the predicted and the actual cases, respectively.
  
 
By using this relations, we can define multiple metrics which can be later used to measure the fairness of an algorithm:
 
By using this relations, we can define multiple metrics which can be later used to measure the fairness of an algorithm:
* Positive predicted value (PPV): the fraction of positive cases which were correctly predicted out of all the positive predictions. It is usually referred to as precision, and represents the [[wikipedia:probability theory | probability]] of a positive prediction to be right. It is given by the following formula:  
+
* '''Positive predicted value (PPV)''': the fraction of positive cases which were correctly predicted out of all the positive predictions. It is usually referred to as precision, and represents the [[wikipedia:probability theory | probability]] of a positive prediction to be right. It is given by the following formula:  
 
<!-- <math display="block"> PPV = P(actual=+|prediction=+) = \frac{TP}{TP+FP}</math> -->
 
<!-- <math display="block"> PPV = P(actual=+|prediction=+) = \frac{TP}{TP+FP}</math> -->
 
[[File:PPV.JPG|1000x1000px|frameless|center]]
 
[[File:PPV.JPG|1000x1000px|frameless|center]]
* False discovery rate (FDR): the fraction of positive predictions which were actually negative out of all the positive predictions. It represents the [[wikipedia:probability theory | probability]] of a positive prediction to be wrong, and it is given by the following formula:  
+
* '''False discovery rate (FDR)''': the fraction of positive predictions which were actually negative out of all the positive predictions. It represents the [[wikipedia:probability theory | probability]] of a positive prediction to be wrong, and it is given by the following formula:  
 
<!-- <math display="block"> FDR = P(actual=-|prediction=+) = \frac{FP}{TP+FP} </math> -->
 
<!-- <math display="block"> FDR = P(actual=-|prediction=+) = \frac{FP}{TP+FP} </math> -->
 
[[File:FDR.JPG|1000x1000px|frameless|center]]
 
[[File:FDR.JPG|1000x1000px|frameless|center]]
* Negative predicted value (NPV): the fraction of negative cases which were correctly predicted out of all the negative predictions. It represents the [[wikipedia:probability theory | probability]] of a negative prediction to be right, and it is given by the following formula:
+
* '''Negative predicted value (NPV)''': the fraction of negative cases which were correctly predicted out of all the negative predictions. It represents the [[wikipedia:probability theory | probability]] of a negative prediction to be right, and it is given by the following formula:
 
<!-- <math display="block"> NPV = P(actual=-|prediction=-) = \frac{TN}{TN+FN} </math> -->
 
<!-- <math display="block"> NPV = P(actual=-|prediction=-) = \frac{TN}{TN+FN} </math> -->
 
[[File:NPV.JPG|1000x1000px|frameless|center]]
 
[[File:NPV.JPG|1000x1000px|frameless|center]]
* False omission rate (FOR): the fraction of negative predictions which were actually positive out of all the negative predictions. It represents the [[wikipedia:probability theory | probability]] of a negative prediction to be wrong, and it is given by the following formula:
+
* '''False omission rate (FOR)''': the fraction of negative predictions which were actually positive out of all the negative predictions. It represents the [[wikipedia:probability theory | probability]] of a negative prediction to be wrong, and it is given by the following formula:
 
<!-- <math display="block"> FOR = P(actual=+|prediction=-) = \frac{FN}{TN+FN} </math> -->
 
<!-- <math display="block"> FOR = P(actual=+|prediction=-) = \frac{FN}{TN+FN} </math> -->
 
[[File:FOR.JPG|1000x1000px|frameless|center]]
 
[[File:FOR.JPG|1000x1000px|frameless|center]]
* True positive rate (TPR): the fraction of positive cases which were correctly predicted out of all the positive cases. It is usually referred to as sensitivity or recall, and it represents the [[wikipedia:probability theory | probability]] of the positive subjects to be classified correctly as such. It is given by the formula:  
+
* '''True positive rate (TPR)''': the fraction of positive cases which were correctly predicted out of all the positive cases. It is usually referred to as sensitivity or recall, and it represents the [[wikipedia:probability theory | probability]] of the positive subjects to be classified correctly as such. It is given by the formula:  
 
<!-- <math display="block"> TPR = P(prediction=+|actual=+) = \frac{TP}{TP+FN} </math> -->
 
<!-- <math display="block"> TPR = P(prediction=+|actual=+) = \frac{TP}{TP+FN} </math> -->
 
[[File:TPR.JPG|1000x1000px|frameless|center]]
 
[[File:TPR.JPG|1000x1000px|frameless|center]]
* False negative rate (FNR): the fraction of positive cases which were incorrectly predicted to be negative out of all the positive cases. It represents the [[wikipedia:probability theory | probability]] of the positive subjects to be classified incorrectly as negative ones, and it is given by the formula:  
+
* '''False negative rate (FNR)''': the fraction of positive cases which were incorrectly predicted to be negative out of all the positive cases. It represents the [[wikipedia:probability theory | probability]] of the positive subjects to be classified incorrectly as negative ones, and it is given by the formula:  
 
<!-- <math display="block"> FNR = P(prediction=-|actual=+) = \frac{FN}{TP+FN} </math> -->
 
<!-- <math display="block"> FNR = P(prediction=-|actual=+) = \frac{FN}{TP+FN} </math> -->
 
[[File:FNR.JPG|1000x1000px|frameless|center]]
 
[[File:FNR.JPG|1000x1000px|frameless|center]]
* True negative rate (TNR): the fraction of negative cases which were correctly predicted out of all the negative cases. It represents the [[wikipedia:probability theory | probability]] of the negative subjects to be classified correctly as such, and it is given by the formula:
+
* '''True negative rate (TNR)''': the fraction of negative cases which were correctly predicted out of all the negative cases. It represents the [[wikipedia:probability theory | probability]] of the negative subjects to be classified correctly as such, and it is given by the formula:
 
<!-- <math display="block"> TNR = P(prediction=-|actual=-) = \frac{TN}{TN+FP} </math> -->
 
<!-- <math display="block"> TNR = P(prediction=-|actual=-) = \frac{TN}{TN+FP} </math> -->
 
[[File:TNR.JPG|1000x1000px|frameless|center]]
 
[[File:TNR.JPG|1000x1000px|frameless|center]]
* False positive rate (FPR): hte fraction of negative cases which were incorrectly predicted to be positive out of all the negative cases. It represents the [[wikipedia:probability theory | probability]] of the negative subjects to be classified incorrectly as positive ones, and it is given by the formula:
+
* '''False positive rate (FPR)''': the fraction of negative cases which were incorrectly predicted to be positive out of all the negative cases. It represents the [[wikipedia:probability theory | probability]] of the negative subjects to be classified incorrectly as positive ones, and it is given by the formula:
 
<!-- <math display="block"> FPR = P(prediction=+|actual=-) = \frac{FP}{TN+FP} </math> -->
 
<!-- <math display="block"> FPR = P(prediction=+|actual=-) = \frac{FP}{TN+FP} </math> -->
 
[[File:FPR.JPG|1000x1000px|frameless|center]]
 
[[File:FPR.JPG|1000x1000px|frameless|center]]
Línea 184: Línea 193:
  
 
=== Preprocessing ===
 
=== Preprocessing ===
 +
 +
Usually, the classifier is not the only problem, the dataset is also biased. The discrimination of a dataset <!-- <math display="inline"> D </math> --> [[File: D.jpg|1000x1000px|frameless|text-bottom]] with respect to the group <!-- <math display="inline"> A = a </math> --> [[File: A_a.JPG|47x16px|framelesss|text-bottom]] can be defined as follows:
 +
<!-- <math display="block"> disc_{A=a}(D) = \frac{|\{X\in D| X(A) \neq a, X(Y) = +\}|}{|\{X \in D | X(A) \neq a \}|} - \frac{|\{X\in D| X(A) = a, X(Y) = +\}|}{|\{X \in D | X(A) = a \}|}</math> -->
 +
[[File: datasetDisc.JPG|1000x1000px|frameless|center]]
 +
 +
That is, an approximation to the difference between the probabilities of belonging in the positive class given that the subject has a protected characteristic different from <!-- <math display="inline"> a </math> --> [[File: Aminus.JPG|16x26px|framelesss|text-bottom]] and equal to <!-- <math display="inline"> a </math> --> [[File: Aminus.JPG|16x26px|framelesss|text-bottom]].
  
 
Algorithms correcting bias at preprocessing remove information concerning variables in the dataset which can result in unfair decisions of the AI, while trying to alter just the bare minimum of this data. This is not as easy as just removing the sensitive variable, because other attributes can be related to the protected one.
 
Algorithms correcting bias at preprocessing remove information concerning variables in the dataset which can result in unfair decisions of the AI, while trying to alter just the bare minimum of this data. This is not as easy as just removing the sensitive variable, because other attributes can be related to the protected one.
Línea 199: Línea 214:
 
Reweighing is an example of preprocessing algorithm. The idea is to assign a weight to each dataset point such that the weighted discrimination is 0 with respect to the designated group.  
 
Reweighing is an example of preprocessing algorithm. The idea is to assign a weight to each dataset point such that the weighted discrimination is 0 with respect to the designated group.  
  
If the dataset <!-- <math display="inline"> D </math> --> [[File: D.jpg|1000x1000px|frameless|text-bottom]] was unbiased the sensitive variable <!-- <math display="inline"> A </math> --> [[File: A.JPG|1000x1000px|frameless|text-bottom]] and the target variable <!-- <math display="inline"> Y </math> --> [[File: Y.JPG|1000x1000px|frameless|text-bottom]] would be statistically independent and the probability of the joint distribution would be the product of the probabilities as follows:
+
If the dataset <!-- <math display="inline"> D </math> --> [[File: D.jpg|1000x1000px|frameless|text-bottom]] was unbiased the sensitive variable <!-- <math display="inline"> A </math> --> [[File: A.JPG|1000x1000px|frameless|text-bottom]] and the target variable <!-- <math display="inline"> Y </math> --> [[File: Y.JPG|1000x1000px|frameless|text-bottom]] would be [[wikipedia:Independence (probability theory)|statistically independent]] and the probability of the [[wikipedia:Joint probability distribution|joint distribution] would be the product of the probabilities as follows:
 
<!-- <math display="block"> P_{exp}(A = a \wedge Y = +) = P(A = a) \times P(Y = +) = \frac{|\{X \in D | X(A) = a\}|}{|D|} \times \frac{|\{X \in D| X(Y) = + \}|}{|D|}</math> -->
 
<!-- <math display="block"> P_{exp}(A = a \wedge Y = +) = P(A = a) \times P(Y = +) = \frac{|\{X \in D | X(A) = a\}|}{|D|} \times \frac{|\{X \in D| X(Y) = + \}|}{|D|}</math> -->
 
[[File: fairness_formula_1.JPG|1000x1000px|frameless|center]]
 
[[File: fairness_formula_1.JPG|1000x1000px|frameless|center]]
  
In reality, however, the dataset is not unbiased and the variables are not statistically independent so the observed probability is:
+
In reality, however, the dataset is not unbiased and the variables are not [[wikipedia:Independence (probability theory)|statistically independent]] so the observed probability is:
 
<!-- <math display="block"> P_{obs}(A = a \wedge Y = +) = \frac{|\{X \in D | X(A) = a \wedge X(Y) = +\}|}{|D|} </math> -->
 
<!-- <math display="block"> P_{obs}(A = a \wedge Y = +) = \frac{|\{X \in D | X(A) = a \wedge X(Y) = +\}|}{|D|} </math> -->
 
[[File: fairness_formula_2.JPG|1000x1000px|frameless|center]]
 
[[File: fairness_formula_2.JPG|1000x1000px|frameless|center]]
Línea 227: Línea 242:
 
==== Adversarial debiasing <ref name="adversarial1"> Brian Hu Zhang; Blake Lemoine; Margaret Mitchell, [https://arxiv.org/pdf/1801.07593.pdf ''Mitigating Unwanted Biases with Adversarial Learning'']. Retrieved 17 December 2019</ref> <ref name="adversarial2"> Joyce Xu, [https://towardsdatascience.com/algorithmic-solutions-to-algorithmic-bias-aef59eaf6565 ''Algorithmic Solutions to Algorithmic Bias: A Technical Guide'']. Retrieved 17 December 2019</ref> ====
 
==== Adversarial debiasing <ref name="adversarial1"> Brian Hu Zhang; Blake Lemoine; Margaret Mitchell, [https://arxiv.org/pdf/1801.07593.pdf ''Mitigating Unwanted Biases with Adversarial Learning'']. Retrieved 17 December 2019</ref> <ref name="adversarial2"> Joyce Xu, [https://towardsdatascience.com/algorithmic-solutions-to-algorithmic-bias-aef59eaf6565 ''Algorithmic Solutions to Algorithmic Bias: A Technical Guide'']. Retrieved 17 December 2019</ref> ====
  
[[File:AdvFig1.JPG|200x150px|frame|50px|Interaction between "predictor" and "adversarial" as shown by Joyce Xu <ref name=adversarial2/>]]
+
[[File:AdvFig1.JPG|200x150px|frame|50px|Interaction between ''predictor'' and ''adversarial'' as shown by Joyce Xu <ref name=adversarial2/>]]
We train two classifiers at the same time through some gradient-based method (f.e.: gradient descent). The first one, the "predictor" tries to accomplish the task of predicting <!-- <math display="inline"> Y </math> --> [[File: Y.JPG|1000x1000px|frameless|text-bottom]], the target variable, given <!-- <math display="inline"> X </math> --> [[File: X.JPG|1000x1000px|frameless|text-bottom]], the input, by modifying its weights <!-- <math display="inline"> W </math> --> [[File: W.JPG|1000x1000px|frameless|text-bottom]] to minimize some loss function <!-- <math display="inline">L_{P}(\hat{y},y)</math> --> [[File: lpyy.JPG|1000x1000px|frameless]]. The second one, the "adversary" tries to accomplish the task of predicting <!-- <math display="inline"> A </math> --> [[File: A.JPG|1000x1000px|frameless|text-bottom]], the sensitive variable, given <!-- <math display="inline"> \hat{Y} </math> --> [[File: Ygorro.JPG|1000x1000px|frameless]] by modifying its weights <!-- <math display="inline"> U </math> --> [[File: U.JPG|20x20px|frameless]] to minimize some loss function <!-- <math display="inline">L_{P}(\hat{z},z) </math> --> [[File: lpzz.JPG|1000x1000px|frameless]].
+
We train two [[wikipedia:Statistical classification|classifiers]] at the same time through some gradient-based method (f.e.: [[wikipedia:Gradient descent|gradient descent]]). The first one, the ''predictor'' tries to accomplish the task of predicting <!-- <math display="inline"> Y </math> --> [[File: Y.JPG|1000x1000px|frameless|text-bottom]], the target variable, given <!-- <math display="inline"> X </math> --> [[File: X.JPG|1000x1000px|frameless|text-bottom]], the input, by modifying its weights <!-- <math display="inline"> W </math> --> [[File: W.JPG|1000x1000px|frameless|text-bottom]] to minimize some [[wikipedia:Loss function|loss function]] <!-- <math display="inline">L_{P}(\hat{y},y)</math> --> [[File: lpyy.JPG|1000x1000px|frameless]]. The second one, the ''adversary'' tries to accomplish the task of predicting <!-- <math display="inline"> A </math> --> [[File: A.JPG|1000x1000px|frameless|text-bottom]], the sensitive variable, given <!-- <math display="inline"> \hat{Y} </math> --> [[File: Ygorro.JPG|1000x1000px|frameless]] by modifying its weights <!-- <math display="inline"> U </math> --> [[File: U.JPG|20x20px|frameless]] to minimize some loss function <!-- <math display="inline">L_{A}(\hat{a},a) </math> --> [[File: lpzz.JPG|1000x1000px|frameless]].
  
An important point here is that, in order to propagate correctly, <!-- <math display="inline"> \hat{Y} </math> --> [[File: Ygorro.JPG|1000x1000px|frameless]] above must refer to the raw output of the classifier, not the discrete prediction; for example, with an artificial neural network and a classification problem, <!-- <math display="inline"> \hat{Y} </math> --> [[File: Ygorro.JPG|1000x1000px|frameless]] could refer to the output of the softmax layer.
+
An important point here is that, in order to propagate correctly, <!-- <math display="inline"> \hat{Y} </math> --> [[File: Ygorro.JPG|1000x1000px|frameless]] above must refer to the raw output of the classifier, not the discrete prediction; for example, with an [[wikipedia:Artificial neural network|artificial neural network]] and a classification problem, <!-- <math display="inline"> \hat{Y} </math> --> [[File: Ygorro.JPG|1000x1000px|frameless]] could refer to the output of the softmax layer.
  
Then we update <!-- <math display="inline"> U </math> --> [[File: U.JPG|20x20px|frameless]] to minimize <!-- <math display="inline"> L_{A} </math> --> [[File: LA.JPG|24x21px|frameless]] at each training step according to the gradient <!-- <math display="inline"> \nabla_{U}L_{A} </math> --> [[File: grad_ula.JPG|1000x1000px|frameless]] and we modify <!-- <math display="inline"> W </math> --> [[File: W.JPG|1000x1000px|frameless|text-bottom]] according to the expression:
+
Then we update <!-- <math display="inline"> U </math> --> [[File: U.JPG|20x20px|frameless]] to minimize <!-- <math display="inline"> L_{A} </math> --> [[File: LA.JPG|24x21px|frameless]] at each training step according to the [[wikipedia:Gradient|gradient]] <!-- <math display="inline"> \nabla_{U}L_{A} </math> --> [[File: grad_ula.JPG|1000x1000px|frameless]] and we modify <!-- <math display="inline"> W </math> --> [[File: W.JPG|1000x1000px|frameless|text-bottom]] according to the expression:
 
<!-- <math display="block"> \nabla_{W}L_{P} - proj_{\nabla_{W}L_{A}}\nabla_{W}L_{P} - \alpha \nabla_{W}L_{A} </math> -->
 
<!-- <math display="block"> \nabla_{W}L_{P} - proj_{\nabla_{W}L_{A}}\nabla_{W}L_{P} - \alpha \nabla_{W}L_{A} </math> -->
 
[[File: fairness_formula_5.JPG|1000x1000px|frameless|center]]
 
[[File: fairness_formula_5.JPG|1000x1000px|frameless|center]]
Línea 238: Línea 253:
  
 
[[File:AdvFig2.JPG|100x150px|frame|50px|Graphic representation of the vectors used in adversarial debiasing as shown in Zhan et al.<ref name=adversarial1/>]]
 
[[File:AdvFig2.JPG|100x150px|frame|50px|Graphic representation of the vectors used in adversarial debiasing as shown in Zhan et al.<ref name=adversarial1/>]]
The intuitive idea is that we want the "predictor" to try to minimize <!-- <math display="inline"> L_{P} </math> --> [[File: LP.JPG|1000x1000px|frameless]] (therefore the term <!-- <math display="inline"> \nabla_{W}L_{P} </math> --> [[File: grad_wlp.JPG|1000x1000px|frameless]]) while, at the same time, maximize <!-- <math display="inline"> L_{A} </math> --> [[File: LA.JPG|24x21px|frameless]] (therefore the term <!-- <math display="inline"> - \alpha \nabla_{W}L_{A} </math> --> [[File: alfa_grad_wla.JPG|1000x1000px|frameless]]), so that the "adversary" fails at predicting the sensitive variable from  <!-- <math display="inline"> \hat{Y} </math> --> [[File: Ygorro.JPG|1000x1000px|frameless|text-bottom]].  
+
The intuitive idea is that we want the ''predictor'' to try to minimize <!-- <math display="inline"> L_{P} </math> --> [[File: LP.JPG|1000x1000px|frameless]] (therefore the term <!-- <math display="inline"> \nabla_{W}L_{P} </math> --> [[File: grad_wlp.JPG|1000x1000px|frameless]]) while, at the same time, maximize <!-- <math display="inline"> L_{A} </math> --> [[File: LA.JPG|24x21px|frameless]] (therefore the term <!-- <math display="inline"> - \alpha \nabla_{W}L_{A} </math> --> [[File: alfa_grad_wla.JPG|1000x1000px|frameless]]), so that the ''adversary'' fails at predicting the sensitive variable from  <!-- <math display="inline"> \hat{Y} </math> --> [[File: Ygorro.JPG|1000x1000px|frameless|text-bottom]].  
  
The term <!-- <math display="inline"> -proj_{\nabla_{W}L_{A}}\nabla_{W}L_{P} </math> --> [[File: fairness_formula_6.JPG|1000x1000px|frameless]] prevents the "predictor" from moving in a direction that helps the "adversary" decrease its loss function.
+
The term <!-- <math display="inline"> -proj_{\nabla_{W}L_{A}}\nabla_{W}L_{P} </math> --> [[File: fairness_formula_6.JPG|1000x1000px|frameless]] prevents the ''predictor'' from moving in a direction that helps the ''adversary'' decrease its loss function.
  
It can be shown that training a classification model "predictor" with this algorithm decreases [[#Definitions based on predicted outcome | demographic parity]] with respect to training it without the "adversary".
+
It can be shown that training a ''predictor'' classification model with this algorithm improves [[#Definitions based on predicted outcome | demographic parity]] with respect to training it without the ''adversary''.
  
 
===Postprocessing===
 
===Postprocessing===
Línea 252: Línea 267:
 
The advantages of postprocessing include that the technique can be applied after any classifiers, without modifying it, and has a good performance in fairness measures. The cons are the need to access to the protected attribute in test time and the lack of choice in the balance between accuracy and fairness.<ref name="datascience"/>
 
The advantages of postprocessing include that the technique can be applied after any classifiers, without modifying it, and has a good performance in fairness measures. The cons are the need to access to the protected attribute in test time and the lack of choice in the balance between accuracy and fairness.<ref name="datascience"/>
  
==== Reject Option based Classification (ROC) <ref name="roc"> Faisal Kamiran; Asim Karim; Xiangliang Zhang, [http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.722.3030&rep=rep1&type=pdf ''Decision Theory for Discrimination-aware Classification'']. Retrieved 17 December 2019</ref> ====
+
==== Reject Option based Classification <ref name="roc"> Faisal Kamiran; Asim Karim; Xiangliang Zhang, [http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.722.3030&rep=rep1&type=pdf ''Decision Theory for Discrimination-aware Classification'']. Retrieved 17 December 2019</ref> ====
  
Given a classifier let <!-- <math display="inline"> P(+|X) </math> --> [[File: roc_formula_1.JPG|1000x1000px|frameless]] be the probability computed by the classifiers as the probability that the instance <!-- <math display="inline"> X </math> --> [[File: X.JPG|1000x1000px|frameless|text-bottom]] belongs to the positive class +. When <!-- <math display="inline"> P(+|X) </math> --> [[File: roc_formula_1.JPG|1000x1000px|frameless]] is close to 1 or to 0, the instance <!-- <math display="inline"> X </math> --> [[File: X.JPG|1000x1000px|frameless|text-bottom]] is specified with high degree of certainty to belong to class + or - respectively. However, when <!-- <math display="inline"> P(+|X) </math> --> [[File: roc_formula_1.JPG|1000x1000px|frameless]] is closer to 0.5 the classification is more unclear.
+
Given a [[wikipedia:Statistical classification|classifier]] let <!-- <math display="inline"> P(+|X) </math> --> [[File: roc_formula_1.JPG|1000x1000px|frameless]] be the probability computed by the classifiers as the probability that the instance <!-- <math display="inline"> X </math> --> [[File: X.JPG|1000x1000px|frameless|text-bottom]] belongs to the positive class +. When <!-- <math display="inline"> P(+|X) </math> --> [[File: roc_formula_1.JPG|1000x1000px|frameless]] is close to 1 or to 0, the instance <!-- <math display="inline"> X </math> --> [[File: X.JPG|1000x1000px|frameless|text-bottom]] is specified with high degree of certainty to belong to class + or - respectively. However, when <!-- <math display="inline"> P(+|X) </math> --> [[File: roc_formula_1.JPG|1000x1000px|frameless]] is closer to 0.5 the classification is more unclear.
  
 
We say <!-- <math display="inline"> X </math> --> [[File: X.JPG|1000x1000px|frameless|text-bottom]] is a "rejected instance" if <!-- <math display="inline"> max(P(+|X), 1-P(+|X)) \leq \theta </math> --> [[File: roc_formula_2.JPG|1000x1000px|frameless]] with a certain <!-- <math display="inline"> \theta </math> --> [[File: Theta.JPG|1000x1000px|frameless|text-bottom]] such that <!-- <math display="inline"> 0.5 < \theta < 1 </math> --> [[File: roc_formula_3.JPG|1000x1000px|frameless|text-bottom]].
 
We say <!-- <math display="inline"> X </math> --> [[File: X.JPG|1000x1000px|frameless|text-bottom]] is a "rejected instance" if <!-- <math display="inline"> max(P(+|X), 1-P(+|X)) \leq \theta </math> --> [[File: roc_formula_2.JPG|1000x1000px|frameless]] with a certain <!-- <math display="inline"> \theta </math> --> [[File: Theta.JPG|1000x1000px|frameless|text-bottom]] such that <!-- <math display="inline"> 0.5 < \theta < 1 </math> --> [[File: roc_formula_3.JPG|1000x1000px|frameless|text-bottom]].

Última revisión de 23:18 18 dic 2019

Spanish version: Equidad y corrección de sesgos en Aprendizaje Automático

In machine learning, a given algorithm is said to be fair, or to have fairness if its results are independent of some variables we consider to be sensitive and not related with it (f.e.: gender, ethnicity, sexual orientation, etc.).

Context

Research about fairness in machine learning is a relatively recent topic. Most of articles about it have been written in the last three years[1]. Some of the most important facts in this topic are the following:

  • In 2018, IBM introduces AI Fairness 360, a Python library with several algorithms to reduce bias in a program, increasing its fairness. [2]
  • Facebook made public, in 2018, their use of a tool, Fairness Flow, to detect bias in their AI. However, said tool code is not accesible, and it is not known if it really corrects this bias. [3]
  • In 2019, Google publishes a set of tools in Github to study the effects of fairness in the long run. [4]

A pesar de que se siguen perfeccionando los algoritmos utilizados, los principales avances vienen de la concienciación por parte de algunas grandes empresas de la importancia que va a tener en la sociedad la reducción del sesgo en los algoritmos de aprendizaje automático en un futuro.

Fairness criteria in classification problems[5]

In classification problems, an algorithm learns a function to predict a discrete characteristic Y.JPG, the target variable, from known characteristics X.JPG. We model A.JPG as a discrete random variable which encodes some characteristics contained or implictly encoded in X.JPG that we consider as sensitive characteristics (gender, ethnicity, sexual orientation, etc.). We finally denote by R.JPG the prediction of the classifier. Now let us define three main criteria to evaluate if a given classifier is fair, that is, if its predictions are not influenced by some of this sensitive variables.

Independence

We say the random variables R,A.JPG satisfy independence if the sensitive characteristics A.JPG are statistically independent to the prediction R.JPG, and we write RbotA.JPG.

We can also express this notion with the following formula:

IndependenceDef.JPG

This means that the probability of being classified by the algorithm in each of the groups is equal for two individuals with different sensitive characteristics.

Yet another equivalent expression for independence can be given using the concept of mutual information between random variables, defined as

MutInf.JPG

In this formula, H.JPG is the entropy of the random variable. Then R,A.JPG satisfy independence if I(R,A).JPG.

A possible relaxation of the indepence definition include introducing a positive slack EpsMq0.JPG and is given by the formula:

IndependenceRel1.JPG

Finally, another possible relaxation is to require IndependenceRel2.JPG.

Separation

We say the random variables R,A,Y.JPG satisfy separation if the sensitive characteristics A.JPG are statistically independent to the prediction R.JPG given the target value Y.JPG, and we write RbotAbarY.JPG.

We can also express this notion with the following formula:

SeparationDef.JPG

This means that the probability of being classified by the algorithm in each of the groups is equal for two individuals with different sensitive characteristics given that they actually belong in the same group (have the same target variable).

Another equivalent expression, in the case of a binary target rate, is that the true positive rate and the false positive rate are equal (and therefore the false negative rate and the true negative rate are equal) for every value of the sensitive characteristics:

SeparationDef1.JPG
SeparationDef2.JPG

Finally, a possible relaxation of the given definitions is the difference between rates to be a positive number lower than a given slack EpsMq0.JPG, instead of equals to zero.

Sufficiency

We say the random variables R,A,Y.JPG satisfy sufficiency if the sensitive characteristics A.JPG are statistically independent to the target value Y.JPG given the prediction R.JPG, and we write YbotAbarR.JPG.

We can also express this notion with the following formula:

SufficiencyDef.JPG

This means that the probability of actually being in each of the groups is equal for two individuals with different sensitive characteristics given that they were predicted to belong to the same group.

Relationships between definitions

Finally, we sum up some of the main results that relate the three definitions given above:

Metrics[6]

Most statistical measures of fairness rely on different metrics, so we will start by defining them. When working with a binary classifier, both the predicted and the actual classes can take two values: positive and negative. Now let us start explaining the different possible relations between predicted and actual outcome:
Confusion matrix
  • True positive (TP): The case where both the predicted and the actual outcome are in the positive class.
  • True negative (TN): The case where both the predicted and the actual outcome are in the negative class.
  • False positive (FP): A case predicted to be in the positive class when the actual outcome is in the negative one.
  • False negative (FN): A case predicted to be in the negative class when the actual outcome is in the positive one.

This relations can be easily represented with a confusion matrix, a table which describes the accuracy of a classification model. In this matrix, columns and rows represent instances of the predicted and the actual cases, respectively.

By using this relations, we can define multiple metrics which can be later used to measure the fairness of an algorithm:

  • Positive predicted value (PPV): the fraction of positive cases which were correctly predicted out of all the positive predictions. It is usually referred to as precision, and represents the probability of a positive prediction to be right. It is given by the following formula:
PPV.JPG
  • False discovery rate (FDR): the fraction of positive predictions which were actually negative out of all the positive predictions. It represents the probability of a positive prediction to be wrong, and it is given by the following formula:
FDR.JPG
  • Negative predicted value (NPV): the fraction of negative cases which were correctly predicted out of all the negative predictions. It represents the probability of a negative prediction to be right, and it is given by the following formula:
NPV.JPG
  • False omission rate (FOR): the fraction of negative predictions which were actually positive out of all the negative predictions. It represents the probability of a negative prediction to be wrong, and it is given by the following formula:
FOR.JPG
  • True positive rate (TPR): the fraction of positive cases which were correctly predicted out of all the positive cases. It is usually referred to as sensitivity or recall, and it represents the probability of the positive subjects to be classified correctly as such. It is given by the formula:
TPR.JPG
  • False negative rate (FNR): the fraction of positive cases which were incorrectly predicted to be negative out of all the positive cases. It represents the probability of the positive subjects to be classified incorrectly as negative ones, and it is given by the formula:
FNR.JPG
  • True negative rate (TNR): the fraction of negative cases which were correctly predicted out of all the negative cases. It represents the probability of the negative subjects to be classified correctly as such, and it is given by the formula:
TNR.JPG
  • False positive rate (FPR): the fraction of negative cases which were incorrectly predicted to be positive out of all the negative cases. It represents the probability of the negative subjects to be classified incorrectly as positive ones, and it is given by the formula:
FPR.JPG

Other fairness criteria

Relationship between fairnes criteria as shown in Barocas et al.[5]

The following criteria can be understood as measures of the three definitions given on the first section, or a relaxation of them. In the table[5] to the right we can see the relationships between them.

To define this measures specifically, we will divide them into three big groups as done in Verma et al.[6]: definitions based on predicted outcome, on predicted and actual outcomes, and definitions based on predicted probabilities and actual outcome.

We will be working with a binary classifier and the folowing notation: S.JPG refers to the score given by the classifier, which is the probability of a certain subject to be in the positive or the negative class. R.JPG represents the final classification predicted by the algorithm, and its value is usually derived from S.JPG, for example will be positive when S.JPG is above a certain threshold. Y.JPG represents the actual outcome, that is, the real classification of the individual and, finally, A.JPG denotes the sensitive attributes of the subjects.

Definitions based on predicted outcome

The definitions in this section focus on a predicted outcome R.JPG for various distributions of subjects. They are the simplest and most intuitive notions of fairness.

  • Group fairness, also referred to as statistical parity, demographic parity, acceptance rate and benchmarking. A classifier satisfies this definition if the subjects in the protected and unprotected groups have equal probability of being assigned to the positive predicted class. This is, if the following formula is satisfied:
Group fairness.JPG
  • Conditional statistical parity. Basically consists in the definition above, but restricted only to a subset of the attributes. With mathematical notation this would be:
Definitions1.JPG

Definitions based on predicted and actual outcomes

This definitions not only consider de predicted outcome R.JPG but also compare it to the actual outcome Y.JPG.

  • Predictive parity, also referred to as outcome test. A classifier satisfies this definition if the subjects in the protected and unprotected groups have equal PPV. This is, if the following formula is satisfied:
Predictive parity.JPG
Mathematically, if a classifier has equal PPV for both groups, it will also have equal FDR, satisfying the formula:
Predictive parity2.JPG
  • False positive error rate balance, also referred to as predictive equality. A classifier satisfies this definition if the subjects in the protected and unprotected groups have aqual FPR. This is, if the following formula is satisfied:
Predictive equality.JPG
Mathematically, if a classifier has equal FPR for both groups, it will also have equal TNR, satisfying the formula:
Predictive equality2.JPG
  • False negative error rate balance, also referred to as equal opportunity. A classifier satisfies this definition if the subjects in the protected and unprotected groups have equal FNR. This is, if the following formula is satisfied:
Equal opportunity.JPG
Mathematically, if a classifier has equal FNR for both groups, ti will also have equal TPR, satisfying the formula:
Equal opportunity2.JPG
  • Equalized odds, also referred to as conditional procedure accuracy equality and disparate mistreatment. A classifier satisfies this definition if the subjects in the protected and unprotected groups have equal TPR and equal FPR, satisfying the formula:
Equalized odds.JPG
  • Conditional use accuracy equality. A classifier satisfies this definition if the subjects in the protected and unprotected groups have equal PPV and equal NPV, satisfying the formula:
Conditional.JPG
  • Overall accuracy equality. A classifier satisfies this definition if the subject in the protected and unprotected groups have equal prediction accuracy, that is, the probability of a subject from one class to be assigned to it. This is, if it satisfies the following formula:
Overall.JPG
  • Treatment equality. A classifier satisfies this definition if the subjects in the protected and unprotected groups have an equal ratio of FN and FP, satisfying the formula:
Treatment.JPG

Definitions based on predicted probabilities and actual outcome

These definitions are based in the actual outcome Y.JPG and the predicted probability score S.JPG.

  • Test-fairness, also known as calibration or matching conditional frequencies. A classifier satisfies this definition if individuals with the same predicted probability score S.JPG have the same probability to be classified in the positive class when they belong to either the protected or the unprotected group:
Definitions2.JPG
  • Well-calibration. It's an extension of the previous definition. It states that when individuals inside or outside the protected group have the same predicted probability score S.JPG they must have the same probability of being classified in the positive class, and this probability must be equal to S.JPG:
Definitions3.JPG
  • Balance for positive class. A classifier satisfies this definition if the subjects constituting the positive class from both protected and unprotected groups have equal average predicted probability score S.JPG. This means that the expected value of probability score for the protected and unprotected groups with positive actual outcome Y.JPG is the same, satisfying the formula:
Balance positive.JPG
  • Balance for negative class. A classifier satisfies this definition if the subjects constituting the negative class from both protected and unprotected groups have equal average predicted probability score S.JPG. This means that the expected value of probability score for the protected and unprotected groups with negative actual outcome Y.JPG is the same, satisfying the formula:
Balance negative.JPG

Algorithms

Fairness can be applied to machine learning algorithms in three different ways: preprocessing the data used in the algorithm, optimization during the training, or post-processing the answers of the algorithm.

Preprocessing

Usually, the classifier is not the only problem, the dataset is also biased. The discrimination of a dataset D.jpg with respect to the group framelesss can be defined as follows:

DatasetDisc.JPG

That is, an approximation to the difference between the probabilities of belonging in the positive class given that the subject has a protected characteristic different from framelesss and equal to framelesss.

Algorithms correcting bias at preprocessing remove information concerning variables in the dataset which can result in unfair decisions of the AI, while trying to alter just the bare minimum of this data. This is not as easy as just removing the sensitive variable, because other attributes can be related to the protected one.

A way to do this is by mapping each individual in the initial dataset into an intermediate representation in which its impossible to identify if it belongs to a particular protected group, while maintaining as much information as possible. Then, the new representation of the data is adjusted to get the maximum accuracy in the algorithm.

This way, individuals are mapped into a new multivariable representation where the probability of any member of a protected group to be mapped to a certain value in the new representation is the same as the probability of an individual which doesn’t belong to the protected group. Then, this representation is used to obtain the prediction for the individual, instead of the initial data. As the intermediate representation is constructed giving the same probability to individuals inside or outside the protected group, this attribute is hidden to the classificator.

An example is explained in Zemel et al.[7] where a multinomial random variable is used as intermediate representation. In the process, the system is encouraged to preserve all the information except those that can lead to biased decisions, and to obtain a prediction as accurate as possible.

On the one hand, this procedure has the advantage that the preprocessed data can be used for any machine learning task. Furthermore, the classifier does not need to be modified, as the correction is applied to the dataset before processing. On the other hand, the other methods obtain better results in accuracy and fairness.[8]

Reweighing [9]

Reweighing is an example of preprocessing algorithm. The idea is to assign a weight to each dataset point such that the weighted discrimination is 0 with respect to the designated group.

If the dataset D.jpg was unbiased the sensitive variable A.JPG and the target variable Y.JPG would be statistically independent and the probability of the [[wikipedia:Joint probability distribution|joint distribution] would be the product of the probabilities as follows:

Fairness formula 1.JPG

In reality, however, the dataset is not unbiased and the variables are not statistically independent so the observed probability is:

Fairness formula 2.JPG

To compensate for the bias, lower weights to favored objects and higher weights to unfavored objects will be assigned. For each X D.JPG we get:

Fairness formula 3.JPG

When we have for each X.JPG a weight associated we compute the weighted discrimination with respect to group framelesss as follows:

Fairness formula 4.JPG

It can be shown that after reweighting this weighted discrimination is 0.

Optimization at training time

Another approach is correcting the bias at training time. This can be done by adding constraints to the optimization objective of the algorithm.[10] These constraints force the algorithm to improve fairness, by keeping the same rates of certain measures for the protected group and the rest of individuals. For example, we can add to the objective of the algorithm the condition that the false positive rate is the same for individuals in the protected group and the ones outside the protected group.

The main measures used in this approach are false positive rate, false negative rate and overall misclassification rate. It is possible to add just one or several of these constraints to the objective of the algorithm. Note that the equality of false negative rates implies the equality of true positive rates so this implies the equality of opportunity. After adding the restrictions to the problem it may turn intractable, so a relaxation on them may be needed.

This technique obtains good results in improving fairness while keeping high accuracy, and lets the programmer to choose the fairness measures to improve. However, each machine learning task may need a different method to be applied and the code in the classifier needs to be modified, which is not always possible.[8]

Adversarial debiasing [11] [12]

Interaction between predictor and adversarial as shown by Joyce Xu [12]

We train two classifiers at the same time through some gradient-based method (f.e.: gradient descent). The first one, the predictor tries to accomplish the task of predicting Y.JPG, the target variable, given X.JPG, the input, by modifying its weights W.JPG to minimize some loss function Lpyy.JPG. The second one, the adversary tries to accomplish the task of predicting A.JPG, the sensitive variable, given Ygorro.JPG by modifying its weights U.JPG to minimize some loss function Lpzz.JPG.

An important point here is that, in order to propagate correctly, Ygorro.JPG above must refer to the raw output of the classifier, not the discrete prediction; for example, with an artificial neural network and a classification problem, Ygorro.JPG could refer to the output of the softmax layer.

Then we update U.JPG to minimize LA.JPG at each training step according to the gradient Grad ula.JPG and we modify W.JPG according to the expression:

Fairness formula 5.JPG

where Alpha.JPG is a tuneable hyperparameter that can vary at each time step.

Graphic representation of the vectors used in adversarial debiasing as shown in Zhan et al.[11]

The intuitive idea is that we want the predictor to try to minimize LP.JPG (therefore the term Grad wlp.JPG) while, at the same time, maximize LA.JPG (therefore the term Alfa grad wla.JPG), so that the adversary fails at predicting the sensitive variable from Ygorro.JPG.

The term Fairness formula 6.JPG prevents the predictor from moving in a direction that helps the adversary decrease its loss function.

It can be shown that training a predictor classification model with this algorithm improves demographic parity with respect to training it without the adversary.

Postprocessing

The final method tries to correct the results of a classifier to achieve fairness. In this method we have a classifier which returns a score for each individual and we need to do a binary prediction for them. High scores are likely to get a positive answer, while low scores are likely to get a negative answer, but we need to adjust the threshold to determine when to answer yes or no depending on our needs. Note that variations in the threshold affect the trade-off between true positive rate and true negative rate.

If the score function is fair in the sense that it’s independent of the protected attribute, then any choice of the threshold will also be fair, but this type of classifiers tend to be biased, so we may need to set a different threshold for each protected group to achieve fairness. A way to do this is plotting the true positive rate against the false negative rate at various threshold settings (this is called ROC curve) and check which threshold satisfies that the rates are equal for the protected group and the rest of the individuals.[13]

The advantages of postprocessing include that the technique can be applied after any classifiers, without modifying it, and has a good performance in fairness measures. The cons are the need to access to the protected attribute in test time and the lack of choice in the balance between accuracy and fairness.[8]

Reject Option based Classification [14]

Given a classifier let Roc formula 1.JPG be the probability computed by the classifiers as the probability that the instance X.JPG belongs to the positive class +. When Roc formula 1.JPG is close to 1 or to 0, the instance X.JPG is specified with high degree of certainty to belong to class + or - respectively. However, when Roc formula 1.JPG is closer to 0.5 the classification is more unclear.

We say X.JPG is a "rejected instance" if Roc formula 2.JPG with a certain Theta.JPG such that Roc formula 3.JPG.

The algorithm of "ROC" consists on classifying the non rejected instances following the rule above and the rejected instances as follows: if the instance is an example of a deprived group ( Roc formula 3.JPG) then label it as positive, otherwise label it as negative.

We can optimize different measures of discrimination as functions of Theta.JPG to find the optimal Theta.JPG for each problem and avoid becoming discriminatory against the privileged group.

References

  1. Moritz Hardt, Berkeley. Retrieved 18 December 2019
  2. IBM AI Fairness 360. Retrieved 18 December 2019
  3. Fairness Flow el detector de sesgos de Facebook. Retrieved 28 December 2019
  4. ML-Fairness gym. Retrieved 18 December 2019
  5. 5,0 5,1 5,2 Solon Barocas; Moritz Hardt; Arvind Narayanan, Fairness and Machine Learning. Retrieved 15 December 2019.
  6. 6,0 6,1 Sahil Verma; Julia Rubin, Fairness Definitions Explained. Retrieved 15 December 2019
  7. Richard Zemel; Yu (Ledell) Wu; Kevin Swersky; Toniann Pitassi; Cyntia Dwork, Learning Fair Representations. Retrieved 1 December 2019
  8. 8,0 8,1 8,2 Ziyuan Zhong, Tutorial on Fairness in Machine Learning. Retrieved 1 December 2019
  9. Faisal Kamiran; Toon Calders, Data preprocessing techniques for classification without discrimination. Retrieved 17 December 2019
  10. Muhammad Bilal Zafar; Isabel Valera; Manuel Gómez Rodríguez; Krishna P. Gummadi, Fairness Beyond Disparate Treatment & Disparate Impact: Learning Classification without Disparate Mistreatment. Retrieved 1 December 2019
  11. 11,0 11,1 Brian Hu Zhang; Blake Lemoine; Margaret Mitchell, Mitigating Unwanted Biases with Adversarial Learning. Retrieved 17 December 2019
  12. 12,0 12,1 Joyce Xu, Algorithmic Solutions to Algorithmic Bias: A Technical Guide. Retrieved 17 December 2019
  13. Moritz Hardt; Eric Price; Nathan Srebro, Equality of Opportunity in Supervised Learning. Retrieved 1 December 2019
  14. Faisal Kamiran; Asim Karim; Xiangliang Zhang, Decision Theory for Discrimination-aware Classification. Retrieved 17 December 2019