Fairness and bias correction in Machine Learning

De FdIwiki ELP
Saltar a: navegación, buscar

Spanish version: Equidad y corrección de sesgos en Aprendizaje Automático

Work in progress.

Criteria in Classification problems[1]

In classification problems, an algorithm learns a function to predict a discrete characteristic Y, the target variable, from known characteristics X. We model A as a discrete random variable which encodes some characteristics contained or implictly encoded in X that we consider as sensitive characteristics (gender, ethnicity, sexuality, etc.). We finally denote by R the prediction of the classifier.

Independence

  • The variables <math>(R,A)</math> satisfy independence if <math> R \bot A </math>.

Independence requires the sensitive characteristics to be statistically independent to the prediction. Another way to express this is

<math> \mathbb{P}(R = r | A = a) = \mathbb{P}(R = r | A = b) \quad \forall r \in R \quad \forall a,b \in A </math>

This means that the probability of being classified by the algorithm in each of the groups equal for two individuals with different sensitive characteristics.

Yet another equivalent expression for this is to use the concept of mutual information between random variables defined as <math> I(X,Y) = H(X) + H(Y) - H(X,Y) </math> where H is the entropy of the random variable. Then <math> (R,A) </math> satisfy independence if <math> I(R,A) = 0 </math>.

Possible relaxations include introducing a positive slack <math> \epsilon > 0 </math> and require

<math> \mathbb{P}(R = r | A = a) \geq \mathbb{P}(R = r | A = b) - \epsilon \quad \forall r \in R \quad \forall a,b \in A </math>

Another possible relaxation is to require <math> I(R,A) \leq \epsilon </math>

Separation

  • The variables <math>(R,A,Y)</math> satisfy separation if <math> R \bot A | Y </math>.

Separation requires the sensitive characteristics to be statistically independent to the prediction given the target variable. Another way to express this is

<math> \mathbb{P}(R = r | Y = q, A = a) = \mathbb{P}(R = r | Y = q, A = b) \quad \forall r \in R \quad q \in Y \ quad \forall a,b \in A </math>

This means that the probability of being classified by the algorithm in each of the groups is equal for two individuals with different sensitive characteristics given that they actually belong in the same group (have the same target variable).

Another equivalent expression, in the case of a binary target rate, is that the true positive rate and the false positive rate are equal (and therefore the false negative rate and the true negative rateare equal) for every value of the sensitive characteristics:

<math> \mathbb{P}(R = 1 | Y = 1, A = a) = \mathbb{P}(R = 1 | Y = 1, A = b) \ quad \forall a,b \in A </math>
<math> \mathbb{P}(R = 1 | Y = 0, A = a) = \mathbb{P}(R = 1 | Y = 0 , A = b) \ quad \forall a,b \in A </math>

One possible relaxation of this condition is that the difference between rates is not zero but rather a positive slack <math> \epsilon > 0 </math>.

Sufficency

  • The variables <math>(R,A,Y)</math> satisfy separation if <math> Y \bot A | R </math>.

Sufficency requires the sensitive characteristics to be statistically independent to the target variable given the prediction. Another way to express this is

<math> \mathbb{P}(Y = q | R = r, A = a) = \mathbb{P}(Y = q | R = r, A = b) \quad \forall r \in R \quad q \in Y \ quad \forall a,b \in A </math>

This means that the probability of actually being in each of the groups is equal for two individuals with different sensitive characteristics given that you they have been predicted to belong to the same group.

Relationships between definitions

Here we sum up some of the main results that relate the definitions from this section:

Metrics[2]

Most statistical measures of fairness rely on different metrics, so we will start by defining them. When working with a binary classifier, both the predicted and the actual classes can take two values: positive and negative. Now let us start explaining the different possible relations between predicted and actual outcome:
Confusion matrix
  • True positive (TP): The case where both the predicted and the actual outcome are in the positive class.
  • True negative (TN): The case where both the predicted and the actual outcome are in the negative class.
  • False positive (FP): A case predicted to be in the positive class when the actual outcome is in the negative one.
  • False negative (FN): A case predicted to be in the negative class when the actual outcome is in the positive one.

This relations can be easily represented with a confusion matrix, a table which describes the accuracy of a classification model. In this matrix, rows and columns represent instances of the predicted and the actual cases, respectively.

By using this relations, we can define multiple metrics which can be later used to measure the fairness of an algorithm:

  • Positive predicted value (PPV): the fraction of positive cases which were correctly predicted out of all the positive predictions. It is usually referred to as precision, and represents the probability of a positive prediction to be right. It is given by the following formula:
PPV.JPG
  • False discovery rate (FDR): the fraction of positive predictions which were actually negative out of all the positive predictions. It represents the probability of a positive prediction to be wrong, and it is given by the following formula:
FDR.JPG
  • Negative predicted value (NPV): the fraction of negative cases which were correctly predicted out of all the negative predictions. It represents the probability of a negative prediction to be right, and it is given by the following formula:
NPV.JPG
  • False omission rate (FOR): the fraction of negative predictions which were actually positive out of all the negative predictions. It represents the probability of a negative prediction to be wrong, and it is given by the following formula:
FOR.JPG
  • True positive rate (TPR): the fraction of positive cases which were correctly predicted out of all the positive cases. It is usually referred to as sensitivity or recall, and it represents the probability of the positive subjects to be classified correctly as such. It is given by the formula:
TPR.JPG
  • False negative rate (FNR): the fraction of positive cases which were incorrectly predicted to be negative out of all the positive cases. It represents the probability of the positive subjects to be classified incorrectly as negative ones, and it is given by the formula:
FNR.JPG
  • True negative rate (TNR): the fraction of negative cases which were correctly predicted out of all the negative cases. It represents the probability of the negative subjects to be classified correctly as such, and it is given by the formula:
TNR.JPG
  • False positive rate (FPR): hte fraction of negative cases which were incorrectly predicted to be positive out of all the negative cases. It represents the probability of the negative subjects to be classified incorrectly as positive ones, and it is given by the formula:
FPR.JPG

Other definitions of fairness

The following criteria can be understood as measures of the three definitions given on the first section, or a relaxation of them. In Figure 1 we can see the relationships between them.

Now let us define all this measures specifically:


References

  1. Solon Barocas; Moritz Hardt; Arvind Narayanan, Fairness and Machine Learning, http://www.fairmlbook.org, 2019.
  2. Sahil Verma; Julia Rubin, Fairness Definitions Explained, (IEEE/ACM International Workshop on Software Fairness, 2018).