MoDe: Controlling Classifier Bias

Get the code here.

MoDe

Moment Decorrelation (MoDe) is a tool that can enforce decorrelation between some nuisance parameter (or protected attribute in ML fairness lingo) and the response of some model with gradient-based optimization (a neural network for example.) It can force trained models to have the same response across different values of the protected attribute but it can also go beyond simple decorrelation. For example, MoDe can constrain the response function to be linear or quadratic in the protected attribute. This can increase performance tremendously if the independence constraint is not necessary but instead one only cares about the “smoothness” of the dependence of the response on the protected attribute (which is a weaker constraint).

For more details please see our article at 2010.09745. (Please use INSPIRE-HEP for citations.)

An implementation is available in each of TensorFlow and PyTorch. The notebooks in examples/ illustrate how MoDe is used and show how one can obtain different response functions on a toy example and on a W-tagging dataset.

Installation

pip install modeloss

The PyTorch implementation requires PyTorch 1.6.0 or newer. The TensorFlow implementation requires TensorFlow 2.2.0 or newer.

Usage

For PyTorch:

from modeloss.pytorch import MoDeLoss

flatness_loss = MoDeLoss(order=0)
loss = lambda pred,target,m,weights: lambd * flatness_loss(pred,target,m,weights)+\
                                            classification_loss(pred,target,weights)

For TensorFlow, replace modeloss.pytorch above with modeloss.tf.

Example

This is a toy example (located in examples/ToyExampleTF.ipynb) in which the signal (samples with label 1) is localized near the value 0.2 of the protected attribute m. While backgrounds (samples with label 0) are uniform in this feature, we note that this bias is introduced into any naive classifer. Indeed, we see that an unconstrained classifier (in the sense that it has no additional fairness regularization) has a large false positive rate for backgrounds near m = 0.2. Here we show how different MoDe regularizations (MoDe[n]) mitigate this bias by flattening the false positive rate as a function of the protected attribute m into its n’th legendre decomposition (where n is the highest moment allowed; 0 is flat, 1 is linear, etc) .

toy_example

Left: The false positive rate versus mass (m) for various models at signal efficiencies (or true positive rates (TPR)) ε = 80, 50, 20% (each set of 3 identically colored and stylized lines correspond to the same model but with selection thresholds chosen to achieve the 3 desired TPRs). The bottom panel shows that MoDe[1] and MoDe[2] completely overlap with the m-agnostic model for this simple example, which is expected because the optimal classifier here has linear dependence on mass (see paper). Right: ROC curves for MoDe[0], MoDe[1], and MoDe[2] compared to the m-agnostic model and a model with unconstrained mass dependence. As in the left panel, we see that MoDe[1], MoDe[2], and the m-agnostic ROC curves are nearly identical because the optimal classifier has linear mass dependence in this simple example. For more details see 2010.09745.

Related