# ICML 2023 Men Also Do Laundry: Multi-Attribute Bias Amplification Interpretable metrics for measuring bias amplification from multiple attributes

## Background and related work

As computer vision systems become more widely deployed, there is increasing concern from both the research
community and the public that these systems are not only reproducing but *amplifying* harmful social biases. The
phenomenon of *bias amplification*, which is the focus of our work, refers to models amplifying inherent
training set biases at test time1.

There are two main approaches to quantifying bias amplification in computer vision models:

*Leakage-based metrics*2, 3, measuring the change in a classifier’s ability to predict group membership from the training data to predictions.*Co-occurrence-based metrics*1, 4, measuring the change in ratio of a group and single attribute from the training data to predictions.

Existing metrics1, 4 measure bias amplification with respect to single annotated attributes (e.g., `computer`). However, large-scale visual datasets often have multiple annotated attributes per image. For example, in the COCO dataset5, 78.8% of the training set of images are associated with more than a single attribute (i.e., object).

More importantly, considering multiple attributes can reveal additional nuances not present when considering only single attributes. In the imSitu dataset6, individually the verb `unloading` and location `indoors` are skewed `male`. However, when considering {`unloading`, `indoors`} in conjunction, the dataset is actually skewed `female`.
Significantly, men tend to be pictured unloading *packages* `outdoors` whereas women are pictured `unloading`
*laundry* or *dishes* `indoors`. Even when men are pictured `indoors`, they are `unloading` *boxes* or *equipment* as opposed to laundry or dishes. Models can similarly leverage correlations between a group and either single or multiple attributes simultaneously.

## Key takeaways

We propose two multi-attribute bias amplification metrics that evaluate bias arising from both single and multiple attributes.

- We are the first to study multi-attribute bias amplification, highlighting that models can leverage correlations between a demographic group and multiple attributes simultaneously, suggesting that current
*single-attribute metrics underreport the amount of bias being amplified*. - When comparing the performance of multi-label classifiers trained on the COCO, imSitu, and CelebA7 datasets, we empirically demonstrate that, on average,
*bias amplification from multiple attributes is greater than that from single attributes*. - While prior works have demonstrated that models can learn to exploit different spurious correlations if one is mitigated8, 9, we are the first to demonstrate that
*bias mitigation methods*1, 2, 10, 11*for single attribute bias can inadvertently increase multi-attribute bias*.

## Multi-attribute bias amplification metrics

We denote by $\mathcal{G}=\{g_1, \dots, g_t\}$ and ${\mathcal{A}=\{a_1,\dots,a_n\}}$ a set of $t$ group membership labels and a set of $n$ attributes, respectively. Let ${\mathcal{M}=\{m_1,\dots,m_{\ell}\}}$ denote a set of $\ell$ sets, containing all possible combinations of attributes, where $m_i$ is a set of attributes and $\lvert m_i \rvert \in \{1,\dots, n\}$. Note $m\in\mathcal{M}$ if and only if $\text{co-occur}(m, g) \geq 1$ in both the ground-truth training set and test set.

We extend Zhao et al.'s1 bias score to a multi-attribute setting such that the bias score of a set of attributes $m\in\mathcal{M}$ with respect to group $g\in\mathcal{G}$ is defined as: $\text{bias}_{\text{train}}(m, g) = \dfrac{\text{co-occur}(m, g)}{\displaystyle\sum_{g^\prime \in \mathcal{G}} \text{co-occur}(m, g^\prime)},$ where $\text{co-occur}(m, g)$ denotes the number of times $m$ and $g$ co-occur in the training set.

**Undirected multi-attribute bias amplification metric**

Our proposed *undirected* multi-attribute bias amplification metric extends
Zhao et al.'s1 single-attribute metric such that bias
amplification from multiple attributes is measured:
$\text{Multi}_\text{MALS} = X, \text{variance}(\Delta_{gm})$
where
$X = \frac{1}{\lvert \mathcal{M} \rvert}\sum_{g\in \mathcal{G} } \sum_{m\in \mathcal{M}}
\left\lvert\Delta_{gm}\right\rvert$
and
$\begin{aligned}
\Delta_{gm} &= \mathbf{1}\left[ \text{bias}_{\text{train}}(m, g) > \lvert
\mathcal{G}\rvert^{-1} \right] \cdot \\ &\left( \text{bias}_{\text{test}}(m, g) - \text{bias}_{\text{train}}(m, g)\right).
\end{aligned}$
Here $\mathbf{1} [\cdot]$ and $\text{bias}_{\text{pred}}(m, g)$ denote an indicator function and the bias score using the test set predictions of $m$ and $g$, respectively.

$\text{Multi}_\text{MALS}$ measures both the mean and variance over the change in bias score from the training set ground truths to test set predictions. By definition, $\text{Multi}_\text{MALS}$ only captures group membership labels that are positively correlated with a set of attributes, i.e., due to the constraint that $\text{bias}_{\text{train}}(m,g) > \lvert \mathcal{G}\rvert^{-1}.$

**Directional multi-attribute bias amplification metric**

Let $\hat{m}$ and $\hat{g}$ denote a model's prediction for attribute group, $m$, and group membership, $g$,
respectively. Our proposed *directional* multi-attribute bias amplification metric extends Wang and Russakovsky's4 single-attribute metric such that bias amplification from multiple attributes is measured:
$\text{Multi}_{\rightarrow} = X, \text{variance}(\Delta_{mg})$
where
$\begin{aligned}
X &=\frac{1}{\lvert \mathcal{G} \rvert \lvert \mathcal{M} \rvert} \sum_{g\in \mathcal{G}}\sum_{m\in \mathcal{M}} y_{gm}\left\lvert\Delta_{gm}\right\rvert+(1-y_{gm})\left\lvert-\Delta_{gm}\right\rvert,
\end{aligned}$
$\begin{aligned}
y_{gm}=\mathbf{1} [ &P_{\text{train}}(g=1, m=1) > P_{\text{train}}(g=1) P_{\text{train}}(m=1)],
\end{aligned}$
and
$\begin{aligned}
\Delta_{gm} &= \begin{cases}
P_{\text{test}}(\hat{m}=1 \vert g=1) - P_{\text{train}}(m=1 \vert g=1) \\ \text{if measuring }G\rightarrow M\\
P_{\text{test}}(\hat{g}=1 \vert m=1) - P_{\text{train}}(g=1 \vert m=1) \\ \text{if measuring }M\rightarrow G
\end{cases}
\end{aligned}$
Unlike $\text{Multi}_\text{MALS}$, $\text{Multi}_{\rightarrow}$ captures both positive and negative
correlations, i.e., $\text{Multi}_{\rightarrow}$ iterates over all $g\in\mathcal{G}$ regardless of whether
$\text{bias}_{\text{train}}(m, g) > \lvert \mathcal{G}\rvert^{-1}$. Moreover, $\text{Multi}_{\rightarrow}$ takes
into account the base rates for group membership and disentangles bias amplification arising from the group
influencing the attribute(s) prediction ($\text{Multi}_{G \rightarrow M}$), as well as bias amplification from
the attribute(s) influencing the group prediction ($\text{Multi}_{M \rightarrow G}$).

## Comparison with existing metrics

Our proposed metrics have three advantages over existing single-attribute co-occurence-based metrics.

**(Advantage 1) Our metrics account for co-occurrences with multiple attributes**

If a model, for example, learns the combination of $a_1$ and $a_2$, i.e., $m=\{a_1, a_2\}\in\mathcal{M}$, are correlated with $g\in\mathcal{G}$, it can exploit this correlation, potentially leading to bias amplification. By iterating over all $m \in \mathcal{M},$ our proposed metric accounts for amplification from single and multiple attributes. Thus, capturing sets of attributes exhibiting amplification, which are not accounted for by existing metrics.

**(Advantage 2) Negative and positive values do not cancel each other out**

Existing metrics calculate bias amplification by aggregating over the difference in bias scores for each attribute individually. Suppose there is a dataset with two annotated attributes $a_1$ and $a_2$. It is possible that ${\Delta_{ga_1} \approx -\Delta_{ga_2}}$. Since our metrics use absolute values, we ensure that positive and negative bias amplifications per attribute do not cancel each other out.

**(Advantage 3) Our metrics are more interpretable**

There is a lack of intuition as to the “ideal” bias amplification value. One interpretation is that smaller values are more desirable. This becomes less clear when values are negative, as occurs in several bias mitigation works10, 12. Negative bias amplification indicates bias in the predictions is in the opposite direction than that in the training set. However, this is not always ideal. First, there often exists a trade-off between performance and smaller bias amplification values. Second, high magnitude negative bias amplification may lead to erasure of certain groups. For example, in imSitu, $\text{bias}_{\text{train}}(\{\text{typing}\}, \text{female}) = \text{0.52}.$ Negative bias amplification signifies the model underpredicts $(\{\text{typing}\}, \text{female})$, which could reinforce negative gender stereotypes1.

Instead, we may want to minimize the distance between the bias amplification value and zero. This interpretation offers the advantage that large negative values are also not desirable. However, a potential dilemma occurs when interpreting two values with the same magnitude but opposite signs, which is a value-laden decision and depends on the system's context. Additionally, under this alternative interpretation, Advantage 2 becomes more pressing as this suggests we are interpreting models as less biased than they are in practice.

Our proposed metrics are easy to interpret. Since we use absolute differences, the ideal value is unambiguously zero. Further, reporting variance provides intuition as to whether amplification is uniform across all attribute-group pairs or if particular pairs are more amplified.

**References**

- J. Zhao, T. Wang, M. Yatskar, V. Ordonez, and K.-W. Chang, “Men Also Like Shopping: Reducing Gender Bias Amplification using Corpus-level Constraints,” in
*EMNLP*, 2017. - T. Wang, J. Zhao, M. Yatskar, K.-W. Chang, and V. Ordonez, “Balanced Datasets Are Not Enough: Estimating and Mitigating Gender Bias in Deep Image Representations,” in
*ICCV*, 2019. - Y. Hirota, Y. Nakashima, and N. Garcia, “Quantifying Societal Bias Amplification in Image Captioning,” in
*CVPR*, 2022. - A. Wang and O. Russakovsky, “Directional Bias Amplification,” in
*ICML*, 2021. - T.-Y. Lin
*et al.*, “Microsoft COCO: Common Objects in Context,” in*ECCV*, 2014. - M. Yatskar, L. Zettlemoyer, and A. Farhadi, “Situation recognition: Visual semantic role labeling for image understanding,” in
*CVPR*, 2016. - Z. Liu, P. Luo, X. Wang, and X. Tang, “Deep Learning Face Attributes in the Wild,” in
*ICCV*, 2015. - Z. Li
*et al.*, “A Whac-A-Mole Dilemma: Shortcuts Come in Multiples Where Mitigating One Amplifies Others,” in*CVPR*, 2023. - Z. Li, A. Hoogs, and C. Xu, “Discover and Mitigate Unknown Biases with Debiasing Alternate Networks,” in
*ECCV*, 2022. - Z. Wang
*et al.*, “Towards Fairness in Visual Recognition: Effective Strategies for Bias Mitigation,” in*CVPR*, 2020. - S. Agarwal, S. Muku, S. Anand, and C. Arora, “Does Data Repair Lead to Fair Models? Curating Contextually Fair Data To Reduce Model Bias,” in
*WACV*, 2022. - V. V. Ramaswamy, S. S. Y. Kim, and O. Russakovsky, “Fair Attribute Classification through Latent Space De-biasing,” in
*CVPR*, 2021.

**Acknowledgments**

This work was funded by Sony Research Inc. We thank William Thong and Julienne LaChance for their helpful comments and suggestions.