NeurIPS 2023 Datasets and Benchmarks (Oral) Ethical Considerations for Responsible Data Curation Practical recommendations for responsibly curating human-centric computer vision datasets for fairness and robustness evaluations, addressing privacy and bias concerns
Jerone Andrews, Sony AI
Dora Zhao, Sony AI
William Thong, Sony AI
Apostolos Modas, Sony AI
Orestis Papakyriakopoulos, Sony AI
Alice Xiang, Sony AI

Read paper
View repo
Watch video
Download BibTeX

Abstract

Human-centric computer vision (HCCV) data curation practices often neglect privacy and bias concerns, leading to dataset retractions and unfair models. HCCV datasets constructed through nonconsensual web scraping lack crucial metadata for comprehensive fairness and robustness evaluations.

Example issues related to problematic data curation practices

Current remedies are post hoc, lack persuasive justification for adoption, or fail to provide proper contextualization for appropriate application.

Example issues related to existing solutions that address privacy and bias concerns in data curation

Our research focuses on proactive, domain-specific recommendations, covering purpose, privacy and consent, and diversity, for curating HCCV evaluation datasets, addressing privacy and bias concerns. We adopt an ante hoc reflective perspective, drawing from current practices, guidelines, dataset withdrawals, and audits, to inform our considerations and recommendations.

The guiding principles behind of considerations and recommendations

(To guide data curators towards more ethical yet resource-intensive curation, we also provide a checklist.)

It is important to make clear that our proposals are not intended for the evaluation of HCCV systems that detect, predict, or label sensitive or objectionable attributes such as race, gender, sexual orientation, or disability.

Considerations and Recommendations

Purpose

In ML, significant emphasis has been placed on the acquisition and utilization of "general-purpose" datasets1. Nevertheless, without a clearly defined task pre-data collection, it becomes challenging to effectively handle issues related to data composition, labeling, data collection methodologies, informed consent, and assessments related to data protection. We address conflicting dataset motivations and provide recommendations.

Consent and Privacy

Informed consent is crucial in research ethics involving humans2, 3, ensuring participant safety, protection, and research integrity4, 5. Shaping data collection practices in various fields(missing reference), informed consent consists of three elements: information (i.e., the participant should have sufficient knowledge about the study to make their decision), comprehension (i.e., the information about the study should be conveyed in an understandable manner), and voluntariness (i.e., consent must be given free of coercion or undue influence). While consent is not the only legal basis for data processing, it is globally preferred for its legitimacy and ability to foster trust5, 6. We address concerns related to consent and privacy, and provide recommendations.

Diversity

HCCV dataset creators widely acknowledge the significance of dataset diversity7, 8, 9, 10, 11, 12, 13, 14, 15, 16, realism17, 18, 11, 16, 8, 19, and difficulty20, 8, 21, 22, 9, 23, 11, 12, 13, 15, 16, 19 to enhance fairness and robustness in real-world applications. Previous research has emphasized diversity across image subjects, environments, and instruments24, 25, 26, 27, but there are many ethical complexities involved in specifying diversity criteria28, 29, 30, 31. We examine taxonomy challenges and offer recommendations.

Concluding Remarks

Supplementary to established ethical review protocols, we have provided proactive, domain-specific recommendations for curating HCCV evaluation datasets for fairness and robustness evaluations. However, encouraging change in ethical practice could encounter resistance or slow adoption due to established norms32, inertia33, diffusion of responsibility34, and liability concerns29.

Example reasons why more ethical practices may find resistance