A paper by H. V. Jagadish at the University of Michigan, Julia Stoyanovich at New York University, and Bill Howe at the University of Washington applies a framework for approaching data equity issues to the crucial data-driven decisions that are being made at an accelerated speed as a result of the pandemic.
The paper, COVID-19 Brings Data Equity Challenges to the Fore, was published in Digital Government: Research and Practice in March 2021.
The Frameworks for Integrative Data Equity Systems (FIDES) is a socio-legal-technical data sharing and management system that is designed to surface, negotiate, and mitigate risks due to inequity and to support accountability. FIDES is a national institute funded by the National Science Foundation that builds on the work of Foundations of Responsible Data Science (FORDS). FIDES aims to provide a repository for institutions to publish heterogeneous, sensitive, and potentially biased data from public and private sources. The paper draws upon insights developed during a two-day workshop on FORDS and FIDES, which was held remotely on March 25-26, 2020.
In this paper, the authors focus on four data equity categories: increasing the visibility of underrepresented groups; facilitating linkage across datasets to ensure access to features such as race and income data that help to surface inequity; providing for equitable and participatory access to data and data products; and monitoring and mitigating unintended consequences for any groups affected by a system after deployment. These priorities call for accounting for potential racial disparities in testing, or stigma associated with a positive diagnosis, and safeguarding against social inequities when using repurposed datasets.
The authors provide this example of when outcome equity methods are needed: “Marion County in Ohio had the highest per capita COVID infection rate in the country at one point in Summer 2020. But the numbers were inflated by an outbreak at a prison, a severe problem nationwide that has more to with systemic policies of mass incarceration than any particular mitigation strategy applied in Ohio specifically.”
Methods for addressing the above issues include adjusting bias in source data, utilizing additional datasets to add missing information, and automatically constructing nutritional labels with succinct information about the fitness for use of a dataset for a particular purpose or task.
The FIDES and FORDS workshop featured keynote presentations by Solon Barocas and H.V. Jagadish, and a panel by Ashley Casovan (AI Global), Stefaan Verhulst (GovLab), Jenny Yang (The Urban Institute), Robert Cheetham (Azavea), and Maggie Levenstein (UMich / ICPSR). The workshop report can be found here.