When patients participate in health studies, their medical and genetic details are typically kept anonymous to protect them. A controversial Trump administration plan to limit the scientific studies used in policymaking could put that anonymity in jeopardy, a new study warns.
Later this year, the Environmental Protection Agency is expected to finalize a proposal that would preclude the use of independent environmental health studies in setting pollution controls, unless the researchers made public the raw data underlying the studies.
The EPA and its backers, including industries that are affected by pollution standards, contend that such a measure would deliver the strongest science for setting pollution standards. But thousands of scientists, environmentalists and public health advocates maintain that the proposal would profoundly undermine the use of independent science in protecting human health. Seminal environmental health studies often rely on confidential data, and without that research showing harm to human health, scientists and environmentalists say the EPA could push for looser pollution limits.
A new study published this week in the journal Environmental Health Perspectives argues that the proposed EPA rule change would increase the risk that confidential health data could be linked back to patients.
Currently, scientists must share analyzed data through the peer review process for research journals and with the EPA. The Trump administration proposal would require that the "raw data," which is much more detailed but still does not use people's names, be turned over to the government and others without a clear set of safeguards to protect people's privacy as there currently is in academia.
The new study shows how five essential kinds of data that are used in health studies—patients' location, medical data, genetic information, occupation and housing—could be used to re-identify individuals by cross-referencing them with public and commercial data sets, even after the studies anonymized the data by removing the names, birth dates and other overt identifiers. Re-identification is when anonymized data such as names or addresses can be linked back to one person or several people.
The Health Insurance Portability and Accountability Act (HIPAA), the main federal law governing the privacy of health information, does protect some data of participants in environmental health studies, but it is not adequate, said Julia Green Brody, the principal investigator of the analysis and executive director of the Silent Spring Institute, a Massachusetts-based nonprofit that does research on environmental chemicals and women's health. Other protections against such re-identification also exist now in academia and the EPA. But scientists, environmentalists and public health advocates fear those could be slipped or set aside if the EPA's new controversial data standard goes into effect.
An EPA spokeswoman said in an email that the agency's proposed "secret science" rule would not make personal data more vulnerable to re-identification, because it would "follow all laws" regarding data privacy and take into account public comment on the issue.
The rule was delayed after an outpouring of opposition from the academic community, the editors of major scientific journals, former EPA administrators and others. But EPA Administrator Andrew Wheeler has said he remains committed to enacting the plan. The agency's Science Advisory Board is meeting next week to discuss a draft recommendation that Wheeler address numerous issues, including privacy, before proceeding with the rule.
The new standard would apply to any research data and would be retroactive, affecting, for example, landmark studies that undergird the Clean Air Act and the Clean Water Act.
Environmental health data, by its nature, is much more reliant on the location, occupation and even housing of its participants than other kinds of medical research. Measurements of exposure to pollutants are usually at the level of a job, home or an individual. At the same time, more consumer data is being collected about Americans by the devices and mobile phone apps they use, the authors said. For instance, more people have wearable devices such as smartphones and Fitbits that continuously gather health and location information.
The abundance of consumer data that's publicly available or can be purchased could be cross-referenced with anonymized study data to identify people, the researchers found.
Ways Participant Data Is Vulnerable
The analysis is based on a dozen studies that mostly began in the 1990s, some of which continue today. They were picked because of their importance to the environmental health field. The studies also contained data that could be vulnerable to being linked to publicly available or commercial datasets.
For example, nine of the 12 studies were conducted in particular geographic areas. "The ability to identify location improves the ease and likelihood of matching demographic information, such as gender, race, and age, to voter lists or commercial lists of residents," the study said.
Ten studies included in the analysis contained data about the subjects' occupations, including agricultural workers exposed to pesticides and workers involved in the Deepwater Horizon oil spill cleanup. Licensing lists of professionals, LinkedIn and professional society membership could be overlaid on anonymized health data, increasing the chances that the subjects could be identified, the researchers wrote. All 12 studies contained at least two of the five demographic or data criteria that the researchers used to probe the limits of participant anonymity. Through one analysis, "participants' region of residence could be inferred with 80-98% accuracy," the study found.
Under the current system used in academia, there are protections against the use of public data to re-identify participants in anonymized studies. Universities have institutional review boards created to safeguard the rights of human subjects who participate in research and train scientists who work with them. Anonymity is often essential to get people to share sensitive personal information in health studies, and the career consequences for researchers who violate review board standards are grave.
It's unclear how the EPA would protect patients' privacy if data from studies were shared much more broadly and then used to re-identify study subjects, using methods and datasets the study employed. It is also uncertain what penalties, if any, would exist for such actions.
Brody said researchers and the public would face a difficult choice if the proposal were to be finalized in its current form: They could turn data over to the EPA and risk the exposure of highly sensitive personal information about subjects, or they could decline to participate in the agency's rulemaking process, limiting the pool of independent research supporting new rules and potentially resulting in weakened regulation.
"What we're saying is that we don't have good methods for protecting data if they're going to be shared openly," Brody said.
What's at Risk for Patients?
Lots of people might be interested in re-identified study data, the study said. Employers and insurers could seek to re-identify data and use the information to discriminate against people or property exposed to certain hazards, the authors wrote. Corporations facing pollution lawsuits could use re-identified data to pressure litigants to drop the case.
"Loss of privacy from re-ID could result in stigma for individuals and communities; affect property values, insurance, employability and legal obligations; or reveal embarrassing or illegal activity," the authors wrote. "It could damage trust in research."
If such re-identification became commonplace, people might decline to participate in environmental health studies, limiting the scope and strength of necessary research.
"This is an important contribution to our understanding of this issue," said Gretchen Goldman, research director of the Center for Science and Democracy at the Union of Concerned Scientists, a scientific research and advocacy group. "It is alarmingly easy to identify personal information from participants when studying environmental health."
Brody said that if the EPA rule is finalized in its current form, it would "gravely undermine public health." She also noted that even without the rule, there were risks to data, and urged academic and other researchers to "not be casual" about the possibility of outside entities or hackers working to access data and re-identify participants. Her co-authors include researchers from Northeastern University, the Massachusetts Institute of Technology and Harvard University.
The study builds on work the same team conducted for a 2017 study that yielded troubling results. That study tasked teams of researchers to re-identify participants in a chemical exposure project. The result: one of every six anonymous participants was ultimately identified by name using publicly available and commercial data.
Correction: This story has been edited to correct the role of an author on the journal article, the description of the Silent Spring Institute and the categories of data that are anonymized in health studies.