A Project of The Annenberg Public Policy Center

Debate Over EPA’s ‘Transparency’ Rule


The Environmental Protection Agency proposed a rule to only use scientific studies with “publicly available” data when it develops regulations. This has sparked a debate in Congress on whether the proposal would prevent the EPA from considering studies that analyze private health information, including those that underpin air pollution standards.

During congressional hearings on April 26, critics of the rule claimed it would force the agency to exclude important studies because releasing data publicly would violate confidentiality agreements between study participants and researchers. Proponents maintained it wouldn’t exclude important studies because confidential information can be redacted.

Who’s right? We find fault with both characterizations of the proposed rule.

Studies that use confidential health information might still be considered by the EPA under the new rule — but not because private data can simply be redacted. Sometimes it can’t, including in the case of a 1993 Harvard study used to craft air pollution standards — a study cited by critics to support their argument.

Still, the rule includes a provision that would allow the EPA administrator to exempt regulations if releasing study data publicly conflicts with protecting privacy. The rule also allows for alternatives to full-on public release in cases where the data include confidential information.

Here we will detail the EPA proposal, what was said about it at the April 26 hearings, and use the 1993 Harvard study on air quality to explain why protecting confidential health information is not always as easy as redacting personal data.

Who Said What

The debate over this proposed rule in Congress is split down party lines, with Democrats opposing it and Republicans supporting it.

During an April 26 hearing on the EPA’s fiscal year 2019 budget request, for example, Rep. Raul Ruiz, a doctor and a Democrat from California, challenged EPA Administrator Scott Pruitt on the rule.

Ruiz first argued that the “type of studies” the EPA wants “to exclude are the same kind of scientific studies that were used to prove that lead in pipes and paints harm children and that secondhand smoke is a dangerous carcinogen.”

“We’re talking about landmark studies, such as the Harvard School of Public Health’s Six Cities Study, which proved a connection between air pollution and early death back in 1993,” he added.

Ruiz then asked Pruitt if the EPA’s proposed rule would cause the “agency to disregard” these studies.

Ruiz, April 26: Will these new regulations cause your agency to disregard these sentinel studies?

Pruitt: If they provide the data and methodology to the agency and the findings, they will be used.

Ruiz: But that is a clear violation of ethical rules protecting patient confidentiality. Who’s protecting …

Pruitt: Those can be redacted, congressman.

During the same hearing, Rep. Kevin Cramer, a Republican from North Dakota, had an exchange with Pruitt that exemplified the proponents’ argument.

Cramer, April 26: Maybe you could elaborate a little bit, how personal data can be protected and is protected. Nobody’s asking for the names of every victim of every, you know, of every pollution source that’s ever happened in the world, or that’s been sourced in any study. They’re not asking for personal data. We’re asking simply for the science to be revealed. You can protect the data, right?

Pruitt: Both the personal data, congressman, as well as confidential business information, both CBI and person information can be redacted and can be addressed and still serve the purposes of the proposed rule.

As we’ll explain, private health data includes more than just a person’s name. Still, studies that analyze confidential information — both of people and businesses — don’t necessarily have to be excluded, as Ruiz claimed.

Dissecting the Rule

Let’s take a closer look at what the proposed rule itself stipulates. The rule says it intends to “strengthen the transparency of EPA regulatory science” by ensuring study data are “publicly available.”

EPA, April 30: The proposed regulation provides that, for the science pivotal to its significant regulatory actions, EPA will ensure that the data and models underlying the science is publicly available in a manner sufficient for validation and analysis.

By “significant regulatory actions,” the proposed rule means any regulation that would likely “adversely” affect a whole host of entities, including the economy, jobs, the environment, public health, or state and local governments. This definition comes from Executive Order 12866, signed by President Bill Clinton in 1993.

The rule also clarifies that it applies specifically to “dose response data and models,” which it defines as “data and models used to characterize the quantitative relationship between the amount of dose or exposure to a pollutant, contaminant, or substance and the magnitude of a predicted health or environmental impact.”

The rule distinguishes these kinds of data from those “that are designed to predict the costs, benefits, market impacts and/or environmental effects of specific regulatory interventions on complex economic or environmental systems.”

In other words, the proposed rule would likely apply to studies that look at the effect of air pollution on mortality, for example, but not necessarily studies that evaluate how much it would cost to implement air pollution standards.

The rule also includes provisions that specifically pertain to protecting privacy and confidentiality, stating that “requirements for availability may differ” depending on the nature of the data.

Some data may be fully accessible through “public data repositories,” the rule says. Other data may have “controlled access in federal research data centers,” meaning members of the public may have to apply for access and sign “nondisclosure agreements” to access the data.

The rule also allows the EPA administrator to exempt certain regulations “if he or she determines that compliance is impracticable because” the agency can’t find a way to release study data in a way that “protects privacy and confidentiality.” 

So, given the language of the proposed rule, it would be up to Pruitt (or future administrators) to decide which regulations are exempted and which aren’t.

The rule also says it’s “intended to apply prospectively to final regulations,” meaning it wouldn’t apply to past regulations created by previous administrations, only future ones.  

However, the EPA does ask for comment on how the agency should handle regulatory programs that base future regulations on past ones. Take the National Ambient Air Quality Standards program, which the rule points to as an example of such a program.

Every five years, the EPA is required under the Clean Air Act to review studies on how six air pollutants, including particle pollution and lead, harm human health and the environment. After that review, the agency may decide to amend its air quality standards in light of new evidence. 

What the proposed rule doesn’t clarify is whether studies included in previous reviews — likely to be considered again in future reviews — would be required to have their data made publicly available.

The Harvard Study

In the course of criticizing the proposed rule, multiple Democrats pointed to a 1993 Harvard study that the EPA used to develop air quality standards for particle pollution. This study would likely be considered again in future NAAQS reviews. So we’ll use it to explain why protecting confidential health information is not always as easy as redacting people’s names.

As we already mentioned, Ruiz, from California, pointed to the study during the April 26 hearing, arguing it was an example of the “type of studies” the EPA wants “to exclude” from rule-making. During another EPA budget hearing on the same day, Rep. Betty McCollum, a Democrat from Minnesota, also cited this study when asking Pruitt if it’s “appropriate” to “ask Americans to give up their personal health information for public consumption.”

So what exactly did this study, published in the New England Journal of Medicine in December 1993, find?

Led by Douglas Dockery, a professor of environmental epidemiology at Harvard, the study looked at the effect of air pollution on the mortality rate of six U.S. cities — Watertown, Massachusetts; Harriman, Tennessee; St. Louis, Missouri; Steubenville, Ohio; Portage, Wisconsin; and Topeka, Kansas.

To do so, the researchers collected data on more than 8,000 individuals total, including their ages, heights, weights, education levels, occupations, smoking histories and medical histories. The researchers also followed up on study participants annually for about 15 years to see who was still living and who had died. During this same period, they monitored air quality in each location as well.

The study found that the death rate of people living in the city with the dirtiest air — Steubenville — was 26 percent higher than the rate of people living in the city with the cleanest air — Portage. For death due to lung cancer and cardiovascular disease in particular, the rate was 37 percent higher in Steubenville than in Portage. This was after the researchers excluded the effect of other risk factors for these diseases, such as cigarette smoking and occupational exposure to pollutants.

As we’ve explained in a previous article, long-term studies like this one are particularly apt at providing causal evidence for relationships, as opposed to only correlational evidence. In this case, this is because the study collected information on other risk factors for death, such as smoking and certain occupations, with the aim of singling out the effect of air pollution. 

The study also showed that, among all of the pollutants the researchers examined, fine particle pollution had the largest effect on mortality. Larger particulates can do damage to the lungs, but fine particulates can do the most damage because they can be breathed deeply into the lungs. Fine particle pollution primarily comes “from the combustion of fossil fuels in transportation, manufacturing, and power generation,” the researchers explained. 

In March 1995, Dockery and others at the American Cancer Society published another study in the American Journal of Respiratory and Critical Care Medicine that found a similar effect. But this study followed more than 500,000 people living in more than 150 U.S. metropolitan areas for seven years, collecting health, death and air quality data along the way.

Based on these studies and others, the EPA created a new standard for fine particle pollution in 1997.

By email, Dockery explained to us why he and his colleagues can’t release all of their data to the public “in a manner sufficient for validation and analysis,” as the proposed rule stipulates. 

The data the researchers used is actually multiple data sets collated together. There’s one data set of air pollution levels, which he said is already publicly available.

There’s another data set that consists of covariates, or each individual’s characteristics, such as age, height, weight, smoking history, occupation and other information. Knowing their individual characteristics alone would not be sufficient to identify an individual in the study,” he said. “These types of non-identifiable data have been released to other researchers.”

And then there’s a third data set of health outcomes, which comprises when people died and why. The difficulty arises when these individual characteristics (covariates) are combined with death records (date of death) and exposure information (place of residence),” he explained. 

To reanalyze the study, all of its data sets need to be linked together for each participant. When they’re linked together, then it becomes possible to identify individuals. “Knowing when someone died, how old they were, [their] sex and where they lived is enough to identify them,” he wrote.

If the researchers were to release their data publicly, they would likely have to redact information that would prevent others from being able to reanalyze their findings.

Dockery told us the Health Insurance Portability and Accountability Act, or HIPAA, is a “benchmark” for researchers when it comes to the kind of information they wouldn’t release to prevent violating privacy agreements with their study participants.  

Under HIPAA, the researchers would have to redact the participants’ names, birthdates and death dates, among other information, to release the data publicly. But without their death dates, for example, the study couldn’t be reanalyzed.

However, the Health Effects Institutea nonprofit funded by the EPA and the motor vehicle industry, did gain access to the researchers’ data to conduct a reanalysis of their studies in 2000. How so? 

First some background: After the EPA introduced the new standard for fine particle pollution in 1997, this led some in Congress, industry and the scientific community to have a debate about public access to data much like the one that is occurring today, HEI explains in a 2000 report

“Some insisted that any data generated using federal funding should be made public,” the report says. “Others argued that these data had been gathered with assurances of confidentiality for the individuals who had agreed to participate.”

To address the debate, Harvard University and the American Cancer Society asked HEI to “organize an independent reanalysis of the data from these studies,” the report says. This was under the condition that HEI and members of the reanalysis team would agree to keep the study participants’ information confidential.

What did HEI find? For the most part, it came to the same conclusions as Dockery and his team.

HEI, July 2000: Overall, the reanalyses assured the quality of the original data, replicated the original results, and tested those results against alternative risk models and analytic approaches without substantively altering the original findings of an association between indicators of particulate matter air pollution and mortality.

This is all to say that it does appear possible in some instances to release confidential health information to select organizations and individuals, so long as they also agree to protect the privacy of study participants. But data couldn’t be made completely public.

To be clear, the case may be different with other studies. It’s also possible that the rule itself will change. When the comment period for the proposed rule ends on May 30, the EPA will write a final rule, which may differ from the proposed one.

The final rule may also be challenged in court. Richard J. Lazarus, a professor of environmental law at Harvard, told the New York Times that Pruitt would be “walking into a judicial minefield” if he prevented the EPA from using certain studies when it develops rules.

Still, what is clear at this point is that politicians on both sides of the aisle mischaracterized the proposed rule. The EPA could still consider studies that use confidential health information under the proposed rule — but not because this private data can simply be redacted. It’s more complicated than that.