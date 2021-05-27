Cancel
CreatorsPublishersAdvertisers
View more in
Politics

Census Bureau's use of 'synthetic data' worries researchers

By MIKE SCHNEIDER
Posted by 
KIRO 7 Seattle
KIRO 7 Seattle
 9 days ago
https://img.particlenews.com/image.php?url=1m0RHN_0aDX1xPU00

ORLANDO, Fla. — (AP) — First came the “noise” — small errors the U.S. Census Bureau decided to introduce into the 2020 census data to protect participants' privacy. Now the bureau is looking into “synthetic data,” manipulating the numbers widely used for economic and demographic research, to obscure the identities of people who provided information.

The moves have some researchers up in arms, worried that the statistical agency could sacrifice accuracy in its zeal to protect privacy.

Census Bureau statisticians disclosed at a virtual conference last week that over the next three years they will work toward developing a method to create “synthetic data" for files on individuals and homes that already are devoid of personalized information. These files, known as American Community Survey microdata, are used by researchers to create customized tables tailored to their research.

Census Bureau statisticians said more privacy protections are needed as technological innovations magnify the threat of people being identified through their survey answers, which are confidential. Computing power is now so vast that it can easily crunch third-party data sets that combine personal information from credit rating and social media companies, purchasing records, voting patterns and public documents, among other things.

“It’s a balancing act. The law requires us to do competing things. We need to release statistics on the nation to allow people to make useful decisions. But we also have to protect the privacy of our respondents,” said Rolando Rodriguez, a Census Bureau statistician, at the conference.

But critics say the proposal, coupled with an ongoing effort to add small inaccuracies to the 2020 census data in order to protect participants' privacy, undermines the Census Bureau's credibility as the go-to provider of precise data about the U.S. population.

University of Minnesota demographer Steven Ruggles said bluntly that synthetic data “will not be suitable for research."

“The Census Bureau is inventing imaginary threats to confidentiality to sharply reduce public access to data," Ruggles said. “I do not think this will stand, because society needs information to function."

The microdata are gathered every year from the American Community Survey with a sample size of 3.5 million households, extrapolated across populations of all sizes, from the entire nation down to neighborhoods. This provides a wide range of estimates on the nation’s demographic makeup and housing characteristics. The microdata are used in the drafting of around 12,000 research papers a year, Ruggles said.

The synthetic data are created by taking variables in the microdata to build models recreating the interrelationships of the variables and then constructing a simulated population based on the models. Scholars would conduct their research using the simulated population — or the synthetic data — and then submit it, if they want, to the Census Bureau for double checking against the real data to make sure their analyses are correct.

Ruggles said new discoveries in data will be missed since the models only capture what is already known.

Another problem is that synthetic data can amplify an outlier, such as in a health study where one person engages in risky behavior multiple times but others don't, and it makes it seem like the risky behavior is more widespread than it actually is, said David Swanson, a professor emeritus of sociology at the University of California Riverside.

There are benefits, though, such as the ability to get details about people at really small geographic levels such as neighborhood blocks, said Cornell University economist Lars Vilhuber, who has done research on the method. The synthetic data makes that possible because it protects privacy, he said,

“You can actually get far more detail into the data than with traditional methods," Vilhuber said.

The Census Bureau said in a statement Thursday that it hasn't made any final decisions on the use of synthetic data in the American Community Survey and it welcomed feedback from researchers. The technique already is used on a limited basis in other surveys by the bureau, which describes the process as combining multiple data sources to produce estimates that cut back on errors at small geographic levels.

The Census Bureau has taken other recent steps to protect individuals’ privacy, which has gotten harder in the face of a proliferation of outside data sources. This year, the bureau proposed using housing units instead of people when defining an urban area. And it has drawn fierce criticism for using a statistical technique known as “differential privacy” in 2020 census data that will be used for drawing congressional and legislative districts.

Differential privacy adds mathematical “noise,” or intentional errors, to the data to obscure any given individual’s identity while still providing statistically valid information. It has been challenged in court by the state of Alabama which says its use will result in inaccurate data.

“The Census Bureau is saying this is in the tradition of what they have always done” in protecting privacy, said historian Margo Anderson, a professor at the University of Wisconsin-Milwaukee. “There’s an increasingly substantial organization of critics saying this is completely different. They say, ‘You have never made the data intentionally inaccurate.'”

The Census Bureau first floated the idea of using synthetic data three years ago, but concerns over that and differential policy got shoved aside after the Trump administration failed unsuccessfully to add a citizenship question to the 2020 census questionnaire and the pandemic challenged the nation's head count last year, Anderson said.

For Swanson, the Census Bureau's efforts at privacy reminds him of the quote that reporter Peter Arnett attributed to an unnamed U.S. military official during the Vietnam War: ″We had to destroy the town in order to save it."

“I feel they literally would destroy the census data to save it from an uncertain threat,” Swanson said. “If they destroy the data, they are going to destroy the bureau.”

___

Follow Mike Schneider on Twitter at https://twitter.com/MikeSchneiderAP

Copyright 2021 The Associated Press. All rights reserved. This material may not be published, broadcast, rewritten or redistributed without permission.

KIRO 7 Seattle

KIRO 7 Seattle

Seattle, WA
34K+
Followers
52K+
Post
17M+
Views
ABOUT

KIRO 7 News is serving the Puget Sound region with live, local and in-depth coverage you can count on with local news, sports, weather, and traffic.

 https://www.kiro7.com/
RELATED LOCAL CHANNELS
State
Alabama State
IN THIS ARTICLE
#Census Data#U S Census#Survey Data#Census Statistics#Research Data#First Data#Ap#The U S Census Bureau#The Census Bureau#University Of Minnesota#Cornell University#Associated Press#Synthetic Data#Multiple Data Sources#Precise Data#Inaccurate Data#Third Party Data Sets#Estimates#Populations#Patterns
YOU MAY ALSO LIKE
News Break
Politics
Related
Panola, ILpeoriastandard.com

Census Bureau reports Panola population was 14 in 2019

Panola had a population of 14 people in 2019, according to U.S. Census Bureau data obtained by the Peoria Standard. The median age was 58, with 42.9 percent of the total population being female and 57.1 percent male. The state's total population in 2019 was 12,770,631. An agency of the...
U.S. Politicscrossroadstoday.com

Ohio, Census Bureau reach agreement on redistricting data

The state of Ohio and the U.S. Census Bureau asked a judge on Tuesday to place on hold their court fight over when data used for redrawing congressional and legislative districts will be released. As part of a settlement agreement, the Census Bureau promised to release the redistricting data no...
PoliticsSantafe New Mexican.com

Census data is a warning to New Mexico

The U.S. Census Bureau recently released what can only be described as disturbing data regarding the future of New Mexico. Notably, while the Land of Enchantment’s population grew by just 2.8 percent over the past decade, each of our neighbors saw double-digit population growth, with the exception of Oklahoma, which still bested New Mexico with 5.5 percent growth.
PoliticsUS News and World Report

Harvard Researchers Recommend Census Not Use Privacy Tool

A group of Harvard researchers has come out against the U.S. Census Bureau's use of a controversial method to protect privacy with the numbers used for redrawing congressional and legislative districts, saying it doesn't produce data good enough for redistricting. The Harvard researchers said in a paper released last week...
U.S. Politicsomahadailyrecord.com

Conservatives Questioning Census Method for Uncounted

When U.S. Census Bureau workers couldn’t find out any information about some households after repeatedly mailing them questionnaire reminders and sending census takers to knock on their doors, the statisticians turned to an obscure, last-resort statistical technique known as “imputation.”. Less than 1% of households were counted using the technique...
Iowa Statethegazette.com

Redistricting on hold until Iowa gets detailed Census Bureau data

DES MOINES — When Iowa lawmakers wrapped up their 20201 session just before midnight May 19, they adjourned sine die — that is, without setting a date to return. They expect to be back in August, but at this time that’s more of a guess than a definite timetable. That’s because they are waiting on the U.S. Census Bureau to provide the granular data needed to complete redistricting — the redrawing of congressional and legislative district boundaries based on the decennial census.
Marketsminneapolisfed.org

Native American Labor Market Dashboard fills important data gap for Indian Country

Native American Labor Market Dashboard fills important data gap for Indian Country. Up-to-date labor market data are essential for understanding how workers and the overall economy are faring. While federal statistical agencies such as the Bureau of Labor Statistics and the U.S. Census Bureau provide estimates of core labor market indicators for both the entire workforce and larger racial and ethnic groups, comparable data for Native Americans and Alaska Natives are not easily accessible. With its new Native American Labor Market Dashboard, Center for Indian Country Development (CICD) fills this gap.
Arizona Statekjzz.org

Census Bureau Ranks Arizona's Per Pupil Spending 49th In Nation

The U.S. Census Bureau ranks Arizona 49th in the nation in terms of its per pupil spending for fiscal year 2019. Last fiscal year, Arizona spent about $10,000 per student, below the national average — about $15,700. The study included Washington, D.C. Only Utah and Idaho’s 2019 per pupil spending...
Tampa, FLPosted by
HowStuffWorks

Why the U.S. Monthly Jobs Report Matters

Seminole Hard Rock Casino department supervisors speak with a job applicant during a job fair at the Seminole Hard Rock Casino May 25, 2021, in Tampa, Florida. Octavio Jones/Getty Images. The first Friday of every month is circled in red marker on the calendars of every economics geek, policy wonk...
Economynorthwestgeorgianews.com

EDITORIAL: Those collecting unemployment benefits should be looking for jobs

Jun. 3—Businesses are scrambling to fill job vacancies as the country continues to reopen from the coronavirus pandemic, but the reality is that there simply are not enough people looking for work. The state made the right move to pull back on the relaxed standards for unemployment benefits and to reinstate the requirement that those collecting benefits must be actively looking for work.
AdvocacyMountain Xpress

Cybersecurity Youth Apprenticeship Initiative partners with Goodwill Industries of Northwest North Carolina to train youth tech workers

Press release from Goodwill Industries of Northwest North Carolina:. The Cybersecurity Youth Apprenticeship Initiative (CYAI) is partnering with Goodwill Industries of Northwest North Carolina (Goodwill) to provide information technology (IT) registered apprenticeship programming to youth ages 16-21, with a focus on women and individuals of color. The Youth IT Apprenticeship...