overview of the federal statistical system › agencies › types of survey data collected ...
TRANSCRIPT
![Page 1: Overview of the Federal Statistical System › Agencies › Types of survey data collected Challenges › Statistical Disclosure and confidentiality › Implications](https://reader035.vdocuments.net/reader035/viewer/2022062515/56649cf45503460f949c2b56/html5/thumbnails/1.jpg)
Official Statistics and Confidentiality
Maura Bardos
![Page 2: Overview of the Federal Statistical System › Agencies › Types of survey data collected Challenges › Statistical Disclosure and confidentiality › Implications](https://reader035.vdocuments.net/reader035/viewer/2022062515/56649cf45503460f949c2b56/html5/thumbnails/2.jpg)
Outline
Overview of the Federal Statistical System› Agencies› Types of survey data collected
Challenges› Statistical Disclosure and confidentiality› Implications
![Page 3: Overview of the Federal Statistical System › Agencies › Types of survey data collected Challenges › Statistical Disclosure and confidentiality › Implications](https://reader035.vdocuments.net/reader035/viewer/2022062515/56649cf45503460f949c2b56/html5/thumbnails/3.jpg)
Federal Statistical System
Headed by a Chief Statistician Decentralized System in the United
States› 13 Agencies with a statistics oriented
mission› Statistical Agencies are located throughout
various agencies in the Federal Government Examples: Census (Commerce Department),
Energy Information Administration (Department of Energy), Bureau of Labor Statistics (Department of Labor)
![Page 4: Overview of the Federal Statistical System › Agencies › Types of survey data collected Challenges › Statistical Disclosure and confidentiality › Implications](https://reader035.vdocuments.net/reader035/viewer/2022062515/56649cf45503460f949c2b56/html5/thumbnails/4.jpg)
Data
Where do the numbers come from? Survey data
Regulations by OMB› Response rates› Legal obligations› Confidentiality
![Page 5: Overview of the Federal Statistical System › Agencies › Types of survey data collected Challenges › Statistical Disclosure and confidentiality › Implications](https://reader035.vdocuments.net/reader035/viewer/2022062515/56649cf45503460f949c2b56/html5/thumbnails/5.jpg)
Confidentiality
Confidential Information Protection and Statistical Efficiency Act of 2002(CIPSEA)- places the onus on federal employees to limit disclosure› Took over 4 years to implement (Anderson and Seltzer)
3 ways to reduce within agencies: › 1) Limiting identifiability of survey materials
within the organization› 2) restricting access to data› 3) restricting the contents that may be
released
![Page 6: Overview of the Federal Statistical System › Agencies › Types of survey data collected Challenges › Statistical Disclosure and confidentiality › Implications](https://reader035.vdocuments.net/reader035/viewer/2022062515/56649cf45503460f949c2b56/html5/thumbnails/6.jpg)
Statistical Disclosure and Confidentiality
Statistical Disclosure- “the identification of an individual (or of an attribute) through the matching of survey data with information available outside of the survey” (Groves, et.al)
The federal government identifies three different types of disclosure: › Identity: inappropriate attribution of information to a data
subject, whether an individual or an organization.› Attribute: data subject is identified from a released file
sensitive information about a data subject is revealed through the released file
› Inferential: the released data make it possible to determine the value of some characteristic of an individual more accurately than otherwise would have been possible (FCSM)
![Page 7: Overview of the Federal Statistical System › Agencies › Types of survey data collected Challenges › Statistical Disclosure and confidentiality › Implications](https://reader035.vdocuments.net/reader035/viewer/2022062515/56649cf45503460f949c2b56/html5/thumbnails/7.jpg)
Example
![Page 8: Overview of the Federal Statistical System › Agencies › Types of survey data collected Challenges › Statistical Disclosure and confidentiality › Implications](https://reader035.vdocuments.net/reader035/viewer/2022062515/56649cf45503460f949c2b56/html5/thumbnails/8.jpg)
Challenges
Need to provide information› FOIA requests, Subpoenas
Satisfy requests for multiple clients. Must keep track of all withheld information
Maintain utility of data while preserving confidentiality
“Programming nightmare” to keep track of the relationship between variables, tables, and hierarchy
![Page 9: Overview of the Federal Statistical System › Agencies › Types of survey data collected Challenges › Statistical Disclosure and confidentiality › Implications](https://reader035.vdocuments.net/reader035/viewer/2022062515/56649cf45503460f949c2b56/html5/thumbnails/9.jpg)
How To Prevent
Specific Strategies Data Swapping Noise Combining Cells Rounding Cell Suppression
![Page 10: Overview of the Federal Statistical System › Agencies › Types of survey data collected Challenges › Statistical Disclosure and confidentiality › Implications](https://reader035.vdocuments.net/reader035/viewer/2022062515/56649cf45503460f949c2b56/html5/thumbnails/10.jpg)
Strategy: Data Swapping
Exchange of reported data values across data records (Fienberg, Steele, Makov, 1996)
![Page 11: Overview of the Federal Statistical System › Agencies › Types of survey data collected Challenges › Statistical Disclosure and confidentiality › Implications](https://reader035.vdocuments.net/reader035/viewer/2022062515/56649cf45503460f949c2b56/html5/thumbnails/11.jpg)
Strategy: Swapping
![Page 12: Overview of the Federal Statistical System › Agencies › Types of survey data collected Challenges › Statistical Disclosure and confidentiality › Implications](https://reader035.vdocuments.net/reader035/viewer/2022062515/56649cf45503460f949c2b56/html5/thumbnails/12.jpg)
Select 10%
Number Child County
HH Edu. HH Income
Race Sex
4Pete Alpha High 61W M
Alfonso Beta Very High 61W M
Number Child County HH Edu HH Income
Race Sex
4 Alfonso Alpha Very High
61 W M
![Page 13: Overview of the Federal Statistical System › Agencies › Types of survey data collected Challenges › Statistical Disclosure and confidentiality › Implications](https://reader035.vdocuments.net/reader035/viewer/2022062515/56649cf45503460f949c2b56/html5/thumbnails/13.jpg)
Strategy: Swapping
![Page 14: Overview of the Federal Statistical System › Agencies › Types of survey data collected Challenges › Statistical Disclosure and confidentiality › Implications](https://reader035.vdocuments.net/reader035/viewer/2022062515/56649cf45503460f949c2b56/html5/thumbnails/14.jpg)
Strategy: Noise Assign a multiplying factor, or noise factor
to all data› For example: the value of a randomly
generated variable might be added to each value in a dataset
“protect individual establishments without compromising the quality of our estimates”
Pro: More data can be published, less complicated, less time consuming
Problem: perturbing ALL data, non-sensitive and sensitive alike
![Page 15: Overview of the Federal Statistical System › Agencies › Types of survey data collected Challenges › Statistical Disclosure and confidentiality › Implications](https://reader035.vdocuments.net/reader035/viewer/2022062515/56649cf45503460f949c2b56/html5/thumbnails/15.jpg)
Strategy: Noise
How is this done: Use Multipliers› The standard is to perturb data by about 10%› Use multipliers ranging from .9 to 1.1› Must preserve trend in data- otherwise useless
for client’s analysis› Use distributions to control variance
(examples)
![Page 16: Overview of the Federal Statistical System › Agencies › Types of survey data collected Challenges › Statistical Disclosure and confidentiality › Implications](https://reader035.vdocuments.net/reader035/viewer/2022062515/56649cf45503460f949c2b56/html5/thumbnails/16.jpg)
Strategy: Noise
![Page 17: Overview of the Federal Statistical System › Agencies › Types of survey data collected Challenges › Statistical Disclosure and confidentiality › Implications](https://reader035.vdocuments.net/reader035/viewer/2022062515/56649cf45503460f949c2b56/html5/thumbnails/17.jpg)
Example: Table with and without Noise
![Page 18: Overview of the Federal Statistical System › Agencies › Types of survey data collected Challenges › Statistical Disclosure and confidentiality › Implications](https://reader035.vdocuments.net/reader035/viewer/2022062515/56649cf45503460f949c2b56/html5/thumbnails/18.jpg)
Tables Before Tabulation Strategies: Data Swapping;
Data Perturbation (Noise) Tables of Frequencies
› Percent of population with certain characteristics› With outside knowledge- respondents with unique
characteristics can be identified› Sensitive information: identified by threshold
Tables of magnitude data› Aggregate data, such as income of individuals,
revenues of companies› Extreme values› Sensitive information: identified by linear sensitivity
measure
![Page 19: Overview of the Federal Statistical System › Agencies › Types of survey data collected Challenges › Statistical Disclosure and confidentiality › Implications](https://reader035.vdocuments.net/reader035/viewer/2022062515/56649cf45503460f949c2b56/html5/thumbnails/19.jpg)
Strategy: Recoding Methods
Changing to values of outlier cases, since outliers are more likely to be sample or population uniques
Top coding- taking the largest values on a variable and giving them the same code value in dataset› For example- place all companies
producing more than 100,000 barrels of oil per day in one category
Non-uniques are unperturbed
![Page 20: Overview of the Federal Statistical System › Agencies › Types of survey data collected Challenges › Statistical Disclosure and confidentiality › Implications](https://reader035.vdocuments.net/reader035/viewer/2022062515/56649cf45503460f949c2b56/html5/thumbnails/20.jpg)
Example of DisclosureHow do we fix this?
![Page 21: Overview of the Federal Statistical System › Agencies › Types of survey data collected Challenges › Statistical Disclosure and confidentiality › Implications](https://reader035.vdocuments.net/reader035/viewer/2022062515/56649cf45503460f949c2b56/html5/thumbnails/21.jpg)
Example Cont. Collapsing of categories
![Page 22: Overview of the Federal Statistical System › Agencies › Types of survey data collected Challenges › Statistical Disclosure and confidentiality › Implications](https://reader035.vdocuments.net/reader035/viewer/2022062515/56649cf45503460f949c2b56/html5/thumbnails/22.jpg)
Strategy: Rounding
Similar to noise. Cells are rounded, random decision is made whether to round up or down› Example: x -r = 5q
Round values to the a multiple of 5 Where q = non negative integer
r = remainder X = cell value,
Rounded up, 5 x (q+1) probability of r/5Rounded down, 5 x q probability of (1-r/5)
![Page 23: Overview of the Federal Statistical System › Agencies › Types of survey data collected Challenges › Statistical Disclosure and confidentiality › Implications](https://reader035.vdocuments.net/reader035/viewer/2022062515/56649cf45503460f949c2b56/html5/thumbnails/23.jpg)
Original Table
![Page 24: Overview of the Federal Statistical System › Agencies › Types of survey data collected Challenges › Statistical Disclosure and confidentiality › Implications](https://reader035.vdocuments.net/reader035/viewer/2022062515/56649cf45503460f949c2b56/html5/thumbnails/24.jpg)
Example: Rounding
![Page 25: Overview of the Federal Statistical System › Agencies › Types of survey data collected Challenges › Statistical Disclosure and confidentiality › Implications](https://reader035.vdocuments.net/reader035/viewer/2022062515/56649cf45503460f949c2b56/html5/thumbnails/25.jpg)
Strategy: Rounding, now with constraints
![Page 26: Overview of the Federal Statistical System › Agencies › Types of survey data collected Challenges › Statistical Disclosure and confidentiality › Implications](https://reader035.vdocuments.net/reader035/viewer/2022062515/56649cf45503460f949c2b56/html5/thumbnails/26.jpg)
How to identify cells with disclosure risks for magnitude data
n-k rule p% rule
![Page 27: Overview of the Federal Statistical System › Agencies › Types of survey data collected Challenges › Statistical Disclosure and confidentiality › Implications](https://reader035.vdocuments.net/reader035/viewer/2022062515/56649cf45503460f949c2b56/html5/thumbnails/27.jpg)
P-Percent rule If upper or lower estimates for the
respondent’s value are closer to the reported value than some prespecified percentage (p) of the total cell value, the cell is sensitive (Groves, 372).
Assumptions: Any respondent can estimate the contribution of another respondent within 100% of its value
The second largest responded can use their reported value and attempt to estimate the largest reported value, X1
![Page 28: Overview of the Federal Statistical System › Agencies › Types of survey data collected Challenges › Statistical Disclosure and confidentiality › Implications](https://reader035.vdocuments.net/reader035/viewer/2022062515/56649cf45503460f949c2b56/html5/thumbnails/28.jpg)
P Percent Rule
A cell is sensitive if:
S>0where S = x1 - 100/p * (T – x2 -
x1)
For a given cell with N respondents, arrange the data in order from large to small: X1>X2>…>Xn>0
![Page 29: Overview of the Federal Statistical System › Agencies › Types of survey data collected Challenges › Statistical Disclosure and confidentiality › Implications](https://reader035.vdocuments.net/reader035/viewer/2022062515/56649cf45503460f949c2b56/html5/thumbnails/29.jpg)
Example
Consider the cell 18,177.
N=3; X1 = 17,000; X2 = 1,000; X3 = 177; p=15
![Page 30: Overview of the Federal Statistical System › Agencies › Types of survey data collected Challenges › Statistical Disclosure and confidentiality › Implications](https://reader035.vdocuments.net/reader035/viewer/2022062515/56649cf45503460f949c2b56/html5/thumbnails/30.jpg)
(n, k) Rule If a small number (n) of the respondents contribute a large
percentage (k) to the total cell value then the cell is sensitive (Groves 372)
![Page 31: Overview of the Federal Statistical System › Agencies › Types of survey data collected Challenges › Statistical Disclosure and confidentiality › Implications](https://reader035.vdocuments.net/reader035/viewer/2022062515/56649cf45503460f949c2b56/html5/thumbnails/31.jpg)
Example We are publishing production data of how
many barrels a day of crude oil each refinery produces. This is secret information. If our competitors found out, it could be detrimental to our business.
There are 4 collectors in the state with collections of 100, 50, 25, and 5 respectively
Find out if this information should be released or not using the n-k rule with (2, 85). The P Percent rule (p=35%)?
Using the P Percent rule, this cell is sensitive. However, it is not sensitive by the n-k rule
![Page 32: Overview of the Federal Statistical System › Agencies › Types of survey data collected Challenges › Statistical Disclosure and confidentiality › Implications](https://reader035.vdocuments.net/reader035/viewer/2022062515/56649cf45503460f949c2b56/html5/thumbnails/32.jpg)
Relationship between n-k and p% rule
![Page 33: Overview of the Federal Statistical System › Agencies › Types of survey data collected Challenges › Statistical Disclosure and confidentiality › Implications](https://reader035.vdocuments.net/reader035/viewer/2022062515/56649cf45503460f949c2b56/html5/thumbnails/33.jpg)
System of equations:P%: Z2 > 100 – 1.35Z1(n,k): Z2 > 85 – Z1
Variable ConstraintsZ2 < Z1Z1 + Z2 < 100
![Page 34: Overview of the Federal Statistical System › Agencies › Types of survey data collected Challenges › Statistical Disclosure and confidentiality › Implications](https://reader035.vdocuments.net/reader035/viewer/2022062515/56649cf45503460f949c2b56/html5/thumbnails/34.jpg)
Relationship between n-k and p% rule
![Page 35: Overview of the Federal Statistical System › Agencies › Types of survey data collected Challenges › Statistical Disclosure and confidentiality › Implications](https://reader035.vdocuments.net/reader035/viewer/2022062515/56649cf45503460f949c2b56/html5/thumbnails/35.jpg)
(55.56, 27.27)
![Page 36: Overview of the Federal Statistical System › Agencies › Types of survey data collected Challenges › Statistical Disclosure and confidentiality › Implications](https://reader035.vdocuments.net/reader035/viewer/2022062515/56649cf45503460f949c2b56/html5/thumbnails/36.jpg)
Strategy: Sensitive Cell Suppression
Primary Suppressions: The sensitive Cell Complementary/Secondary Suppressions:
Additional withheld data to ensure that the primary suppressions cannot be derived by linear combination
Goal: Minimize information lost. This is accomplished by selecting smallest possible cell values for complementary cell suppression
Problem: Often requires a substantial amount of data to be withheld. Potential for errors may lead to the release of confidential data
![Page 37: Overview of the Federal Statistical System › Agencies › Types of survey data collected Challenges › Statistical Disclosure and confidentiality › Implications](https://reader035.vdocuments.net/reader035/viewer/2022062515/56649cf45503460f949c2b56/html5/thumbnails/37.jpg)
Strategy: Sensitive Cell Suppression
Small Tables:› Manual suppression› Computerized audit procedures
Large Tables:› Much more complex, especially with
related tables and hierarchical data› Consistency
![Page 38: Overview of the Federal Statistical System › Agencies › Types of survey data collected Challenges › Statistical Disclosure and confidentiality › Implications](https://reader035.vdocuments.net/reader035/viewer/2022062515/56649cf45503460f949c2b56/html5/thumbnails/38.jpg)
Real Example: Disclosure
![Page 39: Overview of the Federal Statistical System › Agencies › Types of survey data collected Challenges › Statistical Disclosure and confidentiality › Implications](https://reader035.vdocuments.net/reader035/viewer/2022062515/56649cf45503460f949c2b56/html5/thumbnails/39.jpg)
Cell Suppression Example Let’s return to a previous example:
Sales Revenue We determined that we must the cell
must be suppressed. How do we accomplish this?
![Page 40: Overview of the Federal Statistical System › Agencies › Types of survey data collected Challenges › Statistical Disclosure and confidentiality › Implications](https://reader035.vdocuments.net/reader035/viewer/2022062515/56649cf45503460f949c2b56/html5/thumbnails/40.jpg)
Example of a Solution
![Page 41: Overview of the Federal Statistical System › Agencies › Types of survey data collected Challenges › Statistical Disclosure and confidentiality › Implications](https://reader035.vdocuments.net/reader035/viewer/2022062515/56649cf45503460f949c2b56/html5/thumbnails/41.jpg)
Conclusion: Data is secure
High levels of security and suppression protect data are necessary as data guides real life policy issues.
Quality of this data is dependent on not only a high response rate, but accurate responses
Producing data is a function of “public trust” However, the point of data collection is its
use and analysis. The tradeoff between confidentiality and utilization must be examined
![Page 42: Overview of the Federal Statistical System › Agencies › Types of survey data collected Challenges › Statistical Disclosure and confidentiality › Implications](https://reader035.vdocuments.net/reader035/viewer/2022062515/56649cf45503460f949c2b56/html5/thumbnails/42.jpg)
…Or is it?
Patriot Act 2001 (Anderson & Seltzer) Section 508: Disclosure information from
National Center for Education Statistics Surveys
Justice Department is able to obtain and use for investigation and prosecution reports, records, and information (including individually identifiable information)
The Patriot Act overrides the 1994 National Center for Education Statistics Act that protections confidentiality
![Page 43: Overview of the Federal Statistical System › Agencies › Types of survey data collected Challenges › Statistical Disclosure and confidentiality › Implications](https://reader035.vdocuments.net/reader035/viewer/2022062515/56649cf45503460f949c2b56/html5/thumbnails/43.jpg)
Other examples from history
Second War Powers Act (1942-1947) Repealed confidentiality protects of Title 13
governing the US Census Bureau (Anderson & Seltzer)
Japanese Americans and Internment camps (USA Today)
![Page 44: Overview of the Federal Statistical System › Agencies › Types of survey data collected Challenges › Statistical Disclosure and confidentiality › Implications](https://reader035.vdocuments.net/reader035/viewer/2022062515/56649cf45503460f949c2b56/html5/thumbnails/44.jpg)
2004 data on Arab-Americans (NYT)› Released number of Arab-Americans per
zip code› Categorized by country of origin: Egyptian,
Iraqi, Jordanian, Lebanese, Moroccan, Palestinian, Syrian and two general categories, "Arab/Arabic" and "Other Arab."
› Data obtained from a sample (the long form of the census)
![Page 45: Overview of the Federal Statistical System › Agencies › Types of survey data collected Challenges › Statistical Disclosure and confidentiality › Implications](https://reader035.vdocuments.net/reader035/viewer/2022062515/56649cf45503460f949c2b56/html5/thumbnails/45.jpg)
In conclusion…
…the next time you fill out a survey, think about where your information may (or may not) be used.
![Page 46: Overview of the Federal Statistical System › Agencies › Types of survey data collected Challenges › Statistical Disclosure and confidentiality › Implications](https://reader035.vdocuments.net/reader035/viewer/2022062515/56649cf45503460f949c2b56/html5/thumbnails/46.jpg)
Sources Clemetson, Lynette. “Homeland Secuirty given data on Arab-
Americans.” New York Times. July 30, 2004. http://www.nytimes.com/2004/07/30/politics/30census.html
El Nasser, Haya. “Papers show Census role in WWII Camps.” USA Today. March 30, 2007. http://www.usatoday.com/news/nation/2007-03-30-census-role_N.htm
“DoD releases FY 2010 Budget Proposal.” US Department of Defense. May 7, 2009. http://www.defenselink.mil/releases/release.aspx?releaseid=12652
Seltzer, William and Margo Anderson. “NCES and the Patriot Act.” Paper prepared for the Joint Statistical Meetings. 2002. http://www.uwm.edu/~margo/govstat/jsm.pdf
Evans, Timothy, Laura Zayatz, and John Slanta. “Using Noise for Disclosure Limitation of Establishment Tabular Data.” US Census Bureau. 1996. http://www.census.gov/prod/2/gen/96arc/iiaevans.pdf
“Statistical Programs of the US Government.” Office of Management and Budget. 2009. http://www.whitehouse.gov/omb/assets/information_and_regulatory_affairs/09statprog.pdf
![Page 47: Overview of the Federal Statistical System › Agencies › Types of survey data collected Challenges › Statistical Disclosure and confidentiality › Implications](https://reader035.vdocuments.net/reader035/viewer/2022062515/56649cf45503460f949c2b56/html5/thumbnails/47.jpg)
Sources of examples
Sullivan, Colleen. “An Overview of Disclosure Principles.” US Census Bureau. 1992. http://www.2010census.biz/srd/papers/pdf/rr92-09.pdf
“Statistical Policy Working Paper: Report on Statistical Disclosure Methodology.” Federal Committee on Statistical Methodology. 2005. http://www.fcsm.gov/working-papers/SPWP22_rev.pdf
Groves, Robert et. al. Survey Methodology. Hoboken, NJ: John Wiley & Sons. 2004.
![Page 48: Overview of the Federal Statistical System › Agencies › Types of survey data collected Challenges › Statistical Disclosure and confidentiality › Implications](https://reader035.vdocuments.net/reader035/viewer/2022062515/56649cf45503460f949c2b56/html5/thumbnails/48.jpg)
Additional Resources
http://jpc.cylab.cmu.edu/journal/2009/vol01/issue01/issue01.pdf
http://www.census.gov/srd/sdc/papers.html
http://www.census.gov/srd/sdc/abowd-woodcock2001-appendix-only.pdf