g-confid: turning the tables on disclosure risk

12
G-Confid: Turning the tables on disclosure risk Joint UNECE/Eurostat Work Session on Statistical Data Confidentiality Ottawa, Canada 30 October 2013 Peter Wright

Upload: nico

Post on 23-Feb-2016

43 views

Category:

Documents


0 download

DESCRIPTION

G-Confid: Turning the tables on disclosure risk. Joint UNECE/ Eurostat Work Session on Statistical Data Confidentiality Ottawa, Canada 30 October 2013. Peter Wright. G-Confid: a cell suppression application. Use with any table size and any number of dimensions - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: G-Confid: Turning the tables on disclosure risk

G-Confid: Turning the tables on disclosure risk

Joint UNECE/Eurostat Work Session on Statistical Data Confidentiality Ottawa,

Canada 30 October 2013Peter Wright

Page 2: G-Confid: Turning the tables on disclosure risk

2

G-Confid: a cell suppression application Use with any table size and any number of dimensions (subject to hardware / memory limitations) Available for SAS 9.2 and 9.3; SAS EG 4.3 and 5.1

PROC SENSITIVITY identifies sensitive cells• Highlights, inputs, strategies

Macro SUPPRESS creates a suppression pattern Inputs, outputs, strategies

Macro AUDIT audits a suppression pattern

Overview by component

Page 3: G-Confid: Turning the tables on disclosure risk

PROC SENSITIVITY identifies confidential cellsHighlights: Choice of sensitivity rule: p-percent, (n,k), arbitrary

Allows multipledecomposition

3

n

iiixS

1

111 where

Page 4: G-Confid: Turning the tables on disclosure risk

Inputs for PROC SENSITIVITY Definition of hierarchy(ies) for each table dimension Microdata file

• Classification variables (e.g., geography, industry)• Enterprise identifier• Enterprise value

4

Tip: to reduce the sensitivity of a cell by the value of an enterprise, set the enterprise identifier = missing

Page 5: G-Confid: Turning the tables on disclosure risk

Example of SAS code to run PROC SENSITIVITY proc sensitivity data=microfile outconstraint=consfile outcell=cellfileoutlargest=largestfilehierarchy="0 East West; 0 1 2 3;"srule=“pq .20"range=“East A B: West C D;

1 101 201 301: 2 102 202 302: 3 103 203 303;"minresp=5;

id Enterpriseid;var Income;dimension EastWest Industry;run;5

Page 6: G-Confid: Turning the tables on disclosure risk

Strategies using PROC SENSITIVITY Use the MINRESP=r option to set the minimum

number of respondents• Any cell with fewer than r respondents is assigned a

sensitivity of max{1, S} where S is the sensitivity of the cell

• Only positive (>0) values are counted as respondents• MINRESP rule is ignored for a cell with a value

contributed by an anonymous enterprise

6

Note: we can use MINRESP without applying a sensitivity rule

Page 7: G-Confid: Turning the tables on disclosure risk

Strategies using PROC SENSITIVITY (continued)

To reduce oversuppression, apply rules that make use of sampling weights

Example: if the sampling weight wi>3, make the enterprise anonymous (set ID value=missing). G-Confid will use its contribution to reduce the sensitivity of the cell.

7

Find more strategies in: Tambay and Fillion (Proceedings of the JSM 2013)

Page 8: G-Confid: Turning the tables on disclosure risk

Macro SUPPRESS – complementary suppression Uses the SAS/OR® LP solver Input files: (i) cell sensitivities file, and (ii) linear constraints file Syntax: %Suppress(InCell=, Constraint=, CFunction1=, CFunction2=, CVar1=, CVar2=, OutCell=, ByVars=, OutComplement=, ScaleCost=);

Output file has final status (Suppress, Publish) and the net variation (largest amount the cell was “moved”)

8

Page 9: G-Confid: Turning the tables on disclosure risk

Strategies using the macro SUPPRESS Choice of cost functions (functions of cell total)

• Can run the LP process twice to reduce the number of suppressions (e.g., SIZE or DIGITS, then INFORMATION)

Can favour publishing certain cells by defining higher cost values (by default, cost=tot)

9

SIZE (=tot) DIGITS (=log[tot+1])

CONSTANT (=1) INFORMATION (=log[tot+1]/[tot+1])

Page 10: G-Confid: Turning the tables on disclosure risk

Macro AUDIT – validates a suppression pattern Calculates minimum and maximum values for each

suppressed cell using LP solver Provides results for each cell (protection achieved,

not achieved, or exact disclosure)

10

Coming soon: pre-set narrower starting intervals than the default values (0.5tot and 1.5tot) using the Shuttle algorithm (Buzzigoli and Giusti (2006))

Using the Shuttle algorithm to pre-set the starting intervals ↓ run time

Page 11: G-Confid: Turning the tables on disclosure risk

11

PROC SENSITIVITY Use pre-defined or customized sensitivity rule Can do multiple decomposition MINRESP function Can apply weighting strategies

Macro SUPPRESS Can favour cells to publish (or suppress)

Macro AUDIT

Conclusion

Coming soon: additive controlled rounding

Page 12: G-Confid: Turning the tables on disclosure risk

12

For more information, Pour plus d’information,please contact: veuillez contacter :

Peter [email protected]