the new b ank of i taly r emote access to micro d ata (bird)

25
1 The new The new B B ank of ank of I taly taly R R emote emote access to micro access to micro D D ata ata (BIRD) (BIRD) G. Bruno, L. D’Aurizio, G. Bruno, L. D’Aurizio, R. R. Tartaglia-Polcini Tartaglia-Polcini Q2008 – Rome, July 10, 2008 Q2008 – Rome, July 10, 2008

Upload: lainey

Post on 13-Feb-2016

33 views

Category:

Documents


0 download

DESCRIPTION

The new B ank of I taly R emote access to micro D ata (BIRD). G. Bruno, L. D’Aurizio, R. Tartaglia-Polcini Q2008 – Rome, July 10, 2008. Motivation. Information release and data protection as competing goals The risk-utility tradeoff: . risk of data disclosure - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: The new  B ank of  I taly  R emote  access to micro  D ata (BIRD)

11

The newThe new B Bank of ank of Italy taly RRemote emote

access to micro access to micro DData (BIRD)ata (BIRD)

G. Bruno, L. D’Aurizio, G. Bruno, L. D’Aurizio, R. Tartaglia-PolciniR. Tartaglia-PolciniQ2008 – Rome, July 10, 2008Q2008 – Rome, July 10, 2008

Page 2: The new  B ank of  I taly  R emote  access to micro  D ata (BIRD)

22

MotivationMotivation

• Information release and data protection Information release and data protection as competing goalsas competing goals

•The risk-utility tradeoff: The risk-utility tradeoff: •riskrisk of data disclosure of data disclosure•utility utility of widespread availability of dataof widespread availability of data

for research for research

Page 3: The new  B ank of  I taly  R emote  access to micro  D ata (BIRD)

33

MotivationMotivation

GOALS (UTILITY):GOALS (UTILITY):• satisfy growing demand from external researchers for business datasatisfy growing demand from external researchers for business data• improve the improve the accountability of the Central Bank as economic research centreof the Central Bank as economic research centre• provide a service to the scientific communityprovide a service to the scientific community

CONSTRAINTS (RISK):CONSTRAINTS (RISK):• Data confidentiality must be guaranteed:Data confidentiality must be guaranteed:• as a prerequisite for respondents’ collaborationas a prerequisite for respondents’ collaboration• to foster quality of the data providedto foster quality of the data provided• is required by the lawis required by the law• Public Use File (PUF) with individual data judged unfeasible: anonymisation Public Use File (PUF) with individual data judged unfeasible: anonymisation

very problematic with business datavery problematic with business data

Page 4: The new  B ank of  I taly  R emote  access to micro  D ata (BIRD)

44

MotivationMotivation

SYNTHETIC DATA LIMITATIONS:SYNTHETIC DATA LIMITATIONS:

• Identity disclosure impossible in principle, but, particularly with extreme values, it may be possible to re-identify a source record

• Attribute disclosure may happen

• Ample literature on data confounding and synthetic data Ample literature on data confounding and synthetic data (Duncan & Lambert 1989; Rubin 1993; Little 1993; Fuller 1993; (Duncan & Lambert 1989; Rubin 1993; Little 1993; Fuller 1993; Fienberg et al. 1996; Kennickell 1997; Abowd & Woodcock Fienberg et al. 1996; Kennickell 1997; Abowd & Woodcock 2001; Reiter 2002; Raghunathan et al. 2003; etc.) 2001; Reiter 2002; Raghunathan et al. 2003; etc.)

Page 5: The new  B ank of  I taly  R emote  access to micro  D ata (BIRD)

55

ChoicesChoices

• Data confounding: create a PUF containing Data confounding: create a PUF containing perturbed data to prevent identification of perturbed data to prevent identification of individual information. Downside: results individual information. Downside: results (esp. regressions) may heavily depend on (esp. regressions) may heavily depend on the confounding technique adopted - the confounding technique adopted - controversial literature controversial literature

• Data lab (Data lab (à laà la Istat: ADELE) – the Istat: ADELE) – the researcher has to go to the lab in person.researcher has to go to the lab in person.

• Remote processing, using internet, without Remote processing, using internet, without direct access to individual data (direct access to individual data (à laà la Luxembourg Income Study: LISSY)Luxembourg Income Study: LISSY)

Page 6: The new  B ank of  I taly  R emote  access to micro  D ata (BIRD)

66

Other remote processing systemsOther remote processing systems

• Luxembourg Income Study (LISSY, 1987)Luxembourg Income Study (LISSY, 1987)• Statistics Canada (2001)Statistics Canada (2001)• Statistic Denmark (2001)Statistic Denmark (2001)• Statistic Netherlands (2002)Statistic Netherlands (2002)• Australian Bureau of Statistics (2003)Australian Bureau of Statistics (2003)• Statistic Sweden (2003)Statistic Sweden (2003)• US Federal Agencies: NCHS (1997), US Federal Agencies: NCHS (1997),

NCES (1998), Census Bureau (2003)NCES (1998), Census Bureau (2003)

Page 7: The new  B ank of  I taly  R emote  access to micro  D ata (BIRD)

77

The solution adopted at the Bank of ItalyThe solution adopted at the Bank of Italy

BIRDBIRD• Modeled on LISSYModeled on LISSY• Low setup costLow setup cost• Easily customisableEasily customisable• Supports multiple packagesSupports multiple packages• Maximum accessibility for usersMaximum accessibility for users• Multi-level control (user/group, dataset, Multi-level control (user/group, dataset,

keyword)keyword)• Automatic and manual checks & reviewAutomatic and manual checks & review

Page 8: The new  B ank of  I taly  R emote  access to micro  D ata (BIRD)

88

How BIRD worksHow BIRD works

USER ELIGIBILITY CRITERIAUSER ELIGIBILITY CRITERIA

• Researcher status (not necessarily academic) Researcher status (not necessarily academic) proved by a presentation letterproved by a presentation letter

• Identification via valid personal idIdentification via valid personal id• Detailed information via form to be filled inDetailed information via form to be filled in

Page 9: The new  B ank of  I taly  R emote  access to micro  D ata (BIRD)

99

How BIRD worksHow BIRD works

USER PROFILE CREATIONUSER PROFILE CREATION

• The researcher indicates an e-mail address The researcher indicates an e-mail address which will be recognised by the system.which will be recognised by the system.

• The researcher indicates her own user and The researcher indicates her own user and passwordpassword

• User-chosen parameters are input in the user User-chosen parameters are input in the user databasedatabase

• Access profile is createdAccess profile is created

Page 10: The new  B ank of  I taly  R emote  access to micro  D ata (BIRD)

1010

How BIRD worksHow BIRD works

SUBMISSION PROCEDURESUBMISSION PROCEDURE

• Communication with the processing environment Communication with the processing environment via e-mailvia e-mail

• Send a message containing user authentication Send a message containing user authentication info + statements to be submittedinfo + statements to be submitted

• Input message is parsed and checks are performedInput message is parsed and checks are performed• If no error/security violation If no error/security violation submit statements submit statements• Output is parsed (automatically / manually)Output is parsed (automatically / manually)• If no security violation If no security violation forward to the user via e- forward to the user via e-

mailmail

Page 11: The new  B ank of  I taly  R emote  access to micro  D ata (BIRD)

1111

Confidentiality safeguardsConfidentiality safeguards

•User levelUser level•Data levelData level•Processing levelProcessing level

Page 12: The new  B ank of  I taly  R emote  access to micro  D ata (BIRD)

1212

Confidentiality safeguardsConfidentiality safeguards

User level:User level: • Users are identified, qualified and registeredUsers are identified, qualified and registered• Registered mailboxes are whitelisted; ordinarily Registered mailboxes are whitelisted; ordinarily

only one mailbox per useronly one mailbox per user• Outputs are monitored and archived Outputs are monitored and archived • Deontological code, privacy law, specific penaltiesDeontological code, privacy law, specific penalties

SanctionsSanctions• Forbidden submissions or outputs are deletedForbidden submissions or outputs are deleted• Grant of access for users trying to perform Grant of access for users trying to perform

forbidden commands may be revokedforbidden commands may be revoked• Any other sanctions or penalties required by the Any other sanctions or penalties required by the

law where applicablelaw where applicable

Page 13: The new  B ank of  I taly  R emote  access to micro  D ata (BIRD)

1313

Data level:Data level:

• Extreme data are censored (Winsorized)Extreme data are censored (Winsorized)• Identifying variables (ids, names, Identifying variables (ids, names,

addresses) are expunged from the addresses) are expunged from the datasets used for remote processingdatasets used for remote processing

• Stratification variables are collapsed Stratification variables are collapsed (geographical areas and not regions; Ateco (geographical areas and not regions; Ateco aggregations and not codes)aggregations and not codes)

Confidentiality safeguardsConfidentiality safeguards

Page 14: The new  B ank of  I taly  R emote  access to micro  D ata (BIRD)

1414

Confidentiality safeguardsConfidentiality safeguards

Processing level:Processing level:

• Formally forbidden to display individual dataFormally forbidden to display individual data• Keyword parserKeyword parser implementedimplemented with ceiling, with ceiling,

blacklist e graylistblacklist e graylist• Particularly long and/or complex Particularly long and/or complex

programmes are always reviewed manuallyprogrammes are always reviewed manually• In the learning stage, all submissions are In the learning stage, all submissions are

reviewed manuallyreviewed manually

Page 15: The new  B ank of  I taly  R emote  access to micro  D ata (BIRD)

1515

How the parser worksHow the parser works

check typecheck type check performedcheck performed action if failed on action if failed on INPUTINPUT action if failed on OUTPUTaction if failed on OUTPUT

authentication

checking user authentication data

job cancelled n/a

blacklistparsing text for

specific words and sequences

job cancelled n/a

length checking the length of text n/a

soft ceiling: manual review

hard ceiling: job cancelled

graylist (*)parsing text for

specific words and sequences

manual review manual review

(*)(*) This feature will be available in the next release of the system.

Page 16: The new  B ank of  I taly  R emote  access to micro  D ata (BIRD)

1616

Datasets availableDatasets available

STANDARD DATASET: quantitative data for the biggest STANDARD DATASET: quantitative data for the biggest firms (in terms of workforce) are censored firms (in terms of workforce) are censored (Winsorised)(Winsorised)

COMPLETE DATASET: no data censoringCOMPLETE DATASET: no data censoring

Id variables are expunged from both datasets, obviouslyId variables are expunged from both datasets, obviously

Page 17: The new  B ank of  I taly  R emote  access to micro  D ata (BIRD)

1717

Aggravated procedure for accessing the complete Aggravated procedure for accessing the complete dataset:dataset:

• Access must be explicitly requested – a special profileAccess must be explicitly requested – a special profile is createdis created• Review is exclusively manualReview is exclusively manual• Wait times are longer than average as time allocated Wait times are longer than average as time allocated to manual review on complete dataset is reducedto manual review on complete dataset is reduced

Datasets availableDatasets available

Page 18: The new  B ank of  I taly  R emote  access to micro  D ata (BIRD)

1818

Documentation on the websiteDocumentation on the website

• Application formApplication form• Instruction manualInstruction manual• Dataset descriptionDataset description• Examples of submissions in the Examples of submissions in the

supported packages (SAS, Stata)supported packages (SAS, Stata)• Methodological notes on the Methodological notes on the

surveysurvey

Page 19: The new  B ank of  I taly  R emote  access to micro  D ata (BIRD)

1919

SupportSupport

1.1. Documentation available on the Bank of Italy website Documentation available on the Bank of Italy website (manuals, variables description, questionnaires)(manuals, variables description, questionnaires) http://www.bancaditalia.it/statistiche/indcamp/indimpser/birdhttp://www.bancaditalia.it/statistiche/indcamp/indimpser/bird

2.2. Mailbox for queries and assistance:Mailbox for queries and assistance:

[email protected] [email protected]

Page 20: The new  B ank of  I taly  R emote  access to micro  D ata (BIRD)

2020

An exampleAn example

Program Program submitted by the submitted by the user in Stata. user in Stata. Authentication is Authentication is in the first four in the first four lines.lines.

Page 21: The new  B ank of  I taly  R emote  access to micro  D ata (BIRD)

2121

An exampleAn example

Output Output forwarded forwarded after reviewafter review

Page 22: The new  B ank of  I taly  R emote  access to micro  D ata (BIRD)

2222

Usage of the system in the first weeksUsage of the system in the first weeks

•System started officially on Mar 13, System started officially on Mar 13, 20082008

•Beta users from Feb 1, 2008Beta users from Feb 1, 2008•8 registered users8 registered users•172 submissions in 21 weeks172 submissions in 21 weeks

Page 23: The new  B ank of  I taly  R emote  access to micro  D ata (BIRD)

2323

Usage of the system in the first weeksUsage of the system in the first weeks

0

5

10

15

20

25

30

35

w 1 w 3 w 5 w 7 w 9 w 11 w 13 w 15 w 17 w 19 w 21

BIRD: # of weekly submissions, from Feb 1, 2008

Page 24: The new  B ank of  I taly  R emote  access to micro  D ata (BIRD)

2424

Future developmentsFuture developments

• Web submission available alongside e-mail submissionWeb submission available alongside e-mail submission• Other datasets will be made available in the future Other datasets will be made available in the future

(e.g. data from the (e.g. data from the Business Outlook Survey)Business Outlook Survey)• Open source packages processing (e.g. Open source packages processing (e.g. RR))• Merging with external datasets provided by the user, Merging with external datasets provided by the user,

for special projects, on a discretionary basis, under an for special projects, on a discretionary basis, under an aggravated procedure and higher security levels.aggravated procedure and higher security levels.

• Creation of closed groups with special authorisation Creation of closed groups with special authorisation levels for specific projectslevels for specific projects

Page 25: The new  B ank of  I taly  R emote  access to micro  D ata (BIRD)

2525

Thank you for your attention Thank you for your attention