pidb meeting 6_25_15€¦ · web view25.06.2015 · pidb meeting 6_25_15 (crowd chatter, inaudible)...

PIDB meeting 6_25_15

(crowd chatter, inaudible)

LEARY: Good morning. Good morning, no? All right, somebody

who -- it says on. Hmm? Who knows how the mic -- oh, OK.

Ah, there we go. Good morning. My name is Bill Leary, and

I’m happy to be able to welcome you to the latest in the

occasional, if irregular, public meetings of the Public

Interest Declassification Board. We’re always amazed how

many of you turn out when we schedule one of those meetings,

and we’re delighted. I know that most of you know what the

PIDB is, but just to refresh your memory and for the benefit

of any newcomers who may be here, the Public Interest

Declassification Board was established by Congress. Its

members are appointed by the president and the leadership of

Congress, and we have two broad, very complementary

missions. Our first mission is to promote the fullest,

promptest access to the classified record of the United

States government. Which is, of course, a large part of the

history of the national security and foreign policy of the

United States. Our second, very complementary mission, is

to advise the president and the rest of the executive branch

on how to improve the process of classification and

declassification, in order to better accomplish that first

overarching objective. Our meeting this morning is going to

focus on what we think are some really rather exciting

developments related to that second objective, how to

improve the process. And also, we’re going to talk a bit

about our plans for our next study, our next project, to try

to come up with some recommendations for encouraging greater

use of technology to aid in the process of declassification.

My first task, my first very happy task this morning, is to

welcome the two newest members of the Public Interest

Declassification Board. Laura DeBonis, Laura, why don’t you

stand, and Sol Watson. Laura has over 20 years’ experience

in the information technology and media fields. She

currently serves as a founding board member for the Digital

Public Library of America, an organization devoted to

creating an open network of online resources, from

libraries, archives, and museums, and making them freely

available to all. Sounds very pertinent to our mission.

Her professional experience includes a variety of leadership

roles at Google, including her last position there as

director of library partnerships for book search. Welcome,

Laura.

Sol Watson has a long and distinguished career at The New

York Times Company, beginning in 1974, and he retired, I

think, as a senior vice president and chief legal officer of

The New York Times Company. Sol has also been a special

master in the appellate division of the New York State

Supreme Court, and is a member of the American Bar

Association, the National Bar Association, and the

Association of the Bar of the City of New York. From 1966

to 1968, he served in the US Army as a lieutenant in the

military police corps. Welcome, Sol. Now I want to yield

the podium to Ambassador Nancy Soderberg, who will walk us

through the rest of this morning’s program.

SODERBERG: Well, good morning everyone, and thank you for

coming, and thank you Bill Leary for opening us up for

what’s going to be I think an exciting day, and particularly

glad to have Laura and Sol with us as full members of this

great team. I think we’re going to have a really

informative session, an interesting discussion, comments

from the public, our very distinguished guests that we’re

having. And the purpose of this meeting is to continue our

advocacy for the transformation and modernization of the

classification and declassification system. Simply put, it

is not workable under the current system, and needs

technology in order to meet the public’s right to know what

its government does. Our last supplemental report, which

you can get on our website, “Setting Priorities: An

Essential Step in Transforming Declassification,” revisits

one of our recommendations to the president for

transformation. And that’s the focus of today’s discussion,

which is to encourage the development and the use of

existing and new technologies to assist those declassifying

and classifying information at the agencies for the National

Declassification Center. And this morning, we’re excited to

hear from our distinguished speakers, and what strides that

they have made and are seeking to make in support of our

recommendation. And once again, we’re delighted to have our

wonderful friend and our distinguished archivist of the

United States, David Ferriero, join us as our host. There

is no better supporter of our work than David. And he’s

been a longtime advocate for advancing access initiatives

within government. As archivist of the United States, David

is a leader in fostering policies to support a more open and

transparent government. His record demonstrates that. He

encourages the movement of government and the National

Archives from the analog age to the digital information age.

And he recognizes the need to design new processes and

policies to ensure citizen access to records of our

government. And I’m especially impressed with his many

successes in building partnerships that will greatly improve

public access to government information. So let me ask our

archivist, David Ferriero, to come up and say a few words.

Thank you very much.

FERRIERO: Thank you, Nancy, and good morning all. Welcome to my

house. And I’m extraordinarily proud to be the archivist of

the United States. Leading the National Archives as we

strive to promote open government and transparency for the

benefit of our democracy. As caretakers of the Declaration

of Independence, the Constitution, and the Bill of Rights,

we hold the words “We the people” in high esteem, and take

seriously our responsibility to preserve and make available

the billions of government records we hold in trust for the

American people. Innovate to make access happen is our

flagship open government initiative. We continue to take

actions to improve transparency, participation, and

collaboration in every aspect of the work we do here at the

National Archives, while embracing innovation and developing

best practices to carry out our mission for the benefit of

the American people. The Public Information

Declassification Board plays an important role in promoting

open government by continuing to advocate for policy

improvements that support greater public access to

government information of historical significance. The

members have repeatedly recognized publicly the growing

challenge facing the government agencies in today’s digital

information age, and the board has been a strong proponent

of modernizing antiquated policies and practices, often

inhibiting access to our records. The board’s December 2012

report, “Transforming the Security Classification System,”

described these challenges in detail and offered thoughtful

recommendations that if implemented, will modernize and

improve information management overall, including the

expedited declassification of national security information.

The board’s 2014 supplemental report, “Setting Priorities,”

expanded on one element critical to transformation,

prioritizing records of historical significance for

declassification. And I’m pleased to say that the National

Archives Declassification Center has already begun the

process of reevaluating how it prioritizes reviewing records

for declassification.

After successfully retiring a backlog of over 351 million

pages of records, the NDC now has an opportunity to rethink

how it may improve its operations and prioritize records for

declassification review. So that the most significant to

the public and -- are processed first. At the April 15th

NDC public forum, Director Sheryl Shenberger outlined next

steps and prioritization at the NDC, and her comments from

public interest groups, scholars, historians, and advocates,

including board member Bill Leary. We heard proposals for

process improvement, and suggestions of records for

prioritized declassification review. I know the NDC will

consider and apply many of the recommendations made at the

public forum, and we intend to find innovative means to

improve upon our success thus far. Improving access to

historically significant records, however, requires more

than just finding a means to prioritize records for review,

as the board recognized in both of its reports. IN order to

innovate and make access happen, we must seek out

opportunities to integrate new and existing technology into

our information management practices. The board shares this

belief. It’s been a longstanding advocate for the increased

use of pilot projects in order to build partnerships across

agencies and reach our common goal of improving how the

government manages its information, both in declassification

and records management in general. These declassification

and records management policies are practices, and practices

are inherently linked, and the board’s acknowledgement of

this important principle helped shape many of the

commitments found in the president’s second open government

national action plan. Today, I’m pleased to welcome Deputy

Chief Technology Officer of the United States, Mr. Alex

Macgillivray, to this public meeting. This is somewhat of a

reunion, Laura DeBonis, A-Mac, as he’s known in the

industry, and I, were joined at the hip during the Google

Book project, when I was at the New York Public Library. So

it’s great to have both of you in the room. A-Mac will

discuss the technology policy initiatives underway at the

Office of Science and Technology Policy at the White House.

His efforts to leverage technological talent and expertise

of individuals and teams across the government are critical

to modernizing records management, data management, and

declassification processes. I’m sure that you have -- he

will have important commentary on the newly established

United States Digital Service and its mission to transform

the way the government works for the American people.

I welcome research scientist Dr. Cheryl Martin from the

Center for Content Understanding, who has completed pilot

projects at the Applied Research Laboratory at the

University of Texas at Austin, on behalf of the National

Archives, and the CIA. We at the National Archives and the

CIA have partnered with Dr. Martin and her team in an effort

to find technological solutions to assist declassification

-- declassifiers in their decision making, to improve the

outcomes of reviews. Dr. Martin will outline these results

of those pilot projects, which to date, are the only pilot

projects at this level of sophistication in existence that

focus first and foremost on improving declassification and

access to government records. I look forward to hearing

more about the impressive achievements of these pilot

projects during Dr. Martin’s presentation.

We will learn today about the latest cutting edge

technological capabilities and modernized government

policies that support innovation. These advancements are

critical to our work at the National Archives, but the

uniqueness of our mission does not afford us the luxury of

only looking forward. As we prepare and work towards

solutions, and managing digital records, we must also find

innovative and effective means to manage the billions of

pages of paper records still being created across all areas

of government. The sheer volume of information in need of

management, whether found in paper records or in digital

records, digitized records, or special media, will continue

to shape how we do our business. To this end, I’m

encouraged by the progress we at the National Archives, and

at agencies, have made under the direction of our chief

records officer, Paul Wester. In response to the

president’s managing government records directive. As we

work in collaboration to modernize our government’s

information management practices overall, we must remember

to identify and understand the many facets to this

challenge, and view potential solutions from a high level

vantage point. Working to make changes that are automated

and scalable to the benefit of all information users. I

want to thank PIDB, the agencies, the public interest

community, and everyone joining us here today for

contributing to this morning’s discussion. Critical to the

success of our transformation efforts is the continued open

dialogue we share with our stakeholders inside and outside

of government. This engagement is essential to help us

improve our services, and help us serve our democracy by

providing access to the highest value government records.

Thank you for your efforts and support, on our mission and

our work.

(applause)

SODERBERG: Thank you, David, and really, thank you so much

for your continued support of our work, as well as on behalf

of the archives. As David mentioned, we have a fantastic

lineup for you this morning. Both with Cheryl and Alex.

And we’re going to next hear from Alex Macgillivray, who’s

the Deputy Chief Technology Officer of the United States.

And in his first full day of office, which you can take full

credit for, the president created the US Chief Technology

Officer position within the White House Office of Science

and Technology, to lead the administration-wide effort to

unleash the power of technology, data, and innovation to

help meet our nation’s goals and the needs of our citizens.

And Deputy CTO, A-Mac, I guess he’s called, focuses on a

portfolio of key priority areas for the administration,

including the intersection of Big Data, technology, and

privacy. He’s an internationally recognized expert in

technology law and policy, and prior to coming to the White

House, he served as general counsel and head of public

policy at none other than Twitter, from 2009 to 2013. And

he’s an actively practicing developer and coder,

contributing to his ability to formulate creative and

sensible technology policy, and understanding its

ramifications better than certainly I can, I’m sure. But

we’re excited to hear about your assessment of the new

information and technology needs of the government, and how

we can leverage technology talent to modernize records

management, data management, and declassification. So

welcome, A-Mac.

(applause)

MACGILLIVRAY: Thank you so much, ambassador, and thank you to

the PIDB. This is -- it’s wonderful for me to be here,

particularly wonderful to be sharing a stage with the

archivist who I admire so much both for his work here, and

for his work as a librarian at NYPL, MIT, and I think Duke

before that? But having negotiated with him while I was at

Google, I can also say that he can be quite a pit bull for

his particular cause. And he’s often right, which is

extremely annoying when you’re on the other side. But, I

would say that one of the reasons why this is a thrill for

me is the thing that’s motivated me most throughout my

career is access to information. And so, this particular,

both the archive and this board, really embodies that, and

embodies it in a way that’s not -- it’s not trivial. There

are plenty of places where you can talk about access to

information, and there’s no downside, there’s no other

interest at issue. And this is a place where you’re really

dealing with where the rubber meets the road, and trying to

understand that tension, and get through it to actually get

to the access to information, which is extraordinarily

valuable. I was asked to talk a little bit about tech use

in government. And the administration’s commitment to open

government, so I’ll do that, and probably touch on a few

other things as well.

So the president, as the ambassador said, right from day

one, was focused on how do you bring more technology and

expertise into government? And that’s why the CTO’s office

was created, it was sort of a vestige from a very successful

campaign that changed the way campaigning was done, in terms

of bringing more technology understanding into how to run a

campaign, and get it through. But that it was a thing that

was a focus, but maybe not a principle focus over time,

until the healthcare.gov problems happened. And I think the

thing that healthcare.gov brought home for our

administration, I mean more than anything else, was this

idea that you couldn’t really do policy anymore in a vacuum

without understanding implementation, and particularly

without an understanding that implementation in technology.

Obviously, the Affordable Care Act was law at that time, but

by itself it wasn’t going to be able to create its goal of

enrolling more Americans in healthcare. And so, the idea

that technology was going to be responsible for that, and so

we had to get the technology piece right, and that would

mean bringing in technologists, having them work on the

problem, get it over the finish line, but also bringing them

in earlier and earlier into these policy processes to have a

better marriage between that policy and technology goal, was

really important. And so, to that end, the thing that is,

that sort of the techies in government right now is focused

on, is really three principle areas.

So the first is that policy implementation, making

government services world-class. One way to think about

this is we have some of the best, most innovative

technologists within the United States, we have people

who’ve created Amazon, and Google, and Facebook, and

Twitter, and all these other great services that we rely on

every day, and we want to make sure that the government

websites are just as good, that the government services get

provided in just as much of an agile and technology-focused

and user-focused way. And so, there’s a whole bunch of

different people who are working on that, and I’m going to

go into the different people working on the different parts

in a moment.

Number two is really sort of the flip of that, how do you

bring more technology understanding into policy formation?

And so, that’s really something that it’s one of the focuses

of the discussion here today. If you have the types of

tools that Dr. Martin will be talking about at your

disposal, that might change the types of policies that you

can put in place, in terms of classification, and in terms

of classification, and in terms of getting material out into

the public, and providing that access to information. And

that understanding of that interplay between the technology

and the policy is really important, and something that we’re

trying to push forward. And then, the third thing that is

in the broader tech use and government space is thinking

about the engagement between the American people and their

government. And trying to understand ways in which we can

use technology to change the way that engagement happens for

the better. So to make it so that we can do more to a

conversation, so that we can ask more questions of the

American people, and have them give us answers that will

help us govern better. So that we can have people become

more engaged with their government, and make change within

their government in a more effective way. And so, there are

a bunch of people now working on that problem, and trying to

make it better. But it’s also something that the president

has been focused on since the beginning of his term, with

the launch of We the People, a campaign -- a way of allowing

ordinary citizens to bring questions to government, and get

answers.

So now, in terms of all of those different types of things

that we’re trying to do, I wanted to just bring you on a bit

of a tour through the people that are doing it, and the

organizations that they are working with. Because

sometimes, it’s a little bit hard to unpack that, and it is

useful in understanding how we think about it. So, it’s

everything from the US CIO, Tony Scott, who came from

VMWare, who is responsible for government technology

generally, and is working across different government

agencies with excellent CIOs and staff within agencies, to

work on cross government problems, and to bring the best

technology into government. It’s people like Mikey

Dickerson at USDS. Mikey came from Google, where he was an

SRE, and the SRE is software reliability engineers. Those

are the folks that make sure that a site like Google stays

up near 100% of the time. And so, he’s a great person to

bring into government, make sure our government services

have that same type of reliability. But that USDS focus, so

US Digital Services, and that word, service, is important on

a number of levels. First of all, recognizing that we’re

not -- no longer in government, releasing products. Or

we’re just releasing the product, and then it exists, great

people can use it, we can walk away and do something else.

But we’re really talking about services here. Things that

will last over time. And the need to be updated and

iterated on and maintained as services. It’s also thinking

about service as that word, you know, the thing that brought

me into government. The ability to have purpose make

impact. One of the things that Mikey is doing very

effectively is winning recruiting battles against much

better funded offers from Silicon Valley companies, because

he’s able to appeal to an engineer’s sense of purpose. And

there really is no better way to have an impact, have a

really deep impact, on individual Americans, than working

within the federal government. So Mikey is working at, and

really pushing that out within the US Digital Services. The

other thing that the US Digital Services has done,

especially this year, is move out into agencies. So,

there’s now a VA Digital Service, and we will have other

digital services within agencies over time. Those are

groups that are working within agencies, bringing the

excellent staff that we have already at agencies, to bring

some of this new style of doing work. So for example,

instead of putting a requirement out and then working over a

course of five years to be able to launch something that

becomes live to the public at the five year anniversary

mark. Trying to be quick, agile, and launch and iterate.

So being able to launch something, be able to develop it in

the open, and then get it out there over time. So that

there’s actually a better understanding of whether the

project’s going to be successful. And so that we can course

correct when we learn stuff in our implementation.

So that’s USDS. A companion piece to that is 18F. 18F is

within the GSA, General Services Administration, and 18F is

just a street address, 18th and F, it’s not some sort of top

secret thing. But, 18F has a bunch of coders, I think

they’re about 150, 170 strong, who are working on doing the

coding for services, for the federal government. And it’s

one of the great things about 18F is that as they encounter

problems, there’s often this issue in government where if

nobody’s done something, you don’t really want to be the

first to do the thing, because there’s a bunch of different

costs that might come with that. And you won’t get to

internalize all of the benefits. There’s this free riding

problem, lots of different people will be able to

internalize the benefits, but you get to bear all the cost.

And 18F has, as part of their mission, actually doing some

of those first projects, so that they can show by example

here’s a way to use GitHub, and from the very beginning,

develop a foyer project in an open way that people can

actually see what you’re coding in real time. And do some

of those experiments, and get them out there, but also to

produce running services and to improve the services that

government is offering. So that’s 18F. Another project

there is the Presidential Innovation Fellows. The way we

think of these is sort of as innovators, entrepreneurs, and

residents within agencies. These PIF classes, there have

now been three of them, I think we’re on our fourth. And

they basically come in as amazing people from all over

industry, academia, and nonprofit space, former government

people too, they come in, and then go back out to agencies

and try to stir things up a bit. And bring some of the --

those best in class processes and technologies, and back it

up to agencies. On an even more like operational level, we

brought in David Recordon. David was a Facebook engineer,

to be the Director of White House Information Technology.

The White House has the same problem as many agencies, in

terms of how do we modernize the technology that we use?

How do we make ourselves as effective as possible? So

making sure that we have people who are looking at that, and

who are best in breed.

And then finally, Jason Goldman, who was one of the founders

of Twitter, was brought in to lead the Office of Digital

Strategy, and he’s really leading that focus on engagement

with the American people. And doing that through the Office

of Digital Strategy, already done a bunch of things,

including launching the @POTUS Twitter account, which you

could see even over last weekend, there’s a level of

engagement and just personal response that is different from

what we were able to do before. So that’s a really hopeful

thing. We also have Todd Parker, former second US CTO, in

Silicon Valley, leading a recruiting effort. So, as I say,

we believe strongly that people are a major part of the

solution to these issues. So, bringing more and more

talented people within government is really what Todd’s all

about.

So with that, I’m going to jump to talking a little bit

about our open government work, and your open government

work. And I want to just point out, Cori Zarek, who is in

the audience, and should stand up so that she can be more

embarrassed. But Cori has been at the archives and is

(inaudible) to the team CTO, and is really leading our open

government efforts. And has an encyclopedic knowledge of

this stuff, and has been really pushing, and both Cori and

the National Archives have been real leaders when it comes

to making more information accessible to the public, and

getting it out there. So I just wanted to acknowledge and

thank that. So as you know, the open government initiative

was launched, for the president, he had a very busy first

day in office, was another one of the things that he

launched in his first day of office. We are working through

the open government directive, and getting the agencies on a

path to increase the amount of information, and the amount

of understanding that the public has for what government is

actually working on. It’s also something that has moved a

ton of data and information out into the public space where

other people, not the government, can produce everything

from the most trivial app to an important open government

monitor of something that we’re doing. There’s one that’s

out there that is top of my mind, which is just a thing that

shows when the different We the People petitions have been

answered, and holds us accountable to not answering the ones

that have been out there for a long time. So, all -- it’s

everything from the stuff that we would never have imagined,

that makes a huge difference in peoples’ lives, and at the

most grand scale, this is the -- NOAA releases a ton of data

that is used in all the different weather apps that are out

there. They’re very important, but making sure that we do a

lot more of that. And then another piece of this work is

working with the open government partnership, which is a 65

country initiative that brings government and civil society

together across national boundaries, and making sure that

the United States continues to be a leader in open

government over time.

And then finally, the National Action Plans. We’re in the

process of formulating our third National Action Plan, the

previous two National Action Plans have been very

successful, including the formation of the declassification

board as one of the recommendations in the second National

Action Plan, I think. Am I getting that right, [Cori?]?

I’m getting it wrong.

F: Classification (inaudible).

MACGILLIVRAY: Classification and (inaudible) committee. Sorry

about that. See, this is the great thing about having Cori

actually in the audience. But there is always more to be

done within this space, and so one of the things that Cori

has been working really hard on is the National Action Plan

3.0, and we would be interested in hearing any suggestions

that people have for inclusion in that National Action Plan.

And making sure that we’re pushing as much as we can towards

continuing to make government more open and more responsive

to people. So with that, I will sit down, because I’m

really excited to hear about the technology that’s coming

up. And just thank you all for letting me speak.

(applause)

SODERBERG: Well thank you very much for that great summary.

It’s really extraordinary how government is changing. We’re

still behind the private sector, but catching up rapidly.

And I think we’re going to all benefit from the initiatives

that you’re leading at the White House, and we look forward

to continuing the conversation. Our next speaker is Dr.

Cheryl Martin. Research scientist and director of the

Center for Content Learning at the Applied Research

Laboratories, located at the University of Texas. Dr.

Martin’s areas of expertise and list of accomplishments are

vast, and through her work at the Applied Research Lab,

she’s applied data mining detection, inference technologies,

to information assurance problems, including document

declassification and the board of the PIDB had a chance to

travel down, I guess it was last fall, to visit UT and saw

first-hand this revolutionary technology. And I think I

speak for all of the board members when I say that we were

deeply impressed with it. And this morning, Dr. Martin will

share with us how she’s using this technology as semantic

knowledge models, natural language processing, expert

systems, and machine learning to categorize and label text.

And her recent work has been successful in automatically

determining whether documents contain sensitive information

that must be protected. And the pilot she’s conducted in

partnership with the National Archives and the Central

Intelligence Agency has significant impact for

declassification and other information management

activities. And as our reports have documented, it’s only

through technologies such as this that we are going to be

able to manage the vast amount of information that is now in

the government, it’s simply not sustainable in the two eyes

looking at every page system. So, in order to have the

public have access to what its government does, it has to be

automized, and Dr. Martin has figured out a very effective

way of doing it. As far as we can tell, this technology is

the only one that has the level of sophistication operating

for the sole purpose of modernizing declassification and

classification. And we’re concerned that right now there’s

no plan to take it forward, and so we really hope that we

can find a way that you can continue, and even expand on

this important work. So thank you for coming, and let me

invite you up to talk about your exciting project.

(applause)

MARTIN: Thank you. I’d like to thank the board for inviting me

to speak today. It’s a real honor to be here. In this

presentation, I will first highlight some efforts under the

president’s National Action Plan, and I will introduce the

role that the Center for Content Understanding has in this

work. Then, I will define the field of content

understanding, and I will describe our approach for

sensitive content identification, and marking, to provide

decision support for classification and declassification.

The next thing I’ll do, finally, is walk through some of the

pilot projects we’ve been working on in this area, and I

will discuss specific progress that we’ve made with the

Reagan email collection. Do we need to work on some

logistics before we continue? Can people see the --

F: (inaudible).

MARTIN: OK. (inaudible). Except, eh... This is all planned,

this part. OK. Everyone has that handout, OK. Excellent.

So, one of the commitments that is included in the Obama

administration’s second open government National Action Plan

is the quoted item here, which was to pilot technology to

analyze (inaudible) presidential records. It specifically

calls out application to email records from the Reagan

administration, and it identifies the Central Intelligence

Agency and NARA as the responsible agencies. These agencies

brought in our research organization, the Center for Content

Understanding, to help with this work. The Center for

Content Understanding is part of the Applied Research

Laboratories at the University of Texas at Austin. ARL is

established as a university-affiliated research center, or a

UARC, and we’re formerly associated with the Navy, but we

work with organizations throughout the government. All

UARCs have defined, as part of their charter, a set of core

competencies which are identified as the central

capabilities for the US government. And in 2012, ARL, one

of ARL’s core competencies, was identified as content

understanding, based on a growing body of work in that area

that we had accomplished. And the Center for Content

Understanding was formally established at that time. So,

what is content understanding? The dictionary definitions

would indicate that it’s comprehension of something

contained. In the field of content understanding, the

containers are artifacts that people create as a part of

their work, or their daily lives. And the content in these

is the information that’s encoded. We determine whether

this information is understood by assessing actions that are

taken on the information. So, when a person observes an

artifact -- let me get the -- when a person observes an

artifact, we hypothesize that they combine things that they

already know about the world with the information in the

artifact, and create some meaning. But, even for a person,

we can’t directly observe that meaning that’s inside their

head, and tell if it’s right or wrong. So we rely on tests

of what they do with that information to assess

understanding. So, actions can be taken that demonstrate

understanding, and if people perform well on these, such as

making a correct decision on a multiple choice test, then we

say that that’s sufficient evidence of understanding.

In the same way, we assess whether a computer has content

understanding by looking at the actions it takes. So if the

computer can observe an artifact, and demonstrate

appropriate inferences, then we consider that as content

understanding. The main point of all this is that in

content understanding, we’re primarily concerned with having

computers do helpful things with artifacts like documents.

Which brings us to the application of decision support as it

applies to classification and declassification. So, we’re

faced with an exponentially growing volume of records, and

each of these must be initially classified, managed, and

ultimately reviewed for release. Manual efforts to perform

these functions are becoming overwhelmed, and technology can

help people perform these functions. Specifically,

automation can help humans work more efficiently by drawing

their attention to critical questions, and highlighting

items that it would take people a long time to scan for in

documents. It can also make humans more effective by

bringing to bear external information such as list of names,

or projects that not every human has memorized. And it can

do this in a wide variety of topics across a number of

organizational equities at the same time. So that review-

critical information can be recognized and identified as

quickly as possible. So computers excel at time-consuming,

onerous tasks that people don’t like, and don’t necessarily

excel at. And they produce very consistent results. And

this allows humans to do the things they enjoy more, and the

things they’re better at, such as making complex review

judgments. The decision support technology that we’re

developing right now is targeted to identify all the

information in a document that’s relevant to a

classification or a review decision, and highlight this

information, and organize it for consideration by the human

reviewers. So this is an initial model that’s only targeted

toward decision support. Under this model, we would still

need the same human review staff, but we would need far less

humans per document, which would address the volume problem.

The approach that we use for decision support is based on

marking up a document to indicate where the sensitive

information resides. The name of our approach is SCIM, for

sensitive content identification and marking. And it

essentially skims a document to identify all the rules or

categories that apply to the document. It not only

identifies the conclusions corresponding to these rules and

categories, but it also identifies the text from the

document that support those rules. So, for example, in this

document, rule one is identified as applied, it not only

says the rule applies, but it identifies the highlighted

yellow text from the document as support for this rule. So

this allows SCIM to provide a rationale for why it says that

rule applies. SCIM uses a combination of technologies to

achieve this goal. Natural language processing, or NLP, is

used to extract information from the document and put it

into a machine processable data format. In the process of

doing that, it extracts entities and events and

relationships from the document. Expert systems technology

is then used to apply if/then rules to the information

extracted, and determine whether it is sensitive or not.

Machine learning can also be applied, if you have a set of

documents that are known to contain sensitive information

that you are interested in finding, then it can build a

model of those, and then identifying new documents that are

similar. And what ties all this together is the common

semantic knowledge representation that allows us to encode

background information and make inferences. There are a

number of organizations that do good work in this area. And

what is unique about this SCIM approach is that we combine

all these technologies together and we specifically

configure it to identify information of interest.

So here are some examples, which may be a little difficult

to read from the back of the room, but I’ll point out the

critical areas. Here are some examples of sensitive content

that SCIM can identify. In this example, the information

that’s deemed sensitive is for demonstration purposes,

identified to be any discussions of a seismic event in Asia.

So we clearly want to see and review the document on the

left, which talks about an earthquake in China, but we’re

not so much interested in recipes for earthquake cakes on

the right. So, the concept, being able to distinguish

between the concept of earthquake as a seismic event, and

the word earthquake, is the key. And most tools that

reviewers have to use are focused on text string searches

that don’t distinguish between these two instances of

earthquake. So, using NLP technology, we are able -- we’d

be able to pick up on the word earthquake only when it means

the seismic event. This approach of identifying the concept

also allows us to pick up the bottom example where the word

quake is used to reference a seismic event, but this would

be missed in a text string search for earthquake. So, we’re

able to find the concepts, we’re also able to identify

specific cases where the concepts are sensitive. So in this

example, if the earthquake occurred in Europe, that would

not be deemed sensitive under this configuration, even

though the correct earthquake concept is discussed.

So over the years, we’ve applied the SCIM approach in three

major types of configurations, which I’ll talk about in

detail for the remainder of the presentation. The first

application was a proof of concept to help people determine

what the portion marking for classification would be to

apply to a paragraph in a new document. In this case, the

sensitive information that we identified was something that

would relate to information associated with a rule in a

classification guide. The second application was to support

quality assurance review for declassification, and in this

case, the sensitive information was things that reviewers

had identified as things they wanted to take another look at

in this QA process. And the third application is underway

now, and this is targeted toward identifying equity

information across multiple agencies in the government,

where the sensitive information in this case is identified

by each government agency as the equity or the information

that they deem maybe in need of protection. Each of these

three pilot efforts use the exact same reasoning (inaudible)

back end. Really, the only difference among the

applications is how it’s presented to the user, and

configuration for what is considered sensitive. This

diagram visualizes that similarity. SCIM is really designed

as a service. So it takes in the information from a

document, and it provides the marked up information back.

So if you can get text to it, then it can provide this

information that then can be used in the user’s normal

workflow on a user interface or for a sorting algorithm to

help the user do their job. I’ll discuss specific examples

of this types of user interfaces and configuration as I walk

through the pilot projects. The first pilot project was

designed to support portion marking as a decision support.

In this case, the user interface was a document authoring

tool, like a word processor, and each paragraph was

processed by SCIM, and SCIM would suggest all the rules that

applied from an encoded set of derivative classification

guides. As typical in the SCIM approach, not only would the

suggested classification be identified, but it would also

identify which rule from the classification applied, as well

as why -- the rationale for why it applied from the text.

These were presented to the user, and they could accept or

override this suggestion, and once the selection was made,

the user interface would apply the selective portion mark.

In this particular application, we didn’t just present

information to the user, and allow them to select, there was

also a direct feedback to the SCIM service that will allow

users to define or clarify terms on the fly and make

suggestions for improving SCIM performance in the future.

So we learned a number of things from this initial pilot.

First of all, we did achieve extremely high accuracy on the

test cases that we used. Since this was such a small set of

test cases, we couldn’t claim this performance in general,

but it was a highly successful proof of the concept. We

also ran into some challenges. First of all, identifying

what the right answer was that the computer should provide

was actually fairly difficult. We had the test documents,

and they had the portion marks in them, which described what

the classification level was. But we needed to know more

than the classification level. We needed to know why, you

know, which rule from which guide makes this classified, and

where’s the text that says that rule applies? So, we went

to the subject matter experts and said please tell us these

things. And this would have been fine, except subject

matter experts know how to classify, but like most people

who know how to do things, they just know. And when you ask

them to explain how exactly they know, you know, people find

that difficult to do. And there’s also some, you know,

debate amongst the subject matter experts about what

specifically was the rationale for making these

classification decisions? So, this leads to the next lesson

that we learned, which is that since classification guidance

is written to be interpreted by humans, it often lacks the

specificity and the precision that a computer needs to make

a determination. Finally, the thing that really shifted --

or kind of brought this to a close is that we were

ultimately not able to justify access to additional test

data. The test data that we had used was from publications

that were classified, like journal articles or newsletters.

And the need to know issues had kind of already been

resolved by that publication. But ultimately, we weren’t

able to justify a broader access to classified documents,

just to do this research. So then we turned to

declassification, because in that application, the primary

mission is the review, and the access, the need for the

access to documents was clear.

So, in the second application, we provided a decision

support, we provided decision support for a quality

assurance process for declassification. Once manual review

is complete in this process, the documents are sent to the

SCIM service, which marks up the sensitive information that

warrants another look in the quality assurance process. In

this particular quality assurance process, the document

selected for review in the quality assurance phase are not a

random selection. They were all the documents that contain

this particular sensitive information, that they wanted to

double check. Before the SCIM application was deployed, the

way they were selecting these documents was to use a list of

dirty words and they would select a page if it contained any

of those dirty words. We were able to take the SCIM output

and feed it into a user interface that the reviewers had

previously already been using for review, that was the dirty

word user interface. So, if a document -- if earthquake was

one of the words on the old dirty word list, and we found

earthquake in one of the highlighted areas of support in the

document, then we present that page to the user. But if the

document talked about earthquake cakes, and earthquake

wasn’t in any of the highlighted sensitive content, then we

wouldn’t present that to the user. We were also able to

update the user interface to also highlight the important

context information that we found when we were -- when these

rules fired. And that sped up the decision process for the

reviewers. They qualitatively felt like this feature made

them much more efficient in the review for the documents

that were presented to them.

In terms of quantifying the efficiency accruements that we

achieved in terms of page selection, we ran a test

comparison between how the dirty word selection did versus

the SCIM selection on a set of about 160,000 test pages. So

we define the ideal performance, or (inaudible) here, where

out of all the 160,000 pages, only about 8,000 of them

contained this information that they wanted to subject to

the quality assurance process. When we ran the dirty word

list against these pages, it selected a huge number of pages

for quality assurance review. That’s almost two thirds of

the pages that were in the collection overall. But the good

news is that the dirty word selection did select the ones

that they wanted to see, as shown by that green area that’s

still there. The bad news is that it provided a ton of

extra work for the reviewers looking at earthquake cakes

when all they wanted to see were seismic events called

earthquakes. So, also the dirty word list missed a few,

very small, less than 200 pages. And that’s instances where

alternate terminology like quake was being used. So, our

goal in this applications was to decrease this white area

significantly, give the reviewers less work, not miss any of

the -- keep the green area at least the same, and if

possible, reduce that red area as well. And that’s

essentially what we were able to do. Specifically, we

significantly reduced the unnecessary work that the

reviewers had to do, by reducing the false positives. We

also didn’t miss any of the previously correct pages, and we

were able to identify alternate terminology and pick up some

of the pages that the dirty word list was previously

missing. So we kept the green area the same, and we found

90%, 6%, of the previously missed pages. So overall, we

were really happy with this effort, and the reviewers seem

to like it, too. This work -- this is the work that we

extended to apply to the Reagan emails. The concept that

we’re working toward for the presidential emails is for

equity ID, to provide the ability to identify multiple

agencies equity in a collection of documents at the same

time. Presidential records are likely to contain equities

from multiple organizations, and individual documents within

those are also likely to mix information from multiple

agencies. If we can identify those equities accurately with

automation, we can potentially make the referral process

better and faster. And once the sensitive information is

identified, this could be passed along to the individual

agencies and help speed up their review process, as well.

We had to do some initial work on the emails before we could

begin testing with it for equity ID. The emails came from

backup tapes that were preserved at the end of the Reagan

administration from an email system called PROFS, and a set

of about 80,000 emails were preserved and restored.

Unfortunately, the email format for these restored emails

was very difficult to read. They’re -- the emails were

linked together in one long bit stream that was very

difficult to tell where one email started and another email

began. So the initial part of our effort with these emails

was to convert them into a usable data format, and identify

the threading relationships among the emails. We ultimately

created human readable image formats of these emails that we

could them use to put through the formal review process that

currently exists. We completed all of those processing

tasks, and delivered the emails back to NARA, and at that

point, we could apply SCIM to test out the equity ID

concept. We were able to demonstrate that proof of concept

to the board last September, and the initial results were

encouraging. We qualitatively, the sensitive information

that we identified did seem to correctly pick out things

that warranted referral. The formal review of the emails is

currently underway. And this manual review will provide us

ground truth that we can, for which emails contain

(inaudible) so that we can assess the performance of the

SCIM tool. While this is ongoing, we’re working with the

subject matter experts to improve the SCIM tool’s

performance, and add additional equities to the coverage.

This fall, we will take the ground truth that the reviewers

have produced to that point, and we will quantify and

validate the SCIM’s performance. By the end of the year, we

hope to document and quantify success for this concept. And

that will wrap up our efforts.

At this point, I’d like to credit my wonderful staff of

researchers and software engineers who make all this

technology work, and acknowledge the reviewers who have

helped us out, as well as the organizations who funded and

supported this work over the years. And I’d like to thank

the board for advocating technology to help with this

important classification and declassification decisions, and

having me here to speak today. Thank you all for listening,

and I hope that you find this work encouraging.

(applause)

SODERBERG: Well thank you very much, Dr. Martin. We will

have time for questions at the end of the discussion, if

those of you want to address something directly to Dr.

Martin. And as I said, we’re very impressed with these

breakthroughs, and we really want to see a continued

partnership between the CCU and the agencies involved, so

that this great work is carried on and filters through the

rest of the challenges on the declassification in

particular. We’re now going to hear from the PIDB board

members and emeritus members. If I could invite everyone up

here, we’ll hear from various members who would like to

comment, and I think we’re going to actually start with

Laura. So I’d invite all the board members up to

(inaudible). Thanks. Thank you very much. Yeah,

(inaudible). It doesn’t matter at this point. Go ahead,

sit down.

M: Thank you.

DEBONIS: Good morning, everyone. Is this on? Hello, can you

hear me? OK. Hi, I just wanted to say a couple of things.

First of all, it’s a true privilege to serve on this board,

and I’m honored by the trust the president and his

administration have put in me by appointing me to it. I’d

also like to say thank you to Cheryl and A-Mac, and for your

really interesting presentations this morning, that was

incredibly informative and so interesting. I would like to

take this opportunity to say a heartfelt thanks to my fellow

board members, and the PIDB staff for their warm welcome.

Everyone has been so helpful to my process of getting up to

speed, and I look forward to working with each of you. I

just have a couple of remarks, and then I’ll pass it onto

Sol. As we start our work together, I’m hopeful that my

professional background in technology and information

businesses will prove helpful to our areas of concern and

focus. In particular, I look forward to bringing the

benefit of my experiences, particularly my time at Google on

the book search, and with the libraries that participated in

that project, as well as my work subsequently on the digital

public library of America. On a personal note, issues of

information management and questions of access and usability

of information have interested me for a long time. Like

many of the people in this room. I grew up in a small town

in southern New Hampshire, and haunted my local public

library as a kid. The access that little library and its

devoted librarians provided me to a wide range of books and

information is central to who I am, and what I’ve been able

to do in my life. As I get more up to speed on the work of

the -- I think you were saying it PDIB, not PIDB, PDIB, I

hope to contribute meaningfully, particularly in the area of

technology and technology applications. We had a very

useful meeting of the technology working group last work,

with a broad range of agency participants. It was exciting

to see how much work is already being done, and I look

forward to future meetings of the working group. Serving on

this board is a very exciting opportunity, and I look

forward to working with the broader community of agencies

and public interest groups that are engaged with these

critical issues. I feel tremendously privileged to be

working for and on behalf of the public as a member of this

board. Thank you very much.

WATSON: Hello, I’m Solomon Watson. And as a first step, I’d

like to thank President Obama for appointing me to this very

important board. It’s a privilege and honor to serve the

country in this area of declassification and classification.

Nothing is more important to the country than maintaining

its national security. Equally important is ensuring that

our members of the public understand how the government

operates in that area. Along with Laura, I’d like to thank

the members of the board and the information security

oversight office for welcoming us aboard. My general

interest in government operations and national security came

into focus in 1971, when the New York Times and later the

Washington Post published, over government objections, the

so-called Pentagon Papers case. You may recall that those

papers came about as a study authorized by Robert McNamara.

Those papers indicated more than the national security

interest, a historical interest, and involvement of our

government in the affairs of Vietnam, including the

political affairs, and the -- obviously the conduct of the

Vietnam War. The publication of the papers resulted in a

great decision for freedom of the press, and the public’s

right to know. They also increased a growing skepticism

about allegations from the government using national

security as a defense, or a classification modality, to hide

political decisions. I spent most of my career as a lawyer

in The New York Times Company legal department, and there,

one of our primary obligations was to give legal advice to

members of the newsroom when they requested it, on legal

implications of publications of stories, frequently stories

that came about as a result of leaks of classified

information to our newsroom. As a member of the board for

three months, this is my first public meeting, and I must

say, I’m very excited about it. It’s been an exciting and

successful meeting. It appears to me from my other

meetings, executive meetings, and this public meeting, that

there’s a widely held recognition of the need and the

willingness to go forward in this area of classification,

and declassification. Well, there are a number of

challenges, including particularly on the technology side,

it appears to me that there’s a collaborative and collegial

effort among the stakeholders, including the intelligence

community, NGOs, and citizens generally, to make great

progress and I think the board has shown that as a convener

of communities, and inspirational organization, that it has

a continuing and important role to play. I’m certain that I

will contribute my efforts as a citizen interested in public

information, and as a former executive of The New York Times

Company to those efforts. Thank you.

SKAGGS: Good morning. David Skaggs, I’m, as the recovering

politician on the board, I’m authorized to both be

pretentious, and to engage in awkward metaphors, so bear

with me. But, Marty and Bill and I were onboard, so to

speak, from the get-go, and are now transitioning into

emeritus status, but thankfully are able to come back and

kibbutz a bit on the work of the board, and I hope continue

to make contributions, but it’s been a great privilege for

me to serve on this board for however many years it’s been

now since we got stood up. I think I’m here because of my

time on the intelligence committee in the House, serving

with then regular member Nancy Pelosi. And making a

somewhat questionable reputation as having a fetish about

overclassification. So it was interesting to probe during

closed hearings about the sources and rationale for

classification decisions that were in documents presented to

the committee. And so, I’ve been paying the price for that

role now for many years. But enjoying it all -- but the

pretentiousness comes just from, you know, it’s so easy to

sort of lapse into dealing with the grassroots of the

business of government, and losing track of, you know, what

we’re all about, and particularly as a representative of the

legislative branch of the government, if you will, the

uniqueness of American political philosophy, and its

origins, and maybe still, as a system in which the people

are the sovereign. And we are all accountable to and need

to bear in mind our accountability to the public, and that

can only happen if it has access to the information that its

government develops in its name, as much as we can possibly

effectuate. So, that’s -- you know, I sort of see this

board as in a critical role in that fundamental

responsibility of the democracy. And so, it’s -- you know,

you get into the nuts and bolts of classification decisions

and sometimes forget about that. So I -- and it’s so

important for us to have these regular public meetings to

remind us and you that that’s what this is about. The

awkward metaphor, which I won’t dwell on too long, because

it just occurred to me this morning, is that classified

information is sort of the cholesterol of the government

vascular system. And this board is trying to do stents, and

get rid of plaque, and get the system flowing well. Bill

thought we could talk about a different organic system of

the human body that would be less pleasant, but we won’t go

there.

LEARY: Former military (inaudible).

WATSON: Right, right, right. So, we’re hoping that the stents

will avoid the need for, you know, triple bypass for

government. Finally, one of the things that’s been a happy

occasion for me in this job, in the executive branch of

government, is to be reminded about the faithful and

extraordinarily diligent service of civil servants who are

often derided by my former colleagues on the Hill. But do

the work of this nation day in and day out. So this is a

callout for John Powers, who’s in the audience, and who was

on the staff of [ISU?] for many years, and helped this

organization do its work. So John, stand up. And accept

our thanks.

(applause)

WATSON: And you’re invited to lunch. (laughter)

STUDEMAN: Can you hear? I’m Bill Studeman, I’m one of the new

emeritus individuals going off the board after nine years as

a Congressional appointee. So I’ve been on the board for a

very long period of time, through the three major reports,

and as part of my departure homework assignment, carrying

over into the emeritus environment, I’ve been asked to chair

the technology working group, which Laura referred to

before. And we’ve actually already had one meeting. And so

I thought I’d talk a little bit about some of the philosophy

behind that, and then where we think the technology working

group will be going. I’m sort of a subscriber of the notion

of management by nagging, and I think pretty much what the

PIDB does is it does nagging and facilitation to try to get

the government to move in the right policy direction. And

of course, now, as we move into the digital era, this

technology underpinning, which A-Mac talked about this

morning, is really critical to this entire future of

managing classified records. The old system that we’ve had,

which we’ve written about in reports, is not a sustainable

system, in my own personal view. And we’ve said that. The

volume, the veracity, the velocity, the nature and character

of digital records, is going to be -- present to the

declassifiers a dramatically different world. We entered

that world actually 25 years ago, I was the director of NSA

when we fought Desert Storm. That’s 24 and a half years ago

coming up now, the declassification period. We fought that

war in a digital way, so it was really the first digital

war. That said, the permanent records are analog, and so

they obviously had to be converted from digital form to text

record form, and the irony of that is, of course, we’ve now

specified going back for emails in 2016 to digital format,

and then after that, fully digital. So the records in this

period of time will go from digital to analog, and

ultimately back to digital, at no small cost to the process.

So this is an exciting period of having to look at this last

25 years, and the implications of it, and also in that

period of time, of course, the Information Age exploded, and

the media on which these records were kept have all aged,

and disappeared, and there’s a whole series of issues about

even finding the records from that period of time, if they

weren’t put to paper. So, I think that my message is that

this technology working group is going to work in several

areas. One is obviously, the issue we talked about this

morning, the search for applicable tools and technology that

can help with the whole panoply of issues associated with

managing classified information and records. So this is not

just declassification tools, but this is understanding the

architectures, and the environments in which these records

are kept. And of course, as you’re aware, virtually

everybody’s going to the cloud, the cloud will probably be

the dominant architecture for the future, it offers real

opportunities for storage, for large application stores, and

search techniques, and other aids to dealing with

information in a big data environment that we’ve never had

before. So, we will be looking at where those tools might

be. That means that we’re going to have to go probably out

to universities beyond the national -- the laboratory

project that’s going on right now, and also go onto the

public/private side to the information masters in Silicon

Valley and elsewhere, who have the technology that can help

us along in this area. And our job is really to try to

ensure that the government agencies who have classified

holdings are in fact paying attention to this, and sort of

the nagging mode, facilitating mode that is the way of life

in the PIDB.

The next thing we need to do is to track where these

agencies actually are in the rollout of their own future

architectures. So, we have -- in this first technology

working group meeting, we went into the intel community and

had a deep dive into the ICITE project, the intelligence

community IT enterprise upgrade. We did a deep dive on the

archives electronic records program, we’re looking for

convergence and divergence, and trying to facilitate

understanding on the part of everybody, about where

everybody else is going. We had a large contingent of OSD

people there, representing 42 agencies, departments,

services, etc., in the Department of Defense, that hold

classified records. They remind us that 75% of the

classified records are in the Department of Defense, and

there are plenty of issues over there, including there

issues on (inaudible) between OSD and the Department of

Energy, on RDFRD. So there’s a whole series of important

issues that we hope to hear more from them for the future

about where the Department of Defense is going. Of course,

you recognize that these new architectures that are coming

out are coming out essentially as a sort of a chapeau on top

of hundreds and hundreds of systems that lie underneath

them. And that’s really where a lot of these records are or

are going to be. So, there’s some issue about resolving all

that. Yeah. So, three different things. The search for

technology, looking at the existing architectures, and

trying to help with some facilitation around with the

holders, we’ll look at state next, energy, etc. Try to get

them involved. And then, look at the state of records and

the issues that are associated with that. And as you know

from our earlier studies, as we try to move from the as-is

to the to-be, for which there needs to be some kind of

strategy, overarching strategy, which is reflected in

policy, we need to be careful about divining some core

principles for declassification, and identifying the

specific issues that exist in that transition from

essentially the analog the future digital era, where we can

have a lot of support. And then, we can move into some of

the objectives areas where we’re looking at things like

early declassification, inside the 25 year point. And so,

there should be an exciting time. I was struck, as the

presentation was being given on the CCU, that the people who

do declassification can no longer be just policy and

sociological declassifiers. Without the technology people

to work to software, provide all the required

implementations that deal with all of that, there will not

be a future in declassification. So, you have to have the

technology people added to the people who are doing the job

right now, who can do all that configuration management, and

all the other kinds of things that are going to be required.

So this is a significant challenge.

One final thing I would say also is this is all being done

in a down budget environment. And the down budget

environment means that we have to organize the collective of

intelligence declassifiers, a collective of defense

classifiers, etc., into a more common kind of framework, so

that there’s information sharing, really understanding

across everybody about what everybody else is doing. So, we

had this huge task of trying to ride the wave of IT for the

future, which offers all this promise. Organize for

success, so we can get some economy of scale out of the

whole, keeping in mind that when I came on this board nine

years ago, we were just introducing declassifiers in the

intelligence community to each other. So, I think we’ve

actually come a long way in that period of time. And so,

the challenge ahead for the board is to sustain and

accelerate around the objectives for this technology working

group, which I think is going to be the core part of our

effort, as technology relates to the policy for the future.

So thank you very much.

FAGA: I’m Marty Faga. As was said, I joined the board at its

inception in 2006, because Bill Leary appointed me with the

support of the president. And I think he did that because

I’m a person who actually declassified something, which was

the existence of the NRO that I announced publicly in 1992,

after a classified existence of 31 years, a few years of

which it was actually secret. (laughter) A point which I

was able to convince the DCI, Secretary of Defense, and

ultimately the president, in 1992. I’ll observe that 23

years later, there are people in the NRO who still criticize

me for that. I’ve always been interested in

declassification because I served in the 1980s, on the House

Intelligence Committee. And sat on the sideline in the

staff seats, as all the contemporary history of intelligence

was being presented to the committee, year after year after

year, all being carefully recorded, verbatim, in a

classified record, and thinking, what an incredible story

for the American people to hear at an appropriate moment of

declassification. You know, virtually the whole history of

not only intelligence, but all that intelligence learned

about foreign affairs and military affairs. As a

technologist, I’ve always been interested in the kind of

work that Dr. Martin is doing. And in this digital age,

understand that it will be imperative to doing

classification and declassification, and in fact,

intelligence analysis in this age. We’ve been pushing this

for a long time, one of the concerns that public interest

groups have expressed is that we were going to go to total

automation, and the human brain, human decision making,

would no longer be involved. You said it very well, it’s a

decision aid, a decision support aid. I saw some early work

in this almost 30 years ago in map making, which stuck with

me as it brought together technology and a skilled analyst,

that made that analyst vastly more productive, and increased

the interest content of the work that she was doing. Thank

you very much.

STUDEMAN: Just a couple of brief comments about the CCU project

that you heard about this morning. Which impressed all of

us enormously when we got a much more detailed briefing from

Cheryl in Texas. I think certainly, we’re convinced. I

think the CIA is convinced that this concept has been

proved. It works. And what is most striking about the work

they have done is that they have shown that not only do

these approaches, these techniques, these technologies,

improve the efficiency of declassification reviewers and

classifiers, they improve the quality of their work, just as

much as they improve the efficiency of their work. You get

better results. So our great hope, concern, is to make sure

that these proved concepts get applied and used, as soon as

possible. It’ll be a big leap for most of the

classification community to trust the computer to make the

right decision. We’re going to have to get to that, and the

sooner, the better. One final point, I think the public

interest groups and the audience ought to be as impressed as

we are with these potential tools, these real tools,

potentially applicable. And I hope they’ll use their

influence whatever way they can to ensure that the funding

for this project continues. As we will try to do as well.

SODERBERG: We’re going to open it up for public comments in a

moment. I want to just echo what Admiral Studeman said on

the technology committee that he’s driving. I think it’s

going to do more than just put pressure on things, but to

really open people’s eyes to the possibilities, and as Bill

Leary said, the only way forward in addressing the issue of

classification and declassification is technology. When I

started looking at this some time ago, I thought we’d have

to convince declassifiers to be less risk-averse, and take

technology. It’s the other way around, it’s less risky to

use technology, because humans make mistakes, as Cheryl

said, machines can do this ad infinitum, and we get tired

and make mistakes. So, it actually is more accurate, less

risk-averse, more efficient, and cheaper. And it’s the wave

of the future. So again, thank you for being ahead of the

curve on all of this. Bill Leary and I wanted to just take

a moment to recognize our fellow members, and emeritus

members, and then we’ll open it up for questions. So Bill

will lead that off.

LEARY: I’ll start by asking Laura and Sol to come get your

presidential commissions. I talked earlier about the

sterling qualifications that they both have for this job.

Oh, one thing in particular you both have that the rest of

us, almost completely do not have, that is, they made their

careers outside of government. Not inside. So you will

bring a very useful perspective, in addition to all your

expertise to this undertaking. These are your commissions,

signed by the president, for this important undertaking, we

look forward to working with you. Thank you for being with

us to do it.

(applause)

SODERBERG: I was going to embarrass John Powers before David

Skaggs did it, but I have to add a commendation to John.

John has actually left the archives, but has gone over to a

more powerful position where he can continue to help us, as

director of access management at the National Security

Staff, working with John Ficklin, who’s hiding over there in

a chair. John, maybe a wave, too. But really John, thank

-- all of us, we’re going to embarrass you a little bit

further this afternoon, but thank you. You leave a big hole

over here, but I know your heart’s moved into a bigger place

to help us even more. So, thank you for that. I wanted to

just take a minute and acknowledge and recognize the service

of Bill Studeman, David Skaggs, and Marty Faga, who’ve

really been the heart and soul of the PIDB, and you’ve each

completed your third and final members term, and this is our

last public meeting with you, so we wanted to just take a

minute and commend your work, and give you a little present.

All three of you became members in the early 2000s, as

you’ve mentioned, when the board was in its infancy, and I

think you’ve helped shape it and define it into the force

that it is today. You’ve helped write all the reports and

recommendations to date, and as one of the newer members to

this board, your guidance has been extraordinary. And each

of the reports that we’ve put out reflects their heart and

soul, your passion and advocacy for transparency,

responsible declassification, and your thought provoking

comments, has really, I think, made an indelible mark on our

efforts to change government. You spent many, many hours of

dedicated service, not only on this board, but in other

public service roles and jobs, and leave a lasting mark on

this country. In addition to his years in the Navy, Admiral

Studeman was, where I first met you actually, as the

director of the National Security Agent, and director --

deputy director of the CIA, as well as an acting director

for a little bit. David Skaggs served as -- 12 years in the

House of Representatives, from Colorado’s second district,

where his heart always is. And he was six years on the

House Permanent Select Committee on Intelligence, so has

been an invaluable voice of wisdom, reason, and prodding on

many occasions that I think we’ve needed. And of course,

Marty Faga was the 10th director of the National

Reconnaissance Office, which we can now, as he mentioned,

publicly talk about. And I think that set the stage for

putting out the public’s right to know that the NRO existed.

So, these are just a small way of saying thank you, but we’d

like to have you come, and we just have a small gift for

you. And with the assistance of the archivist, who I guess

just left, (laughter) we’ve made reproductions of the seven

samples of secret ink, that you can take with you. And this

is one of my favorite things. This report is dated October

30, 1917, and it was classified as confidential for many

years. And it details description of secret writing

techniques. For instance, in April 2011, the CIA finally

declassified this information and made it public, and so

this is thought to be the oldest classified record held by

the government, and it was created in 1917. We thought that

this would be an appropriate remembrance of your time here.

And with that, I just wanted to thank each and every one of

you for your dedicated service, and here’s a little gift for

(applause)

M: (inaudible).

M: I don’t know whether any of you have seen the junk flying

around town, an email about -- well, (inaudible) CIA has for

its important points on its documents. It’s a black

highlighter, so it’s (inaudible). (laughter)

SODERBERG: OK. Now we’ve finally gotten to the reason that

you’re all here, is for the public comments. We’d like very

much to hear from all of you. We have about a half an hour

to take questions. You can direct them at any of us sitting

up here, Dr. Martin, and I will cut you off if you turn it

into a speech, so thank you.

F: (inaudible).

AFTERGOOD: Hi, I’m Steve Aftergood with Federation of

American Scientists. I wanted to caution against putting

the technology cart ahead of the policy horse. I was struck

by Dr. Martin’s remark that the classification guidance that

she receives is often lacking in specificity and clarity of

the kind that’s needed for the computer. I think it’s also

lacking in the clarity that’s needed for human classifiers,

and it accounts for much of the overclassification that

takes place. I don’t want to identify a problem without

proposing a solution, and I think there is potential

solution in the upcoming fundamental classification guidance

review that is already required by executive order. Under

the terms of that review, every classification instruction

in every one of the thousands of classification guides

throughout government is supposed to be reviewed by

executive branch agencies. And I would suggest that this is

an opportunity to refine that classification guidance and to

give it the kind of clarity and specificity that Dr. Martin

needs, and that the rest of us expect. If we have vague,

confusing guidance, of the kind that I think we do have

today, then automating its application is just going to

create chaos. So, Admiral Studeman talked about nagging,

and Bill Leary suggested agenda items for public interest

groups. I would like to suggest an agenda item for the

PIDB, that you do some nagging about the most effective

possible implementation of this upcoming fundamental

classification guidance review. Thanks.

STUDEMAN: Yeah, good point. Just -- this is (inaudible) your

remarks highlight.

SODERBERG: Here, use the microphone.

STUDEMAN: Of the way in which the inexorable move to greater use

of technology will force agencies to refine their

classification guidance, because as you say, it won’t work

unless that happens.

CONNELLY: My name is Matthew Connelly, I’m at Columbia

University. I work with colleagues in computer science and

mathematics on a project we call the Declassification

Engine. And I was very interested to hear Dr. Martin’s

presentation. I have a three parter. One is, a number of

people have pointed out how obviously, humans also make

errors. There’s a project in the UK called Project Abaca,

where people from the National Archives have looked at the

error rate, the intercoder unreliability, how it is that

humans looking at the same document will redact different

things, or withhold different documents. So is there any

research, or any plans for research, to establish the

baseline? So we can know, you know, what is the error rate,

you know, when humans are reviewing documents for

declassification? I think that would be quite useful in

advancing this argument for the need for technology. The

second part has to do with the presentation itself. So, I

had some difficult just evaluating some of the research, and

I would love to know more, so I’m wondering if there are

plans for publication of some of the more specific aspects

of what methods you used, and what kind of results you were

getting. So is the code going to become open source? You

know, are there plans for publication? I hope, anyway, that

this will become a research field for data scientists, but I

think for that to happen, they would have to know more about

the kind of data you’re using, and what kind of results

you’re getting. The last part is about the funding. So, it

would be great, right, if there was more support for

research in this field. My project is funded by the

McArthur Foundation. I understand a couple of years ago

that DARPA put out a request for proposals for research in

this very area, but none of the proposals were funded. So

I’m wondering if there’s any prospect for funding in this

area to support research on automation of declassification,

for people outside of government?

SODERBERG: Cheryl, do you want to respond?

MARTIN: Sure. Let me go over in this corner. So we haven’t

actually done any quantification of how much better the

results are using this technology. We have a general

feeling that the reviewers are able to do a better job,

because a lot of the meticulousness, the attention they

require, is taken out of the equation for them, and they’re

able to focus better. But those are all kind of qualitative

feelings, we don’t have a baseline of before and after. And

at this point, we just don’t have enough resources to do a

long-term study like this. I agree that it would be

beneficial to have that before and after data. But with our

-- currently, no resources to dedicate towards that, and we

have not previously done that. In terms of plans for

publication, and the code -- and publication, making the

code open, the technologies that we use are already open

source. In fact, most of the fundamental (inaudible)

processing and expert systems are free, because we like free

things. Technology, really the things that we’re adding to

it is the configuration for the sensitive information. And

that gets pretty quickly into what’s classified. So, in

terms of making the infrastructure that ties the

technologies together open source, that could be something

that happens, but that takes some effort to maintain an open

source project, and kind of service that. And that’s just

not something we’ve been off. We -- you know, in terms --

it’s just a resource constraint on my staff. The third

question, I don’t have anything to comment on for that.

FITZPATRICK: Thank you, Cheryl. And thank you Matt for the

question. My name’s John Fitzpatrick, I’m the director of

the Information Security Oversight Office here at NARA, and

the executive secretary, we provide all of the staff support

to the board. So, let me take the question about resources,

and say they’re absolutely the questions that need to be

asked, but sort of where we are in the movie right now with

regard to proving the concept. I think we’ve -- as Bill

said, we’re there. Now, what do you do with the proven

concept? And how does the government accept that this is a

possibility and place in its action plan doing something

about that possibility? So, part of the board’s purpose is

to -- I like to say put wind in the sails of others efforts.

Admiral Studeman says management by nagging, I think those

are both the same thing. And we are in a moment, if you

will, here, by taking the board’s plans to promote and then

report on this need for the purpose of getting movement at

the government level for a program to do those things.

Where the understanding of how much can be done openly and

how much needs to be done in a classified environment, can

be parsed in programmatic terms, where the requirements for

attention on this and intelligence in presidential library,

in defense, or in other civilians agencies, where all of

those constituencies perform declassification in a stand-

alone way that is connected to each other, how do we take

that and make a government program out of it? All the

government agencies doing it do it, but we don’t do it in a

unified or an integrated program yet with resources,

technology, and a strategy that bring those together. So,

we’re trying to get that to happen now, I think the presence

of the open government program, the chief technology officer

of the United States, and the National Action Plan

commitments that progressively drive us down the lane

towards doing something, that’s where we’re hoping to take

it. So, I would say watch this space, and -- but we have to

go from -- we’re going from ground up. We’ve proven the

concept, now we’ve got to get the traction inside the OMB

and other places to make something of it. Thanks.

SODERBERG: I seem to remember, when we were at Texas, and

Cheryl, correct me if I’m wrong, that there was a

differential of 15 to 20% on the mistake rate. That it was,

you know, 80% were correct, and it brought it up to 95% when

you’re using the technology. Am I misremembering that?

MARTIN: No, that was something that we could see in the

classification [portion marking?]. And we did have a

quantification for that small set, so that’s true. In a

larger sense (overlapping dialogue; inaudible). Ambassador

Soderberg is correct, we did have some comparison of before

and after for classification portion marking, where we had

looked at what people had done for the small sets of test

cases, and compared it to the ultimate accuracy that the

computer was able to perform. And it was about 80% for

people in the small test cases. And in terms of accuracy

for the human classifiers, and that’s going to vary by

person. But ultimately, the computer was able to achieve

almost a 90%, or almost 100, 98% accuracy rate. So, we do

have a general feeling in terms of the portion marking. We

don’t have quantitative results for, you know, actually

finding mistakes in declassification. That’s just not

something we’ve really pursued as a large study to look for

those types of errors that humans may be making. And that’s

just a resource focus, I think.

SODERBERG: I have a couple of hands up. One here and one

there, and John Ficklin, you may not want to do this, but if

you want to -- I’ll give you a minute to think about it, if

you want to comment on the classification review that the

gentleman from Columbia was talking about, if you want to

fill us in on where that stands, I’ll invite you to do that

after the next two comments, or you can pass. So, why don’t

we start there and then -- back row.

BENDER: Michael Bender, Air Force Declassification Office.

There’s still a large volume of paper records to be reviewed

in College Park, and at Suitland, and I’m wondering, does

the board feel that automation would be used for these

analog records with the attendant requirements for scanning

and OCR? Or do you feel that automation should be reserved

for the more digital records?

SODERBERG: Thank you.

STUDEMAN: Our expert here in the front row, (inaudible) Cooper,

stand up, Eric. (inaudible) spend -- yeah.

M: I have a mic.

STUDEMAN: Pushing this project, on behalf of the CIA.

M: The digitization of the paper is actually very expensive.

And because we have to then OCR the paper, and use somewhat

dirty text, then the error rate would get much higher. I

don’t think that the technology we’re proposing is a

solution for paper, simply because of the processing that

would be required on the paper in order to apply the machine

concepts to it.

F: (inaudible) minutes. Is there another question?

SODERBERG: But one of the advantages of digitizing the

records is it frees up declassifiers to clear up some of the

backlog, too. So, it’s much faster, cheaper, more

efficient, if we could get this. The back row.

CLARKSON: Yeah, I just wanted to comment on the dream of

standardization of how humans --

SODERBERG: Can you introduce yourself?

CLARKSON: Oh, Charles Clarkson, sorry. How humans communicate,

I’m reminded of a story, a friend of mine were at NIH, and

trying to standardize how doctors communicate. And I said,

“You know, you getting a doctor to describe symptoms in

Boston the same way that a doctor in San Diego describes

it?” He says, “Hell, we can’t even get two hospitals in

Boston to agree on how to do this.” And it’s really one of

the fundamental challenges and weaknesses of the data act,

trying to have transparency and consistent and standard, if

they can agree, get agreement among federal agencies on

terms, they’ll never get agreement among all the grantees

that money goes to. So, following up on Bill Leary’s

comment, technology is your only hope in terms of conceptual

data mining that can accommodate how asthma can be described

50,000 different ways. Humans will never agree in a moment.

Thank you.

F: (inaudible).

SODERBERG: Do I see any other comments? John, did you want

to talk about what was going on there?

FICKLIN: Good morning, I’m John Ficklin. Senior director for

records and access management of the National Security

Council staff. Of course, Mr. Aftergood is correct. We do

need to make sure that we get the classification guides

correct. Certainly if the classification guides are

correct, we can apply that to this technology, and really be

resourceful. I do have to tell you, there was one other

comment that I’ve heard multiple times today, and it’s

really a top priority for me, and that’s funding. Money.

This all costs money. And I’m committed to do what I can to

try to ensure that we do get the money we need to move this

project forward. Thank you.

SODERBERG: John?

FITZPATRICK: Yeah, if -- I’d like to also refer back to Steve’s

comment about the fundamental classification guidance

review, so that we don’t leave something hanging. This is

-- he’s absolutely right, this is an opportunity built into

the executive order. We have been once through the

executive branch on this, with reports that were completed

in the 2012 timeframe, and have learned some lessons,

largely in owing to the observations of the public, and

analysts like Steve, who have looked at that and said, you

know, so what? And how can we get more meaningful so what’s

out of that? And so, Steve and I have had a number of

conversations on this topic, and we have taken those to

other folks in the business area of classification

management. Our office will issue guidance to agencies to

get them started in the next cycle. And so, in addition to

the input that we’ve gotten from Steve, and what he has,

I’ll say, put ideas in your head, hopefully, to think about

this as well, we welcome your input for how meaningful

outcomes could come from that review, in addition to the

ones that we’ve talked about here, we’ll continue to talk

about it with the board, and ISU, and the classification

management activities we do in the interagency, are already

contemplating how to pick the best task for agencies in the

conduct of this review, so that we get optimal output on the

back end. So I didn’t -- I wanted to acknowledge Steve’s

remark, but also say yes, in fact, that dialogue is engaged,

and it is something that we welcome input on as we always

do. Thanks.

SODERBERG: If I -- going once, any more? Let me pass --

thank you all for coming. I’m just going to pass the mic

back to Bill Leary, who will close the meeting.

LEARY: Yes, thank you all for coming. We encourage you to

continue to send comments, suggestions, complaints. We have

a blog called “Transforming Classification,” where you can

supply those comments. We urge you to use it, because we do

take seriously the word public in our title, and in our

mission. Thank you all for coming.

(applause)

F: Thank you.

(crowd chatter, inaudible)

END OF AUDIO FILE

pidb meeting 6_25_15€¦ · web view25.06.2015 · pidb meeting 6_25_15 (crowd chatter, inaudible)...

Documents

guam · 2015. 6. 20. · active irp through the pacific...

lectura 4. hurbinek. la palabra inaudible o el decir...

morning morning brifing

members · web view1 day ago · bridgesgood morning. i...

adaptive beam steering of rlsa antenna with...

metamorph: injecting inaudible commands into over-the-air...

pidb - pet industry database / powered by edgenet

lesson 47. good morning, good morning, good morning to you!...

dolphin attack: inaudible voice...

1 · web viewservices is i'm going technical support...

2006 annual...

wednesday 8 oct thursday 9 oct friday 10 oct saturday...

good morning, good morning! good morning to you! good...

[begin part 03-second hour audio]€¦ · key: part...

arte en la mente inaudible

pidb report 2020 - national archives · a vision for the...

laslo gerardus evers- the inaudible symphony: on the...

brand list · • h2 • morning-chch morning live • hgtv...

good morning, morning

morning, morning, morning!!