pidb meeting 6_25_15€¦ · web view25.06.2015 · pidb meeting 6_25_15 (crowd chatter, inaudible)...
Post on 12-Apr-2018
224 Views
Preview:
TRANSCRIPT
PIDB meeting 6_25_15
(crowd chatter, inaudible)
LEARY: Good morning. Good morning, no? All right, somebody
who -- it says on. Hmm? Who knows how the mic -- oh, OK.
Ah, there we go. Good morning. My name is Bill Leary, and
I’m happy to be able to welcome you to the latest in the
occasional, if irregular, public meetings of the Public
Interest Declassification Board. We’re always amazed how
many of you turn out when we schedule one of those meetings,
and we’re delighted. I know that most of you know what the
PIDB is, but just to refresh your memory and for the benefit
of any newcomers who may be here, the Public Interest
Declassification Board was established by Congress. Its
members are appointed by the president and the leadership of
Congress, and we have two broad, very complementary
missions. Our first mission is to promote the fullest,
promptest access to the classified record of the United
States government. Which is, of course, a large part of the
history of the national security and foreign policy of the
United States. Our second, very complementary mission, is
to advise the president and the rest of the executive branch
on how to improve the process of classification and
declassification, in order to better accomplish that first
1
overarching objective. Our meeting this morning is going to
focus on what we think are some really rather exciting
developments related to that second objective, how to
improve the process. And also, we’re going to talk a bit
about our plans for our next study, our next project, to try
to come up with some recommendations for encouraging greater
use of technology to aid in the process of declassification.
My first task, my first very happy task this morning, is to
welcome the two newest members of the Public Interest
Declassification Board. Laura DeBonis, Laura, why don’t you
stand, and Sol Watson. Laura has over 20 years’ experience
in the information technology and media fields. She
currently serves as a founding board member for the Digital
Public Library of America, an organization devoted to
creating an open network of online resources, from
libraries, archives, and museums, and making them freely
available to all. Sounds very pertinent to our mission.
Her professional experience includes a variety of leadership
roles at Google, including her last position there as
director of library partnerships for book search. Welcome,
Laura.
Sol Watson has a long and distinguished career at The New
York Times Company, beginning in 1974, and he retired, I
2
think, as a senior vice president and chief legal officer of
The New York Times Company. Sol has also been a special
master in the appellate division of the New York State
Supreme Court, and is a member of the American Bar
Association, the National Bar Association, and the
Association of the Bar of the City of New York. From 1966
to 1968, he served in the US Army as a lieutenant in the
military police corps. Welcome, Sol. Now I want to yield
the podium to Ambassador Nancy Soderberg, who will walk us
through the rest of this morning’s program.
SODERBERG: Well, good morning everyone, and thank you for
coming, and thank you Bill Leary for opening us up for
what’s going to be I think an exciting day, and particularly
glad to have Laura and Sol with us as full members of this
great team. I think we’re going to have a really
informative session, an interesting discussion, comments
from the public, our very distinguished guests that we’re
having. And the purpose of this meeting is to continue our
advocacy for the transformation and modernization of the
classification and declassification system. Simply put, it
is not workable under the current system, and needs
technology in order to meet the public’s right to know what
its government does. Our last supplemental report, which
you can get on our website, “Setting Priorities: An
3
Essential Step in Transforming Declassification,” revisits
one of our recommendations to the president for
transformation. And that’s the focus of today’s discussion,
which is to encourage the development and the use of
existing and new technologies to assist those declassifying
and classifying information at the agencies for the National
Declassification Center. And this morning, we’re excited to
hear from our distinguished speakers, and what strides that
they have made and are seeking to make in support of our
recommendation. And once again, we’re delighted to have our
wonderful friend and our distinguished archivist of the
United States, David Ferriero, join us as our host. There
is no better supporter of our work than David. And he’s
been a longtime advocate for advancing access initiatives
within government. As archivist of the United States, David
is a leader in fostering policies to support a more open and
transparent government. His record demonstrates that. He
encourages the movement of government and the National
Archives from the analog age to the digital information age.
And he recognizes the need to design new processes and
policies to ensure citizen access to records of our
government. And I’m especially impressed with his many
successes in building partnerships that will greatly improve
public access to government information. So let me ask our
4
archivist, David Ferriero, to come up and say a few words.
Thank you very much.
FERRIERO: Thank you, Nancy, and good morning all. Welcome to my
house. And I’m extraordinarily proud to be the archivist of
the United States. Leading the National Archives as we
strive to promote open government and transparency for the
benefit of our democracy. As caretakers of the Declaration
of Independence, the Constitution, and the Bill of Rights,
we hold the words “We the people” in high esteem, and take
seriously our responsibility to preserve and make available
the billions of government records we hold in trust for the
American people. Innovate to make access happen is our
flagship open government initiative. We continue to take
actions to improve transparency, participation, and
collaboration in every aspect of the work we do here at the
National Archives, while embracing innovation and developing
best practices to carry out our mission for the benefit of
the American people. The Public Information
Declassification Board plays an important role in promoting
open government by continuing to advocate for policy
improvements that support greater public access to
government information of historical significance. The
members have repeatedly recognized publicly the growing
challenge facing the government agencies in today’s digital
5
information age, and the board has been a strong proponent
of modernizing antiquated policies and practices, often
inhibiting access to our records. The board’s December 2012
report, “Transforming the Security Classification System,”
described these challenges in detail and offered thoughtful
recommendations that if implemented, will modernize and
improve information management overall, including the
expedited declassification of national security information.
The board’s 2014 supplemental report, “Setting Priorities,”
expanded on one element critical to transformation,
prioritizing records of historical significance for
declassification. And I’m pleased to say that the National
Archives Declassification Center has already begun the
process of reevaluating how it prioritizes reviewing records
for declassification.
After successfully retiring a backlog of over 351 million
pages of records, the NDC now has an opportunity to rethink
how it may improve its operations and prioritize records for
declassification review. So that the most significant to
the public and -- are processed first. At the April 15th
NDC public forum, Director Sheryl Shenberger outlined next
steps and prioritization at the NDC, and her comments from
public interest groups, scholars, historians, and advocates,
6
including board member Bill Leary. We heard proposals for
process improvement, and suggestions of records for
prioritized declassification review. I know the NDC will
consider and apply many of the recommendations made at the
public forum, and we intend to find innovative means to
improve upon our success thus far. Improving access to
historically significant records, however, requires more
than just finding a means to prioritize records for review,
as the board recognized in both of its reports. IN order to
innovate and make access happen, we must seek out
opportunities to integrate new and existing technology into
our information management practices. The board shares this
belief. It’s been a longstanding advocate for the increased
use of pilot projects in order to build partnerships across
agencies and reach our common goal of improving how the
government manages its information, both in declassification
and records management in general. These declassification
and records management policies are practices, and practices
are inherently linked, and the board’s acknowledgement of
this important principle helped shape many of the
commitments found in the president’s second open government
national action plan. Today, I’m pleased to welcome Deputy
Chief Technology Officer of the United States, Mr. Alex
Macgillivray, to this public meeting. This is somewhat of a
7
reunion, Laura DeBonis, A-Mac, as he’s known in the
industry, and I, were joined at the hip during the Google
Book project, when I was at the New York Public Library. So
it’s great to have both of you in the room. A-Mac will
discuss the technology policy initiatives underway at the
Office of Science and Technology Policy at the White House.
His efforts to leverage technological talent and expertise
of individuals and teams across the government are critical
to modernizing records management, data management, and
declassification processes. I’m sure that you have -- he
will have important commentary on the newly established
United States Digital Service and its mission to transform
the way the government works for the American people.
I welcome research scientist Dr. Cheryl Martin from the
Center for Content Understanding, who has completed pilot
projects at the Applied Research Laboratory at the
University of Texas at Austin, on behalf of the National
Archives, and the CIA. We at the National Archives and the
CIA have partnered with Dr. Martin and her team in an effort
to find technological solutions to assist declassification
-- declassifiers in their decision making, to improve the
outcomes of reviews. Dr. Martin will outline these results
of those pilot projects, which to date, are the only pilot
8
projects at this level of sophistication in existence that
focus first and foremost on improving declassification and
access to government records. I look forward to hearing
more about the impressive achievements of these pilot
projects during Dr. Martin’s presentation.
We will learn today about the latest cutting edge
technological capabilities and modernized government
policies that support innovation. These advancements are
critical to our work at the National Archives, but the
uniqueness of our mission does not afford us the luxury of
only looking forward. As we prepare and work towards
solutions, and managing digital records, we must also find
innovative and effective means to manage the billions of
pages of paper records still being created across all areas
of government. The sheer volume of information in need of
management, whether found in paper records or in digital
records, digitized records, or special media, will continue
to shape how we do our business. To this end, I’m
encouraged by the progress we at the National Archives, and
at agencies, have made under the direction of our chief
records officer, Paul Wester. In response to the
president’s managing government records directive. As we
work in collaboration to modernize our government’s
9
information management practices overall, we must remember
to identify and understand the many facets to this
challenge, and view potential solutions from a high level
vantage point. Working to make changes that are automated
and scalable to the benefit of all information users. I
want to thank PIDB, the agencies, the public interest
community, and everyone joining us here today for
contributing to this morning’s discussion. Critical to the
success of our transformation efforts is the continued open
dialogue we share with our stakeholders inside and outside
of government. This engagement is essential to help us
improve our services, and help us serve our democracy by
providing access to the highest value government records.
Thank you for your efforts and support, on our mission and
our work.
(applause)
SODERBERG: Thank you, David, and really, thank you so much
for your continued support of our work, as well as on behalf
of the archives. As David mentioned, we have a fantastic
lineup for you this morning. Both with Cheryl and Alex.
And we’re going to next hear from Alex Macgillivray, who’s
the Deputy Chief Technology Officer of the United States.
And in his first full day of office, which you can take full
credit for, the president created the US Chief Technology
10
Officer position within the White House Office of Science
and Technology, to lead the administration-wide effort to
unleash the power of technology, data, and innovation to
help meet our nation’s goals and the needs of our citizens.
And Deputy CTO, A-Mac, I guess he’s called, focuses on a
portfolio of key priority areas for the administration,
including the intersection of Big Data, technology, and
privacy. He’s an internationally recognized expert in
technology law and policy, and prior to coming to the White
House, he served as general counsel and head of public
policy at none other than Twitter, from 2009 to 2013. And
he’s an actively practicing developer and coder,
contributing to his ability to formulate creative and
sensible technology policy, and understanding its
ramifications better than certainly I can, I’m sure. But
we’re excited to hear about your assessment of the new
information and technology needs of the government, and how
we can leverage technology talent to modernize records
management, data management, and declassification. So
welcome, A-Mac.
(applause)
MACGILLIVRAY: Thank you so much, ambassador, and thank you to
the PIDB. This is -- it’s wonderful for me to be here,
particularly wonderful to be sharing a stage with the
11
archivist who I admire so much both for his work here, and
for his work as a librarian at NYPL, MIT, and I think Duke
before that? But having negotiated with him while I was at
Google, I can also say that he can be quite a pit bull for
his particular cause. And he’s often right, which is
extremely annoying when you’re on the other side. But, I
would say that one of the reasons why this is a thrill for
me is the thing that’s motivated me most throughout my
career is access to information. And so, this particular,
both the archive and this board, really embodies that, and
embodies it in a way that’s not -- it’s not trivial. There
are plenty of places where you can talk about access to
information, and there’s no downside, there’s no other
interest at issue. And this is a place where you’re really
dealing with where the rubber meets the road, and trying to
understand that tension, and get through it to actually get
to the access to information, which is extraordinarily
valuable. I was asked to talk a little bit about tech use
in government. And the administration’s commitment to open
government, so I’ll do that, and probably touch on a few
other things as well.
So the president, as the ambassador said, right from day
one, was focused on how do you bring more technology and
12
expertise into government? And that’s why the CTO’s office
was created, it was sort of a vestige from a very successful
campaign that changed the way campaigning was done, in terms
of bringing more technology understanding into how to run a
campaign, and get it through. But that it was a thing that
was a focus, but maybe not a principle focus over time,
until the healthcare.gov problems happened. And I think the
thing that healthcare.gov brought home for our
administration, I mean more than anything else, was this
idea that you couldn’t really do policy anymore in a vacuum
without understanding implementation, and particularly
without an understanding that implementation in technology.
Obviously, the Affordable Care Act was law at that time, but
by itself it wasn’t going to be able to create its goal of
enrolling more Americans in healthcare. And so, the idea
that technology was going to be responsible for that, and so
we had to get the technology piece right, and that would
mean bringing in technologists, having them work on the
problem, get it over the finish line, but also bringing them
in earlier and earlier into these policy processes to have a
better marriage between that policy and technology goal, was
really important. And so, to that end, the thing that is,
that sort of the techies in government right now is focused
on, is really three principle areas.
13
So the first is that policy implementation, making
government services world-class. One way to think about
this is we have some of the best, most innovative
technologists within the United States, we have people
who’ve created Amazon, and Google, and Facebook, and
Twitter, and all these other great services that we rely on
every day, and we want to make sure that the government
websites are just as good, that the government services get
provided in just as much of an agile and technology-focused
and user-focused way. And so, there’s a whole bunch of
different people who are working on that, and I’m going to
go into the different people working on the different parts
in a moment.
Number two is really sort of the flip of that, how do you
bring more technology understanding into policy formation?
And so, that’s really something that it’s one of the focuses
of the discussion here today. If you have the types of
tools that Dr. Martin will be talking about at your
disposal, that might change the types of policies that you
can put in place, in terms of classification, and in terms
of classification, and in terms of getting material out into
the public, and providing that access to information. And
14
that understanding of that interplay between the technology
and the policy is really important, and something that we’re
trying to push forward. And then, the third thing that is
in the broader tech use and government space is thinking
about the engagement between the American people and their
government. And trying to understand ways in which we can
use technology to change the way that engagement happens for
the better. So to make it so that we can do more to a
conversation, so that we can ask more questions of the
American people, and have them give us answers that will
help us govern better. So that we can have people become
more engaged with their government, and make change within
their government in a more effective way. And so, there are
a bunch of people now working on that problem, and trying to
make it better. But it’s also something that the president
has been focused on since the beginning of his term, with
the launch of We the People, a campaign -- a way of allowing
ordinary citizens to bring questions to government, and get
answers.
So now, in terms of all of those different types of things
that we’re trying to do, I wanted to just bring you on a bit
of a tour through the people that are doing it, and the
organizations that they are working with. Because
15
sometimes, it’s a little bit hard to unpack that, and it is
useful in understanding how we think about it. So, it’s
everything from the US CIO, Tony Scott, who came from
VMWare, who is responsible for government technology
generally, and is working across different government
agencies with excellent CIOs and staff within agencies, to
work on cross government problems, and to bring the best
technology into government. It’s people like Mikey
Dickerson at USDS. Mikey came from Google, where he was an
SRE, and the SRE is software reliability engineers. Those
are the folks that make sure that a site like Google stays
up near 100% of the time. And so, he’s a great person to
bring into government, make sure our government services
have that same type of reliability. But that USDS focus, so
US Digital Services, and that word, service, is important on
a number of levels. First of all, recognizing that we’re
not -- no longer in government, releasing products. Or
we’re just releasing the product, and then it exists, great
people can use it, we can walk away and do something else.
But we’re really talking about services here. Things that
will last over time. And the need to be updated and
iterated on and maintained as services. It’s also thinking
about service as that word, you know, the thing that brought
me into government. The ability to have purpose make
16
impact. One of the things that Mikey is doing very
effectively is winning recruiting battles against much
better funded offers from Silicon Valley companies, because
he’s able to appeal to an engineer’s sense of purpose. And
there really is no better way to have an impact, have a
really deep impact, on individual Americans, than working
within the federal government. So Mikey is working at, and
really pushing that out within the US Digital Services. The
other thing that the US Digital Services has done,
especially this year, is move out into agencies. So,
there’s now a VA Digital Service, and we will have other
digital services within agencies over time. Those are
groups that are working within agencies, bringing the
excellent staff that we have already at agencies, to bring
some of this new style of doing work. So for example,
instead of putting a requirement out and then working over a
course of five years to be able to launch something that
becomes live to the public at the five year anniversary
mark. Trying to be quick, agile, and launch and iterate.
So being able to launch something, be able to develop it in
the open, and then get it out there over time. So that
there’s actually a better understanding of whether the
project’s going to be successful. And so that we can course
correct when we learn stuff in our implementation.
17
So that’s USDS. A companion piece to that is 18F. 18F is
within the GSA, General Services Administration, and 18F is
just a street address, 18th and F, it’s not some sort of top
secret thing. But, 18F has a bunch of coders, I think
they’re about 150, 170 strong, who are working on doing the
coding for services, for the federal government. And it’s
one of the great things about 18F is that as they encounter
problems, there’s often this issue in government where if
nobody’s done something, you don’t really want to be the
first to do the thing, because there’s a bunch of different
costs that might come with that. And you won’t get to
internalize all of the benefits. There’s this free riding
problem, lots of different people will be able to
internalize the benefits, but you get to bear all the cost.
And 18F has, as part of their mission, actually doing some
of those first projects, so that they can show by example
here’s a way to use GitHub, and from the very beginning,
develop a foyer project in an open way that people can
actually see what you’re coding in real time. And do some
of those experiments, and get them out there, but also to
produce running services and to improve the services that
government is offering. So that’s 18F. Another project
there is the Presidential Innovation Fellows. The way we
18
think of these is sort of as innovators, entrepreneurs, and
residents within agencies. These PIF classes, there have
now been three of them, I think we’re on our fourth. And
they basically come in as amazing people from all over
industry, academia, and nonprofit space, former government
people too, they come in, and then go back out to agencies
and try to stir things up a bit. And bring some of the --
those best in class processes and technologies, and back it
up to agencies. On an even more like operational level, we
brought in David Recordon. David was a Facebook engineer,
to be the Director of White House Information Technology.
The White House has the same problem as many agencies, in
terms of how do we modernize the technology that we use?
How do we make ourselves as effective as possible? So
making sure that we have people who are looking at that, and
who are best in breed.
And then finally, Jason Goldman, who was one of the founders
of Twitter, was brought in to lead the Office of Digital
Strategy, and he’s really leading that focus on engagement
with the American people. And doing that through the Office
of Digital Strategy, already done a bunch of things,
including launching the @POTUS Twitter account, which you
could see even over last weekend, there’s a level of
19
engagement and just personal response that is different from
what we were able to do before. So that’s a really hopeful
thing. We also have Todd Parker, former second US CTO, in
Silicon Valley, leading a recruiting effort. So, as I say,
we believe strongly that people are a major part of the
solution to these issues. So, bringing more and more
talented people within government is really what Todd’s all
about.
So with that, I’m going to jump to talking a little bit
about our open government work, and your open government
work. And I want to just point out, Cori Zarek, who is in
the audience, and should stand up so that she can be more
embarrassed. But Cori has been at the archives and is
(inaudible) to the team CTO, and is really leading our open
government efforts. And has an encyclopedic knowledge of
this stuff, and has been really pushing, and both Cori and
the National Archives have been real leaders when it comes
to making more information accessible to the public, and
getting it out there. So I just wanted to acknowledge and
thank that. So as you know, the open government initiative
was launched, for the president, he had a very busy first
day in office, was another one of the things that he
launched in his first day of office. We are working through
20
the open government directive, and getting the agencies on a
path to increase the amount of information, and the amount
of understanding that the public has for what government is
actually working on. It’s also something that has moved a
ton of data and information out into the public space where
other people, not the government, can produce everything
from the most trivial app to an important open government
monitor of something that we’re doing. There’s one that’s
out there that is top of my mind, which is just a thing that
shows when the different We the People petitions have been
answered, and holds us accountable to not answering the ones
that have been out there for a long time. So, all -- it’s
everything from the stuff that we would never have imagined,
that makes a huge difference in peoples’ lives, and at the
most grand scale, this is the -- NOAA releases a ton of data
that is used in all the different weather apps that are out
there. They’re very important, but making sure that we do a
lot more of that. And then another piece of this work is
working with the open government partnership, which is a 65
country initiative that brings government and civil society
together across national boundaries, and making sure that
the United States continues to be a leader in open
government over time.
21
And then finally, the National Action Plans. We’re in the
process of formulating our third National Action Plan, the
previous two National Action Plans have been very
successful, including the formation of the declassification
board as one of the recommendations in the second National
Action Plan, I think. Am I getting that right, [Cori?]?
I’m getting it wrong.
F: Classification (inaudible).
MACGILLIVRAY: Classification and (inaudible) committee. Sorry
about that. See, this is the great thing about having Cori
actually in the audience. But there is always more to be
done within this space, and so one of the things that Cori
has been working really hard on is the National Action Plan
3.0, and we would be interested in hearing any suggestions
that people have for inclusion in that National Action Plan.
And making sure that we’re pushing as much as we can towards
continuing to make government more open and more responsive
to people. So with that, I will sit down, because I’m
really excited to hear about the technology that’s coming
up. And just thank you all for letting me speak.
(applause)
SODERBERG: Well thank you very much for that great summary.
It’s really extraordinary how government is changing. We’re
still behind the private sector, but catching up rapidly.
22
And I think we’re going to all benefit from the initiatives
that you’re leading at the White House, and we look forward
to continuing the conversation. Our next speaker is Dr.
Cheryl Martin. Research scientist and director of the
Center for Content Learning at the Applied Research
Laboratories, located at the University of Texas. Dr.
Martin’s areas of expertise and list of accomplishments are
vast, and through her work at the Applied Research Lab,
she’s applied data mining detection, inference technologies,
to information assurance problems, including document
declassification and the board of the PIDB had a chance to
travel down, I guess it was last fall, to visit UT and saw
first-hand this revolutionary technology. And I think I
speak for all of the board members when I say that we were
deeply impressed with it. And this morning, Dr. Martin will
share with us how she’s using this technology as semantic
knowledge models, natural language processing, expert
systems, and machine learning to categorize and label text.
And her recent work has been successful in automatically
determining whether documents contain sensitive information
that must be protected. And the pilot she’s conducted in
partnership with the National Archives and the Central
Intelligence Agency has significant impact for
declassification and other information management
23
activities. And as our reports have documented, it’s only
through technologies such as this that we are going to be
able to manage the vast amount of information that is now in
the government, it’s simply not sustainable in the two eyes
looking at every page system. So, in order to have the
public have access to what its government does, it has to be
automized, and Dr. Martin has figured out a very effective
way of doing it. As far as we can tell, this technology is
the only one that has the level of sophistication operating
for the sole purpose of modernizing declassification and
classification. And we’re concerned that right now there’s
no plan to take it forward, and so we really hope that we
can find a way that you can continue, and even expand on
this important work. So thank you for coming, and let me
invite you up to talk about your exciting project.
(applause)
MARTIN: Thank you. I’d like to thank the board for inviting me
to speak today. It’s a real honor to be here. In this
presentation, I will first highlight some efforts under the
president’s National Action Plan, and I will introduce the
role that the Center for Content Understanding has in this
work. Then, I will define the field of content
understanding, and I will describe our approach for
sensitive content identification, and marking, to provide
24
decision support for classification and declassification.
The next thing I’ll do, finally, is walk through some of the
pilot projects we’ve been working on in this area, and I
will discuss specific progress that we’ve made with the
Reagan email collection. Do we need to work on some
logistics before we continue? Can people see the --
F: (inaudible).
MARTIN: OK. (inaudible). Except, eh... This is all planned,
this part. OK. Everyone has that handout, OK. Excellent.
So, one of the commitments that is included in the Obama
administration’s second open government National Action Plan
is the quoted item here, which was to pilot technology to
analyze (inaudible) presidential records. It specifically
calls out application to email records from the Reagan
administration, and it identifies the Central Intelligence
Agency and NARA as the responsible agencies. These agencies
brought in our research organization, the Center for Content
Understanding, to help with this work. The Center for
Content Understanding is part of the Applied Research
Laboratories at the University of Texas at Austin. ARL is
established as a university-affiliated research center, or a
UARC, and we’re formerly associated with the Navy, but we
work with organizations throughout the government. All
UARCs have defined, as part of their charter, a set of core
25
competencies which are identified as the central
capabilities for the US government. And in 2012, ARL, one
of ARL’s core competencies, was identified as content
understanding, based on a growing body of work in that area
that we had accomplished. And the Center for Content
Understanding was formally established at that time. So,
what is content understanding? The dictionary definitions
would indicate that it’s comprehension of something
contained. In the field of content understanding, the
containers are artifacts that people create as a part of
their work, or their daily lives. And the content in these
is the information that’s encoded. We determine whether
this information is understood by assessing actions that are
taken on the information. So, when a person observes an
artifact -- let me get the -- when a person observes an
artifact, we hypothesize that they combine things that they
already know about the world with the information in the
artifact, and create some meaning. But, even for a person,
we can’t directly observe that meaning that’s inside their
head, and tell if it’s right or wrong. So we rely on tests
of what they do with that information to assess
understanding. So, actions can be taken that demonstrate
understanding, and if people perform well on these, such as
26
making a correct decision on a multiple choice test, then we
say that that’s sufficient evidence of understanding.
In the same way, we assess whether a computer has content
understanding by looking at the actions it takes. So if the
computer can observe an artifact, and demonstrate
appropriate inferences, then we consider that as content
understanding. The main point of all this is that in
content understanding, we’re primarily concerned with having
computers do helpful things with artifacts like documents.
Which brings us to the application of decision support as it
applies to classification and declassification. So, we’re
faced with an exponentially growing volume of records, and
each of these must be initially classified, managed, and
ultimately reviewed for release. Manual efforts to perform
these functions are becoming overwhelmed, and technology can
help people perform these functions. Specifically,
automation can help humans work more efficiently by drawing
their attention to critical questions, and highlighting
items that it would take people a long time to scan for in
documents. It can also make humans more effective by
bringing to bear external information such as list of names,
or projects that not every human has memorized. And it can
do this in a wide variety of topics across a number of
27
organizational equities at the same time. So that review-
critical information can be recognized and identified as
quickly as possible. So computers excel at time-consuming,
onerous tasks that people don’t like, and don’t necessarily
excel at. And they produce very consistent results. And
this allows humans to do the things they enjoy more, and the
things they’re better at, such as making complex review
judgments. The decision support technology that we’re
developing right now is targeted to identify all the
information in a document that’s relevant to a
classification or a review decision, and highlight this
information, and organize it for consideration by the human
reviewers. So this is an initial model that’s only targeted
toward decision support. Under this model, we would still
need the same human review staff, but we would need far less
humans per document, which would address the volume problem.
The approach that we use for decision support is based on
marking up a document to indicate where the sensitive
information resides. The name of our approach is SCIM, for
sensitive content identification and marking. And it
essentially skims a document to identify all the rules or
categories that apply to the document. It not only
identifies the conclusions corresponding to these rules and
28
categories, but it also identifies the text from the
document that support those rules. So, for example, in this
document, rule one is identified as applied, it not only
says the rule applies, but it identifies the highlighted
yellow text from the document as support for this rule. So
this allows SCIM to provide a rationale for why it says that
rule applies. SCIM uses a combination of technologies to
achieve this goal. Natural language processing, or NLP, is
used to extract information from the document and put it
into a machine processable data format. In the process of
doing that, it extracts entities and events and
relationships from the document. Expert systems technology
is then used to apply if/then rules to the information
extracted, and determine whether it is sensitive or not.
Machine learning can also be applied, if you have a set of
documents that are known to contain sensitive information
that you are interested in finding, then it can build a
model of those, and then identifying new documents that are
similar. And what ties all this together is the common
semantic knowledge representation that allows us to encode
background information and make inferences. There are a
number of organizations that do good work in this area. And
what is unique about this SCIM approach is that we combine
29
all these technologies together and we specifically
configure it to identify information of interest.
So here are some examples, which may be a little difficult
to read from the back of the room, but I’ll point out the
critical areas. Here are some examples of sensitive content
that SCIM can identify. In this example, the information
that’s deemed sensitive is for demonstration purposes,
identified to be any discussions of a seismic event in Asia.
So we clearly want to see and review the document on the
left, which talks about an earthquake in China, but we’re
not so much interested in recipes for earthquake cakes on
the right. So, the concept, being able to distinguish
between the concept of earthquake as a seismic event, and
the word earthquake, is the key. And most tools that
reviewers have to use are focused on text string searches
that don’t distinguish between these two instances of
earthquake. So, using NLP technology, we are able -- we’d
be able to pick up on the word earthquake only when it means
the seismic event. This approach of identifying the concept
also allows us to pick up the bottom example where the word
quake is used to reference a seismic event, but this would
be missed in a text string search for earthquake. So, we’re
able to find the concepts, we’re also able to identify
30
specific cases where the concepts are sensitive. So in this
example, if the earthquake occurred in Europe, that would
not be deemed sensitive under this configuration, even
though the correct earthquake concept is discussed.
So over the years, we’ve applied the SCIM approach in three
major types of configurations, which I’ll talk about in
detail for the remainder of the presentation. The first
application was a proof of concept to help people determine
what the portion marking for classification would be to
apply to a paragraph in a new document. In this case, the
sensitive information that we identified was something that
would relate to information associated with a rule in a
classification guide. The second application was to support
quality assurance review for declassification, and in this
case, the sensitive information was things that reviewers
had identified as things they wanted to take another look at
in this QA process. And the third application is underway
now, and this is targeted toward identifying equity
information across multiple agencies in the government,
where the sensitive information in this case is identified
by each government agency as the equity or the information
that they deem maybe in need of protection. Each of these
three pilot efforts use the exact same reasoning (inaudible)
31
back end. Really, the only difference among the
applications is how it’s presented to the user, and
configuration for what is considered sensitive. This
diagram visualizes that similarity. SCIM is really designed
as a service. So it takes in the information from a
document, and it provides the marked up information back.
So if you can get text to it, then it can provide this
information that then can be used in the user’s normal
workflow on a user interface or for a sorting algorithm to
help the user do their job. I’ll discuss specific examples
of this types of user interfaces and configuration as I walk
through the pilot projects. The first pilot project was
designed to support portion marking as a decision support.
In this case, the user interface was a document authoring
tool, like a word processor, and each paragraph was
processed by SCIM, and SCIM would suggest all the rules that
applied from an encoded set of derivative classification
guides. As typical in the SCIM approach, not only would the
suggested classification be identified, but it would also
identify which rule from the classification applied, as well
as why -- the rationale for why it applied from the text.
These were presented to the user, and they could accept or
override this suggestion, and once the selection was made,
the user interface would apply the selective portion mark.
32
In this particular application, we didn’t just present
information to the user, and allow them to select, there was
also a direct feedback to the SCIM service that will allow
users to define or clarify terms on the fly and make
suggestions for improving SCIM performance in the future.
So we learned a number of things from this initial pilot.
First of all, we did achieve extremely high accuracy on the
test cases that we used. Since this was such a small set of
test cases, we couldn’t claim this performance in general,
but it was a highly successful proof of the concept. We
also ran into some challenges. First of all, identifying
what the right answer was that the computer should provide
was actually fairly difficult. We had the test documents,
and they had the portion marks in them, which described what
the classification level was. But we needed to know more
than the classification level. We needed to know why, you
know, which rule from which guide makes this classified, and
where’s the text that says that rule applies? So, we went
to the subject matter experts and said please tell us these
things. And this would have been fine, except subject
matter experts know how to classify, but like most people
who know how to do things, they just know. And when you ask
them to explain how exactly they know, you know, people find
33
that difficult to do. And there’s also some, you know,
debate amongst the subject matter experts about what
specifically was the rationale for making these
classification decisions? So, this leads to the next lesson
that we learned, which is that since classification guidance
is written to be interpreted by humans, it often lacks the
specificity and the precision that a computer needs to make
a determination. Finally, the thing that really shifted --
or kind of brought this to a close is that we were
ultimately not able to justify access to additional test
data. The test data that we had used was from publications
that were classified, like journal articles or newsletters.
And the need to know issues had kind of already been
resolved by that publication. But ultimately, we weren’t
able to justify a broader access to classified documents,
just to do this research. So then we turned to
declassification, because in that application, the primary
mission is the review, and the access, the need for the
access to documents was clear.
So, in the second application, we provided a decision
support, we provided decision support for a quality
assurance process for declassification. Once manual review
is complete in this process, the documents are sent to the
34
SCIM service, which marks up the sensitive information that
warrants another look in the quality assurance process. In
this particular quality assurance process, the document
selected for review in the quality assurance phase are not a
random selection. They were all the documents that contain
this particular sensitive information, that they wanted to
double check. Before the SCIM application was deployed, the
way they were selecting these documents was to use a list of
dirty words and they would select a page if it contained any
of those dirty words. We were able to take the SCIM output
and feed it into a user interface that the reviewers had
previously already been using for review, that was the dirty
word user interface. So, if a document -- if earthquake was
one of the words on the old dirty word list, and we found
earthquake in one of the highlighted areas of support in the
document, then we present that page to the user. But if the
document talked about earthquake cakes, and earthquake
wasn’t in any of the highlighted sensitive content, then we
wouldn’t present that to the user. We were also able to
update the user interface to also highlight the important
context information that we found when we were -- when these
rules fired. And that sped up the decision process for the
reviewers. They qualitatively felt like this feature made
35
them much more efficient in the review for the documents
that were presented to them.
In terms of quantifying the efficiency accruements that we
achieved in terms of page selection, we ran a test
comparison between how the dirty word selection did versus
the SCIM selection on a set of about 160,000 test pages. So
we define the ideal performance, or (inaudible) here, where
out of all the 160,000 pages, only about 8,000 of them
contained this information that they wanted to subject to
the quality assurance process. When we ran the dirty word
list against these pages, it selected a huge number of pages
for quality assurance review. That’s almost two thirds of
the pages that were in the collection overall. But the good
news is that the dirty word selection did select the ones
that they wanted to see, as shown by that green area that’s
still there. The bad news is that it provided a ton of
extra work for the reviewers looking at earthquake cakes
when all they wanted to see were seismic events called
earthquakes. So, also the dirty word list missed a few,
very small, less than 200 pages. And that’s instances where
alternate terminology like quake was being used. So, our
goal in this applications was to decrease this white area
significantly, give the reviewers less work, not miss any of
36
the -- keep the green area at least the same, and if
possible, reduce that red area as well. And that’s
essentially what we were able to do. Specifically, we
significantly reduced the unnecessary work that the
reviewers had to do, by reducing the false positives. We
also didn’t miss any of the previously correct pages, and we
were able to identify alternate terminology and pick up some
of the pages that the dirty word list was previously
missing. So we kept the green area the same, and we found
90%, 6%, of the previously missed pages. So overall, we
were really happy with this effort, and the reviewers seem
to like it, too. This work -- this is the work that we
extended to apply to the Reagan emails. The concept that
we’re working toward for the presidential emails is for
equity ID, to provide the ability to identify multiple
agencies equity in a collection of documents at the same
time. Presidential records are likely to contain equities
from multiple organizations, and individual documents within
those are also likely to mix information from multiple
agencies. If we can identify those equities accurately with
automation, we can potentially make the referral process
better and faster. And once the sensitive information is
identified, this could be passed along to the individual
agencies and help speed up their review process, as well.
37
We had to do some initial work on the emails before we could
begin testing with it for equity ID. The emails came from
backup tapes that were preserved at the end of the Reagan
administration from an email system called PROFS, and a set
of about 80,000 emails were preserved and restored.
Unfortunately, the email format for these restored emails
was very difficult to read. They’re -- the emails were
linked together in one long bit stream that was very
difficult to tell where one email started and another email
began. So the initial part of our effort with these emails
was to convert them into a usable data format, and identify
the threading relationships among the emails. We ultimately
created human readable image formats of these emails that we
could them use to put through the formal review process that
currently exists. We completed all of those processing
tasks, and delivered the emails back to NARA, and at that
point, we could apply SCIM to test out the equity ID
concept. We were able to demonstrate that proof of concept
to the board last September, and the initial results were
encouraging. We qualitatively, the sensitive information
that we identified did seem to correctly pick out things
that warranted referral. The formal review of the emails is
currently underway. And this manual review will provide us
38
ground truth that we can, for which emails contain
(inaudible) so that we can assess the performance of the
SCIM tool. While this is ongoing, we’re working with the
subject matter experts to improve the SCIM tool’s
performance, and add additional equities to the coverage.
This fall, we will take the ground truth that the reviewers
have produced to that point, and we will quantify and
validate the SCIM’s performance. By the end of the year, we
hope to document and quantify success for this concept. And
that will wrap up our efforts.
At this point, I’d like to credit my wonderful staff of
researchers and software engineers who make all this
technology work, and acknowledge the reviewers who have
helped us out, as well as the organizations who funded and
supported this work over the years. And I’d like to thank
the board for advocating technology to help with this
important classification and declassification decisions, and
having me here to speak today. Thank you all for listening,
and I hope that you find this work encouraging.
(applause)
SODERBERG: Well thank you very much, Dr. Martin. We will
have time for questions at the end of the discussion, if
those of you want to address something directly to Dr.
39
Martin. And as I said, we’re very impressed with these
breakthroughs, and we really want to see a continued
partnership between the CCU and the agencies involved, so
that this great work is carried on and filters through the
rest of the challenges on the declassification in
particular. We’re now going to hear from the PIDB board
members and emeritus members. If I could invite everyone up
here, we’ll hear from various members who would like to
comment, and I think we’re going to actually start with
Laura. So I’d invite all the board members up to
(inaudible). Thanks. Thank you very much. Yeah,
(inaudible). It doesn’t matter at this point. Go ahead,
sit down.
M: Thank you.
DEBONIS: Good morning, everyone. Is this on? Hello, can you
hear me? OK. Hi, I just wanted to say a couple of things.
First of all, it’s a true privilege to serve on this board,
and I’m honored by the trust the president and his
administration have put in me by appointing me to it. I’d
also like to say thank you to Cheryl and A-Mac, and for your
really interesting presentations this morning, that was
incredibly informative and so interesting. I would like to
take this opportunity to say a heartfelt thanks to my fellow
board members, and the PIDB staff for their warm welcome.
40
Everyone has been so helpful to my process of getting up to
speed, and I look forward to working with each of you. I
just have a couple of remarks, and then I’ll pass it onto
Sol. As we start our work together, I’m hopeful that my
professional background in technology and information
businesses will prove helpful to our areas of concern and
focus. In particular, I look forward to bringing the
benefit of my experiences, particularly my time at Google on
the book search, and with the libraries that participated in
that project, as well as my work subsequently on the digital
public library of America. On a personal note, issues of
information management and questions of access and usability
of information have interested me for a long time. Like
many of the people in this room. I grew up in a small town
in southern New Hampshire, and haunted my local public
library as a kid. The access that little library and its
devoted librarians provided me to a wide range of books and
information is central to who I am, and what I’ve been able
to do in my life. As I get more up to speed on the work of
the -- I think you were saying it PDIB, not PIDB, PDIB, I
hope to contribute meaningfully, particularly in the area of
technology and technology applications. We had a very
useful meeting of the technology working group last work,
with a broad range of agency participants. It was exciting
41
to see how much work is already being done, and I look
forward to future meetings of the working group. Serving on
this board is a very exciting opportunity, and I look
forward to working with the broader community of agencies
and public interest groups that are engaged with these
critical issues. I feel tremendously privileged to be
working for and on behalf of the public as a member of this
board. Thank you very much.
WATSON: Hello, I’m Solomon Watson. And as a first step, I’d
like to thank President Obama for appointing me to this very
important board. It’s a privilege and honor to serve the
country in this area of declassification and classification.
Nothing is more important to the country than maintaining
its national security. Equally important is ensuring that
our members of the public understand how the government
operates in that area. Along with Laura, I’d like to thank
the members of the board and the information security
oversight office for welcoming us aboard. My general
interest in government operations and national security came
into focus in 1971, when the New York Times and later the
Washington Post published, over government objections, the
so-called Pentagon Papers case. You may recall that those
papers came about as a study authorized by Robert McNamara.
Those papers indicated more than the national security
42
interest, a historical interest, and involvement of our
government in the affairs of Vietnam, including the
political affairs, and the -- obviously the conduct of the
Vietnam War. The publication of the papers resulted in a
great decision for freedom of the press, and the public’s
right to know. They also increased a growing skepticism
about allegations from the government using national
security as a defense, or a classification modality, to hide
political decisions. I spent most of my career as a lawyer
in The New York Times Company legal department, and there,
one of our primary obligations was to give legal advice to
members of the newsroom when they requested it, on legal
implications of publications of stories, frequently stories
that came about as a result of leaks of classified
information to our newsroom. As a member of the board for
three months, this is my first public meeting, and I must
say, I’m very excited about it. It’s been an exciting and
successful meeting. It appears to me from my other
meetings, executive meetings, and this public meeting, that
there’s a widely held recognition of the need and the
willingness to go forward in this area of classification,
and declassification. Well, there are a number of
challenges, including particularly on the technology side,
it appears to me that there’s a collaborative and collegial
43
effort among the stakeholders, including the intelligence
community, NGOs, and citizens generally, to make great
progress and I think the board has shown that as a convener
of communities, and inspirational organization, that it has
a continuing and important role to play. I’m certain that I
will contribute my efforts as a citizen interested in public
information, and as a former executive of The New York Times
Company to those efforts. Thank you.
SKAGGS: Good morning. David Skaggs, I’m, as the recovering
politician on the board, I’m authorized to both be
pretentious, and to engage in awkward metaphors, so bear
with me. But, Marty and Bill and I were onboard, so to
speak, from the get-go, and are now transitioning into
emeritus status, but thankfully are able to come back and
kibbutz a bit on the work of the board, and I hope continue
to make contributions, but it’s been a great privilege for
me to serve on this board for however many years it’s been
now since we got stood up. I think I’m here because of my
time on the intelligence committee in the House, serving
with then regular member Nancy Pelosi. And making a
somewhat questionable reputation as having a fetish about
overclassification. So it was interesting to probe during
closed hearings about the sources and rationale for
classification decisions that were in documents presented to
44
the committee. And so, I’ve been paying the price for that
role now for many years. But enjoying it all -- but the
pretentiousness comes just from, you know, it’s so easy to
sort of lapse into dealing with the grassroots of the
business of government, and losing track of, you know, what
we’re all about, and particularly as a representative of the
legislative branch of the government, if you will, the
uniqueness of American political philosophy, and its
origins, and maybe still, as a system in which the people
are the sovereign. And we are all accountable to and need
to bear in mind our accountability to the public, and that
can only happen if it has access to the information that its
government develops in its name, as much as we can possibly
effectuate. So, that’s -- you know, I sort of see this
board as in a critical role in that fundamental
responsibility of the democracy. And so, it’s -- you know,
you get into the nuts and bolts of classification decisions
and sometimes forget about that. So I -- and it’s so
important for us to have these regular public meetings to
remind us and you that that’s what this is about. The
awkward metaphor, which I won’t dwell on too long, because
it just occurred to me this morning, is that classified
information is sort of the cholesterol of the government
vascular system. And this board is trying to do stents, and
45
get rid of plaque, and get the system flowing well. Bill
thought we could talk about a different organic system of
the human body that would be less pleasant, but we won’t go
there.
LEARY: Former military (inaudible).
WATSON: Right, right, right. So, we’re hoping that the stents
will avoid the need for, you know, triple bypass for
government. Finally, one of the things that’s been a happy
occasion for me in this job, in the executive branch of
government, is to be reminded about the faithful and
extraordinarily diligent service of civil servants who are
often derided by my former colleagues on the Hill. But do
the work of this nation day in and day out. So this is a
callout for John Powers, who’s in the audience, and who was
on the staff of [ISU?] for many years, and helped this
organization do its work. So John, stand up. And accept
our thanks.
(applause)
WATSON: And you’re invited to lunch. (laughter)
STUDEMAN: Can you hear? I’m Bill Studeman, I’m one of the new
emeritus individuals going off the board after nine years as
a Congressional appointee. So I’ve been on the board for a
very long period of time, through the three major reports,
and as part of my departure homework assignment, carrying
46
over into the emeritus environment, I’ve been asked to chair
the technology working group, which Laura referred to
before. And we’ve actually already had one meeting. And so
I thought I’d talk a little bit about some of the philosophy
behind that, and then where we think the technology working
group will be going. I’m sort of a subscriber of the notion
of management by nagging, and I think pretty much what the
PIDB does is it does nagging and facilitation to try to get
the government to move in the right policy direction. And
of course, now, as we move into the digital era, this
technology underpinning, which A-Mac talked about this
morning, is really critical to this entire future of
managing classified records. The old system that we’ve had,
which we’ve written about in reports, is not a sustainable
system, in my own personal view. And we’ve said that. The
volume, the veracity, the velocity, the nature and character
of digital records, is going to be -- present to the
declassifiers a dramatically different world. We entered
that world actually 25 years ago, I was the director of NSA
when we fought Desert Storm. That’s 24 and a half years ago
coming up now, the declassification period. We fought that
war in a digital way, so it was really the first digital
war. That said, the permanent records are analog, and so
they obviously had to be converted from digital form to text
47
record form, and the irony of that is, of course, we’ve now
specified going back for emails in 2016 to digital format,
and then after that, fully digital. So the records in this
period of time will go from digital to analog, and
ultimately back to digital, at no small cost to the process.
So this is an exciting period of having to look at this last
25 years, and the implications of it, and also in that
period of time, of course, the Information Age exploded, and
the media on which these records were kept have all aged,
and disappeared, and there’s a whole series of issues about
even finding the records from that period of time, if they
weren’t put to paper. So, I think that my message is that
this technology working group is going to work in several
areas. One is obviously, the issue we talked about this
morning, the search for applicable tools and technology that
can help with the whole panoply of issues associated with
managing classified information and records. So this is not
just declassification tools, but this is understanding the
architectures, and the environments in which these records
are kept. And of course, as you’re aware, virtually
everybody’s going to the cloud, the cloud will probably be
the dominant architecture for the future, it offers real
opportunities for storage, for large application stores, and
search techniques, and other aids to dealing with
48
information in a big data environment that we’ve never had
before. So, we will be looking at where those tools might
be. That means that we’re going to have to go probably out
to universities beyond the national -- the laboratory
project that’s going on right now, and also go onto the
public/private side to the information masters in Silicon
Valley and elsewhere, who have the technology that can help
us along in this area. And our job is really to try to
ensure that the government agencies who have classified
holdings are in fact paying attention to this, and sort of
the nagging mode, facilitating mode that is the way of life
in the PIDB.
The next thing we need to do is to track where these
agencies actually are in the rollout of their own future
architectures. So, we have -- in this first technology
working group meeting, we went into the intel community and
had a deep dive into the ICITE project, the intelligence
community IT enterprise upgrade. We did a deep dive on the
archives electronic records program, we’re looking for
convergence and divergence, and trying to facilitate
understanding on the part of everybody, about where
everybody else is going. We had a large contingent of OSD
people there, representing 42 agencies, departments,
49
services, etc., in the Department of Defense, that hold
classified records. They remind us that 75% of the
classified records are in the Department of Defense, and
there are plenty of issues over there, including there
issues on (inaudible) between OSD and the Department of
Energy, on RDFRD. So there’s a whole series of important
issues that we hope to hear more from them for the future
about where the Department of Defense is going. Of course,
you recognize that these new architectures that are coming
out are coming out essentially as a sort of a chapeau on top
of hundreds and hundreds of systems that lie underneath
them. And that’s really where a lot of these records are or
are going to be. So, there’s some issue about resolving all
that. Yeah. So, three different things. The search for
technology, looking at the existing architectures, and
trying to help with some facilitation around with the
holders, we’ll look at state next, energy, etc. Try to get
them involved. And then, look at the state of records and
the issues that are associated with that. And as you know
from our earlier studies, as we try to move from the as-is
to the to-be, for which there needs to be some kind of
strategy, overarching strategy, which is reflected in
policy, we need to be careful about divining some core
principles for declassification, and identifying the
50
specific issues that exist in that transition from
essentially the analog the future digital era, where we can
have a lot of support. And then, we can move into some of
the objectives areas where we’re looking at things like
early declassification, inside the 25 year point. And so,
there should be an exciting time. I was struck, as the
presentation was being given on the CCU, that the people who
do declassification can no longer be just policy and
sociological declassifiers. Without the technology people
to work to software, provide all the required
implementations that deal with all of that, there will not
be a future in declassification. So, you have to have the
technology people added to the people who are doing the job
right now, who can do all that configuration management, and
all the other kinds of things that are going to be required.
So this is a significant challenge.
One final thing I would say also is this is all being done
in a down budget environment. And the down budget
environment means that we have to organize the collective of
intelligence declassifiers, a collective of defense
classifiers, etc., into a more common kind of framework, so
that there’s information sharing, really understanding
across everybody about what everybody else is doing. So, we
51
had this huge task of trying to ride the wave of IT for the
future, which offers all this promise. Organize for
success, so we can get some economy of scale out of the
whole, keeping in mind that when I came on this board nine
years ago, we were just introducing declassifiers in the
intelligence community to each other. So, I think we’ve
actually come a long way in that period of time. And so,
the challenge ahead for the board is to sustain and
accelerate around the objectives for this technology working
group, which I think is going to be the core part of our
effort, as technology relates to the policy for the future.
So thank you very much.
FAGA: I’m Marty Faga. As was said, I joined the board at its
inception in 2006, because Bill Leary appointed me with the
support of the president. And I think he did that because
I’m a person who actually declassified something, which was
the existence of the NRO that I announced publicly in 1992,
after a classified existence of 31 years, a few years of
which it was actually secret. (laughter) A point which I
was able to convince the DCI, Secretary of Defense, and
ultimately the president, in 1992. I’ll observe that 23
years later, there are people in the NRO who still criticize
me for that. I’ve always been interested in
declassification because I served in the 1980s, on the House
52
Intelligence Committee. And sat on the sideline in the
staff seats, as all the contemporary history of intelligence
was being presented to the committee, year after year after
year, all being carefully recorded, verbatim, in a
classified record, and thinking, what an incredible story
for the American people to hear at an appropriate moment of
declassification. You know, virtually the whole history of
not only intelligence, but all that intelligence learned
about foreign affairs and military affairs. As a
technologist, I’ve always been interested in the kind of
work that Dr. Martin is doing. And in this digital age,
understand that it will be imperative to doing
classification and declassification, and in fact,
intelligence analysis in this age. We’ve been pushing this
for a long time, one of the concerns that public interest
groups have expressed is that we were going to go to total
automation, and the human brain, human decision making,
would no longer be involved. You said it very well, it’s a
decision aid, a decision support aid. I saw some early work
in this almost 30 years ago in map making, which stuck with
me as it brought together technology and a skilled analyst,
that made that analyst vastly more productive, and increased
the interest content of the work that she was doing. Thank
you very much.
53
STUDEMAN: Just a couple of brief comments about the CCU project
that you heard about this morning. Which impressed all of
us enormously when we got a much more detailed briefing from
Cheryl in Texas. I think certainly, we’re convinced. I
think the CIA is convinced that this concept has been
proved. It works. And what is most striking about the work
they have done is that they have shown that not only do
these approaches, these techniques, these technologies,
improve the efficiency of declassification reviewers and
classifiers, they improve the quality of their work, just as
much as they improve the efficiency of their work. You get
better results. So our great hope, concern, is to make sure
that these proved concepts get applied and used, as soon as
possible. It’ll be a big leap for most of the
classification community to trust the computer to make the
right decision. We’re going to have to get to that, and the
sooner, the better. One final point, I think the public
interest groups and the audience ought to be as impressed as
we are with these potential tools, these real tools,
potentially applicable. And I hope they’ll use their
influence whatever way they can to ensure that the funding
for this project continues. As we will try to do as well.
SODERBERG: We’re going to open it up for public comments in a
moment. I want to just echo what Admiral Studeman said on
54
the technology committee that he’s driving. I think it’s
going to do more than just put pressure on things, but to
really open people’s eyes to the possibilities, and as Bill
Leary said, the only way forward in addressing the issue of
classification and declassification is technology. When I
started looking at this some time ago, I thought we’d have
to convince declassifiers to be less risk-averse, and take
technology. It’s the other way around, it’s less risky to
use technology, because humans make mistakes, as Cheryl
said, machines can do this ad infinitum, and we get tired
and make mistakes. So, it actually is more accurate, less
risk-averse, more efficient, and cheaper. And it’s the wave
of the future. So again, thank you for being ahead of the
curve on all of this. Bill Leary and I wanted to just take
a moment to recognize our fellow members, and emeritus
members, and then we’ll open it up for questions. So Bill
will lead that off.
LEARY: I’ll start by asking Laura and Sol to come get your
presidential commissions. I talked earlier about the
sterling qualifications that they both have for this job.
Oh, one thing in particular you both have that the rest of
us, almost completely do not have, that is, they made their
careers outside of government. Not inside. So you will
bring a very useful perspective, in addition to all your
55
expertise to this undertaking. These are your commissions,
signed by the president, for this important undertaking, we
look forward to working with you. Thank you for being with
us to do it.
(applause)
SODERBERG: I was going to embarrass John Powers before David
Skaggs did it, but I have to add a commendation to John.
John has actually left the archives, but has gone over to a
more powerful position where he can continue to help us, as
director of access management at the National Security
Staff, working with John Ficklin, who’s hiding over there in
a chair. John, maybe a wave, too. But really John, thank
-- all of us, we’re going to embarrass you a little bit
further this afternoon, but thank you. You leave a big hole
over here, but I know your heart’s moved into a bigger place
to help us even more. So, thank you for that. I wanted to
just take a minute and acknowledge and recognize the service
of Bill Studeman, David Skaggs, and Marty Faga, who’ve
really been the heart and soul of the PIDB, and you’ve each
completed your third and final members term, and this is our
last public meeting with you, so we wanted to just take a
minute and commend your work, and give you a little present.
All three of you became members in the early 2000s, as
you’ve mentioned, when the board was in its infancy, and I
56
think you’ve helped shape it and define it into the force
that it is today. You’ve helped write all the reports and
recommendations to date, and as one of the newer members to
this board, your guidance has been extraordinary. And each
of the reports that we’ve put out reflects their heart and
soul, your passion and advocacy for transparency,
responsible declassification, and your thought provoking
comments, has really, I think, made an indelible mark on our
efforts to change government. You spent many, many hours of
dedicated service, not only on this board, but in other
public service roles and jobs, and leave a lasting mark on
this country. In addition to his years in the Navy, Admiral
Studeman was, where I first met you actually, as the
director of the National Security Agent, and director --
deputy director of the CIA, as well as an acting director
for a little bit. David Skaggs served as -- 12 years in the
House of Representatives, from Colorado’s second district,
where his heart always is. And he was six years on the
House Permanent Select Committee on Intelligence, so has
been an invaluable voice of wisdom, reason, and prodding on
many occasions that I think we’ve needed. And of course,
Marty Faga was the 10th director of the National
Reconnaissance Office, which we can now, as he mentioned,
publicly talk about. And I think that set the stage for
57
putting out the public’s right to know that the NRO existed.
So, these are just a small way of saying thank you, but we’d
like to have you come, and we just have a small gift for
you. And with the assistance of the archivist, who I guess
just left, (laughter) we’ve made reproductions of the seven
samples of secret ink, that you can take with you. And this
is one of my favorite things. This report is dated October
30, 1917, and it was classified as confidential for many
years. And it details description of secret writing
techniques. For instance, in April 2011, the CIA finally
declassified this information and made it public, and so
this is thought to be the oldest classified record held by
the government, and it was created in 1917. We thought that
this would be an appropriate remembrance of your time here.
And with that, I just wanted to thank each and every one of
you for your dedicated service, and here’s a little gift for
you.
(applause)
M: (inaudible).
M: I don’t know whether any of you have seen the junk flying
around town, an email about -- well, (inaudible) CIA has for
its important points on its documents. It’s a black
highlighter, so it’s (inaudible). (laughter)
58
SODERBERG: OK. Now we’ve finally gotten to the reason that
you’re all here, is for the public comments. We’d like very
much to hear from all of you. We have about a half an hour
to take questions. You can direct them at any of us sitting
up here, Dr. Martin, and I will cut you off if you turn it
into a speech, so thank you.
F: (inaudible).
AFTERGOOD: Hi, I’m Steve Aftergood with Federation of
American Scientists. I wanted to caution against putting
the technology cart ahead of the policy horse. I was struck
by Dr. Martin’s remark that the classification guidance that
she receives is often lacking in specificity and clarity of
the kind that’s needed for the computer. I think it’s also
lacking in the clarity that’s needed for human classifiers,
and it accounts for much of the overclassification that
takes place. I don’t want to identify a problem without
proposing a solution, and I think there is potential
solution in the upcoming fundamental classification guidance
review that is already required by executive order. Under
the terms of that review, every classification instruction
in every one of the thousands of classification guides
throughout government is supposed to be reviewed by
executive branch agencies. And I would suggest that this is
an opportunity to refine that classification guidance and to
59
give it the kind of clarity and specificity that Dr. Martin
needs, and that the rest of us expect. If we have vague,
confusing guidance, of the kind that I think we do have
today, then automating its application is just going to
create chaos. So, Admiral Studeman talked about nagging,
and Bill Leary suggested agenda items for public interest
groups. I would like to suggest an agenda item for the
PIDB, that you do some nagging about the most effective
possible implementation of this upcoming fundamental
classification guidance review. Thanks.
STUDEMAN: Yeah, good point. Just -- this is (inaudible) your
remarks highlight.
SODERBERG: Here, use the microphone.
STUDEMAN: Of the way in which the inexorable move to greater use
of technology will force agencies to refine their
classification guidance, because as you say, it won’t work
unless that happens.
CONNELLY: My name is Matthew Connelly, I’m at Columbia
University. I work with colleagues in computer science and
mathematics on a project we call the Declassification
Engine. And I was very interested to hear Dr. Martin’s
presentation. I have a three parter. One is, a number of
people have pointed out how obviously, humans also make
errors. There’s a project in the UK called Project Abaca,
60
where people from the National Archives have looked at the
error rate, the intercoder unreliability, how it is that
humans looking at the same document will redact different
things, or withhold different documents. So is there any
research, or any plans for research, to establish the
baseline? So we can know, you know, what is the error rate,
you know, when humans are reviewing documents for
declassification? I think that would be quite useful in
advancing this argument for the need for technology. The
second part has to do with the presentation itself. So, I
had some difficult just evaluating some of the research, and
I would love to know more, so I’m wondering if there are
plans for publication of some of the more specific aspects
of what methods you used, and what kind of results you were
getting. So is the code going to become open source? You
know, are there plans for publication? I hope, anyway, that
this will become a research field for data scientists, but I
think for that to happen, they would have to know more about
the kind of data you’re using, and what kind of results
you’re getting. The last part is about the funding. So, it
would be great, right, if there was more support for
research in this field. My project is funded by the
McArthur Foundation. I understand a couple of years ago
that DARPA put out a request for proposals for research in
61
this very area, but none of the proposals were funded. So
I’m wondering if there’s any prospect for funding in this
area to support research on automation of declassification,
for people outside of government?
SODERBERG: Cheryl, do you want to respond?
MARTIN: Sure. Let me go over in this corner. So we haven’t
actually done any quantification of how much better the
results are using this technology. We have a general
feeling that the reviewers are able to do a better job,
because a lot of the meticulousness, the attention they
require, is taken out of the equation for them, and they’re
able to focus better. But those are all kind of qualitative
feelings, we don’t have a baseline of before and after. And
at this point, we just don’t have enough resources to do a
long-term study like this. I agree that it would be
beneficial to have that before and after data. But with our
-- currently, no resources to dedicate towards that, and we
have not previously done that. In terms of plans for
publication, and the code -- and publication, making the
code open, the technologies that we use are already open
source. In fact, most of the fundamental (inaudible)
processing and expert systems are free, because we like free
things. Technology, really the things that we’re adding to
it is the configuration for the sensitive information. And
62
that gets pretty quickly into what’s classified. So, in
terms of making the infrastructure that ties the
technologies together open source, that could be something
that happens, but that takes some effort to maintain an open
source project, and kind of service that. And that’s just
not something we’ve been off. We -- you know, in terms --
it’s just a resource constraint on my staff. The third
question, I don’t have anything to comment on for that.
FITZPATRICK: Thank you, Cheryl. And thank you Matt for the
question. My name’s John Fitzpatrick, I’m the director of
the Information Security Oversight Office here at NARA, and
the executive secretary, we provide all of the staff support
to the board. So, let me take the question about resources,
and say they’re absolutely the questions that need to be
asked, but sort of where we are in the movie right now with
regard to proving the concept. I think we’ve -- as Bill
said, we’re there. Now, what do you do with the proven
concept? And how does the government accept that this is a
possibility and place in its action plan doing something
about that possibility? So, part of the board’s purpose is
to -- I like to say put wind in the sails of others efforts.
Admiral Studeman says management by nagging, I think those
are both the same thing. And we are in a moment, if you
will, here, by taking the board’s plans to promote and then
63
report on this need for the purpose of getting movement at
the government level for a program to do those things.
Where the understanding of how much can be done openly and
how much needs to be done in a classified environment, can
be parsed in programmatic terms, where the requirements for
attention on this and intelligence in presidential library,
in defense, or in other civilians agencies, where all of
those constituencies perform declassification in a stand-
alone way that is connected to each other, how do we take
that and make a government program out of it? All the
government agencies doing it do it, but we don’t do it in a
unified or an integrated program yet with resources,
technology, and a strategy that bring those together. So,
we’re trying to get that to happen now, I think the presence
of the open government program, the chief technology officer
of the United States, and the National Action Plan
commitments that progressively drive us down the lane
towards doing something, that’s where we’re hoping to take
it. So, I would say watch this space, and -- but we have to
go from -- we’re going from ground up. We’ve proven the
concept, now we’ve got to get the traction inside the OMB
and other places to make something of it. Thanks.
SODERBERG: I seem to remember, when we were at Texas, and
Cheryl, correct me if I’m wrong, that there was a
64
differential of 15 to 20% on the mistake rate. That it was,
you know, 80% were correct, and it brought it up to 95% when
you’re using the technology. Am I misremembering that?
MARTIN: No, that was something that we could see in the
classification [portion marking?]. And we did have a
quantification for that small set, so that’s true. In a
larger sense (overlapping dialogue; inaudible). Ambassador
Soderberg is correct, we did have some comparison of before
and after for classification portion marking, where we had
looked at what people had done for the small sets of test
cases, and compared it to the ultimate accuracy that the
computer was able to perform. And it was about 80% for
people in the small test cases. And in terms of accuracy
for the human classifiers, and that’s going to vary by
person. But ultimately, the computer was able to achieve
almost a 90%, or almost 100, 98% accuracy rate. So, we do
have a general feeling in terms of the portion marking. We
don’t have quantitative results for, you know, actually
finding mistakes in declassification. That’s just not
something we’ve really pursued as a large study to look for
those types of errors that humans may be making. And that’s
just a resource focus, I think.
SODERBERG: I have a couple of hands up. One here and one
there, and John Ficklin, you may not want to do this, but if
65
you want to -- I’ll give you a minute to think about it, if
you want to comment on the classification review that the
gentleman from Columbia was talking about, if you want to
fill us in on where that stands, I’ll invite you to do that
after the next two comments, or you can pass. So, why don’t
we start there and then -- back row.
BENDER: Michael Bender, Air Force Declassification Office.
There’s still a large volume of paper records to be reviewed
in College Park, and at Suitland, and I’m wondering, does
the board feel that automation would be used for these
analog records with the attendant requirements for scanning
and OCR? Or do you feel that automation should be reserved
for the more digital records?
SODERBERG: Thank you.
STUDEMAN: Our expert here in the front row, (inaudible) Cooper,
stand up, Eric. (inaudible) spend -- yeah.
M: I have a mic.
STUDEMAN: Pushing this project, on behalf of the CIA.
M: The digitization of the paper is actually very expensive.
And because we have to then OCR the paper, and use somewhat
dirty text, then the error rate would get much higher. I
don’t think that the technology we’re proposing is a
solution for paper, simply because of the processing that
66
would be required on the paper in order to apply the machine
concepts to it.
F: (inaudible) minutes. Is there another question?
SODERBERG: But one of the advantages of digitizing the
records is it frees up declassifiers to clear up some of the
backlog, too. So, it’s much faster, cheaper, more
efficient, if we could get this. The back row.
CLARKSON: Yeah, I just wanted to comment on the dream of
standardization of how humans --
SODERBERG: Can you introduce yourself?
CLARKSON: Oh, Charles Clarkson, sorry. How humans communicate,
I’m reminded of a story, a friend of mine were at NIH, and
trying to standardize how doctors communicate. And I said,
“You know, you getting a doctor to describe symptoms in
Boston the same way that a doctor in San Diego describes
it?” He says, “Hell, we can’t even get two hospitals in
Boston to agree on how to do this.” And it’s really one of
the fundamental challenges and weaknesses of the data act,
trying to have transparency and consistent and standard, if
they can agree, get agreement among federal agencies on
terms, they’ll never get agreement among all the grantees
that money goes to. So, following up on Bill Leary’s
comment, technology is your only hope in terms of conceptual
data mining that can accommodate how asthma can be described
67
50,000 different ways. Humans will never agree in a moment.
Thank you.
F: (inaudible).
SODERBERG: Do I see any other comments? John, did you want
to talk about what was going on there?
FICKLIN: Good morning, I’m John Ficklin. Senior director for
records and access management of the National Security
Council staff. Of course, Mr. Aftergood is correct. We do
need to make sure that we get the classification guides
correct. Certainly if the classification guides are
correct, we can apply that to this technology, and really be
resourceful. I do have to tell you, there was one other
comment that I’ve heard multiple times today, and it’s
really a top priority for me, and that’s funding. Money.
This all costs money. And I’m committed to do what I can to
try to ensure that we do get the money we need to move this
project forward. Thank you.
SODERBERG: John?
FITZPATRICK: Yeah, if -- I’d like to also refer back to Steve’s
comment about the fundamental classification guidance
review, so that we don’t leave something hanging. This is
-- he’s absolutely right, this is an opportunity built into
the executive order. We have been once through the
executive branch on this, with reports that were completed
68
in the 2012 timeframe, and have learned some lessons,
largely in owing to the observations of the public, and
analysts like Steve, who have looked at that and said, you
know, so what? And how can we get more meaningful so what’s
out of that? And so, Steve and I have had a number of
conversations on this topic, and we have taken those to
other folks in the business area of classification
management. Our office will issue guidance to agencies to
get them started in the next cycle. And so, in addition to
the input that we’ve gotten from Steve, and what he has,
I’ll say, put ideas in your head, hopefully, to think about
this as well, we welcome your input for how meaningful
outcomes could come from that review, in addition to the
ones that we’ve talked about here, we’ll continue to talk
about it with the board, and ISU, and the classification
management activities we do in the interagency, are already
contemplating how to pick the best task for agencies in the
conduct of this review, so that we get optimal output on the
back end. So I didn’t -- I wanted to acknowledge Steve’s
remark, but also say yes, in fact, that dialogue is engaged,
and it is something that we welcome input on as we always
do. Thanks.
69
SODERBERG: If I -- going once, any more? Let me pass --
thank you all for coming. I’m just going to pass the mic
back to Bill Leary, who will close the meeting.
LEARY: Yes, thank you all for coming. We encourage you to
continue to send comments, suggestions, complaints. We have
a blog called “Transforming Classification,” where you can
supply those comments. We urge you to use it, because we do
take seriously the word public in our title, and in our
mission. Thank you all for coming.
(applause)
F: Thank you.
(crowd chatter, inaudible)
END OF AUDIO FILE
70
top related