government big data solutions award

25
CTOlabs.com vernment Big Data Solutions Awa ourley http://ctolabs.com No

Upload: urian

Post on 15-Feb-2016

52 views

Category:

Documents


0 download

DESCRIPTION

Government Big Data Solutions Award. Bob Gourley http:// ctolabs.com Nov 2011. About This Presentation:. How can we help accelerate public sector innovation? Top Federal Mission Needs for Big Data The State of Big Data Solutions in the Federal Space - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Government Big Data Solutions  Award

CTOlabs.com

Government Big Data Solutions Award

Bob Gourley http://ctolabs.com Nov 2011

Page 2: Government Big Data Solutions  Award

CTOlabs.com

About This Presentation:

• How can we help accelerate public sector innovation? • Top Federal Mission Needs for Big Data• The State of Big Data Solutions in the Federal Space• The Intent of the Government Big Data Solutions Award• Criteria• Judges• Top Nominees for 2011• How to Nominate for 2012• The Judges Choice for 2011

2

Page 3: Government Big Data Solutions  Award

CTOlabs.com

Our Challenge

Page 4: Government Big Data Solutions  Award

CTOlabs.com

The Government Needs More Agility*

The government can rapidly benefit from the lessons of high tech by being a faster follower, especially when it comes to Big Data constructs

Thesis: If the Big Data community understands more about federal missions, challenges and successes, we can improve the speed and effectiveness of federal solutions.

4

“High tech runs three-times faster than normal businesses. And the government runs three-times slower than normal businesses.

So we have a nine-times gap” – Andy Grove

*Among other needs

Page 5: Government Big Data Solutions  Award

CTOlabs.com

Top Federal Mission Needs for Big Data Financial fraud detection across large, rapidly changing data sets

Cyber Security: rapid real time analysis of all relevant data

Rapid return of geospatial data based on query

Location based push of data: Focused on emergency response

Real time return of relevant search: USA.gov is exemplar

Real time suggestion of topics: USA.gov is exemplar

Real time suggestion of correlations: DoD has many use cases

Bioinformatics: Human Genome

Bioinformatics: Patient location, treatment, outcomes

5

These needs must be met in an era of significant downward pressure on budgets. Scalable systems with well thought out governance & extensive automation are key.

Page 6: Government Big Data Solutions  Award

CTOlabs.com

Most active fed solution areas: Federal integrators: Spending internal research and development

funds to create prototypes and full solutions relevant to fed missions

DoD and IC agencies: Using Big Data approaches to solve “needle in the haystack” and “connect the dots” problems

National Labs: Bioinformatics solutions have been put in place by federal researchers

OMB and GSA: Ensuring sharing of lessons and solutions. Key exemplars around web search methods. Solutions inside government agencies and on citizen facing properties

6

Big Data solutions are already making a difference in government service to citizens. Highlighting some of this virtuous work is a goal of our Government

Big Data Solutions Award.

Page 7: Government Big Data Solutions  Award

CTOlabs.com

The Intent of the Government Big Data Solutions Award

Established to help facilitate exchange of best practices, lessons learned and creative ideas for solutions to hard data challenges

Special focus on solutions built around Apache Hadoop framework

Nominees and award winners to be written up in CTOlabs.com technology reviews

Award meant to help generate exchange of lessons learned

7

We established a team of judges, asked them to consider mission impact as primary criteria, and solicited award nominations via sites frequented by

government IT professionals and solution providers.

Page 8: Government Big Data Solutions  Award

CTOlabs.com

Judges Doug Cutting: An advocate and creator of open source search

technologies (@cutting)

Chris Dorobek: Founder, editor, publisher of DorobekInsider.com (@DorobekINSIDER)

Ed Granstedt: QinetiQ Strategic Solution Center

Ryan LaSalle: Accenture Technology Labs (@Labsguy)

Alan Wade: Experienced federal CIO

8

Judges are all experienced innovators known for mastery in their fields

Page 9: Government Big Data Solutions  Award

CTOlabs.com

Top Nominees for 2011 USA Search: Best in class hosted search services over more than 400 gov

sites. Great use of CDH3.

GCE Federal: Cloud-based financial management solutions. Apache Hadoop, Hbase, Lucene for Dept of Labor.

PNNL Bioinformatics: Leading researcher Dr. Taylor of PNNL is advancing understanding of health, biology, genetics and computing using Apache Hadoop/MapReduce/HBase.

SherpaSurfing: Use of CDH as a cybersecurity solution. Ingest packet capture in any format, analyze trends, find malware, alert.

US Department of State: Bureau of Counselor Affairs. Large data with important applications for citizen service and national security.

9

Each of these are making a difference for government missions right now.

Page 10: Government Big Data Solutions  Award

CTOlabs.com

Please Think Now About 2012 Nominations

Page 11: Government Big Data Solutions  Award

CTOlabs.com

How to Nominate for 2012

11

Click Here. Fill In Form. Hit “Submit”

• We expect (and hope for) a much more crowded field of contenders next year.

• Please let us know if you are working on things that feds should be aware of.

• You can also submit technologies for review on our site.

Page 12: Government Big Data Solutions  Award

CTOlabs.com

Special MentionDepartment of State

Consular Consolidated Database

Page 13: Government Big Data Solutions  Award

CTOlabs.com

Department of State (DoS), Bureau of Consular Affairs (CA) Consular Consolidated Database (CCD)

CCD is critical to citizen support and important in facilitating lawful visits to US

First line of defense against unlawful entry

Largest connected/replicating database structure in the government

Pre-screening visa applicants, helps adjudicators weed out fraud

Used by multiple agencies

13

Very smart use of current data approaches to solve hard problems

Page 14: Government Big Data Solutions  Award

CTOlabs.com

Judge’s Choice 2011GSA

USA Search

Page 15: Government Big Data Solutions  Award

CTOlabs.com 15

Page 16: Government Big Data Solutions  Award

CTOlabs.com

USA Search

Program of General Services Administration’s (GSA) Office of Citizen Services and Information Technologies.

Hosted search services for USA.gov and over 500 other government websites.

Solves big data challenges with open source capabilities.

CDH3 since fall 2010. HDFS, Hadoop and Hive used in cost effective, resilient, scalable solution.

Search Results. Search Suggestions. Trend analysis. Analytic dashboards.

16

Bottom Line: USA Search brings the best of the open source community to multiple government missions, including direct citizen support

Page 17: Government Big Data Solutions  Award

CTOlabs.com 17

Page 18: Government Big Data Solutions  Award

CTOlabs.com

Questions/Comments?

Page 19: Government Big Data Solutions  Award

CTOlabs.com

This Presentation Prepared By:Bob GourleyCTOlabs.com

http://twitter.com/bobgourley

Page 20: Government Big Data Solutions  Award

CTOlabs.com

Backup Slides

20

Page 21: Government Big Data Solutions  Award

CTOlabs.com

Department of State (DoS), Bureau of Consular Affairs (CA) Consular Consolidated Database (CCD)

•Bureau of Consular Affairs issues travel documents to U.S. and foreign citizens. CA stores data collected from consular posts abroad and domestic processing centers, as well as other government agencies in the Consular Consolidated Database (CCD).

•CCD holds over one hundred (115) terabytes of data, growing by 6-8 terabytes each month. Over 170 software applications collect this information and provide interfaces with the numerous partner agencies that share data with CA.

•CCD is the “largest connected/replicating database structure in the government.”

•Most of these applications use a ‘case’ (such as a visa or passport application), and not a person record, as the basis of their data storage and retrieval. At the application level, it is extremely difficult to link person information in one application to potentially-matching person information contained in another application. A person could apply for a visa at one location, and then apply at another location under a different name, and an adjudicator may not be able to establish the link between the cases. The CCD can leverage all available data elements from all applications throughout the system in order to determine all of the potential identity matches of any given person that CA has encountered.

•The CCD also contains unstructured data, such as free-form comments or case notes. The CCD must deal with millions of large image files, such as applicant photos or scanned documents. The CCD’s powerful, custom-built analytical tools synthesize the complex data captured by CA with the equally-complex data received from other agencies. The CCD thus gives its users the ability to make informed decisions, detect and prevent fraud, and identify potential national security threats.

21

Page 22: Government Big Data Solutions  Award

CTOlabs.com

Department of State (DoS), Bureau of Consular Affairs (CA) Consular Consolidated Database (CCD)

•CCD is based on Oracle tools.

•The CCD can pre-screen a visa record before an adjudicator even looks at it. The CCD provides the means to conduct vetting checks against various government databases.

•Due to the wide variety of resources used by the CCD, the system can establish links between two applicants using completely different names. With each subsequent encounter, the CCD creates additional links, resulting in a searchable, fully cross-referenced web of information that traces a person’s activities across all of CA data. By being able to see these links in a person-centric view, the adjudicators have a broader, more complete, and more easily-accessible set of data with which to make better-informed decisions.

•The CCD automatically initiates biometric checks. The CCD automatically looks for fraud indicators. The CCD captures all of the data entered during the process and automatically creates cross-references using the new data.

•The CCD has transformed CA’s mission delivery by breaking the paradigm of data isolated in independent databases

•The CCD allows staff to focus its time on better customer service, investigative activities, and analysis. CA’s technical achievement with the CCD has been to create a robust, economical, and analytically-powerful data platform in an environment where fragmentation and inefficiency had been the norm.

22

Page 23: Government Big Data Solutions  Award

CTOlabs.com

USA Search: A Strategic Resource• USASearch is a program of the General Services Administration’s (GSA)

Office of Citizen Services and Information Technologies.

• GSA believes in building once and using many times. USASearch is no exception. Since 2000, USASearch has provided hosted search services for USA.gov and for more than 400 government websites—across all levels of government—at no cost through its Affiliate Program.

• USASearch instituted many innovative changes in 2010—making it a model for the Obama administration’s effort to leverage open source technologies and shared solutions to bring substantial cost savings for the government. With its new open architecture model, the USASearch Program provides viable and scalable shared search services.

• USASearch Solves Big Data Challenges

23

Page 24: Government Big Data Solutions  Award

CTOlabs.com

USA Search: A Strategic Resource• USASearch began using Cloudera’s Distribution including Apache Hadoop (CDH3) for the first time

in the fall of 2010, and since then has seen its usage grow every month—not just in scale, but also in scope.

• All of the search traffic across USA.gov and the hundreds of affiliate sites comes through a single search service, and this generates a lot of data. To continuously improve the service, USASearch needs aggregated information on what searchers look for, how well they find it, and emerging trends, among other information. Once searches are initiated, USASearch also needs to know what results are shown and clicked on. This information needs to be broken down by affiliate and by time, and also aggregated across all affiliates.

• The initial system was fairly simple and did just enough to address the most pressing data needs. As USASearch watched its data grow and the nightly batch jobs took longer and longer, it became clear that it would soon exhaust its existing resources. USASearch considered scaling up the hardware vertically and sharding the database horizontally, but both options seemed to kick the can down the road. Larger database hardware is both costly and eventually insufficient for USASearch’s needs, and sharding promised to take all the usual issues associated with a single database system and multiply them.

• USASearch determined it needed HDFS, Hadoop, and Apache Hive—a big data system that could grow cost effectively and without downtime, be naturally resilient to failures, and sensibly handle backups.

24

Page 25: Government Big Data Solutions  Award

CTOlabs.com

USA Search: A Strategic Resource• USASearch Makes Data Actionable USASearch displays the results of its Hive analyses in

various analytics dashboards, but, more importantly, it also ensures the results positively affect searchers’ experience on government websites. For example, USASearch uses Hadoop to generate contextually relevant and timely search suggestions for each of its affiliated government websites. Compare the different type-ahead suggestions for ‘gran’ on NPS.gov and USA.gov. Both websites use the same USASearch backend system, but the suggestions differ completely.

• USASearch Is a Success The overhaul of USASearch’s analytics is a dramatic success

story. In the space of a few months, USASearch went from having a brittle and hard-to-scale RDBMS-based analytics platform to a much more agile Hadoop-based system that is intrinsically designed to scale. USASearch continues to see its Hadoop usage grow in scope with each new data source it adds, and it is clear that USASearch will rely on it more and more as the suite of tools and resources around Hadoop grows and matures in the future.

• By using a state-of-the-art open source technology, USASearch has created a radically different search service that transforms the customer experience. Having a government-owned and -controlled search service allows us to constantly understand what’s on the minds of Americans to drive enhancements to other delivery channels. The public has a much improved experience when interacting with the government due to USASearch.

25