soa pilots: federation of soa and semantic medline

24
SOA Pilots: Federation of SOA and Semantic Medline Dr. Brand Niemann Director and Senior Enterprise Architect – Data Scientist Semantic Community http://semanticommunity.info/ AOL Government Blogger http://gov.aol.com/bloggers/brand-niemann/ October 2, 2012 p://semanticommunity.info/Federal_SOA/14th_SOA_for_E-Government_Conference_October_2_20 p://semanticommunity.info/Federal_SOA/Federation_of_SOA p:// semanticommunity.info/A_NITRD_Dashboard/Semantic_Medline ://semanticommunity.info/Federal_SOA/14th_SOA_for_E-Government_Conference_October_2_2012#Blog_By_Bran 1

Upload: jenski

Post on 25-Feb-2016

52 views

Category:

Documents


1 download

DESCRIPTION

SOA Pilots: Federation of SOA and Semantic Medline. Dr. Brand Niemann Director and Senior Enterprise Architect – Data Scientist Semantic Community http://semanticommunity.info/ AOL Government Blogger http://gov.aol.com/bloggers/brand-niemann/ October 2, 2012. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: SOA Pilots: Federation of SOA and Semantic Medline

1

SOA Pilots: Federation of SOA and Semantic Medline

Dr. Brand NiemannDirector and Senior Enterprise Architect – Data Scientist

Semantic Communityhttp://semanticommunity.info/

AOL Government Bloggerhttp://gov.aol.com/bloggers/brand-niemann/

October 2, 2012

http://semanticommunity.info/Federal_SOA/14th_SOA_for_E-Government_Conference_October_2_2012http://semanticommunity.info/Federal_SOA/Federation_of_SOAhttp://semanticommunity.info/A_NITRD_Dashboard/Semantic_Medlinehttp://semanticommunity.info/Federal_SOA/14th_SOA_for_E-Government_Conference_October_2_2012#Blog_By_Brand_Niemann

Page 2: SOA Pilots: Federation of SOA and Semantic Medline

2

Overview• Key People:

– Gus Hunt, CTO, CIA– Robert Ames, Senior VP for Technology, In-Q-Tel– Dr. George Strawn, Director, National Coordination Office, Networking and Information

Technology Research and Development (NITRD) Program, OSTP, White House• Big Data:

– Transactions– Interactions– Conversations

• Four Vs:– Volume (Terabytes to Zettabytes)– Variety (Structured to Structured and Unstructured)– Velocity (Batch to Streaming Data)– Value (Worth the Extra Expense? Need Data Scientist)

Page 3: SOA Pilots: Federation of SOA and Semantic Medline

3

Intelligence Community Love Big Data

http://gov.aol.com/2012/03/13/why-the-intelligence-community-loves-big-data/http://semanticommunity.info/AOL_Government/Intelligence_Community_Loves_Big_Data

Gus Hunt, CTO, CIA

Page 4: SOA Pilots: Federation of SOA and Semantic Medline

4

Big Data and the Government Enterprise

http://semanticommunity.info/AOL_Government/Big_Data_and_the_Government_Enterprise#Story

Robert Ames, Senior VP for Technology, In-Q-TelDr. George Strawn, Director, National Coordination Office, Networking and Information Technology Research and Development (NITRD) Program, OSTP, White House

Page 5: SOA Pilots: Federation of SOA and Semantic Medline

5

Big Data Innovation Conference

http://analytics.theiegroup.com/bigdata-boston

Page 6: SOA Pilots: Federation of SOA and Semantic Medline

6

Big Data Innovation Data Science

http://semanticommunity.info/AOL_Government/Big_Data_Innovation#Story-Pre-Summit

Page 7: SOA Pilots: Federation of SOA and Semantic Medline

7

Big Data Innovation Dashboard

https://silverspotfire.tibco.com/us/library#/users/bniemann/Public?IEGroup-BigDataInnovation-Spotfire

Page 8: SOA Pilots: Federation of SOA and Semantic Medline

8

Panève’s ZettaLeaf & ZettaTree Products

• Scalable single level storage– Panève’s scalable single level

storage model collapses the server, network, and storage by removing software and replacing them with memory system primitives. This eliminates all network and network-processing overhead associated with accessing storage and delivers a 10,000X increase in raw performance.

http://semanticommunity.info/@api/deki/files/19353/exec_summary_20120916.pdf

Page 9: SOA Pilots: Federation of SOA and Semantic Medline

9

Big Data in Memory Innovation Story

• Met Jef Sharp, President, Panève:– Amazing fast access and massive storage – Big Data

Supercomputer on My Mobile Device– John Hopkins University – Blackbook (CIA Cloud)

• I suggested:– Greylock Partners - #2 Data Scientist in the World (DJ Patil,

Entrepreneur-in-Residence who built the first formal data science team at LinkedIn)

• Works for In-Q-Tel (Robert Ames, Senior VP for Technology, In-Q-Tel)

• Works for CIA (Gus Hunt, CTO, CIA)– Who Wants Big Data Supercomputer on Mobile Devices

Page 10: SOA Pilots: Federation of SOA and Semantic Medline

10

Big Data Innovation Conference Book

http://www-01.ibm.com/software/info/rte/bdig/bdwa-7-post.html

Page 11: SOA Pilots: Federation of SOA and Semantic Medline

11

Understanding Big Data• Understanding Big Data:

– Analytics for Enterprise Class Hadoop and Streaming Data• This book is about big data: Big Data is a Big Deal! • Big data is going to change the way you do things in the

future, how you gain insight, and make decisions (the change isn’t going to be a replacement, rather a synergy and extension). This book to help you get quickly up to speed on this technology and to show you the unique things IBM is doing to turn the freely available open source big data technology into a big data platform; there’s a major difference and the platform is comprised of leveraging the open source technologies (and never forking it) and marrying that to enterprise capabilities provided by a technology leader that understands the benefits a platform can provide.

• By the time you are done reading this book, you’ll have a good handle on the big data opportunity that lies ahead, a better understanding on the requirements that ensures you have the right big data platform (as opposed to just technology), and have a strong foundational knowledge as to the business opportunities that lie ahead with big data and some of the technologies available.

http://semanticommunity.info/@api/deki/files/19341/IML14296USEN.pdf

Page 12: SOA Pilots: Federation of SOA and Semantic Medline

12

YARC Data Solutions & Products• Graph analytics at work: finding needles in a needle stack:

– Many Big Data problems are about searching for things you know you want to find. It's challenging because the volumes of data make it like searching for a needle in a haystack. But it's easy because a needle and a piece of hay, though similar, do not look exactly alike.

– But discovery problems are about finding what you don't know. Imagine trying to find a needle in a stack of needles . That's even harder. How can you find the right needle if you don't know what it looks like? How can you discover something new if you don't know what you're looking for?

– In order to find the unknown, you often have to know the right question to ask. It takes time and effort to ask every question and you keep learning as you continue to ask questions. uRiKA dramatically shortens this cycle. In the same amount of time it took you to ask one question, we enable you to ask a thousand questions, making it more likely that you'll discover the answer that gives you a "uRiKA" moment - and helps you gain competitive advantage.

– uRiKA specializes in discovering the unknown, the unpredicted - and completely unexpected. Learn how customers spanning government, financial services and healthcare organizations are able to find needles in needle stacks that change the balance in their favor.

– YarcData's uRiKA: Big Data appliance for real time graph analytics (512 terabytes in memory).

http://www.yarcdata.com/solutions.html & http://www.yarcdata.com/products.html

Page 13: SOA Pilots: Federation of SOA and Semantic Medline

13

$100,000 YarcData Big Data Graph Analytics Challenge

• The YarcData Big Data Graph Analytics Challenge will recognize the best submissions of un-partitionable, big data graph problems. The Challenge is open until October 31, 2012, and will award prizes ranging from $3,000 to $70,000.

http://www.yarcdata.com/graph-analytic-challenge.html

Page 14: SOA Pilots: Federation of SOA and Semantic Medline

14

MITRE Big Data Analytics

http://www.mitre.org/news/digest/advanced_research/06_12/data_analytics.html

http://www.mitre.org/work/tech_papers/2012/12_0076/12_0076.pdf

Page 15: SOA Pilots: Federation of SOA and Semantic Medline

15

October 4th ACT-IAC Big Data Forum!• Questions:

– Do you have a Big Data challenge?– Do you face compliance issues from Big Data implementation?– How do you ensure a usable result from your Big Data Project?– Do you want to discuss these and other Big Data issues with more than 20 top level

decision makers and thought leaders from government, including the Department of Defense, Homeland Security, and Energy, industry and academia as they explore ways to address the ever expanding challenges surrounding the processing and analysis of Big Data?

• Keynote Speaker:– Dr. George Strawn, Director, National Coordination Office, Networking and Information

Technology Research and Development (NITRD) Program, OSTP, White House• Venue: Grand Hyatt Washington, 1000 H Street, NW, Washington, DC 20001• Time: 12:30 – 6 p.m.• MY NOTE: CANCELLED – WORKING TO DO AT 15th CONFERENCE (APRIL 2nd)

BASED ON JANUARY 24, 2013, PRESENTATION TO THE FEDERAL BIG DATA SENIOR STEERING GROUP (ALSO DECEMBER 12th BIG DATA PART II).

http://semanticommunity.info/Federal_SOA/14th_SOA_for_E-Government_Conference_October_2_2012#Big_Data_Fall_Forum_2012

Page 16: SOA Pilots: Federation of SOA and Semantic Medline

16

Innovation by Our Data Science Team

• Members:– See below (and anyone else that would like to join us)

• Presentations:– Semantic Information Integration within the Healthcare Sector – Eric

Little, Orbis Technologies– Using Semantic Medline on the New Cray Graph Computer for Medical

Research – Victor Pollara, Noblis• Panel Discussion: Big Data and the Government Enterprise

– Kate Goodier (Moderator, IC), and *Dr. George Strawn (OSTP/NITRD/NCO), Dr. Eric Little (Orbis Technologies), Dr. Victor Pollara (Noblis), *Steve Reinhardt (Cray), and Dr. Tom Rindflesch (NLM)

• Please Think of Questions

* Note: Gadi Ben-Yehuda replaces Dr. George Strawn, and Mark Guiton replaces Steve Reinhardt.

Page 17: SOA Pilots: Federation of SOA and Semantic Medline

17

Eric Little• Eric Little is currently Director of

Information Management at Orbis Technologies, Inc., in Orlando, FL. He received a Ph.D. in Philosophy and Cognitive Science in 2002 from the University at Buffalo, State University of New York. He later received a Post-Doctoral Fellow in the University at Buffalo’s Department of Industrial Engineering developing ontologies for multisource information fusion applications (2002-04). Dr. Little then worked for several years as Assistant Professor of Doctoral Studies in Health Policy & Education and Director of the Center for Ontology and Interdisciplinary Studies at D'Youville College, Buffalo, NY (2004-2009). He left academia in 2009 to work as Chief Knowledge Engineer at the Computer Task Group (CTG) before joining Orbis.

Page 18: SOA Pilots: Federation of SOA and Semantic Medline

18

Victor Pollara• Dr. Pollara is a Senior Principal

Scientist at Noblis’ in the Health Innovation mission area. He applies several decades of experience in theoretical computer science, bioinformatics, knowledge extraction from text, and algorithm design to develop computational solutions for complex, data-driven problems. His current work is focused on applying formal modeling and semantic technologies to large, heterogeneous data sets and experimenting with Noblis’ Cray XMT2 as a multi-billion triplestore server.

Page 19: SOA Pilots: Federation of SOA and Semantic Medline

19

Kate Goodier• Ms. Goodier is a senior engineering consultant for

the STRATIS division of L-3 Communications. She has more than 20 years of experience in the technical program management and systems development team leadership for both industry and the intelligence community. In addition to technical program management and management support, she has extensive systems engineering and integration experience with in large ACAT I programs. She maintains sponsored accounts in the Joint requirements Oversight Council (JROC) and other knowledge-bases. Ms. Goodier was fifth employee hired at the Center for Information Protection for Dept. of Treasury, FBI, and CIA. She was recognized by the Federal Enterprise Architecture (FEA) Program Management Office (PMO) as an expert in system Data Engineering and developed the Data Reference Model (DRM) version 1.5 Data Description guidance for the FEA. She is a member of the Scientific Committee for the Semantic Technologies in Intelligence, Defense and Security community.

Page 20: SOA Pilots: Federation of SOA and Semantic Medline

20

Gadi Ben-Yehuda• Gadi Ben-Yehuda is the Director of Innovation and

Social Media for The Center for the Business of Government.

• Mr. Ben-Yehuda has worked on the Web since 1994, when he received an email from Maya Angelou through his first Web site. He has an MFA in poetry from American University, has taught writing at Howard University, and has worked in Washington, DC, for nonprofits, lobbying organizations, Fleishman-Hillard Global Communications, and Al Gore's presidential campaign.

• Prior to his current position, Gadi was a Web Strategist for the District of Columbia's Office of the chief Technology Officer (OCTO). Additionally, Gadi has taught creative, expository, and Web writing for more than 10 years to university students, private-sector professionals, and soldiers, including Marines at the Barracks at 8th and I in Washington, DC. (The lattermost by far the most disciplined.)

• You can follow Gadi on Twitter, read his columns on Huffington Post, and see his posts on GovLoop, and read his blog entries on the IBM Center for the Business of Government site.

Page 21: SOA Pilots: Federation of SOA and Semantic Medline

21

Mark Guiton• Mark Guiton serves as Director, Government

Relations, responsible for working with federal executive and legislative branch officials on a variety of program, policy and procurement issues as it relates to advanced computing. Prior to joining Cray, Mr. Guiton served as legislative director in the U.S. Congress from 1999 to 2003 with a focus on appropriations and technology matters. From 1995 to 1998, he served as a technology policy advisor working closely with the House Government Management, Information and Technology subcommittee from 1995 to 1998. Before working in Congress, he was a computer programmer/analyst for Shared Medical Systems Corporation (now Siemens). Mr. Guiton received a B.S. in computer science with a concentration in electrical engineering from the University of Scranton, Pennsylvania.

Page 22: SOA Pilots: Federation of SOA and Semantic Medline

22

Tom Rindflesch• Thomas C. Rindflesch has a

Ph.D. in linguistics from the University of Minnesota and conducts research in natural language processing at the National Library of Medicine. He leads a research group focused on exploiting the Library’s resources to support development of advanced information management technologies in the biomedical domain.

Page 23: SOA Pilots: Federation of SOA and Semantic Medline

23

BIG DATA at the HillTopics Trends Issues CommentsMyth vs. Realities Big Data Solves Everything Hype Without

Demonstrated Business and Scientific Value

See Data Evolution in the Government Enterprise: Will It Still Be Big Data Next Year?Privacy: Who knows what? The Intelligence Community

Knows EverythingWho Knows Everything the Intelligence Communty Is Doing?

See Intelligence Community Loves Big Data

Cloud: Where Big Data belongs?

Terabytes to Zettabytes Bandwidth Limitations Amazon: Fedex Your Storage Devices To Us to Upload Your Big Data

Mobility – of you and your data Bring Your Own Device (BYOD) Conventional Web Sites and Databases Are Not Mobile-Enabled

Your Mobile Device Has Access To a Supercomputer

Storage and technology Scalable single level storage Collapses the Server, Network, and storage by removing software and replacing them with memory system primitives

Panève’s ZettaLeaf & ZettaTree Products

Data Analytics – hidden gems and spurious conclusions

Data Science Too Few Data Scientists - Need a Government Data Science Community

See My Data Journalism Articles

Opportunities and risks in data aggregation

Aggregate Before Analysis To Reduce Size

Needels Could Be Lost See Data Evolution in the Government Enterprise: Will It Still Be Big Data Next Year?Security concerns for large

data setsIntegrate Calssified and Unclassified Data Sources

Different Security Levels Need To Specify/Protect Security at the Row and Element Level

Financial Implications Hadoop for Everything with Big Data

Costs 50 Times Higher Than Expected

Big Data In Memory Could Be More Costs Effective

Page 24: SOA Pilots: Federation of SOA and Semantic Medline

24

BIG DATA at the Hill• My three suggestions:

– What Congress Should Do to Help Big Data• Allow access to confidential data like the Census Data Centers• Allow sharing between statistical agencies• Have a Chief Data Officer that promotes a Federal Data Science Community of Data

Scientists and Statisticians– The Federal Government Should First Focus on the Value of Big Data

• Hadoop Projects are costing 50 times more than expected• DHS failed fast with a Big Data in the Cloud Project, but quickly and at less cost• Semantic Medline on the Cray Graph Computer in an example of Federal Data

Science Team Project with Value– The Federal Government Should Foster Real Innovation with Government

Data• Encourage private industry to add value to government data• Consider having the Federal Government's Chief Statistician be the Chief Data Officer• Empower the Government's Data Scientists and Statisticians to Analyze Big Data and

Statistical Datahttp://semanticommunity.info/AOL_Government/BIG_DATA_at_the_Hill