big data cloud june 3rd meetup - presentation by mark davis
DESCRIPTION
Big Data Cloud June 3rd Meetup - Presentation by Mark Davis Unlocking Big Data through Analytics and SearchTRANSCRIPT
![Page 1: Big Data Cloud June 3rd Meetup - Presentation by Mark Davis](https://reader034.vdocuments.net/reader034/viewer/2022042623/54b52a834a795959468b460c/html5/thumbnails/1.jpg)
Big Data Cloud Meetup
Big Data & Cloud Computing - Help, Educate & Demystify.
June 3rd 2011
![Page 2: Big Data Cloud June 3rd Meetup - Presentation by Mark Davis](https://reader034.vdocuments.net/reader034/viewer/2022042623/54b52a834a795959468b460c/html5/thumbnails/2.jpg)
Kitenga, Mark Davis CTO
Unlocking Big Data through Analytics and Search
June 3rd 2011 Meetup
![Page 3: Big Data Cloud June 3rd Meetup - Presentation by Mark Davis](https://reader034.vdocuments.net/reader034/viewer/2022042623/54b52a834a795959468b460c/html5/thumbnails/3.jpg)
Big DataEnormous transactional dataEnormous unstructured informationToo big for databasesNew tools are needed
![Page 4: Big Data Cloud June 3rd Meetup - Presentation by Mark Davis](https://reader034.vdocuments.net/reader034/viewer/2022042623/54b52a834a795959468b460c/html5/thumbnails/4.jpg)
Unstructured data explosion
4
Multimedia Content
TextImagery
AudioVideo
Sensor StreamsBiometric data
3D
TextEmail
DocumentsWeb pages
TweetsPosts
Structured Enterprise DataDatawarehouse
CDRsFinancial records
Access logs
<5%
![Page 5: Big Data Cloud June 3rd Meetup - Presentation by Mark Davis](https://reader034.vdocuments.net/reader034/viewer/2022042623/54b52a834a795959468b460c/html5/thumbnails/5.jpg)
Big Data
5
Trillions of user interactions/transactions == Big Data
<1M <10M >100M
Open sourceMySQL
PHP
Data warehousingParallel SQL
Big hardware
NoSQLHadoop/MapReduce
Hbase/HIVE
Traditional (DBMS-based) solutions
Emerging technologies
![Page 6: Big Data Cloud June 3rd Meetup - Presentation by Mark Davis](https://reader034.vdocuments.net/reader034/viewer/2022042623/54b52a834a795959468b460c/html5/thumbnails/6.jpg)
The Structured/Unstructured Chasm
SQLRDBMS
Transactional DataBI Tools
SearchDocumentsText ClassificationTaxonomiesOntologies
![Page 7: Big Data Cloud June 3rd Meetup - Presentation by Mark Davis](https://reader034.vdocuments.net/reader034/viewer/2022042623/54b52a834a795959468b460c/html5/thumbnails/7.jpg)
Unstructured Analytics: Surfacing Metadata
![Page 8: Big Data Cloud June 3rd Meetup - Presentation by Mark Davis](https://reader034.vdocuments.net/reader034/viewer/2022042623/54b52a834a795959468b460c/html5/thumbnails/8.jpg)
Information Extraction
Parts-of-Speech Tagging
Tokenization
Lemmatization
Finite State Transducer
Finite State Transducer
Finite State Transducer
Machine-Learning
![Page 9: Big Data Cloud June 3rd Meetup - Presentation by Mark Davis](https://reader034.vdocuments.net/reader034/viewer/2022042623/54b52a834a795959468b460c/html5/thumbnails/9.jpg)
Search + Analytics
Query Language
Metadata Extraction
Indexing
Facet Browsing Facet Charting
Resource Integration
AutosuggestSpellcheck
![Page 10: Big Data Cloud June 3rd Meetup - Presentation by Mark Davis](https://reader034.vdocuments.net/reader034/viewer/2022042623/54b52a834a795959468b460c/html5/thumbnails/10.jpg)
10
Defense Intelligence
Analyst support staff needs to convert raw data into actionable intelligence
Situation Reports
Geo-tagged Imagery
Named Entity Extraction
Image tagging
Video analytics
Linkage Analysis
Network Visualization
Search
Hadoop/MapReduce, GPUs, HDFS,
Hbase, SOLR
Improve Force
Effectiveness
US Army NavyDHSNSA
![Page 11: Big Data Cloud June 3rd Meetup - Presentation by Mark Davis](https://reader034.vdocuments.net/reader034/viewer/2022042623/54b52a834a795959468b460c/html5/thumbnails/11.jpg)
11
CASE STUDY: US ARMY
Analysis Bottlenecks
200 data feedsUnacceptable response time
Analysts avoid complete searches
Basic entity extractionSlow analysis cycles
Distribution by PowerPoint
Enabling Technologies: Oracle and custom thick clients
The Solution>200 data feeds
<0.5s queriesFast analysis cyclesMachine Learning
AnalyticsBiometrics
Linkage AnalysisFace recognition
Video taggingCollaborative systems
Enabling techonolgies: GPU clouds, Hadoop/MapReduce, Katta, Lucene,
NoSQL, Hbase
![Page 12: Big Data Cloud June 3rd Meetup - Presentation by Mark Davis](https://reader034.vdocuments.net/reader034/viewer/2022042623/54b52a834a795959468b460c/html5/thumbnails/12.jpg)
12
Pharma Bioinformatics
Increase speed of drug discovery
Patents
Genetic Sequence Data
Journal Articles
ZettaVoxBiological Named Entity Extraction
Author Name Extraction and Normalization
Linkage Analysis
Timelines
Facetted Search
Hadoop/MapReduce, HDFS, Hbase, GPUs,
SOLR
Faster Discovery
![Page 13: Big Data Cloud June 3rd Meetup - Presentation by Mark Davis](https://reader034.vdocuments.net/reader034/viewer/2022042623/54b52a834a795959468b460c/html5/thumbnails/13.jpg)
13
Pharma Treemap
![Page 14: Big Data Cloud June 3rd Meetup - Presentation by Mark Davis](https://reader034.vdocuments.net/reader034/viewer/2022042623/54b52a834a795959468b460c/html5/thumbnails/14.jpg)
14
![Page 15: Big Data Cloud June 3rd Meetup - Presentation by Mark Davis](https://reader034.vdocuments.net/reader034/viewer/2022042623/54b52a834a795959468b460c/html5/thumbnails/15.jpg)
![Page 16: Big Data Cloud June 3rd Meetup - Presentation by Mark Davis](https://reader034.vdocuments.net/reader034/viewer/2022042623/54b52a834a795959468b460c/html5/thumbnails/16.jpg)
Demo
![Page 17: Big Data Cloud June 3rd Meetup - Presentation by Mark Davis](https://reader034.vdocuments.net/reader034/viewer/2022042623/54b52a834a795959468b460c/html5/thumbnails/17.jpg)
![Page 18: Big Data Cloud June 3rd Meetup - Presentation by Mark Davis](https://reader034.vdocuments.net/reader034/viewer/2022042623/54b52a834a795959468b460c/html5/thumbnails/18.jpg)
Summary
• Big Data spans unstructured and structured data
• Effective tools for managing both involve understanding the differences and similarities of both
• Bridging the chasm between them means merging search and analytics together
![Page 19: Big Data Cloud June 3rd Meetup - Presentation by Mark Davis](https://reader034.vdocuments.net/reader034/viewer/2022042623/54b52a834a795959468b460c/html5/thumbnails/19.jpg)
Questions?
![Page 20: Big Data Cloud June 3rd Meetup - Presentation by Mark Davis](https://reader034.vdocuments.net/reader034/viewer/2022042623/54b52a834a795959468b460c/html5/thumbnails/20.jpg)
20
Contact Info
[email protected]://www.kitenga.com
Kitenga, Inc.2953 Bunker Hill Lane, Suite 400Santa Clara, CA 95054
1-(408)-462-KITE1-(253)-541-6799 (FAX)