analyze*datain* mongodb*with*the* hunk*app*€¦ · javascript java python* php* c# ruby* rest*api!...
TRANSCRIPT
Copyright © 2014 Splunk Inc.
Asya Kamsky Principle Developer Advocate, MongoDB
Analyze Data in MongoDB with the Hunk App
Disclaimer
2
During the course of this presentaIon, we may make forward-‐looking statements regarding future events or the expected performance of the company. We cauIon you that such statements reflect our current expectaIons and
esImates based on factors currently known to us and that actual events or results could differ materially. For important factors that may cause actual results to differ from those contained in our forward-‐looking statements,
please review our filings with the SEC. The forward-‐looking statements made in the this presentaIon are being made as of the Ime and date of its live presentaIon. If reviewed aTer its live presentaIon, this presentaIon may not contain current or accurate informaIon. We do not assume any obligaIon to update any forward-‐looking statements we may make. In addiIon, any informaIon about our roadmap outlines our general product direcIon and is subject to change at any Ime without noIce. It is for informaIonal purposes only, and shall not be incorporated into any contract or other commitment. Splunk undertakes no obligaIon either to develop the features or funcIonality described or to
include any such feature or funcIonality in a future release.
Speaker Bio
3
! Principal Community Advocate – Helping users get the most out of their MongoDB
deployments
! Over 20 years of industry experience ranging from big companies to start-‐ups
! Her career has spanned work in database technologies, security, soTware tesIng, networking, and the web
MongoDB
Business Agility: Dynamic Data Model
Scale Fast: Operational Flexibility
Low TCO: Commodity Hardware
Fastest Growing Database of 2013*
*DB-Engines
Horizontally Scalable -‐Sharding
Agile Flexible
High Performance & Strong Consistency
Application!
Highly Available -‐Replica Sets
{ author: “roger”, date: new Date(), text: “Spirited Away”, tags: [“Tezuka”, “Manga”]}
MongoDB
RelaIonal Data Models: Hard to Change New Table
New Table
New Column
Name Age Phone Email
New Column
Documents are Easier RelaAonal MongoDB
{ ! givenName: ‘Paul’,! surname: ‘Miller’! city: ‘London’,! location: [45.123,47.232],! cars: [ ! { model: ‘Bentley’,! year: 1973,! value: 100000, … },! { model: ‘Rolls Royce’,! year: 1965,! value: 330000, … }! ]!}!
No SQL But SIll Flexible Querying
Rich Queries • Find all Mark’s policies • Find everybody who purchased a policy last
month
GeospaAal • Find all customers that live within 10 miles of NYC
Text Search • Find all tweets that menAon the company within the last 2 days
AggregaAon • What’s the total value of Mark’s policies
Map Reduce • How many customers that bought an auto policy got a home policy within three months
{ !customer_id : 1,!!first_name : "Mark",!!last_name : "Smith",!!city : "San Francisco",!!policies: [ !{!! !policy_number : 13,!! !type : “auto”,!! !deductible: 500!! },!! {!policy_number : 14,!! !type : “life”,!! !beneficiaries: […]!! }!
] ! }!
The AcceleraIng Pace of Data Volume | Velocity | Variety | Variability
GPS, RFID,
Hypervisor, Web Servers,
Email, Messaging, Clickstreams, Mobile,
Telephony, IVR, Databases, Sensors, TelemaIcs, Storage,
Servers, Security Devices, Desktops
Machine data is the fastest growing, most complex, most valuable area of big data
Industry Leading Big Data Product Porjolio
Real-‐Ime indexing Real-‐Ime search
Splunk Apps Vibrant and passionate developer community
IT Ops.
Security & Compliance
Digital Intelli-‐gence
App Dev & App
Mgmt.
Business AnalyIcs
Splunk Hadoop Connect ODBC DB Connect
Ad hoc analyIcs of historical data in Hadoop
Developers building big data apps on top of Hadoop
3600 Customer View
Complete Security AnalyIcs
Product and Service AnalyIcs
Powerful Developer Plajorm with Familiar Tools
JavaScript Java Python PHP C# Ruby
REST API
Add New UI components
Integrate into ExisIng Systems
With Known Languages and Frameworks
Components of Hunk Server
64-‐bit Linux OS
REST API COMMAND LINE
Explore Analyze Visualize Dashboards Share
ODBC
splunkd
Hadoop Interface • Hadoop Client Libraries • JAVA
Streaming Resource Libraries • NoSQL & Other Stores
splunkweb Web and ApplicaIon server
Virtual Indexes
Python, AJAX, CSS, XSLT, XML
Search Head C++, Web Services
Where is Hunk Used? FINANCIAL SERVICES • Detect / prevent fraud • Model and manage risk • Personalize banking & insurance products
WEB / SOCIAL / MOBILE • Product and customer analyIcs • SenIment analysis, personalizaIon • Web log, image and video analysis
RETAIL • Behavior analysis • Cross-‐selling, recommendaIon engine • OpImize pricing, placement, design • OpImize inventory and distribuIon
GOVERNMENT • Detect and prevent fraud • Security and intelligence • Support open data iniIaIves
MANUFACTURING • SimulaIon, analysis and design • Improve service via product sensor data • “Digital factory” for lean manufacturing
HEALTHCARE • Drug pedigree and supply chain • PaIent monitoring • Compliance, archival, text search • Data-‐driven research
Virtual Indexes
• Enables seamless use of almost the enIre Splunk stack on data • AutomaIcally handles query execuIon to Mongo, Hadoop, etc. • Technology is patent pending
Hunk Search Head >
Examples of Virtual Indexes
External System 1
External System 2
External System 3
index = syslog (/home/syslog/…)
index = apache_logs index = sensor_data
index = twiyer
Hunk applies schema for all fields – including transacIons – at search Ime
Hunk Applies Schema on the Fly
• Structure applied at search Ime
• No briyle schema to work around
• AutomaIcally find payerns and trends
Hunk Search Architecture
Query per Index/Virtual Index
Search Processor
Hunk Search Head >
1. 3.
4.
2.
Splunk Distributed Search
Hadoop External Results Provider
MongoDB Streaming
Resource Library MongoDBProvider
MongoDB
MongoDB
MongoDB
JSON Config
Results ReducIon
Install via Command Line
21
! Go to <apps.splunk.com URL> ! Download MongoDBProvider.spl ! Either:
– Copy MongoDBProvider.spl to $SPLUNK_HOME/etc/apps – tar –zxvf MongoDBProvider.spl
Configure Indexes.conf – Overview
22
! Indexes.conf defines indexes, physical and virtual ! Need to two configuraIon items, a provider and a virtual index
– Provider should be 1:1 to your MongoDB Server – There can be mulIple virtual indexes per Provider
! Indexes.conf can be in any Splunk App, probably easiest to put it in MongoDBProvider folder
Configure Indexes.conf
23
[wocorders] vix.provider = local-‐mongodb vix.mongodb.db = demo vix.mongodb.collecIon = wocorders vix.mongodb.field.Ime = Imestamp vix.mongodb.field.Ime.format = date
[provider:local-mongodb]!vix.family = mongodb_erp_family!vix.splunk.search.debug = 0!vix.mongodb.host = localhost:27017!
Provider Name (referenced in Virtual Indexes)!Family!Disable Debugging!Hostname:Port!
Provider
[mongodb_vix]!vix.provider = local-mongodb!vix.mongodb.db = hunk!vix.mongodb.collection = test!vix.mongodb.field.time = _id!vix.mongodb.field.time.format = ObjectId
Name of the Virtual Index (used by users)!Provider Name (matches earlier stanza)!MongoDB DB Name!MongoDB Collection Name!Field to extract time from!Format of the Field to Extract Time From (Valid Options are ObjectID, Date, or Epoch)!
Virtual Index 1
Configure Indexes.conf
24
[wocorders]!vix.provider = local-mongodb!vix.mongodb.db = demo!vix.mongodb.collection = wocorders!vix.mongodb.field.time = timestamp!vix.mongodb.field.time.format = date
Name of the Virtual Index (used by users)!Provider Name (matches earlier stanza)!MongoDB DB Name!MongoDB Collection Name!Field to extract time from!Format of the Field to Extract Time From (Valid Options are ObjectID, Date, or Epoch)!
Virtual Index 2
How to Query Mongo
25
index=mongodb (foo=xyz OR other=val) | fields foo, bar, baz
Query your MongoDB
Virtual Index
Match any fields by specifying the field name and matching parameters
Minimize results returned by
projecIng down only the fields you want returned
Mongo Specific IntegraIon Highlights
26
index=mongodb foo=xyz | Imechart avg(bar) by baz
Predicate Pushdown ProjecAons
Filtering terms are processed on the MongoDB side, so only results where the
field foo matches xyz are returned
We only return back fields which are menIoned in the parIcular search, in this
case _Ime, bar and baz
Get The Bits!
28
! Hunk – hyp://splunk.com/download
! MongoDB App – hyp://apps.splunk.com/app/1810/ – Or search for “MongoDB” on apps.splunk.com