webinar: elevate your enterprise architecture with in-memory computing
TRANSCRIPT
Tweet #MongoDBWebinar and follow @mongodb
Elevate Your Enterprise Architecture with an In-Memory
Computing Strategy
Dylan TongPrincipal Solutions [email protected]
Tweet #MongoDBWebinar and follow @mongodb
In-Memory Computing
How can we process data as fast as possible by leveraging in-memory speed at it’s best?
What are the possibilities if we could?
Tweet #MongoDBWebinar and follow @mongodb
High-frequency trading (HFT) is a program trading platform that uses powerful computers to transact a large number of orders at very fast speeds. It uses complex algorithms to analyze multiple markets and execute orders based on market conditions.
Typically, the traders with the fastest execution speeds are more profitable than traders with slower execution speeds.
Source: Investopedia
Speed Matters…
Tweet #MongoDBWebinar and follow @mongodb
Speed Matters…Amazon found that it increased revenue by 1% for every 100ms of improvement [source: Amazon]
A 1-second delay in page load time equals 11% fewer page views, a 16% decrease in customer satisfaction, and 7% loss in conversions. [Source: Aberdeen Group]
A study found that 27% of the participants who did mobile shopping were dissatisfied due to the experience being too slow. [Source: Forrester Consulting]
Tweet #MongoDBWebinar and follow @mongodb
How Fast?
Latency Unit
RAM access 100s ns
SSD access 100s µs
HDD access 10s ms
Normalized to 1 s
~6 min
~6 days
~12 months
Tweet #MongoDBWebinar and follow @mongodb
Why Now?*Average $/GB
2015 $4.372013 $5.52010 $12.372005 $1892000 $1,1071995 $30,8751990 $103,8801985 $859,3751980 $6,328,125
2005 2010 2013 2015$0
$20
$40
$60
$80
$100
$120
$140
$160
$180
$200
Last 10 Years…
“Generally affordable”
*http://www.statisticbrain.com/average-historic-price-of-ram/
Tweet #MongoDBWebinar and follow @mongodb
Why Now?
2010 2013 2015$0.00
$2.00
$4.00
$6.00
$8.00
$10.00
$12.00
$14.00
“An Option at Scale”
*Average $/GB
2015 $4.372013 $5.52010 $12.372005 $1892000 $1,1071995 $30,8751990 $103,8801985 $859,3751980 $6,328,125
Last 5 Years…
*http://www.statisticbrain.com/average-historic-price-of-ram/
Tweet #MongoDBWebinar and follow @mongodb
"This will process these data using algorithms for machine learning and artificial intelligence before sending the data back to the car.
The zFAS board will in this way continuously extend its capabilities to master even complex situations increasingly better," Audi stated. "The piloted cars from Audi thus learn more every day and with each new situation they experience.”
Source: T3.com
The possibilities…
Tweet #MongoDBWebinar and follow @mongodb
Challenges: Scale
Tweet #MongoDBWebinar and follow @mongodb
Challenges: Cost Viability
= $34,777/yr. ~$1.74M/yr. for infrastructure to support 100TB
Tweet #MongoDBWebinar and follow @mongodb
Challenges: Cost Viability
Storage Type Avg. Cost ($/GB) Cost at 100TB ($)
RAM 5.00 500K
SSD 0.47-1.00 47K to 100K
HDD 0.03 3K
http://www.statisticbrain.com/average-cost-of-hard-drive-storage/
http://www.myce.com/news/ssd-price-per-gb-drops-below-0-50-how-low-can-they-go-70703/
Tweet #MongoDBWebinar and follow @mongodb
Challenges: DurabilityVolatile Memory
• What happens when things fail, and what data maybe loss?
• How does the system synchronize with your durable storage? Does it do this well, and is it simple to implement?
Tweet #MongoDBWebinar and follow @mongodb
Challenges: Design Still Matters
Tweet #MongoDBWebinar and follow @mongodb
on RAM
Tweet #MongoDBWebinar and follow @mongodb
Scenario : ECommerce Modernization InitiativeBusiness Problems Technology Limitation
Customer experience is suffering during high traffic events.
Too expensive to scale system to support spike events.
Scaling system is hard, and engineering teams can’t react fast enough in the event of unexpected growth
Some caching solution implemented, but it mostly only helps with read performance; synchronizing writes has been a development nightmare.
Lack of mobile customers in Europe and Asia has been attributed to latency issues.
Difficult to extend data architecture globally, so effort is put on hold
Tweet #MongoDBWebinar and follow @mongodb
Scenario : ECommerce Modernization InitiativeBusiness Problems Technology LimitationBelow industry conversation rate performance has been attributed partly to poor personalization
Customer info is siloed across across the Enterprise, and it’s too complicated to bring this data together so effective models can be built to drive personalization
“Big Data” project to bring data together to drive machine learning and cognitive capabilities in platform failed as data scientists report platform was too slow to develop on, and performance was impractical.
Business analysts have siloed views of the eCommerce channel, and information isn’t getting to them fast enough
Related to limitations above
Integrating data into data warehouse is slow and hard to maintain
Tweet #MongoDBWebinar and follow @mongodb
OrdersProduct Catalog
Customer Data: Profile, Sessions,
Carts, PersonalizationInventory
NoSQLRDBMS
Platform Services
eCommerce Datastores Dependent External Data Sources and Integrations
CRM ERP PIM
Data warehouse
BI Tools
…
Platform API
Scenario : ECommerce Modernization Initiative
Tweet #MongoDBWebinar and follow @mongodb
Customer Data: Profile, Sessions,
Carts, Personalization
NoSQLRDBMS CRM ERP PIMPartner Sources: Supplier
databases…etc.Legacy:
Mainframe
Product Catalog
Silo Data-sources Problem
SLOW AND POOR SCALABILITY
Tweet #MongoDBWebinar and follow @mongodb
NoSQLRDBMS CRM ERP PIMPartner Sources: Supplier
databases…etc.Legacy:
Mainframe
Operational Single View
Operational Single ViewCustomer Data:
Profile, Sessions, Carts, Personalization
Product Catalog
Tweet #MongoDBWebinar and follow @mongodb
Operational Single View
MongoDB Enterprise Data Hub
Operational Single View
Tweet #MongoDBWebinar and follow @mongodbReference: Metlife Wall Presentation
Tweet #MongoDBWebinar and follow @mongodb
{ product_name: ‘Acme Paint’, color: [‘Red’, ‘Green’], size_oz: [8, 32], finish: [‘satin’, ‘eggshell’]
}
{ product_name: ‘T-shirt’, size: [‘S’, ‘M’, ‘L’, ‘XL’], color: [‘Heather Gray’ … ], material: ‘100% cotton’, wash: ‘cold’, dry: ‘tumble dry low’}
{ product_name: ‘Mountain Bike’, brake_style: ‘mechanical disc’, color: ‘grey’, frame_material: ‘aluminum’, no_speeds: 21, package_height: ‘7.5x32.9x55’, weight_lbs: 44.05, suspension_type: ‘dual’, wheel_size_in: 26}
Documents in the same product catalog collection in MongoDB
Dynamic Schema
Tweet #MongoDBWebinar and follow @mongodb
Flexible Data Model: facilitates agile development and continuous delivery methodologies
Scalability: scale-out dynamically as demand grows
Still Agile, Scalable and Simple
Tweet #MongoDBWebinar and follow @mongodb
High Performance: • More predictable, and lower
latency on less in-memory infrastructure.
In-Memory Storage Engine
Infrastructure Optimization: • Assign a data subset on the
In-Memory SE via Zone Sharding.
• Optimize on cost vs. performance without silos.
.Rich Query Capability: • Full MongoDB Query and
Indexing Support.IN-MEMORY SE NODES WIREDTIGER NODES
Tweet #MongoDBWebinar and follow @mongodb
WEST EAST
Update
SHARD 4TAG: EAST, WT
Local Read/Write with Strong Consistency
Session Data Geographically Localized, and with In-memory Engine Latency
SHARD 2TAG: WEST, WT
SHARD 3TAG: EAST, IN_MEM
SHARD 1TAG: WEST, IN_MEM
Tweet #MongoDBWebinar and follow @mongodb
Durability and Fault-Tolerance:
• Mixed ReplicaSets allow data to be replicated from In-Memory SE to WT SE.
• Full High Availability: automatic fail-over, cross geography.
In-Memory Storage Engine
Tweet #MongoDBWebinar and follow @mongodb
NoSQLRDBMS
Platform Databases Dependent External Data Sources and Integrations
CRM ERP PIMPartner Sources: Supplier
databases…etc.Legacy:
Mainframe
Operational Unified View
Advance Personalization
1. TRAIN/RE-TRAIN ML MODELS
2. APPLY MODELS TO REAL-TIME STREAM OF INTERACTIONS
3. DRIVE TARGETED CONTENT, RECOMMENDATIONS…ETC.
Tweet #MongoDBWebinar and follow @mongodb
Why ?Speed. By exploiting in-memory optimizations, Sparkhas shown up to 100x higher performance thanMapReduce running on Hadoop.
Simplicity. Easy-to-use APIs for operating on largedatasets. This includes a collection of sophisticatedoperators for transforming and manipulatingsemi-structured data.
Unified Framework. Packaged with higher-level libraries,including support for SQL queries, machine learning,stream and graph processing. These standard librariesincrease developer productivity and can be combined tocreate complex workflows.
Tweet #MongoDBWebinar and follow @mongodb
Operational Single View
+Spark Connector• Native Scala connector,
certified by Databricks • Exposes all Spark APIs &
libraries
• Efficient data filtering with predicate pushdown, secondary indexes, & in-database aggregations
• Locality awareness to reduce data movement
Tweet #MongoDBWebinar and follow @mongodb
Locality AwarenessCLUSTER MANAGER
TaskTask
TaskTask
Task
DRIVER PROGRAM
SPARK CONTEXT
Tweet #MongoDBWebinar and follow @mongodb
Operational Single View
+Spark Connector
Blend client data from multiple internal and external sources to drive real time campaign optimization
Tweet #MongoDBWebinar and follow @mongodb
MongoDB+Spark at China Eastern
180m fare calculations & 1.6 billion searches per day
Oracle database peaked at 200 searches per second.
Radically re-architect their fare engine to meet the required 100x growth in search traffic.
Tweet #MongoDBWebinar and follow @mongodb
ETL
(Yesterday’s) Data at the Speed of Thought?
Tweet #MongoDBWebinar and follow @mongodb
BI Connector
BI Connector
db.orders.aggregate( [ { $group: { _id: null, total: { $sum: "$price" } } }] )
SELECT SUM(price) AS totalFROM orders
Tweet #MongoDBWebinar and follow @mongodb
Resources for YouSpark Connector• Download: Spark Packages
GitHub • Documentation
• Whitepaper: Turning Analytics into Real-Time Action
• Education:M233: Getting Started with Spark and MongoDB
In-Memory Storage Engine• Download: Enterprise Server• Documentation
BI Connector• Download: BI Connector• Documentation
Tweet #MongoDBWebinar and follow @mongodb