meetup#2: building responsive symbology & suggest webservice

22
Building responsive Symbology & Suggest web service with MongoDB Andrei Palchys, @apalchys Alex Kosau, @alexkosau

Upload: minsk-mongodb-user-group

Post on 14-Jul-2015

123 views

Category:

Technology


0 download

TRANSCRIPT

Building responsive Symbology & Suggest web

servicewith MongoDB

Andrei Palchys, @apalchys

Alex Kosau, @alexkosau

Introduction• Customer: Thomson Reuters

• Business domain: Financial markets

• Goal: Implement Next-Gen financial web services

• The project started: July 2011

• The project finished: (Dec 2011)

• Team: 1 team lead, 5+1 developers, 2 QA

Web services• Symbology Web ServiceProvides reference data about financial instruments, via symbols, codes or instrument names

• Suggest Web Service

Architecture

Sources ETLSearch Engine

Web services

Front End

Desktop

Sources ETL The New Web Services Desktop

Old

New

Reasons to write the new web services

• Bad performance

• Expensive for scaling or extending

• Not easy to manage some type of data

Requirements for the web new services

• Performance 95% Symbology requests should fit in 50ms.

95% Suggest requests should fit in 25ms.

• Use normalized data

• Use less memory as much as possible

• Fast data loading into DB

• Windows environment and .Net platform

• Microsoft SQL Server • 13 ms, too slow

• Oracle TimesTen• Relational

• Completely in-memory: guaranteed latency but slow startup

• Expensive

• McObject’s ExtremeDb• Object DB

• Native C interface: designed for performance

• Ultra reliability

• Still expensive

What we considered from commercial databases

• Redis

• Hbase

• Cassandra

• RavenDB

All these databases miss one of the requirements

What we considered from free databases

MongoDB

• Document-oriented

• Simple use (decent interface for .NET available)

• Simple maintenance (monitoring, replication, sharding)

• Data is stored in-memory once used.

• 1ms average response time

• Cross-platform (native Windows support)

Databases

• Symbology DB – about 30GB of data

• Suggest DB – >22 GB of data

Symbology WSSuggest WS

Symbology DB

Suggest WS

Suggest DB

• 6 “clusters” all around the world (TR data centers), in replica set.

• “cluster” – 3 servers (replica set + sharding) + 1 arbiter

• 2 of them are also used to load data.

• 128GB of memory per server

Deployment (planned)

• Fast search by full key

• Minimize the space taken by the data, since we need it to fit into RAM

• Data is Text only (no pictures etc)

• Full document required always

• Only some fields are used to query data, and these fields are short (3..10 symbols)

• New fields should be easily added to the “queryable” list

• Composite queries are needed sometimes• AB and CD and not EF or GH

• Fast data loading

Symbology DB: challenge

Map the names of the document fields to ints

RIC -> 1

Name -> 2

{"1": "GOOG.O","2": "Google"

}

Symbology DB: solution

Unite all queryable fields into arrays

• Query syntax is the same

• Single index – less space occupied

• Easy to add new searchable data

"s":[{

"k": 1, "v": "MSFT.O"

},{

"k": 2, "v": "Microsoft Inc."

}

]

Symbology DB: solution

Combine key and value properties

• Takes less space

• Use regex /^a../

• No performance decrease – MongoDB uses index for regex which starts with /^

"s":[

"MSFT.O|1",

"Microsoft Inc.|2"

]

Query: { s: { $regex: \"^MSFT.O\\|\" } }

Symbology DB: solution

Compress not queryable data and store as a single field (binary data)

• Encode with Protocol Buffers or MsgPack– In our case, MsgPack 2x faster than Protobuf

• Zip with Snappy – Fastest algorithm in the world.

{

"b" : BinData(0,"CgcxMDkwMzcwEgZ1cztJQk0xAAAAAAAA8D86A05ZU0IXTmV3IFl

vcmsgU3RvY2sgRXhjaGFuZ2VZAAAAAAAA8D9gAXABeAGJAQAAAAAAAPA/ogEFNDc0MU6qAQU0NzQxTrI…“)

}

Symbology DB: solution

Symbology DB: solution

Change ETL output format to json and insert directly to MongoDB

It helped to decrease loading time from 9h to 1h.

• Fast search by partial text

• Keep only top 50 entities per term

• Generate Suggest DB from existing Symbology DB

Suggest DB: challenge

Use “Inverted” index for fast search by partial text

{“term”: “g”, “references”:[…]},

{“term”: “go”, “references”:[…]},

{“term”: “goo”, “references”:[…]},

{“term”: “goog”, “references”:[…]},

Suggest DB: solution

Generate Suggest DB from existing Symbology DB

• About 750 mln temporary documents

• MongoDB Map Reduce is too slow

• All MongoDB based algorithms takes a lot of time

Use Amazon Elastic MapReduce!

10h -> 40 mins

Practical usage Amazon Elastic MapReduce (Viktar Basharymau)

http://bit.ly/usage_mapreduce

Suggest DB: solution

- Use IBsonSerializer interface instead of BsonElement attributes

- Driver has good performance – we have not found any bottlenecks.

.Net MongoDB driver

Questions?