mail search as a sercive: presented by rishi easwaran, aol

17
OCTOBER 13-16, 2016 AUSTIN, TX

Upload: lucidworks

Post on 16-Apr-2017

524 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Mail Search As A Sercive: Presented by Rishi Easwaran, Aol

O C T O B E R 1 3 - 1 6 , 2 0 1 6 • A U S T I N , T X

Page 2: Mail Search As A Sercive: Presented by Rishi Easwaran, Aol

Mail Search As A Service Rishi Easwaran

Principal Software Engineer, Aol

Page 3: Mail Search As A Sercive: Presented by Rishi Easwaran, Aol

3

01Agenda

•  Overview

•  Multicore Architecture

•  Multicore Pain Points

•  Hybrid Cloud Architecture

•  Hybrid Cloud Benefits

•  Search As A Service

•  Future Work

•  Q & A

Page 4: Mail Search As A Sercive: Presented by Rishi Easwaran, Aol

4

02Overview

Metadata Storage Index Storage

Mail Core

Bulk Storage

Protocol Handlers (LMTP, PP3,IMAP,SOAP,WCAP,AOL API)

Sharding Layer

Directory Service

Page 5: Mail Search As A Sercive: Presented by Rishi Easwaran, Aol

5

02Solr Production Metrics

Index Size Range 1KB to 100 GB

Number of hosts > 200

Complex Size > 400 TB

Number of solr indexes > 50 million

Avg updates requests /day > 1.2 billion

Avg search requests /day > 70m

Current Availability 99.99%

Page 6: Mail Search As A Sercive: Presented by Rishi Easwaran, Aol

6

02Multicore Architecture

http://wiki.apache.org/solr/LotsOfCores

Page 7: Mail Search As A Sercive: Presented by Rishi Easwaran, Aol

7

02Multicore Architecture Pain Points

•  Non Availability of Search.

Ø  No backups available for user Solr index.

Ø  High re-indexing time.

Ø  Disk commits every 5 minutes.

•  High variance in response times.

•  Large hardware footprint. (> 1000 hosts)

•  Load balancing and frequent hot spots required manual intervention.

Page 8: Mail Search As A Sercive: Presented by Rishi Easwaran, Aol

8

02Benefits of Upgrading Multicore To Solr 4.2

•  75% Reduction in Search response times

•  50% Reduction in Disk busy.

•  15% reduction in CPU usage.

•  50% Reduction in GC total stop time.

Ø  Application throughput into the 99.9% range

Page 9: Mail Search As A Sercive: Presented by Rishi Easwaran, Aol

9

02Solr Hybrid Cloud Architecture

Page 10: Mail Search As A Sercive: Presented by Rishi Easwaran, Aol

10

02 Cloud Archiving Tool a.k.a (CAT)

•  Split Merge logic caused CPU spike and live instance slowness.

•  We can passively split and merge.

•  We can run split merge on a cloud shard once a day.

•  Split merge process can be controlled to off peak/maintenance hours.

•  Minimal impact to live production user and system.

Page 11: Mail Search As A Sercive: Presented by Rishi Easwaran, Aol

11

02Hybrid Cloud Architecture Benefits

•  Cost Savings of ~30%

•  SSD drives to handle the newest data (Updates 10ms & Searches 50ms)

•  NRT availability of indexed message. (1s commit)

•  SOLR is not single point of failure in our system

Multicore Hybrid Cloud

Inserts 20ms 7ms

Deletes 20ms 7ms

Searches 60ms 55ms

Page 12: Mail Search As A Sercive: Presented by Rishi Easwaran, Aol

12

02Hybrid Cloud Architecture Pain Points

•  Disk Space Issues

Ø  Deleted document clean up and recovery of SSD space. http://lucene.472066.n3.nabble.com/Solr-Cloud-reclaiming-disk-space-from-deleted-documents-td4200506.html

Ø  Multiple index.timestamp directories filling up SSD space. http://lucene.472066.n3.nabble.com/Multiple-index-timestamp-directories-using-up-disk-space-td4201098.html

Ø  60% free space required for optimal operation

•  Solr Overseer Node & Overseer overflow issue.

•  CloudSolrJ 10s hardcoded ZK timeout at initialization.

•  Search Infrastructure tightly coupled with Mail System

Ø  The  external  clients  should  not  care  about  the  underlying  indexing  service.  

Page 13: Mail Search As A Sercive: Presented by Rishi Easwaran, Aol

13

02Search As A Service

Page 14: Mail Search As A Sercive: Presented by Rishi Easwaran, Aol

14

02Service Rest API’s <host>:<port>/addDocument?id=<id>&applicationId=<id>&responseFormat=xml/json&document=<doc> xml <response> <statusCode/> <statusText/> <statusDetailText/> </response> json {"response":{"statusCode":"200","statusText":"OK","statusDetailText":"Details"}}

Page 15: Mail Search As A Sercive: Presented by Rishi Easwaran, Aol

15

02 Issues

•  Dispersion of problem

•  Slow node impact felt at a broader scale

Solution •  Tracing a user request across multiple systems (Specific exception logging).

•  Incorporate Hystrix in HttpSolrServer for latency and fault tolerance

Page 16: Mail Search As A Sercive: Presented by Rishi Easwaran, Aol

16

02 Future work

•  Solr Cloud Cross Data Centre Deployment.

•  Upgrade to Solr 5.3.1 and remove overlapping customization.

•  Fault Tolerance Tuning for different sub-systems.

•  Attachment analysis and indexing with Tika.

Page 17: Mail Search As A Sercive: Presented by Rishi Easwaran, Aol

17

02 Q & A

§ [email protected] § www.linkedin.com/in/rishieaswaran/