曾勇 elastic search-intro

27
LAMP 人 人人人人人人人 第 12 第 第第第第第第第第第第第第第第第第第第 - 第第第第第第 www.LAMPER.cn QQ 人83304912 http://weibo.com/lampercn

Upload: shaoning-pan

Post on 26-Jan-2015

251 views

Category:

Technology


9 download

DESCRIPTION

#LAMP人#第12期《新一代互联网行为定向广告技术的挑战与优化- 品友互动专场》 – anyshare 之 曾勇

TRANSCRIPT

Page 1: 曾勇 Elastic search-intro

LAMP人 主题分享交流会

第12期:《新一代互联网行为定向广告技术的挑战与优化》

- 品友互动专场

www.LAMPER.cnQQ群:83304912

http://weibo.com/lampercn

Page 2: 曾勇 Elastic search-intro

ElasticSearchA search engine “ready to fly”

Medcl/2012/2/18

Page 3: 曾勇 Elastic search-intro

About me

• Medcl

• medcl@sina• medcl@github• [email protected]• log.medcl.net

Page 4: 曾勇 Elastic search-intro

Why I am here?

• 好东西需要与大家一起分享!

Page 5: 曾勇 Elastic search-intro

What’s elasticsearch

• “Distributed, (Near) Real Time, Search Engine”

• Open Source ( Apache 2.0 )• RESTful• Free Schema ( Dynamic )• MultiTenant• Scalable• High Availability• Rich Search Features• Good Expansibility• … …

Page 6: 曾勇 Elastic search-intro

first impression

Page 7: 曾勇 Elastic search-intro
Page 8: 曾勇 Elastic search-intro
Page 9: 曾勇 Elastic search-intro

Let’s start the trip

Page 10: 曾勇 Elastic search-intro

Debug Tools

Page 11: 曾勇 Elastic search-intro

Index a document

curl –XPOST http://localhost:9200/myindex/share/1-d’ { "url" : "http://www.lamper.cn/", "date" : "2012-02-18 13:00:00", "location" : "beijing, 北京 "}’

RESTfulURL 地址

索引文档内容,Json 格式

Field字段名称 字段内容

Page 12: 曾勇 Elastic search-intro

Index Response

{ "ok": true, "_index": "myindex", "_type": "share", "_id": "1", "_version": 1}

Page 13: 曾勇 Elastic search-intro

Explain the url

http://localhost:9200/myindex/share/1

服务器 IP 地址

HTTP 端口

索引名称

索引类型名称

索引文档唯一标识

Page 14: 曾勇 Elastic search-intro

Query the document

curl –XGET http://localhost:9200/myindex/share/_search?q=location:beijing

ES 服务器地址

索引名称

类型名称

搜索 RESTful 接口

指定查询条件

查询条件,字段名 : 值

Page 15: 曾勇 Elastic search-intro

Search Response{ "took": 12, "timed_out": false, "_shards": { "total": 5, "successful": 5, "failed": 0 }, "hits": { "total": 1, "max_score": 0.5, "hits": [ { "_index": "myindex", "_type": "share", "_id": "1", "_score": 0.5, "_source": { "url": "http://www.lamper.cn/", "date": "2012-02-18 13:00:00", "location": "beijing, 北京 " } } ] }}

Page 16: 曾勇 Elastic search-intro

Queries

http://localhost:9200/myindex/share/_search?q=beijinghttp://localhost:9200/myindex/share,conf/_search?q=beijinghttp://localhost:9200/myindex/_search?q=beijinghttp://localhost:9200/myindex,myindex2/_search?q=beijinghttp://localhost:9200/_search?q=beijing

Page 17: 曾勇 Elastic search-intro

QueryDSL

curl -XPOST http://localhost:9200/myindex/_search –d’{ "query": { "term": { "location": "beijing" } }}’

Why QueryDSL?

Filters 、 Caching 、 Highlighting 、 Facet 、 Compl

exQuery… …

Page 18: 曾勇 Elastic search-intro

Scalability&HA

Page 19: 曾勇 Elastic search-intro

Distributed Lucene Directory

• Each index is fully sharded with a configurable number of shards.

• Each shard can have zero or more replicas.• Read / Search operations performed on either

replica shard.

Page 20: 曾勇 Elastic search-intro

Automatic shard allocation

From:http://www.slideshare.net/elasticsearch/elasticsearch-at-berlinbuzzwords-2010#

Page 21: 曾勇 Elastic search-intro

Scalability

• nodes that can hold data, and nodes that do not.

• There is no need for a load balancer in elasticsearch, each node can receive a request, and if it can’t handle it, it will automatically delegate it to the appropriate node(s).

• If you want to scale out search, you can simply have more shard replicas per shard.

Page 22: 曾勇 Elastic search-intro

Transaction log

• Indexed / deleted doc is fully persistent• No need for a Lucene IndexWriter#commit• Managed using a transaction log / WAL• Full single node durability (kill dash 9)• Utilized when doing hot relocation of shards• Periodically “flushed” (calling IW#commit)

Page 23: 曾勇 Elastic search-intro

BASE

• Each document you index is there once the index operation is done.

• No need to commit or something similar to get everything persisted.

• A shard can have 1 or more replicas for HA. • Gateway persistency is done in the

background in an async manner.

Page 24: 曾勇 Elastic search-intro

Not Mentioned Here…

• Versioning• Template• River• Percolator• PartialUpdate• Routing• Parent-Child Type• Scripting• … …

That’s Too Much,Discovery it yourself

Page 25: 曾勇 Elastic search-intro

Community&Support

• http://github.com/elasticsearch• http://groups.google.com/group/elasticsearch• Irc:#elasticsearch@freenode

• qq 群: 190605846• http://doc.elasticsearch.cn• http://s.medcl.net/

Page 26: 曾勇 Elastic search-intro

BTW

• 招人 in’– 分布式– 高性能– 海量数据处理– 个性化推荐– 搜索引擎

• 对以上任一感兴趣者:– 欢迎加入我们的团伙 !

My Company

Page 27: 曾勇 Elastic search-intro

Thank you!