elasticsearch meetup 30 - 10 - 2014

19
ElasticSearch lessons learned Alberto Paro, October 30, 2014

Upload: alberto-paro

Post on 13-Apr-2017

154 views

Category:

Technology


4 download

TRANSCRIPT

Page 1: ElasticSearch Meetup 30 - 10 - 2014

ElasticSearch

lessonslearned

Alberto Paro, October 30, 2014

Page 2: ElasticSearch Meetup 30 - 10 - 2014

AgendaIntroductionElasticSearchCommon PitfallsQuestions

Page 3: ElasticSearch Meetup 30 - 10 - 2014

About meAlberto Paro, @aparo77My motto: “always learning”CTO at Big Data TechnologiesFreelance Consulting

International companies (Italy, Switzerland, Austria, USA)Web Development on Big Data SolutionsNLP/Spark/Lucene/SOLR/ElasticSearch implementations & trainingReactive and Functional Programming (Scala, Akka, Spray.io, Play)

Page 4: ElasticSearch Meetup 30 - 10 - 2014

About mePackt Publishing Book Author and reviewer

ElasticSearch Cookbook (Author, Dec 2013) ElasticSearch Server (Review, Apr 2014)ElasticSearch Cookbook – Second Edition (Author, Dec 2014)

Using ElasticSearch from 2010 ~ version 1.10PyES – ElasticSearch python driver used by Cern, IBM, …ElasticSearch MongoDB riverDjango ElasticSearch EngineFor companies I developed up to 4 ORMs for ElasticSearch (.Net, Python, Scala) and several plugins

Page 5: ElasticSearch Meetup 30 - 10 - 2014

ElasticSearchApache LuceneStarted in 2010 by Shay BanonOpen Source – Apache LicenseA company was formed in 2012: ElasticSearch

Training, support and development

@kimchy

Page 6: ElasticSearch Meetup 30 - 10 - 2014

ElasticSearchScalable

Distributed, Node DiscoveryAutomatic shardingQuery distribution

RESTful, HTTP APIWith API wrappers for .Net, Ruby, Java, Scala, …JSON in, JSON out -> JSON Coast-to-Coast

Document ModelMaps Json to Object“schemaless” -> field type recognitionKeeps source, keeps ‘version’ number, keeps timestamp, …

Page 7: ElasticSearch Meetup 30 - 10 - 2014

ElasticSearchField types and analyzers

String, numeric, geo, …Custom types: attachments, IP, IBAN, …Arrays, subdocuments, nested documents

Integrated AggregationsYour big data insights

TermsMin/Max/Avg/SumTop hitGeo DistanceAnd more

Page 8: ElasticSearch Meetup 30 - 10 - 2014

DBMS -> ElasticSearchDBMS ElasticSearch MongoDBDatabase Index DatabaseTable Type CollectionField Field FieldRecord Document Document

User must rethink their models.

Page 9: ElasticSearch Meetup 30 - 10 - 2014

DBMS -> ElasticSearchDatamodelling is the same Entity Relation, plus:

Multi valuesEmbeddingMutable/Immutable dataAlternative three foreign key alternative:

Term queryParent/ChildNested

{ "book" : { "isbn" : ”9781782166627", "name" : ”ElasticSearch Cookbook", "author" : { "first_name" : ”Alberto", "last_name" : ”Paro" }, "pages" : 430, "tag" : [”elasticsearch", ”java”, “python”, “Rest”] }}

Page 10: ElasticSearch Meetup 30 - 10 - 2014

Common PitfallsSchema(less)?Automatic field type recognition

Can miss typesStrict about types: only some types can be upgradedCheck the datetime:

UNIX (epoch from …) (the standard world)ISO 8601 -> “yyyy-MM-ddTHH:mm:ssZ”

Page 11: ElasticSearch Meetup 30 - 10 - 2014

Common PitfallsWhat’s the best transport protocol?In JVM, prefer the native

FasterExtra bonus

HTTP best for balancerThrift best for performance

Faster than HTTPCharset “safe”

Page 12: ElasticSearch Meetup 30 - 10 - 2014

Common PitfallsNever, Never publish your ElasticSearch server outside DMZ

Security problems with scriptingSimple HTTP can destroy your server

Or simply drain your money on Amazon Cloud

ElasticSearch has a lot of problems with URL securityVulnerabilities

Page 13: ElasticSearch Meetup 30 - 10 - 2014

Common PitfallsVery fast indexingBulk indexing:

Set up without replicas (replicas = 0, not 1)Play with bulk size (300-500-1000-5000-10000)Performances depends on data complexityBefore indexing: After indexing:

curl -XPUT localhost:9200/test/_settings -d '{ "index" : { "refresh_interval" : "1s" } }'

curl -XPUT localhost:9200/test/_settings -d '{ "index" : { "refresh_interval" : "-1” } }'

Page 14: ElasticSearch Meetup 30 - 10 - 2014

Common PitfallsElasticSearch uses a lot of memory and file-descriptors!Optimize them in /etc/security/limits.conf

elasticsearch soft nofile 32000

elasticsearch hard nofile 32000

elasticsearch - memlock unlimited

Set the ES_HEAP_SIZEElasticSearch config file conf/elasticsearch.yml

bootstrap.mlockall: true

Page 15: ElasticSearch Meetup 30 - 10 - 2014

Common PitfallsWait the yellow statusAre you using ElasticSearch as Primary datastore?

It can replace both DBMS or MongoDBbut it depends on your data

Cron SnapshotsDon’t abuse flush(Be reactive)

Prefer “update” to post repost the same objectUse the “version” Luke!

Page 16: ElasticSearch Meetup 30 - 10 - 2014

Common PitfallsIf possible don’t use rivers

Hard to debugReduce your server responsivityCan crash your serverThey will be removed (2.0?)(Prefer Spark SchemaDDL)

Use scriptsThe easy way to extend ElasticSearch for trivial functionalitiesPrefer Groovy (or native Java for performances)Don’t use inline scripts, if possible

Prefer indexed or file with parameters

Page 17: ElasticSearch Meetup 30 - 10 - 2014

Common PitfallsUse plugins

If it’s not available, write a new one

Always backup before upgradingSnapshots can save your life!

Bug in 1.3.xCheck your plugins to compatibilityRead the ElasticSearch changelog

Sometimes you MUST upgrade your cluster

Use a least 3 nodes (if possible)

Page 18: ElasticSearch Meetup 30 - 10 - 2014

ConclusionsElasticSearch benefits

Easy to setupVery clever architecture

DrawbacksChange sharding in a full index non-trivialPay attention when upgrading

ElasticSearchClever architecture, fast, stable, extendableDoes exactly what you need

Page 19: ElasticSearch Meetup 30 - 10 - 2014

Thank you

[email protected]@aparo77

Questions?