real time bom explosions with apache solr and spark
TRANSCRIPT
REAL TIME BOM EXPLOSIONS WITH APACHE SOLR AND SPARKAndreasZitzelsberger
BILLS OF MATERIAL (BOMS) EXPLAINED
BOMS ARE NEEDED FOR…
ProductionPlanning ForecastingDemand Scenario-BasedPlanningRunningSimulations
DemandResolver
THE BIG PICTURE
Parts and abstract demands
Orders / Independent Demands
Production
Planning
Analytics
BOMs / Dependent demands
ORDERS / INDEPENDENT DEMANDSOrderData CarConfiguration
Id,OrderTime,… TYPE VARIANT UPHOLSTERY PAINT OPTION_1,OPTION_2,…
… …
… …
Customer‘s
orders
Predicted
Orders
Stockcars
Image:EnergySolutionsInternationalInc
PARTS AND ABSTRACT DEMANDSId Description Attributes TermsTR_1 Automatictransmission,supplierA … TYPE_1OR
VAR_2ANDNOTOPT_2
…
TR_2 Automatictransmission,supplierB … TYPE_2ANDVAR_1OR
OPT_5OROPT_6
DT_1 150hp,automatictransmissiondrivetrain …
LI_1 Xenonlights … OPT_3ORVAR_9OR…
BT_1 SmallBattery … 8<=OPT_3*200+VAR_9*5+
OPT_4*2…<=12
Theconstructiveapproachisimpossibleduetocombinatoryexplosion.Therulesneedtobeevaluatedduringdemandresolution.
DemandResolver
WE NEED TO OVERCOME THREE CORE PROBLEMS…
Parts and abstract demands
Orders / Independent Demands
Production
Planning
Analytics
BOMs / Dependent demands
DemandResolver
1. HOW TO RESOLVE DEMANDS AS FAST AS POSSIBLE?
Parts and abstract demands
Orders / Independent Demands
BOMs / Dependent demands
2. HOW TO JOIN LARGE DATA SETS?
Parts and abstract demands
Analytics
BOMs / Dependent demands
Orders / Independent Demands
3. ANALYTICS ON LARGE DATA SETS?
Parts and abstract demands
Analytics
BOMs / Dependent demands
Orders / Independent Demands
DemandResolver
1. HOW TO RESOLVE DEMANDS AS FAST AS POSSIBLE?
Parts and abstract demands
Orders / Independent Demands
Ontoproblem#1…
BOMs / Dependent demands
THE SIMPLE SOLUTION…
A BETTER SOLUTION…
2millionparts
15millionabstract
demands
OR: ONLY CALCULATE WHAT IS STRICTLY NECESSARY
Step2:REFINE
Refinethesupersetby
evaluatingtheusagetermsin
O(n).
Actualdemands
Post-
Filter
Pre-
Filter
AbstractDemands
Step1:FILTER
Pre-filterdemandswithafast
indexinO(logn).
Yieldsasupersetofthe
requiredparts.
~6000possible
demands
~3000actualdemands
PRE-FILTER USING CANDIDATE AND A PROSCRIPTION SETS
■ Consider((OPT_1)OR(OPT_2))ANDNOT(OPT_3OROPT_4)
■ Candidateset: C(t)={OPT_1,OPT_2}
■ Proscriptionset: P(t)={OPT_3,OPT_4}
■ Preselectbythecriterium:
■ AtleastoneconditionfromC(t)issatisfied
■ NoneoftheconditionsfromP(t)aresatisfied
EXAMPLE CANDIDATE AND PROSCRIPTION SETS
16
Candidateset Proscriptionset
Apartcanonlybeusedtogetherwith
“white”or“blue”paint.
Entry:PART_1,C-PAINT:[white,blue]
Query:C-PAINT=blue
Apartcannotbeusedwith“yellow”paint.
Entry:PART_2,P-PAINT:[yellow]
Query:NOTP-PAINT=yellow
SolrServer
SOLR IMPLEMENTATION
PART_ID:“…”
//Partattributes
//….
TERMS:[…]
TYPE_C:[…]
TYPE_P:[…]
VARIANT_C:[…]
VARIANT_P:[…]
OPTIONS_C:[…]
OPTIONS_P:[…]
Part
Ifevaluate(TERMS)
accept
Else
drop
Postfilter
BulkSolrquery
forNorders
Bulkexportof
thesearch
results
Dependent
Demands
(Solr)
DemandResolver
WHY APACHE SOLR■ Reallyfast■ Powerfulfaceting■ Solrcloud■ Extensible
■Mostimportantly,itjustworks
BOMs / Dependent demands
DemandResolver
1. HOW TO RESOLVE DEMANDS AS FAST AS POSSIBLE?
Parts and abstract demands
Orders / Independent Demands
2. HOW TO JOIN TWO LARGE DATA SETS?
Parts and abstract demands
Analytics
BOMs / Dependend demands
Orders / Independent Demands
let‘scontinuewith
problem#2…
FORWARD AND BACKWARD JOINS ON PARTSOrders/
IndependentDemands
PartUsages/ DependendDemands
Parts/
AbstractDemands
Given: Ordercriteria(TYPE=XY12) andpartscriteria(PART_ID=4711OR4712)
Task: Findallordersthatsatisfytheordercriteria ANDcontainatleastonepart,thatsatisifiesthepartcriteria Returnallordersandthepartsforeachorderthatsatisfythepartcriteria.
Id … Id …OrderId PartId
7.000.000
2.000.000
21.000.000.000
■ Nobodywantstojoin7millionx21billionx2millionrecords.
THIS IS ACTUALLY HARD
■ Nobodywantstostore21billionrecords.
PARTIAL DENORMALIZATION TO THE RESCUE
Orders
<10Mio.recordsinsteadof21billion
Ids Attributes PartIds
1 … 1,3,786,…
2 … 2,6,786,…
3 .. 3,75,95,..
Ids Attributes OrderIds
1 … 54,23321,..
2 … 1,4,8652,..
3 … 864,23452,…
Parts
SIMPLE, BUT NOT EASY: CUSTOM JOIN LOGIC IN SOLR
Step1Pre-filtertheparts
usinganindexsearch
Step2Pre-filtertheorders
usinganindex
search
Step3Computethe
intersectionofboth
supersets
Allparts
Searchedpartswith
attributeA=XY
Allorders
OrderstobebuiltinQ4with
attributeA
Intersectrecords
basedontechnicalidsOrderstobebuiltin
Q4
Final
Result
• CustomSearchComponents
• SolrSearchComponents
WE ONLY EXECUTE THE JOIN IF WE MUST
Nofilter Filterbyorderattributes Filterbypartattributes Filterbyorderandpartattributes
Facetsonparts Index JOIN Index JOIN
Facectsonorders Index Index JOIN JOIN
Listparts JOIN JOIN JOIN JOIN
Listorders Index Index JOIN JOIN
Resultsetsmaybeunwieldinglylarge.
Earlyterminationforinteractivequeries
iftheresultgrowstoolarge.
Orders / Independent Demands
BOMs / Dependend demands
2. HOW TO JOIN TWO LARGE DATA SETS?
Parts and abstract demands
Analytics
3. ANALYTICS ON VERY LARGE DATA SETS?
Parts and abstract demands
Analytics
BOMs / Dependend demands
Orders / Independent Demands
Whataboutproblem
#3?
WHY APACHE SPARK■ Fast■ High-LevelfunctionalAPI■Wellintegrated
■ Batchsemantics
■WorkswellwithSolr(Ifyouuse
yourownconnector…)
TECHNICAL ARCHITECTURE
ApacheSolr
ApacheSpark
DemandResolver
APIFacade
Loader
SourceSystems
HardwareNode1
SolrInstance
Shard20
HardwareNode2
SolrInstance
Shard30
SparkWorker SparkWorker
Shard21Shard11
SparkMaster
ZooKeeper
... ...
SparkMaster
ZooKeeper(Leader)
APIFacadeAPIFacade
Load-Balancer
Jenkins+Loader(Leader)Jenkins+Loader(Backup)
HardwareNoden
SolrInstance
Shardn
Shardn
...
SparkWorker
Addnodesasrequired
OPERATIONS ARCHITECTURE
APIFacade
BULK EXPORT WITH A SHARED OFF-HEAP CACHE
LuceneSegment
LuceneSegment
LuceneSegment
Shared Off-Heap Cache
1
DocIdOrder
DocIdPart
Web interface
2
3
BusinessFacadeObject
data
4
http://chronicle.software/products/chronicle-map/
Orders / Independent Demands
BOMs / Dependend demands
3. ANALYTICS ON VERY LARGE DATA SETS?
Parts and abstract demands
Analytics
LESSONS LEARNEDhttps://en.wikipedia.org/wiki/Melk_Abbey
OPTIMIZE LATE, BUT DO OPTIMIZE
©2015MaxSiedentopf
SOME OF OUR OPTIMIZATIONS
36
Bulk RequestHandler
Binary DocValue support
Boolean interpreter as postfilter
Mass data binary response format
Search components with custom JOIN algorithm
Solving thousands of orders with one request
Be able to store data effective using our own JOIN implementation.
Speed up the access to persisted data dramatically using binary doc values.
0111 0111
Use the standard Solr cinary codec with an optimized data-model that reduce the amount of data by a factor of 8.
Computing BOM
explosions
Enable Solr with custom post filters to filter documents using stored bool expessions .
LOW LEVEL OPTIMIZATIONS CAN YIELD GREAT RETURNS
37
October 14 January 15 May 15 October 15
4,9 ms
0,28 ms
24 ms
Tim
e to
cal
cula
te th
e B
oM fo
r one
ord
er
0,08 ms
Scoring (-8%)
Solr–Query Parser (-25%)
Stat-Cache (-8%)
Using String DocValues (-28%)
Development of the processing time Demand Calulation Service PoC Profiling result and the some improvements to reduce the query time.
DON’T TRUST PAPER, DO POCS
IN CONCLUSION
THANK YOU
@andreasz82 [email protected]