real time bom explosions with apache solr and spark

40
REAL TIME BOM EXPLOSIONS WITH APACHE SOLR AND SPARK Andreas Zitzelsberger

Upload: andreas-zitzelsberger

Post on 14-Feb-2017

364 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Real Time BOM Explosions With Apache Solr and Spark

REAL TIME BOM EXPLOSIONS WITH APACHE SOLR AND SPARKAndreasZitzelsberger

Page 2: Real Time BOM Explosions With Apache Solr and Spark

BILLS OF MATERIAL (BOMS) EXPLAINED

Page 3: Real Time BOM Explosions With Apache Solr and Spark

BOMS ARE NEEDED FOR…

ProductionPlanning ForecastingDemand Scenario-BasedPlanningRunningSimulations

Page 4: Real Time BOM Explosions With Apache Solr and Spark

DemandResolver

THE BIG PICTURE

Parts and abstract demands

Orders / Independent Demands

Production

Planning

Analytics

BOMs / Dependent demands

Page 5: Real Time BOM Explosions With Apache Solr and Spark

ORDERS / INDEPENDENT DEMANDSOrderData CarConfiguration

Id,OrderTime,… TYPE VARIANT UPHOLSTERY PAINT OPTION_1,OPTION_2,…

… …

… …

Customer‘s

orders

Predicted

Orders

Stockcars

Image:EnergySolutionsInternationalInc

Page 6: Real Time BOM Explosions With Apache Solr and Spark

PARTS AND ABSTRACT DEMANDSId Description Attributes TermsTR_1 Automatictransmission,supplierA … TYPE_1OR

VAR_2ANDNOTOPT_2

TR_2 Automatictransmission,supplierB … TYPE_2ANDVAR_1OR

OPT_5OROPT_6

DT_1 150hp,automatictransmissiondrivetrain …

LI_1 Xenonlights … OPT_3ORVAR_9OR…

BT_1 SmallBattery … 8<=OPT_3*200+VAR_9*5+

OPT_4*2…<=12

Theconstructiveapproachisimpossibleduetocombinatoryexplosion.Therulesneedtobeevaluatedduringdemandresolution.

Page 7: Real Time BOM Explosions With Apache Solr and Spark

DemandResolver

WE NEED TO OVERCOME THREE CORE PROBLEMS…

Parts and abstract demands

Orders / Independent Demands

Production

Planning

Analytics

BOMs / Dependent demands

Page 8: Real Time BOM Explosions With Apache Solr and Spark

DemandResolver

1. HOW TO RESOLVE DEMANDS AS FAST AS POSSIBLE?

Parts and abstract demands

Orders / Independent Demands

BOMs / Dependent demands

Page 9: Real Time BOM Explosions With Apache Solr and Spark

2. HOW TO JOIN LARGE DATA SETS?

Parts and abstract demands

Analytics

BOMs / Dependent demands

Orders / Independent Demands

Page 10: Real Time BOM Explosions With Apache Solr and Spark

3. ANALYTICS ON LARGE DATA SETS?

Parts and abstract demands

Analytics

BOMs / Dependent demands

Orders / Independent Demands

Page 11: Real Time BOM Explosions With Apache Solr and Spark

DemandResolver

1. HOW TO RESOLVE DEMANDS AS FAST AS POSSIBLE?

Parts and abstract demands

Orders / Independent Demands

Ontoproblem#1…

BOMs / Dependent demands

Page 12: Real Time BOM Explosions With Apache Solr and Spark

THE SIMPLE SOLUTION…

Page 13: Real Time BOM Explosions With Apache Solr and Spark

A BETTER SOLUTION…

Page 14: Real Time BOM Explosions With Apache Solr and Spark

2millionparts

15millionabstract

demands

OR: ONLY CALCULATE WHAT IS STRICTLY NECESSARY

Step2:REFINE

Refinethesupersetby

evaluatingtheusagetermsin

O(n).

Actualdemands

Post-

Filter

Pre-

Filter

AbstractDemands

Step1:FILTER

Pre-filterdemandswithafast

indexinO(logn).

Yieldsasupersetofthe

requiredparts.

~6000possible

demands

~3000actualdemands

Page 15: Real Time BOM Explosions With Apache Solr and Spark

PRE-FILTER USING CANDIDATE AND A PROSCRIPTION SETS

■ Consider((OPT_1)OR(OPT_2))ANDNOT(OPT_3OROPT_4)

■ Candidateset: C(t)={OPT_1,OPT_2}

■ Proscriptionset: P(t)={OPT_3,OPT_4}

■ Preselectbythecriterium:

■ AtleastoneconditionfromC(t)issatisfied

■ NoneoftheconditionsfromP(t)aresatisfied

Page 16: Real Time BOM Explosions With Apache Solr and Spark

EXAMPLE CANDIDATE AND PROSCRIPTION SETS

16

Candidateset Proscriptionset

Apartcanonlybeusedtogetherwith

“white”or“blue”paint.

Entry:PART_1,C-PAINT:[white,blue]

Query:C-PAINT=blue

Apartcannotbeusedwith“yellow”paint.

Entry:PART_2,P-PAINT:[yellow]

Query:NOTP-PAINT=yellow

Page 17: Real Time BOM Explosions With Apache Solr and Spark

SolrServer

SOLR IMPLEMENTATION

PART_ID:“…”

//Partattributes

//….

TERMS:[…]

TYPE_C:[…]

TYPE_P:[…]

VARIANT_C:[…]

VARIANT_P:[…]

OPTIONS_C:[…]

OPTIONS_P:[…]

Part

Ifevaluate(TERMS)

accept

Else

drop

Postfilter

BulkSolrquery

forNorders

Bulkexportof

thesearch

results

Dependent

Demands

(Solr)

DemandResolver

Page 18: Real Time BOM Explosions With Apache Solr and Spark

WHY APACHE SOLR■ Reallyfast■ Powerfulfaceting■ Solrcloud■ Extensible

■Mostimportantly,itjustworks

Page 19: Real Time BOM Explosions With Apache Solr and Spark

BOMs / Dependent demands

DemandResolver

1. HOW TO RESOLVE DEMANDS AS FAST AS POSSIBLE?

Parts and abstract demands

Orders / Independent Demands

Page 20: Real Time BOM Explosions With Apache Solr and Spark

2. HOW TO JOIN TWO LARGE DATA SETS?

Parts and abstract demands

Analytics

BOMs / Dependend demands

Orders / Independent Demands

let‘scontinuewith

problem#2…

Page 21: Real Time BOM Explosions With Apache Solr and Spark

FORWARD AND BACKWARD JOINS ON PARTSOrders/

IndependentDemands

PartUsages/ DependendDemands

Parts/

AbstractDemands

Given: Ordercriteria(TYPE=XY12) andpartscriteria(PART_ID=4711OR4712)

Task: Findallordersthatsatisfytheordercriteria ANDcontainatleastonepart,thatsatisifiesthepartcriteria Returnallordersandthepartsforeachorderthatsatisfythepartcriteria.

Id … Id …OrderId PartId

7.000.000

2.000.000

21.000.000.000

Page 22: Real Time BOM Explosions With Apache Solr and Spark

■ Nobodywantstojoin7millionx21billionx2millionrecords.

THIS IS ACTUALLY HARD

■ Nobodywantstostore21billionrecords.

Page 23: Real Time BOM Explosions With Apache Solr and Spark

PARTIAL DENORMALIZATION TO THE RESCUE

Orders

<10Mio.recordsinsteadof21billion

Ids Attributes PartIds

1 … 1,3,786,…

2 … 2,6,786,…

3 .. 3,75,95,..

Ids Attributes OrderIds

1 … 54,23321,..

2 … 1,4,8652,..

3 … 864,23452,…

Parts

Page 24: Real Time BOM Explosions With Apache Solr and Spark

SIMPLE, BUT NOT EASY: CUSTOM JOIN LOGIC IN SOLR

Step1Pre-filtertheparts

usinganindexsearch

Step2Pre-filtertheorders

usinganindex

search

Step3Computethe

intersectionofboth

supersets

Allparts

Searchedpartswith

attributeA=XY

Allorders

OrderstobebuiltinQ4with

attributeA

Intersectrecords

basedontechnicalidsOrderstobebuiltin

Q4

Final

Result

• CustomSearchComponents

• SolrSearchComponents

Page 25: Real Time BOM Explosions With Apache Solr and Spark

WE ONLY EXECUTE THE JOIN IF WE MUST

Nofilter Filterbyorderattributes Filterbypartattributes Filterbyorderandpartattributes

Facetsonparts Index JOIN Index JOIN

Facectsonorders Index Index JOIN JOIN

Listparts JOIN JOIN JOIN JOIN

Listorders Index Index JOIN JOIN

Resultsetsmaybeunwieldinglylarge.

Earlyterminationforinteractivequeries

iftheresultgrowstoolarge.

Page 26: Real Time BOM Explosions With Apache Solr and Spark

Orders / Independent Demands

BOMs / Dependend demands

2. HOW TO JOIN TWO LARGE DATA SETS?

Parts and abstract demands

Analytics

Page 27: Real Time BOM Explosions With Apache Solr and Spark

3. ANALYTICS ON VERY LARGE DATA SETS?

Parts and abstract demands

Analytics

BOMs / Dependend demands

Orders / Independent Demands

Whataboutproblem

#3?

Page 28: Real Time BOM Explosions With Apache Solr and Spark
Page 29: Real Time BOM Explosions With Apache Solr and Spark

WHY APACHE SPARK■ Fast■ High-LevelfunctionalAPI■Wellintegrated

■ Batchsemantics

■WorkswellwithSolr(Ifyouuse

yourownconnector…)

Page 30: Real Time BOM Explosions With Apache Solr and Spark

TECHNICAL ARCHITECTURE

ApacheSolr

ApacheSpark

DemandResolver

APIFacade

Loader

SourceSystems

Page 31: Real Time BOM Explosions With Apache Solr and Spark

HardwareNode1

SolrInstance

Shard20

HardwareNode2

SolrInstance

Shard30

SparkWorker SparkWorker

Shard21Shard11

SparkMaster

ZooKeeper

... ...

SparkMaster

ZooKeeper(Leader)

APIFacadeAPIFacade

Load-Balancer

Jenkins+Loader(Leader)Jenkins+Loader(Backup)

HardwareNoden

SolrInstance

Shardn

Shardn

...

SparkWorker

Addnodesasrequired

OPERATIONS ARCHITECTURE

APIFacade

Page 32: Real Time BOM Explosions With Apache Solr and Spark

BULK EXPORT WITH A SHARED OFF-HEAP CACHE

LuceneSegment

LuceneSegment

LuceneSegment

Shared Off-Heap Cache

1

DocIdOrder

DocIdPart

Web interface

2

3

BusinessFacadeObject

data

4

http://chronicle.software/products/chronicle-map/

Page 33: Real Time BOM Explosions With Apache Solr and Spark

Orders / Independent Demands

BOMs / Dependend demands

3. ANALYTICS ON VERY LARGE DATA SETS?

Parts and abstract demands

Analytics

Page 34: Real Time BOM Explosions With Apache Solr and Spark

LESSONS LEARNEDhttps://en.wikipedia.org/wiki/Melk_Abbey

Page 35: Real Time BOM Explosions With Apache Solr and Spark

OPTIMIZE LATE, BUT DO OPTIMIZE

©2015MaxSiedentopf

Page 36: Real Time BOM Explosions With Apache Solr and Spark

SOME OF OUR OPTIMIZATIONS

36

Bulk RequestHandler

Binary DocValue support

Boolean interpreter as postfilter

Mass data binary response format

Search components with custom JOIN algorithm

Solving thousands of orders with one request

Be able to store data effective using our own JOIN implementation.

Speed up the access to persisted data dramatically using binary doc values.

0111 0111

Use the standard Solr cinary codec with an optimized data-model that reduce the amount of data by a factor of 8.

Computing BOM

explosions

Enable Solr with custom post filters to filter documents using stored bool expessions .

Page 37: Real Time BOM Explosions With Apache Solr and Spark

LOW LEVEL OPTIMIZATIONS CAN YIELD GREAT RETURNS

37

October 14 January 15 May 15 October 15

4,9 ms

0,28 ms

24 ms

Tim

e to

cal

cula

te th

e B

oM fo

r one

ord

er

0,08 ms

Scoring (-8%)

Solr–Query Parser (-25%)

Stat-Cache (-8%)

Using String DocValues (-28%)

Development of the processing time Demand Calulation Service PoC Profiling result and the some improvements to reduce the query time.

Page 38: Real Time BOM Explosions With Apache Solr and Spark

DON’T TRUST PAPER, DO POCS

Page 39: Real Time BOM Explosions With Apache Solr and Spark

IN CONCLUSION

Page 40: Real Time BOM Explosions With Apache Solr and Spark

THANK YOU

@andreasz82 [email protected]