Transcript
  • 1. MotivationOur SolutionEvaluationFuture WorkDistributed Formal Concept AnalysisAlgorithms Based on an Iterative MapReduceFrameworkBiao Xu Ruair de Frin Eric Robson Mchel FoghlTelecommunications Software & Systems GroupWaterford Institute of TechnologyICFCA 2012 Leuven, BlegiumBiao Xu, etc. Distributed FCA Algorithms MR

2. MotivationOur SolutionEvaluationFuture WorkOutline1 MotivationThe Basic Problems of Current FCA AlgorithmsRelated Work2 Our SolutionAdopt Iterative MapReduce FrameworkFCA Algorithms Adaptation3 Evaluation4 Future WorkBiao Xu, etc. Distributed FCA Algorithms MR 3. MotivationOur SolutionEvaluationFuture WorkThe Basic Problems of Current FCA AlgorithmsRelated WorkOutline1 MotivationThe Basic Problems of Current FCA AlgorithmsRelated Work2 Our SolutionAdopt Iterative MapReduce FrameworkFCA Algorithms Adaptation3 Evaluation4 Future WorkBiao Xu, etc. Distributed FCA Algorithms MR 4. MotivationOur SolutionEvaluationFuture WorkThe Basic Problems of Current FCA AlgorithmsRelated WorkApply FCA algorithms in real world applicationsTime-consuming to large and high-demension data.Table: Execution time of traditional FCA algorithms (in seconds).Dataset mushroom anon-web census-incomesize 8124125 32711294 103950133NextClosure 618 14671 18230CloseByOne 2543 656 7465Hard to deal with distributed database.Data volumeCommunicationPrivacySecurityBiao Xu, etc. Distributed FCA Algorithms MR 5. MotivationOur SolutionEvaluationFuture WorkThe Basic Problems of Current FCA AlgorithmsRelated WorkOutline1 MotivationThe Basic Problems of Current FCA AlgorithmsRelated Work2 Our SolutionAdopt Iterative MapReduce FrameworkFCA Algorithms Adaptation3 Evaluation4 Future WorkBiao Xu, etc. Distributed FCA Algorithms MR 6. MotivationOur SolutionEvaluationFuture WorkThe Basic Problems of Current FCA AlgorithmsRelated WorkFew work on distributed FCA algorithmsA distributed version of CloseByOne based on HadoopMapReduce.Petr Krajca, etc. Distributed Algorithm for ComputingFormal Concepts Using Map-Reduce Framework. IDA,2009.Differences in our work.using an iterative MapReduce, Twister.mining formal concepts in the least iterations.Biao Xu, etc. Distributed FCA Algorithms MR 7. MotivationOur SolutionEvaluationFuture WorkAdopt Iterative MapReduce FrameworkFCA Algorithms AdaptationOutline1 MotivationThe Basic Problems of Current FCA AlgorithmsRelated Work2 Our SolutionAdopt Iterative MapReduce FrameworkFCA Algorithms Adaptation3 Evaluation4 Future WorkBiao Xu, etc. Distributed FCA Algorithms MR 8. MotivationOur SolutionEvaluationFuture WorkAdopt Iterative MapReduce FrameworkFCA Algorithms AdaptationFeatures of MapReduce FrameworkDivide and conquer strategy: map + reduce function.Table: Partitioned datasets S1 and S2S1 or (OS1, P, IS1)a b c d e f g1 2 3 S2 or (OS2, P, IS2)a b c d e f g4 5 6 Move algorithms to nodes other than datasets.Utilize a cluster not only single machine.Fault tolerance.Biao Xu, etc. Distributed FCA Algorithms MR 9. MotivationOur SolutionEvaluationFuture WorkAdopt Iterative MapReduce FrameworkFCA Algorithms AdaptationMapReduce Data FlowSplit 0 mapreduce Part 0reduce Part 1Split 1 mapSplit 2 mapInputOutputnode 0sortcopymergeBiao Xu, etc. Distributed FCA Algorithms MR 10. MotivationOur SolutionEvaluationFuture WorkAdopt Iterative MapReduce FrameworkFCA Algorithms AdaptationTwister: an Iterative MapReduce RuntimeA lightweight MapReduce runtime developed by IndianaUniversity.Efcient support for Iterative MapReduce computations.Table: Comparison between Twister and HadoopTwister HadoopLong running map/reduce task Single step map/reduceIterative supporting Jobs chainingStatic & dynamic data Static data onlyBiao Xu, etc. Distributed FCA Algorithms MR 11. MotivationOur SolutionEvaluationFuture WorkAdopt Iterative MapReduce FrameworkFCA Algorithms AdaptationTwister ArchitectureTwister DaemonWorker PoolMaster NodeMain ProgramTwister DriverTwister DaemonWorker Poolmapreducemap mapreduce reduceCacheable TasksLocal Disk Local DiskData distribution,collection, andpartition le creationWorker NodeBBBWorker NodePub/subBroker NetworkBiao Xu, etc. Distributed FCA Algorithms MR 12. MotivationOur SolutionEvaluationFuture WorkAdopt Iterative MapReduce FrameworkFCA Algorithms AdaptationOutline1 MotivationThe Basic Problems of Current FCA AlgorithmsRelated Work2 Our SolutionAdopt Iterative MapReduce FrameworkFCA Algorithms Adaptation3 Evaluation4 Future WorkBiao Xu, etc. Distributed FCA Algorithms MR 13. MotivationOur SolutionEvaluationFuture WorkAdopt Iterative MapReduce FrameworkFCA Algorithms AdaptationDecompose the FCA AlgorithmMap phase produces local concepts, FYSn.Reduce phase generates global concepts by merging localconcepts from mappers.Theorem: Given the closuresFYS1, , FYSnfrom n disjoint partitions,FYS = FYS1 FYSn.Named our algorithms with MR : MRCbo, MRGanter,MRGanter+.Biao Xu, etc. Distributed FCA Algorithms MR 14. MotivationOur SolutionEvaluationFuture WorkAdopt Iterative MapReduce FrameworkFCA Algorithms AdaptationMRGanter Work FlowData Split 1MapcomputeClosure()while(!isLastClosure(Closure))runMapReduce()Reduce 1merging()check()Data Split nMapcomputeClosure()Reduce nmerging()check()ClosureDDDS SDatr1, localClosure1atrj, localClosurejatr1, localClosure1atri, localClosureiFigure: Static data labeled by S and dynamic data labeled by D.Biao Xu, etc. Distributed FCA Algorithms MR 15. MotivationOur SolutionEvaluationFuture WorkAdopt Iterative MapReduce FrameworkFCA Algorithms AdaptationRunning example of MRGanter and MRGanter+.d p_i F1 from S1 F2 from S2 Fg {c,g} {b,c,f,g} {c,g}f {b,d,f} {f} {f}e {a,c,e,g} {d,e} {e}d {b,d,f} {d,e} {d}c {c,g} {b,c,f,g} {c,g}b {b,d,f} {b} {b}a {a} {a,d,e,f} {a}{f}g {b,c,d,f,g} {b,c,f,g} {b,c,f,g}e {a,c,e,g} {d,e} {e}d {b,d,f} {d,e} {d}c {c,g} {b,c,f,g} {c,g}b {b,d,f} {b} {b}a {a} {a,d,e,f} {a}{e}g {a,c,e,g} {a,. . . ,g} {a,c,e,g}f {a,. . . ,g} {a,d,e,f} {a,d,e,f}d {b,d,f} {d,e} {d}c {c,g} {b,c,f,g} {c,g}b {b,d,f} {b} {b}a {a} {a,d,e,f} {a}{d}g {b,c,d,f,g} {a,. . . ,g} {b,c,d,f,g}f {b,d,f} {a,d,e,f} {d,f}e {a,. . . ,g} {d,e} {d,e}c {c,g} {b,c,f,g} {c,g}b {b,d,f} {b} {b}a {a} {a,d,e,f} {a}d p_i F1 from S1 F2 from S2 Fg {c,g} {b,c,f,g} {c,g}f {b,d,f} {f} {f}e {a,c,e,g} {d,e} {e}d {b,d,f} {d,e {d}c {c,g} {b,c,f,g} {c,g}b {b,d,f} {b} {b}a {a} {a,d,e,f} {a}{cg}f {b,c,d,f,g} {b,c,f,g} {b,c,f,g}e {a,c,e,g} {a,. . . ,g} {a,c,e,g}d {b,c,d,f,g} {a,. . . ,g} {b,c,d,f,g}b {b,d,f} {b} {b}a {a} {a,d,e,f} {a}{f}g {b,c,d,f,g} {b,c,f,g} {b,c,f,g}e {a,c,e,g} {d,e} {e}d {b,d,f} {d,e} {d}c {c,g} {b,c,f,g} {c,g}b {b,d,f} {b} {b}a {a} {a,d,e,f} {a}{e}g {a,c,e,g} {a,. . . ,g} {a,c,e,g}f {a,. . . ,g} {a,d,e,f} {a,d,e,f}d {b,d,f} {d,e} {d}c {c,g} {b,c,f,g} {c,g}b {b,d,f} {b} {b}a {a} {a,d,e,f} {a}Biao Xu, etc. Distributed FCA Algorithms MR 16. MotivationOur SolutionEvaluationFuture WorkEfciency of MRTable: Execution time: Distributed algorithms are the fastest (inseconds) on certain number of machines (in round brackets).Dataset mushroom anon-web census-incomeconcepts 219010 129009 96531Density 17.36% 1.03% 6.7%NextClosure 618 14671 18230CloseByOne 2543 656 7465MRCbo 241 (11) 693 (11) 803 (11)MRGanter 20269 (5) 20110 (3) 9654 (11)MRGanter+ 198 (9) 496 (9) 358 (11)Biao Xu, etc. Distributed FCA Algorithms MR 17. MotivationOur SolutionEvaluationFuture WorkScalability of MR (1)0 2 4 6 8 10 12102103104105Nodes (Count)CPUTime(Second)MRGanter+MRCboMRGanterFigure: Mushroom dataset: comparison of MRGanter+, MRCbo andMRGanter. MRGanter+ outperforms MRCbo and MRGanter whendense data is processed.Biao Xu, etc. Distributed FCA Algorithms MR 18. MotivationOur SolutionEvaluationFuture WorkScalability of MR (2)0 2 4 6 8 10 12102103104105Nodes (Count)CPUTime(Second)MRGanter+MRCboMRGanterFigure: Anon-web dataset: comparison of MRGanter+, MRCbo andMRGanter. MRGanter+ is faster when more than 3 nodes are used.Biao Xu, etc. Distributed FCA Algorithms MR 19. MotivationOur SolutionEvaluationFuture WorkScalability of MR (3)0 2 4 6 8 10 12102103104105Nodes (Count)CPUTime(Second)MRGanter+MRCboMRGanterFigure: Census dataset: comparison of MRGanter+, MRCbo andMRGanter. MRGanter+ is fastest when a large dataset is processed.Biao Xu, etc. Distributed FCA Algorithms MR 20. MotivationOur SolutionEvaluationFuture WorkFuture WorkExplore the effect of data distribution between clusternodes.Examine MR performance with larger dataset sizes.Extend our approach by reducing the size of intermediatedata.Biao Xu, etc. Distributed FCA Algorithms MR 21. MotivationOur SolutionEvaluationFuture WorkThank youQuestions?Biao Xu, etc. Distributed FCA Algorithms MR


Top Related