parallel computing - courses.cs.ut.ee · parallel computing useful for large data sets ... report...

20
Parallel Computing Benson Muite [email protected] http://kodu.ut.ee/~benson https://courses.cs.ut.ee/2016/paralleel/fall/Main/HomePage 24 October 2016 Juku from http://muuseum.at.mt.ut.ee/kogu/165.html Available under CC-BY license.. 1 / 18

Upload: lamduong

Post on 15-Apr-2018

222 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Parallel Computing - courses.cs.ut.ee · Parallel computing useful for large data sets ... Report at  course-files/DS-seminar-Andrii-Rozumnyi.pdf

Parallel Computing

Benson Muite

[email protected]://kodu.ut.ee/~benson

https://courses.cs.ut.ee/2016/paralleel/fall/Main/HomePage

24 October 2016

Juku from http://muuseum.at.mt.ut.ee/kogu/165.html Available under CC-BY license.. 1 / 18

Page 2: Parallel Computing - courses.cs.ut.ee · Parallel computing useful for large data sets ... Report at  course-files/DS-seminar-Andrii-Rozumnyi.pdf

Clustering, Accelerators and OpenCL

Juku from http://muuseum.at.mt.ut.ee/kogu/165.html Available under CC-BY license.. 2 / 18

Page 3: Parallel Computing - courses.cs.ut.ee · Parallel computing useful for large data sets ... Report at  course-files/DS-seminar-Andrii-Rozumnyi.pdf

Clustering

Given a data set, decompose it into similar itemsParallel computing useful for large data setsMany possible algorithmsWill look at K-means algorithmPresentation follows F. NielsenIntroduction to HPC with MPI for Data Science

Juku from http://muuseum.at.mt.ut.ee/kogu/165.html Available under CC-BY license.. 3 / 18

Page 4: Parallel Computing - courses.cs.ut.ee · Parallel computing useful for large data sets ... Report at  course-files/DS-seminar-Andrii-Rozumnyi.pdf

K means

Consider grouping points in N-dimensional spaceAs an example, consider dimensions of passenger roadvehiclesMay wish to split them into cars, vans, buses, trainsWhat dimensions would be most useful?

Juku from http://muuseum.at.mt.ut.ee/kogu/165.html Available under CC-BY license.. 4 / 18

Page 5: Parallel Computing - courses.cs.ut.ee · Parallel computing useful for large data sets ... Report at  course-files/DS-seminar-Andrii-Rozumnyi.pdf

K means

Having chosen the dimensions, need an algorithmHave already decided on 4 categoriesTypically do not know anything more about the dataFor simplicity, assume there is at least one representativein each category

Juku from http://muuseum.at.mt.ut.ee/kogu/165.html Available under CC-BY license.. 5 / 18

Page 6: Parallel Computing - courses.cs.ut.ee · Parallel computing useful for large data sets ... Report at  course-files/DS-seminar-Andrii-Rozumnyi.pdf

K means

a) Pick 4 cluster centroids randomlyi) Calculate distance of each point to a centroidii) Put each point in a cluster based on centroid it is closest toiii) Calculate centroids of each clusteriv) Repeat i-iv until sum of distances from centroids stops

decreasing

Juku from http://muuseum.at.mt.ut.ee/kogu/165.html Available under CC-BY license.. 6 / 18

Page 7: Parallel Computing - courses.cs.ut.ee · Parallel computing useful for large data sets ... Report at  course-files/DS-seminar-Andrii-Rozumnyi.pdf

K means

Typically use square of euclidean distanceCan use other distances, depending on application, somemay be better than othersMethod converges, because at each iteration “energy” orsum of squares of euclidean distances always decreases,but remains positive (fixed point theorem)Example athttp://shiny.rstudio.com/gallery/kmeans-example.html

Juku from http://muuseum.at.mt.ut.ee/kogu/165.html Available under CC-BY license.. 7 / 18

Page 8: Parallel Computing - courses.cs.ut.ee · Parallel computing useful for large data sets ... Report at  course-files/DS-seminar-Andrii-Rozumnyi.pdf

K means

ParallelizationCan parallelize calculating distances from centroids, nocommunicationCan parallelize calculating centroids, reduction andbroadcast communications neededReduction and broadcast also needed to check forconvergenceCan use parallel IOShould weak and strong scale quite wellExamples athttp://rbigdata.github.io/documentation/pmclust/01-pmclust_pkmeans.htmlandhttps://github.com/RBigData/pmclust/blob/master/demo/ex_kmeans.rCan later compare speed to own code

Juku from http://muuseum.at.mt.ut.ee/kogu/165.html Available under CC-BY license.. 8 / 18

Page 9: Parallel Computing - courses.cs.ut.ee · Parallel computing useful for large data sets ... Report at  course-files/DS-seminar-Andrii-Rozumnyi.pdf

Accelerators

Heterogeneous architectures for high performance with lowenergy consumptionMany different kinds of hardwareMany programming models

Juku from http://muuseum.at.mt.ut.ee/kogu/165.html Available under CC-BY license.. 9 / 18

Page 10: Parallel Computing - courses.cs.ut.ee · Parallel computing useful for large data sets ... Report at  course-files/DS-seminar-Andrii-Rozumnyi.pdf

Accelerators

Graphics Processing Units (GPU)Field Programmable Gate Arrays (FPGA)Xeon PhiMassively Parallel Processor Array (MPPA)Other specialized processing units, for example forencryption and signal processing

Juku from http://muuseum.at.mt.ut.ee/kogu/165.html Available under CC-BY license.. 10 / 18

Page 11: Parallel Computing - courses.cs.ut.ee · Parallel computing useful for large data sets ... Report at  course-files/DS-seminar-Andrii-Rozumnyi.pdf

Nvidia GPUs

http://www.nvidia.comhttps://en.wikipedia.org/wiki/Nvidia_Tesla

2 Tflop double precision performanceProgramming APIs CUDA, CUDA Fortran, OpenCL,OpenACCFor compute and graphics

Juku from http://muuseum.at.mt.ut.ee/kogu/165.html Available under CC-BY license.. 11 / 18

Page 12: Parallel Computing - courses.cs.ut.ee · Parallel computing useful for large data sets ... Report at  course-files/DS-seminar-Andrii-Rozumnyi.pdf

AMD Firepro GPUs

http://www.amd.comhttps://en.wikipedia.org/wiki/AMD_FirePro

2 Tflop double precision performanceProgramming APIs OpenCL, OpenACC, HCC and HSAILFor compute and graphics

Juku from http://muuseum.at.mt.ut.ee/kogu/165.html Available under CC-BY license.. 12 / 18

Page 13: Parallel Computing - courses.cs.ut.ee · Parallel computing useful for large data sets ... Report at  course-files/DS-seminar-Andrii-Rozumnyi.pdf

Intel Xeon Phi

http://www.intel.comhttps://en.wikipedia.org/wiki/Xeon_Phi

1 Tflop double precision performanceProgramming APIs OpenCL (old versions), OpenMP, MPI,CILK, Fortran, CLatest versions can be self hostedFor compute and graphics

Juku from http://muuseum.at.mt.ut.ee/kogu/165.html Available under CC-BY license.. 13 / 18

Page 14: Parallel Computing - courses.cs.ut.ee · Parallel computing useful for large data sets ... Report at  course-files/DS-seminar-Andrii-Rozumnyi.pdf

Parallela

http://www.parallella.org/https://en.wikipedia.org/wiki/Adapteva

Programming APIs OpenCL, C, pthreadsEmbedded applicationsEnergy efficient computing 50 single precsion Gflops/WattLatest version has 1024 coreshttps://www.parallella.org/blog/

Juku from http://muuseum.at.mt.ut.ee/kogu/165.html Available under CC-BY license.. 14 / 18

Page 15: Parallel Computing - courses.cs.ut.ee · Parallel computing useful for large data sets ... Report at  course-files/DS-seminar-Andrii-Rozumnyi.pdf

Nvidia Tegra K1 and X1

http://www.nvidia.com/https://en.wikipedia.org/wiki/Tegra#Tegra_K1

0.19 Tflops double precisionProgramming APIs CUDA, OpenCL

Juku from http://muuseum.at.mt.ut.ee/kogu/165.html Available under CC-BY license.. 15 / 18

Page 16: Parallel Computing - courses.cs.ut.ee · Parallel computing useful for large data sets ... Report at  course-files/DS-seminar-Andrii-Rozumnyi.pdf

AMD APU

http://www.amd.comhttps://en.wikipedia.org/wiki/AMD_Accelerated_Processing_Unit

0.700 Tflops single precisionProgramming APIs OpenCL, OpenACC, Fortran, CFor compute and graphics

Juku from http://muuseum.at.mt.ut.ee/kogu/165.html Available under CC-BY license.. 16 / 18

Page 17: Parallel Computing - courses.cs.ut.ee · Parallel computing useful for large data sets ... Report at  course-files/DS-seminar-Andrii-Rozumnyi.pdf

Intel HD graphics

http://www.intel.comhttps://en.wikipedia.org/wiki/Intel_HD_and_Iris_Graphics

Programming OpenCLFor compute and graphics

Juku from http://muuseum.at.mt.ut.ee/kogu/165.html Available under CC-BY license.. 17 / 18

Page 18: Parallel Computing - courses.cs.ut.ee · Parallel computing useful for large data sets ... Report at  course-files/DS-seminar-Andrii-Rozumnyi.pdf

Others

Massively Parallel Processor Array (Kalray, Pezy)Field Programmable Gate Array (Xilinix, Altera)

Juku from http://muuseum.at.mt.ut.ee/kogu/165.html Available under CC-BY license.. 18 / 18

Page 19: Parallel Computing - courses.cs.ut.ee · Parallel computing useful for large data sets ... Report at  course-files/DS-seminar-Andrii-Rozumnyi.pdf

Accelerators

Pattern Matching ExampleOpencCL parallelization of Naive, Knuth-Morris-Pratt andBoyer-Moore-Horspool pattern matching algorithmsReport at http://ds.cs.ut.ee/courses/course-files/DS-seminar-Andrii-Rozumnyi.pdfCode at https://github.com/JaakTree/pattern_matching/tree/test

Possible project based on work by Handre Eliashttp://kodu.ut.ee/~handre/

Possible projects related to machine learninghttp://www.oi.ut.ee/en/studies/towards-robot-judges

Juku from http://muuseum.at.mt.ut.ee/kogu/165.html Available under CC-BY license.. 19 / 18

Page 20: Parallel Computing - courses.cs.ut.ee · Parallel computing useful for large data sets ... Report at  course-files/DS-seminar-Andrii-Rozumnyi.pdf

References

Balras G. “Multicore and GPU programming an IntegratedApproach” Morgan Kauffman 2015Nielsen, F. “Introduction to HPC with MPI for Data Science”Springer 2016PBD R https://rbigdata.github.io/

Rozumnyi, A. “ http://ds.cs.ut.ee/courses/course-files/DS-seminar-Andrii-Rozumnyi.pdf

Elias, H. “Wave simulation in a computer game” Proc.European Seminar on Computing 2016http://www.esco2016.femhub.com/media/ESCO2016_Book_of_Abstracts.pdf

Elias, H. “Simulation game in a web browser”http://comserv.cs.ut.ee/ati_thesis/datasheet.php?id=53653&year=2016

Juku from http://muuseum.at.mt.ut.ee/kogu/165.html Available under CC-BY license.. 20 / 18