parallel computing - courses.cs.ut.ee · parallel computing useful for large data sets ... report...

Post on 15-Apr-2018

222 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Parallel Computing

Benson Muite

benson.muite@ut.eehttp://kodu.ut.ee/~benson

https://courses.cs.ut.ee/2016/paralleel/fall/Main/HomePage

24 October 2016

Juku from http://muuseum.at.mt.ut.ee/kogu/165.html Available under CC-BY license.. 1 / 18

Clustering, Accelerators and OpenCL

Juku from http://muuseum.at.mt.ut.ee/kogu/165.html Available under CC-BY license.. 2 / 18

Clustering

Given a data set, decompose it into similar itemsParallel computing useful for large data setsMany possible algorithmsWill look at K-means algorithmPresentation follows F. NielsenIntroduction to HPC with MPI for Data Science

Juku from http://muuseum.at.mt.ut.ee/kogu/165.html Available under CC-BY license.. 3 / 18

K means

Consider grouping points in N-dimensional spaceAs an example, consider dimensions of passenger roadvehiclesMay wish to split them into cars, vans, buses, trainsWhat dimensions would be most useful?

Juku from http://muuseum.at.mt.ut.ee/kogu/165.html Available under CC-BY license.. 4 / 18

K means

Having chosen the dimensions, need an algorithmHave already decided on 4 categoriesTypically do not know anything more about the dataFor simplicity, assume there is at least one representativein each category

Juku from http://muuseum.at.mt.ut.ee/kogu/165.html Available under CC-BY license.. 5 / 18

K means

a) Pick 4 cluster centroids randomlyi) Calculate distance of each point to a centroidii) Put each point in a cluster based on centroid it is closest toiii) Calculate centroids of each clusteriv) Repeat i-iv until sum of distances from centroids stops

decreasing

Juku from http://muuseum.at.mt.ut.ee/kogu/165.html Available under CC-BY license.. 6 / 18

K means

Typically use square of euclidean distanceCan use other distances, depending on application, somemay be better than othersMethod converges, because at each iteration “energy” orsum of squares of euclidean distances always decreases,but remains positive (fixed point theorem)Example athttp://shiny.rstudio.com/gallery/kmeans-example.html

Juku from http://muuseum.at.mt.ut.ee/kogu/165.html Available under CC-BY license.. 7 / 18

K means

ParallelizationCan parallelize calculating distances from centroids, nocommunicationCan parallelize calculating centroids, reduction andbroadcast communications neededReduction and broadcast also needed to check forconvergenceCan use parallel IOShould weak and strong scale quite wellExamples athttp://rbigdata.github.io/documentation/pmclust/01-pmclust_pkmeans.htmlandhttps://github.com/RBigData/pmclust/blob/master/demo/ex_kmeans.rCan later compare speed to own code

Juku from http://muuseum.at.mt.ut.ee/kogu/165.html Available under CC-BY license.. 8 / 18

Accelerators

Heterogeneous architectures for high performance with lowenergy consumptionMany different kinds of hardwareMany programming models

Juku from http://muuseum.at.mt.ut.ee/kogu/165.html Available under CC-BY license.. 9 / 18

Accelerators

Graphics Processing Units (GPU)Field Programmable Gate Arrays (FPGA)Xeon PhiMassively Parallel Processor Array (MPPA)Other specialized processing units, for example forencryption and signal processing

Juku from http://muuseum.at.mt.ut.ee/kogu/165.html Available under CC-BY license.. 10 / 18

Nvidia GPUs

http://www.nvidia.comhttps://en.wikipedia.org/wiki/Nvidia_Tesla

2 Tflop double precision performanceProgramming APIs CUDA, CUDA Fortran, OpenCL,OpenACCFor compute and graphics

Juku from http://muuseum.at.mt.ut.ee/kogu/165.html Available under CC-BY license.. 11 / 18

AMD Firepro GPUs

http://www.amd.comhttps://en.wikipedia.org/wiki/AMD_FirePro

2 Tflop double precision performanceProgramming APIs OpenCL, OpenACC, HCC and HSAILFor compute and graphics

Juku from http://muuseum.at.mt.ut.ee/kogu/165.html Available under CC-BY license.. 12 / 18

Intel Xeon Phi

http://www.intel.comhttps://en.wikipedia.org/wiki/Xeon_Phi

1 Tflop double precision performanceProgramming APIs OpenCL (old versions), OpenMP, MPI,CILK, Fortran, CLatest versions can be self hostedFor compute and graphics

Juku from http://muuseum.at.mt.ut.ee/kogu/165.html Available under CC-BY license.. 13 / 18

Parallela

http://www.parallella.org/https://en.wikipedia.org/wiki/Adapteva

Programming APIs OpenCL, C, pthreadsEmbedded applicationsEnergy efficient computing 50 single precsion Gflops/WattLatest version has 1024 coreshttps://www.parallella.org/blog/

Juku from http://muuseum.at.mt.ut.ee/kogu/165.html Available under CC-BY license.. 14 / 18

Nvidia Tegra K1 and X1

http://www.nvidia.com/https://en.wikipedia.org/wiki/Tegra#Tegra_K1

0.19 Tflops double precisionProgramming APIs CUDA, OpenCL

Juku from http://muuseum.at.mt.ut.ee/kogu/165.html Available under CC-BY license.. 15 / 18

AMD APU

http://www.amd.comhttps://en.wikipedia.org/wiki/AMD_Accelerated_Processing_Unit

0.700 Tflops single precisionProgramming APIs OpenCL, OpenACC, Fortran, CFor compute and graphics

Juku from http://muuseum.at.mt.ut.ee/kogu/165.html Available under CC-BY license.. 16 / 18

Intel HD graphics

http://www.intel.comhttps://en.wikipedia.org/wiki/Intel_HD_and_Iris_Graphics

Programming OpenCLFor compute and graphics

Juku from http://muuseum.at.mt.ut.ee/kogu/165.html Available under CC-BY license.. 17 / 18

Others

Massively Parallel Processor Array (Kalray, Pezy)Field Programmable Gate Array (Xilinix, Altera)

Juku from http://muuseum.at.mt.ut.ee/kogu/165.html Available under CC-BY license.. 18 / 18

Accelerators

Pattern Matching ExampleOpencCL parallelization of Naive, Knuth-Morris-Pratt andBoyer-Moore-Horspool pattern matching algorithmsReport at http://ds.cs.ut.ee/courses/course-files/DS-seminar-Andrii-Rozumnyi.pdfCode at https://github.com/JaakTree/pattern_matching/tree/test

Possible project based on work by Handre Eliashttp://kodu.ut.ee/~handre/

Possible projects related to machine learninghttp://www.oi.ut.ee/en/studies/towards-robot-judges

Juku from http://muuseum.at.mt.ut.ee/kogu/165.html Available under CC-BY license.. 19 / 18

References

Balras G. “Multicore and GPU programming an IntegratedApproach” Morgan Kauffman 2015Nielsen, F. “Introduction to HPC with MPI for Data Science”Springer 2016PBD R https://rbigdata.github.io/

Rozumnyi, A. “ http://ds.cs.ut.ee/courses/course-files/DS-seminar-Andrii-Rozumnyi.pdf

Elias, H. “Wave simulation in a computer game” Proc.European Seminar on Computing 2016http://www.esco2016.femhub.com/media/ESCO2016_Book_of_Abstracts.pdf

Elias, H. “Simulation game in a web browser”http://comserv.cs.ut.ee/ati_thesis/datasheet.php?id=53653&year=2016

Juku from http://muuseum.at.mt.ut.ee/kogu/165.html Available under CC-BY license.. 20 / 18

top related