basics of grid and cloud computingeero/gtla/gridlecture1.pdf•6-9.02 python intro •13-16.02...
TRANSCRIPT
University of Tartu, Institute of Computer Science
Basics of Grid and Cloud ComputingGridi ja pilvetehnoloogia alused
(http://courses.cs.ut.ee/2012/cloud)
2011/12 Spring
2 Practical Information
Lectures Wed 10:15 Liivi 2 - 1111-8: Eero Vainikko – Grid Computing9-16: Satish Narayana Srirama – CloudsComputer Classes:
• group 4: Mon 10:15 Liivi 2 - 205 ;
– Grid: Hardi Teder [email protected]
– Cloud: Reimo Rebane [email protected]
• group 3: Tue 8:15 Liivi 2 - 205; Pelle Jakovits [email protected]
• group 1: Thu 10:15 Liivi 2 - 205; Pelle Jakovits [email protected]
• group 2: Thu 14:15 Liivi 2 - 205; Riivo Talviste [email protected]
3 Practical Information
• Final grade:
– Active participation at lectures (ca 10%)
* Devising questions for on-line study-questionary in 24h after eachlecture
– Solution of Computer Class exercises
– Cloud project
– Written exam (Wed, 30. May 2011) 50%
NB!Crucial to keep the deadlines for all home assignments!
4 Syllabus
Lectures (1-8):
• Introduction to the subject (HPC history, supercomputers, clusters, Grid; exam-ples, visions, projects...)
• Grid architecture
• Grid Security concepts (PKI, Authorisation, CA, etc.)
• Globus Toolkit (what is virtual organisation., how to achieve it using GT etc),OGSA, WSRF
• Other Grids (UNICORE, LCG2, SunGE, ...)
• Condor, OpenPBS, Sun GE, LFS.
• NorduGrid, BalticGrid, Estonian Grid.
• Desktop-Grids (MiG, F2F)
• Examples of different grid solutions in the world
5 Syllabus
Computer Classes (preliminary) schedule:
Exercises on Grid computing (Hardi Teder, Pelle Jakovits, Riivo Talviste)
• 6-9.02 Python intro
• 13-16.02 Hello, Grid! Grid information systems, submitting first grid job
• 20-23.02 Grid security. Breaking RSA code
• 27.03-1.04 Data management on grid
• 5-8.03 Job management on grid
• 12-15.03 Grid user interfaces and tools. POV-Ray rendering
• 19-22.03 Grids and clouds, the road ahead.
• 26.03-29.04 TBA
6 Syllabus
Cloud Lectures (9-16):
• by: Dr Satish Narayana Srirama
Exercises on Cloud computing (Pelle Jakovits, Riivo Talviste, Reimo Rebane)
• 5.-9.04 Amazon EC2, Amazon S3, Elastic Fox, Google AppEngine
• 12-16.04 Eucalyptus, SciCloud, Auto Scaling & special features in EC2
• 19-23.04 Hadoop
• 26-30.04 Hadoop continued & Selecting the mini project topic
• 3-7.05
• 10-14.05 Preliminary results of project
• 17-21.05
• 24-28.05 Project delivery
7 Literature
1. Fran Berman, Geoffrey C. Fox and Anthony J. G. Hey, Grid Computing. Makingthe Global Infrastructure a Reality, John Wiley & Sons, 2003, (Grid Computing(http://www.grid2002.org/)).
2. Ian Foster and Carl Kesselman (eds.), The Grid: Blueprint for a New ComputingInfrastructure, 2nd edition, Morgan Kaufmann Publishers, 2004.
3. Michael Di Stefano, Distributed Data Management for Grid Computing, JohnWiley & Sons, 2005.
4. F Travostino, J Mambretti, G Karmous-Edwards (eds.), Grid Networks: En-abling Grids with Advanced Communication Technology , John Wiley & Sons,2006.
5. Vladimir Silva, Grid Computing For Developers, Charles River Media, 2006.
6. A. Chakrabarti, Grid Computing Security, Springer 2007.
8 Literature
7. R. Prodan, T. Fahringer, Grid Computing: Experiment Management, Tool Inte-gration, and Scientific Workflows, Springer, 2007.
8. Yang Xiao, Security in Distributed, Grid, Mobile, and Pervasive Computing,Auerbach Publications, 2007.
9. Introduction to Grid Computing (http://www.redbooks.ibm.com/redbooks/pdfs/sg246778.pdf),
10. Open Grid Forum (http://www.ogf.org).
11. The Globus Alliance (http://www.globus.org/).
12. Nordugrid (http://www.nordugrid.org/).
13. Estonian Grid (http://grid.eenet.ee/).
14. Baltic Grid (http://www.balticgrid.org).
9 Literature
Python:
1. Jeffrey Elkner, Allen B. Downey, and Chris Meyers, How to Think Like a Com-puter Scientist. Learning with Python, 2nd edition, Book homepage (http://openbookproject.net/thinkcs/python/english2e/).
2. Hans Petter Langetangen, A Primer on Scientific Programming withPython, Springer, 2009. Book webpage (http://vefur.simula.no/intro-programming/).
3. Hans Petter Langetangen, Python Scripting for Computational Science. ThirdEdition, Springer 2008. Book homepage (http://folk.uio.no/hpl/scripting/).
4. Neeme Kahusk, Sissejuhatus Pythonisse (http://www.cl.ut.ee/inimesed/nkahusk/sissejuhatus-pythonisse/)
5. Python Documentation (http://www.python.org/doc/), for startPython Tutorial (http://docs.python.org/tut/tut.html)
10 Literature
6. Mark Lutz and David Ascher, Learning Python, O’Reilly Media Inc. 2004,
7. Mark Lutz, Learning Python (4th edition), O’Reilly Media, Inc. (and SafariBooks), 2009
8. Travis E. Oliphant, Guide to NumPy (http://www.tramy.us), TrelgolPublishing 2006.
Some lecture slides:
1. Kent Engström, Python Introduction (slaidid) (http://www.nsc.liu.se/ngssc-grid/python-engstrom.pdf), NGSSC course in gridcomputing, January 10-18, 2005.
2. Chris Meers, An introduction to Python, with application to scientific comput-ing (slides (http://hughm.cs.ukzn.ac.za/~murrellh/bio/lit/pysci.pdf)), Cornell Theory Center.
3. Introduction to Scientific Computing with Python (slides (http://www.physics.rutgers.edu/grad/509/python1.pdf))
11 Past and future of the course; related courses
About this course• First time Spring 2005 Basics of Grid and Cluster Computing
• Since 2009 (Spring): Basics of Grid and Cloud Computing
– Second part: Basics of Cloud Computing (3 eap)
Other related courses:MTAT.08.022 Parallel Programming Languages (6 eap) (2011)
• Distributed Systems Seminar Wed 14:15 (Fri 14:15) Liivi 2 - 315
– MsC students: MTAT.08.024 12eap (3+3+3+3)
– Bachelor students: MTAT.08.014 8eap (2+2+2+2)
– PhD students: Distributed Systems Research Seminar MTAT.08.01920eap (5+5+5+5)
12 Past and future of the course; related courses
• Parallel Computing: MTAT.08.007 6eap Autumn 2012
• Scientific Computing: MTAT.08.010 6eap Spring 2014
• Introduction to Scientific Computing: MTAT.08.025 3eap April-May 2012(University-wide elective course for PhD students)
13 Introduction 1.1 Driving forces of computational science
1 Introduction
1.1 Driving forces of computational science
High Performance Computing (HPC)
• Environment simulation; some examples:
– Climate changes
– Prediction of amount of fish in Norwegian fjords
– Ice glacier flow simulation
• Solving fluid dynamic problems
– Weather predictions
– Design of hypersonic airplanes
– Design of more efficient cars
14 Introduction 1.1 Driving forces of computational science
– Extremely quiet submarines
– Design of efficient and safe nuclear power stations
* solution bisection, turbulence
• Simulation of nuclear explosions
• Satellite data analysis
• Data analysis of DNA-sequences
• Simulation of 3D proteine molecules
• Simulation of global economical processes
• etc. in more and more fields
15 Introduction 1.1 Driving forces of computational science
Common to all examples: need for larger than usual set of resources:
• CPU cycles
• data volume
• special devices producing data
=> parallel processing
=> questions:
• how to store data?
– Data repositories
– Data repository services
• how to move data?
– Networks
– Internet and private networks
• which algorithms can be used?
– Theory and practice of paral-lel algorithms
16 Introduction 1.2 History of HPC
1.2 History of HPC
pre-history (human arrays):1929 – parallelisation of weather predictionsA bit similar:≈1940 – Russian war defense - parallel computing (tank T40
calculations)Some expert’s predictions:1947 - computer engineer Howard Aiken: USA will need in the future at most 6
computers!1977 - Seymour Cray: The computer Cray-1 will attract potentially only ca 100
clients
Reality: how many Cray-1 class computer powers do you carry with you today?
17 Introduction 1.2 History of HPC
Gordon E. Moore’s law:(1965: the number of switches dou-bles every second year )
1975: - refinement of the above:[ The number of switches / Perfor-mance ] of a CPU doubles every18 months
18 Introduction 1.2 History of HPC
first processors 102 100 Flopsmodern desktop computers 109 Gigaflops (GFlops)
modern supercomputers 1012 Teraflops (TFlops)we are about to achieve soon 1015 Petaflops (PFlops)
next step 1018 Exaflops (EFlops)
History of Computers (http://smashinghub.com/history-of-computers.htm)
Supercomputers→ Clusters 99K Grids Clouds
19 Introduction 1.3 Data Challenges
1.3 Data Challenges
How large is 1 petabyte?
• Some high resolution pictures abouteach person on the Earth
• (5 years ago: An example of
petabyte storage device:
– train wagon full of high resolu-tion magnetic tapes
– About 3 years to read throughwith a fast tape-reader)
• Today: Largest tape drives store 5TBunpacked data (StorageTek T10000)=> 1PB takes ca 205 tapes. Readingone tape takes ca 6.1h => 52 days toread all. These tapes would pile up intotower of about 5.2m in height, weight-ing less than 60 kg, in volume, ca 40%of it would fit into hand-baggage on aplane.
* Largest commercial databases today ≈a few terabytes (1012 bytes)
Science’s needs in the near future (anexample): Particle physics experimentsproduce around 10 Petabytes a year
20 Introduction 1.3 Data Challenges
Prediction for the needs: Around the year 2015 there is a need for Exabyte (1018)storage databanks and Petaflops processing power
How large is Exabyte?All the information generated in 1999 – 2 ExabytesAll spoken words by all people ever: 5 Exabytes!
One of the most challenging problems – data updates!
21 Introduction 1.4 Classification of Parallel Computers
1.4 Classification of Parallel Computers
• Architecture
Flynn’s classification
Instruction SISD SIMDstream (MISD) MIMD
Data stream
Abbreviations:
S - Single
M - Multiple
I - Instruction
D - Data
For example: Single InstructionMultiple Data stream
– Single processor computer
– Multicore processor
– distributed system
– shared memory system
• Network
– topology
* ring, array, hypercube...
– properties
* bandwidth, latency
• Memory access
– shared , distributed , hybrid
22 Introduction 1.4 Classification of Parallel Computers
• operating system
– UNIX,
– LINUX,
– (OpenMosix)
– WIN*
• Algorithm realisation
– using only hardware modules
– mixed modules (hardware andsoftware)
• Control type
– synchronous
– dataflow-driven
– asynchronous
• scope
– supercomputing
– distributed computing
– real time sytems
– mobile systems
– grid and cloud computing
– etc
23 Introduction 1.5 Supercomputers
But impossible to ignore (implicit or explicit) parallelism in a computer or a set ofcomputers
1.5 Supercomputers
• Last word in computer hardware
– one step ahead in technology
– Note: today’s supercomputers are tomorrow’s commodity systems!
• expensive
• shipped with OS
• Supercomputers→ Clusters
24 Introduction 1.6 Computer Clusters
1.6 Computer Clusters
Workstation groups connected with LAN with uniform softwareExample: Linux-clusters (Beowulf Clusters) , University of Tartu HPC
aurumasin
Special network solutions
• Myrinet (Clos-networks)
• Scali
• *-Ethernet
• Infiniband
25 Introduction 1.6 Computer Clusters
Top 500
Top500 (http://www.top500.org)
• Also, http://www.bbc.co.uk/news/10187248