bsc-microsoft research center overview
Post on 05-Jul-2015
684 Views
Preview:
TRANSCRIPT
BSC-Microsoft ResearchCentre Overview
Osman Unsal
Barcelona 26.March.2010
BSC Departments
• Computer Science• Computer Architecture
• Life Sciences• Earth Sciences• Computer Applications
• Mission– Investigate, develop and manage technology to
facilitate the advancement of science.
• Objectives– Operate national supercomputing facility
– R&D in Supercomputing and Computer Architecture.
– Collaborate in R&D e-Science
• Consortium– the Spanish Government (MEC)– the Catalonian Government (DURSI)– the Technical University of Catalonia (UPC)
Barcelona Supercomputing Center
Location
Private/Public Partnership in Research• Two main models for company/university lablets
– Closed – Open
Closed
Open
• Finding the appropriate balance is key for successclosed open
?Students highly motivatedGenerated IP is private
CompanyUniversity
Higher visibility for researchers?Requires more effort from company
CompanyUniversity
BSCMSRC experience (I)
• Open model– No patents, public IP, papers and open source main focus
• Making positive impact in the community• Contributing to standard setting bodies
– Participation of Microsoft Cambridge Senior Researchers– Similar research agenda with parallel computing centers in US
• Berkeley• Illinois• Rice• Stanford
Combining efforts is absolutely necessary
BSCMSRC experience (II)
• Internships/stays:– UPC students exposed to Microsoft Research through internships– Future extension: longer stays of Cambridge researchers in
Barcelona
• Collaboration between BSCMSRC and other Microsoft institutes– Cosbi in planning phase
• BSC coordinating 9 partner, 5M Euro EC VELOX project – Microsoft Cambridge catalyst in introducing potential partners– Microsoft observers to project (as is Intel and SUN)
BSCMSR Centre: general overview
• Microsoft-BSC joint project– Combining research expertise
• Computer Architectures for Parallel Paradigms at BSC: computer architecture
• Microsoft Research Cambridge: programming systems– Kickoff meeting in April 2006– Total Effort:
• 2 Senior BSC researchers, 23 PhD students– Support and part-time involvement of senior researchers and
faculty from UPC
• BSCMSRC inaugurated January 2008
BSCMSR Centre: research approach
• Processor design historically bottom-up:– Build the chip and let the software guys sort it out– Difficult to defend in the era of multi- and many-core
• We prefer a top-down approach:– Let software developments guide hardware design– “Coger el toro por los cuernos”– Utilize feedback from the OS, compiler, programmer language
experts about the features they would like to see in future processors
The motivation“It is better for Intel to get involved in this now so when we get to the point of having 10s and 100s of cores we will have the answers.
There is a lot of architecture work to do to release the potential, and we will not bring these products to market until we have good solutions to the programming problem”
Justin Rattner Intel CTO
“Now, the grains inside these machines more and more will be multi-core type devices, and so the idea of parallelization won't just be at the individual chip level, even inside that chip we need to explore new techniques like transactional memorythat will allow us to get the full benefit of all those transistors and map that into higher and higher performance.”Bill Gates, Supercomputing 05 keynote
Marenostrum
Most beautiful supercomputerFortune magazine, Sept. 2006
#1 in Europe, #5 in the World
100's of TeraFlopswith general purpose Linux supercluster of commodity PowerPC-based Blade Servers
Source: Intel
•80 processors in a die of 300 square mm.•Terabytes per second of memory bandwidth•Note: The barrier of the Teraflops was obtained by Intel in 1991 using 10.000 Pentium Pro processors contained in more than 85 cabinets occupying 200 square meters •This will be possible in 5 years from now
• Top-down architecture, include:– Application– Debugging– Programming models– Programming languages– Compilers– Operating Systems– Runtime environment
• As design drivers
TOP-DOWN COMPUTER ARCHITECTURE
BSC Contributors
Full-timeSerbiaPhD StudentVladimir Gajinov
Full-TimeTurkeyPhD StudentGokcen Kestor
Full-TimeIranPhD StudentAzam Seyedi
Full-TimeSerbiaPhD StudentNikola Markovic
Full-timeTurkey/USADoctorIbrahim Hur
Full-timeArgentinaDoctorAdrian Cristal
Full-timeTurkeyDoctorOsman Unsal
Part-timeSpainProfessorEduard Ayguade
Full-timeIndiaPhD StudentSutirtha Sanyal
Full-timeTurkeyPhD StudentNehir Sonmez
Full-timeSerbiaPhD StudentSasa Tomic
Full-timeBulgariaPhD StudentFerad Zyulkyarov
Full-timeSerbiaPhD StudentSrdjan Stipic
Part-timeSpainPhD StudentEnrique Vallejo
Part-timeSpainProfessorMateo Valero
Full-timeSerbiaPhD StudentMilovan Djuric
Full-timeIrelandPhD StudentTimothy Hayes
Full-timeSerbiaPhD StudentIvan Ratkovic
Full-timeSerbiaPhD StudentNikola Bezanic
Full-timeSerbiaPhD StudentMilan Stanic
Full-timeSpainPh.D. StudentOscar Palomar
Full-timeGreeceM.S. StudentVasilis Karakostas
Full-TimeTurkeyPhD StudentGulay Yalcin
Full-TimeSerbiaPhD StudentVesna Smiljovic
Full-TimeSerbiaPhD StudentNebosja Miletic
InternSpainM.S. StudentOriol Arcas
Full-TimeSpainPhD StudentAdria Armejach
InternSpainM.S. StudentOtto Pflucker
Microsoft- BSC Project: A quick snapshot
• Project started in April 2006
• Two-year initial duration
• Initial topic:Transactional Memory
• BSC – Microsoft Research Centre inaugurated January 2008
Transactions vs. Locks
• Non-blocking synchronization
• Deadlock free• Composable• Easy programmable• Efficiency of fine
grain locks
• Blocking synchronization
• Deadlock risk• Non-composable• Coarse grain locks
limit TLP• Fine grain locks are
difficult to program
Transactions Locks
B1
Slide 13
B1 Osman addedBSC-CNS; 02/09/2007
What is Transactional Memory (TM) ?
• TM optimistically runs transactions in parallel, in the hope that they do not perform conflicting memory accesses.
• If the transactions do not conflict then the optimism has paid off.
• If transactions do attempt conflicting accesses, then the TM must delay or abort the work of one or the other.
• The partial effects of aborted transactions are "rolled back" before they are re-executed .
Transaction Execution• The TM system lets different threads to
execute the atomic regions speculatively.• The TM system guarantees:
– Atomicity – all tentative memory changes become visible to the other threads simultaneously at the time when a transaction commits.
– Isolation – while transaction executes the tentative memory updates are not visible by the other threads.
Readset / Writeset
• Readset: The set of all the distict memorylocations read by a transaction
• Writeset: The set of all the distinct memorylocations written by a transaction
atomic { atomic {… …read (a); read (a);read (b); write (c);… …write (a); }
}
Versioning and Conflictresolution
• Basic implementation requirements– Data versioning– Conflict detection & resolution
• Conflicts happen if– One transaction (attemps to) reads a data item while another
one tries to write to the item– At least two transactions (attempt to) write a data item
• If conflict is detected one of the transactions could be aborted
atomic { atomic {… …read (a); read (a);read (b); write (c);… …write (a); }
}
Conflict Detection andResolution
• Detect and handle conflicts between transaction– Read-Write and (often) Write-Write conflicts– For detection, a transactions tracks its read-set and write-set
1. Eager detection •Check for conflicts during loads or stores
HW: check through coherence lookupsSW: checks through locks and/or version numbers
•Use contention manager to decide to stall or abortVarious priority policies to handle common case fast
2.Lazy detection•Detect conflicts when a transaction attempts to commit
HW: write-set of committing transaction compared to read-set of others–Committing transaction succeeds; others may abort
SW: validate write-set and read-set using locks and version numbers
atomic { atomic {… …read (a); read (a);read (b); write (c);… …write (a); }
}
Versioning• Manage uncommited(new) and commited(old) versions of data for
concurrent transactions
1.Eager (undo-log based)•Update memory location directly; maintain undo info in a log+Faster commit, direct reads (SW) –Slower aborts, no fault tolerance, weak atomicity (SW)
2.Lazy (write-buffer based)•Buffer writes until commit; update memory location on commit +Faster abort, fault tolerance, strong atomicity (SW)–Slower commits, indirect reads (SW)
atomic { atomic {… …read (a); read (a);read (b); write (c);… …write (a); }
}
App
licat
ions
Architecture
Transactional Memory
STM HTM
Func
tiona
lIm
pera
tive
Pro
gram
min
gm
odel
Transactional Memory• Advent of multicore chips• Parallel programmingcommonplace
• Simplicity vs. wide applicability• Full performance not theinitial driving force, but thefinal objective
• Advent of multicore chips• Parallel programmingcommonplace
• Simplicity vs. wide applicability• Full performance not theinitial driving force, but thefinal objective
MSRC
BSC
App
licat
ions
Architecture
Transactional Memory
STM HTM
Func
tiona
lIm
pera
tive
Pro
gram
min
gm
odel
Research onTM•Haskell Transactional Benchmark•RMS applications•Quake-TM•Atomic-Quake•Wormbench
•Haskell Transactional Benchmark•RMS applications•Quake-TM•Atomic-Quake•Wormbench
MSRC
BSC
App
licat
ions
Architecture
Transactional Memory
STM HTM
Func
tiona
lIm
pera
tive
Pro
gram
min
gm
odel
Research on TM•Haskell STM•OpenMP + TM•Haskell STM•OpenMP + TM
MSRC
BSC
App
licat
ions
Architecture
Transactional Memory
STM HTM
Func
tiona
lIm
pera
tive
Pro
gram
min
gm
odel
Research on TM•HW acceleration of STM •Simulators•Dynamic filtering •Eazy HTM
•HW acceleration of STM •Simulators•Dynamic filtering •Eazy HTM
MSRC
BSC
Selected Publications on TMF. Zyulkyarov, T. Harris, O. Unsal, A. Cristal, M. Valero,” Debugging Programs that use Atomic Blocks and
Transactional Memory “, 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP’10), January 2010
S. Tomić, C. Perfumo, C. Kulkarni, A. Armejach, A. Cristal, O. Unsal, T. Harris, M. Valero, “EazyHTM, Eager-Lazy Hardware Transactional Memory”, 42nd International Symposium on Microarchitecture, Micro'09, December 2009
V. Gajinov, F. Zyulkyarov, A. Cristal, O. Unsal, T. Harris, M. Valero, “QuakeTM: Parallelizing a Complex Serial Application Using Transactional Memory”, 23rd International Conference on Supercomputing (ICS 2009), June 2009.
N. Sonmez, A. Cristal, T. Harris, O. Unsal, M. Valero, “Taking the heat off transactions: dynamic selection of pessimistic concurrency control”, 23th International Parallel and Distributed Systems Symposium, (IPDPS 2009), May 2009,
F. Zyulkyarov , V. Gajinov, O. Unsal , A. Cristal , E. Ayguade, T. Harris , M. Valero, “Atomic Quake: Use Case of Transactional Memory in an Interactive Multiplayer Game Server”, 14th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP), February 2009.
C. Perfumo, N. Sonmez, S. Stipic, O. Unsal, A. Cristal, T. Harris, M. Valero, “The Limits of Software Transactional Memory (STM): Dissecting Haskell STM Applications on a Many-Core Environment”, ACM International Conference on Computing Frontiers, May 2008
E. Vallejo, T. Harris, A. Cristal, O. Unsal, M. Valero, “Hybrid Transactional Memory to Accelerate Safe Lock-based Transactions”, Third ACM SIGPLAN Workshop on Transactional Computing TRANSACT, February 2008
M. Milovanović, R. Ferrer, O. Unsal, A. Cristal, X. Martorell, E. Ayguadé, J. Labarta and M. Valero, “Transactional Memory and OpenMP”, International Journal of Parallel Programming.
C. Perfumo, N. Sonmez, O. Unsal, A. Cristal, M. Valero, T. Harris, “Dissecting Transactional Executions in Haskell”, Second ACM Workshop on Transactional Computing TRANSACT, August 2007
T. Harris, A. Cristal, O. Unsal, E. Ayguade, F. Gagliardi, B. Smith, M. Valero, “Transactional Memory: An Overview”, IEEE Micro, May-June 2007
FP7 TM-Project VELOX• Goal: Unified TM system stack, Duration: 2008-2011, Budget: 5M Euros• BSC involvement initiated through BSC-MSR Centre • Project participants:
– BSC– EPFL– University of Neuchatel– Technical University of Dresden– Tel Aviv University– Chalmers Institute of Technology– AMD– Red-Hat– VirtualLogix
• Project observers:– Microsoft– Intel– SUN
Empowering TM across the system stackTM-based Applications and Benchmarks
(Legacy software migration, TM-aware development, etc.)TM-based Applications and Benchmarks
(Legacy software migration, TM-aware development, etc.)
Application Environments(Application servers, event processing engine, etc.)
Application Environments(Application servers, event processing engine, etc.)
Programming Language(Java/C/C++ integration, exception handling, etc.)
Programming Language(Java/C/C++ integration, exception handling, etc.)
Compiler(Code generation, bytecode instrumentation, etc.)
Compiler(Code generation, bytecode instrumentation, etc.)
TM Libraries(Word-based, object-based, PLS, TBB, etc.)
TM Libraries(Word-based, object-based, PLS, TBB, etc.)
Native Libraries(System libraries)
Native Libraries(System libraries)
Operating System(Low-level TM support)Operating System(Low-level TM support)
TM Hardware and Adaptation Layer (Multi-core CPUs, HW+SW transactions, etc.)TM Hardware and Adaptation Layer (Multi-core CPUs, HW+SW transactions, etc.)
1
2
6
5
C
7
4
3 Inte
grat
ion
(Ada
ptiv
e TM
, hy
brid
app
roac
hes,
etc
.)In
tegr
atio
n(A
dapt
ive
TM,
hybr
id a
ppro
ache
s, e
tc.)
Theo
ry(C
onte
ntio
n, is
olat
ion,
alg
orit
hms,
etc
.)Th
eory
(Con
tent
ion,
isol
atio
n, a
lgor
ithm
s, e
tc.)
A B
http://www.velox-project.eu/
Thank you!
http://www.bsc.eshttp://www.bscmsrc.eu
http://www.velox-project.eu
top related