k-computer and supercomputing projects in japan

29
K-computer and Supercomputing Projects in Japan Makoto Taiji Computational Biology Research Core RIKEN Planning Office for the Center for Computational and Quantitative Life Science & Processor Research Team RIKEN Advanced Institute for Computational Science [email protected]

Upload: vega

Post on 25-Feb-2016

57 views

Category:

Documents


1 download

DESCRIPTION

K-computer and Supercomputing Projects in Japan. Makoto Taiji Computational Biology Research Core R IKEN Planning Office for the Center for Computational and Quantitative Life Science & Processor Research Team RIKEN Advanced Institute for Computational Science [email protected]. Agenda. - PowerPoint PPT Presentation

TRANSCRIPT

1

K-computer and Supercomputing Projects in JapanMakoto Taiji

Computational Biology Research CoreRIKEN Planning Office for the Center for Computational and Quantitative Life Science&Processor Research TeamRIKEN Advanced Institute for Computational [email protected] Institute for Computational ScienceHigh Performance Computing InfrastructureMy own perspective in future HPC,and MDGRAPE-4 (in short)

2My BackgroundsPhysicsSpecial-purpose computers for scientific simulations (1986~)Monte Carlo simulations of spin systems (1986, m-TIS I)FPGA-based reconfigurable machine (1990, m-TIS II)Gravitational N-body problems (1992~96, GRAPE-4,5) Molecular Dynamics simulations (1994~, MD-GRAPE, MDM, MDGRAPE-3,4)Dense Matrix Calculation, quasi-general-purpose machine(MACE, 2000)Ultrafast laser spectroscopy (1987~92)Conjugated PolymersRhodopsin and BacteriorhodopsinLearning process as dynamical systems, multi-agent dynamics (1996~2002)Physical Random Number Generator (1997~2004)

3World situation of HPC (Top 500)CountryShare of Japan:Down to 6th position4Next-Generation Supercomputer ProjectNational project to develop a leading general-purpose supercomputer in JapanNot for single purpose cf. Earth SimulatorLocation: Kobe Port IslandDeveloper: FujitsuLinpack10 PetaFLOPSPartial operation: Spring 2011Full service: Autumn 2012

K computer system (CG)5

Mt. RokkoSannomiyaPort IslandKobe Sky BridgePortlinerTo Akashi / Awaji-IslandTo OsakaAbout 5km from Sannomiya12 min. by PortlinerAshiyaKobe AirportKobe Medical Industry Development ProjectCore FacilitiesShinkansen-LineShin-Kobe StationPhoto: June, 2006K-computer &Advanced Institute for Computational SciencesLocation of K computer

6

RIKEN Advanced Institutefor Computational ScienceNational Center to cover wide fields of computational science and engineering7Formation of Central Hub in Kobe8StrategicRegionAcademiaRegisteredOrganizationSelection of applicationsUser SupportPublic UseIndustryAdvanced Institute for Computational ScienceOperationSophisticationOperation Organization UseInterdisciplinaryResearch, Computer ScienceOperation and sophistication of the supercomputer,Computational SciencesInterdisciplinary researchDirector:Dr. Kimihiko HiraoStrategicRegionStrategicRegionStrategicRegionStrategic Use

8RIKEN Advanced Institutefor Computational Science9ComputationalScienceResearchComputerScienceResearch

Grand Challenge Applications Next-Generation Integrated Nano-Science Simulation Software (20062011)Next-Generation Integrated Life-Science Simulation Software (20062012)To create next-generation nano-materials (new semiconductor materials, etc.) by integrating theories (such as quantum chemistry, statistical dynamics and solid electron theory) and simulation techniques in the fields of new-generation information functions/materials, nano-biomaterials, and energyBase site: Institute for Molecular Science

Next-Generation Energy

Solar energy fixationFuel alcoholFuel cells Electric energy storage

Electrons and molecules Electrons Domain Electron theory of solids Quantum chemistry

Doping of fullerene and carbon nanotubes Molecular dynamics Condensed mattersIntegrated system

5nmSelf- organized magnetic nanodots Semi-macroscopicMolecular assembly

Next-Generation Nano Biomolecules

Next-Generation information Function Materials One-dimensional crystal of silicon

Polio virus

Orbiton (orbital waves) Ferromagnetic half-metals offonlightlightOptical switch Liposome

NafionWater15nmMesoscale structure of naflon membraneSelf-assembly

Capsulation

Nafion membrane Medicines, New drug, and DDS

Protein foldingNonlinear optical DeviceNano quantum devicesSpin electronicsUltra high-density storage devicesIntegrated electronic devicesWater molecules inside lisozyme cavity

Whole bodyCardiovascular system

Cells

Organs

Tissues MicroMacroMesoMicroscopic approach MD/first principle/quantum chemistry simulations Continuous entity simulations Size

Base site: RIKEN Wako Institute Electronic conduction in integrated systems Vascular system modelingSkeleton modelFluids, heat, structuresAchievement of chemical reactionsMolecular network analysisProtein structural analysisDrug response analysis

Proteins/DNA 10010-110-3~-210-5~-410-8~-6High Intensity Focused UltrasoundDrug developmentTailor-made medicine Drug Delivery SystemRegenerative medicineSurgical proceduresCathetersMicromachinesHyperthermiaMacroscopic approach Organ and body scale Toward therapeutic technology Molecular scale Cellular scale VirusesAnticancer drugsProtein controlNano processes for DDClight

27 nm46 nmTo provide new tools for breakthroughs against various problems in life science by means of petaflops-class simulation technology, leading to comprehensive understanding of biological phenomena and the development of new drugs/medical devices and diagnostic/therapeutic methods

BrainFunctionAppointment of Strategic RegionsComputational resources and budget will be allocated for the following regionsStrategic organization will organize the researchRegion 1. Foundations for predictive life sciences,medical care, and drug designRegion 2. Innovation of new materials and new energies Region 3. Prediction of global change for disaster prevention and reduction Region 4. Next-generation manufacturing Region 5. Origin and structure of matter and the universe2009-2010: Feasibility Studies2011-2015: Strategic Researches1111FY2008FY2009FY2010FY2011ComputerbuildingResearchbuildingFY2007FY2006FY2012Shared file systemProcessing unitFront-end unit(total system software)Next-GenerationIntegrated NanoscienceSimulationNext-GenerationIntegratedLife SimulationVerificationDevelopment, production, and evaluation Tuning and improvementVerificationProduction, installation, and adjustment Production, installation, and adjustmentConstructionDesignConstructionDesignPrototype andevaluationDetailed designConceptualdesignDetailed designBasicdesignDevelopment, production, and evaluationProduction and evaluationSystemBuildings Detailed designBasicdesignSchedule of Project ApplicationsStrategicResearchesResearch PromotionPreparatoryResearchesPartial operation within FY2010, Full operation starts from FY2012Feasibility Studies12Features of K computer = K means 1016High Performance : Linpack 10 PFLOPSMassive Parallelization> 80,000 Processors, > 640,000 CoresSPARC64 VIIIfx: Processor designed for HPCVISIMPACT / HPC-ACE extensions16GB / node, 2GB / core~20MWK-Computer SystemNumber of nodes : > 80,000Number of Processors: > 80,000Number of Cores: > 640,000Peak Performance: > 10 PFLOPSMemory Capacity: > 1PB (16GB/node)Network: Tofu interconnect (6-dim. Torus)User view: 3D-TorusBandwidth: 5GB/s bidirectional for each six direction4 Simultaneous CommunicationBisection Bandwidth: >30TB/s (bidirectional, nominal peak) CPU: 128GFLOPS(8 Core)CoreSIMD(4FMA)16GFlopsCoreSIMD(4FMA)16GFlopsCoreSIMD(4FMA)16GFlopsCoreSIMD(4FMA)16GFlopsCoreSIMD(4FMA)16GFlopsCoreSIMD(4FMA)16GFlopsCoreSIMD(4FMA)16GFlopsL2$: 5MB64GB/sCoreSIMD(4FMA)16GFLOPSMEM: 16GB

3D-Torus Networkxyz5GB/s x Bidirectional5GB/s x Bidirectional5GB/s xBidirectional5GB/s x Bidirectional5GB/s xBidirectional5GB/s x Bidirectional1414Cabinet of K computer24 boards/cabinet192 CPUs24 TFLOPS15

What is special in K computer?NetworkHigh Bandwidth, Low LatencyProcessor for HPCVISIMPACTShared Cache & Hardware BarrierMulti-core parallelization of inner loopHPC-ACERegister ExtensionSIMD 2FMA, 2 issue/cycle (4FMA/Core)Instructions for special functions (trigonometric, inverse, square-root, inverse square-root etc.)

1617

T. Maruyama, Proc. Hot Chips 2009.SoftwareOS: LinuxCompilerFujitsu compiler will supportFortran(2003), C(1999), C++(2003)GNU C/C++ extensionsAutomatic vectorization for SPARC64 VIIIfxOpenMP 3.0MPI-2.1gcc may also be available. However, it cannot generate CPU specific instructions (e.g SIMD) and poor performance is expected.18How to use it?Five Strategic Regions has been selected.For these fields, MEXT will fund some research budget, and machine time will be delivered.General UseFor general use, registered organization will control distribution of machine time.Commercial Use

RIKEN does not responsible for the usage of the machine, basically.HPCI:High Performance Computing InfrastructureSystem to utilize academic supercomputers in Japan2012~User Communities5 strategic regions, Industrial Consortiums, National Universities and InstitutesComputing Resource ProviderRIKEN AICS, University Centers, National Institutes20Basic Idea of HPCI21LogicalStructurePhysicalStructure25 Organization13 OrganizationProblem in Future of HPC HardwareIf the problem can be parallelizedComputing performance is cheap.However, in every aspectsData movements dominates costs.CoreCacheCacheMain MemoryNodeNodeNodeDiskSystemSystem/Apparatus/Internet22Future Processors for HPCGap between top-end HPC processors and commodity will increaseWhat are needed for HPCMany-core processors, Accelerators for dense problems Chip stacking for bandwidthNetwork integrationNetwork will be the most important factor in HPCFuture Directions (1)Network integration is essential both for general-purpose machines and special-purpose onesPlatform for AcceleratorsGeneral-purpose processor coresCache or local memoryFast, low-latency on-chip and off-chip networksNetwork>30GB/sMemory100GB/sMemoryPUAcceleratorOn-chip Network>100GB/s/router24Future Directions (2)High Memory Bandwidth SystemSingle-chip BlueGene/Lby System-on-Chip or Chip stacking by TSVB/F1B/F0.1 for remote nodeNetwork>50GB/sMemoryPU>500GB/s>500GFLOPS25Problem in NetworkMolecular Dynamics: Strong Scaling is important50,000 FLOP/particle/stepN=1055 GFLOP/step5TFLOPS effective performance1msec/step = 170nsec/dayRather Easy5PFLOPS effective performance1sec/step = 200sec/day???Difficult, but importantAntonD. E. Shaw ResearchSpecial-purpose pipeline+ General-purpose core + Dedicated NetworkBy decreasing communication latency, it can achieve high sustained performance even for small systems

R. O. Dror et al., Proc. Supercomputing 2009, in USB memory.27MDGRAPE-4Special-purpose computer for molecular dynamics simulationsTest bed for future HPC hardwareFY2010-FY2012System-on-ChipAcceleratorMemoryGeneral-purpose processorNetwork~4Tflops / chip

28Fin29

HPCI ConsortiumUsers Opinion

Steering Committee

User Community HubsComputing Resources ProviderUniversityCenter A UniversityCenter B

InstituteCenter B

InstituteCenter A

PrivateCompany

LifeScience

MaterialScience

Environment

Manufacturing BasicPhysics

5 StrategicRegions

Field A

Field BRIKENAICS

NII

HPCI

K Computer

Network Intrastructure

Computerat B Univ.

Computerat A Inst.

Computerat A Univ.

Computerat B Inst.

Computerat Company

OtherComputers

HPCIStorage

Computerat Each Hub Computer

at Each HubComputerat Each Hub

Computerat Each Hub

Computerat Each Hub

Computerat Each Hub

Computerat Each Hub

ASIC

X+X-Y+Y-Z+Z-

Hi-speed serial links8bit x 6.5Gbps / each

XL YL ZLHi-speed serial linksfor torus bypass(optional)

FPGASSTL_15 etc.256bit x 500-800MHz

Hi-speed serial linksto FPGA L14bit x 3.125Gbpsor 2bit x 6.5Gbps?or 4bit x 6.5Gbps?

Gigabit Ethernet

DDR3(micro)DIMM

SSTL_15 etc.64bit x 800MHz