eece 571r: data-intensive computing systems

Download EECE 571R: Data-intensive computing systems

Post on 05-Jan-2016

50 views

Category:

Documents

1 download

Embed Size (px)

DESCRIPTION

EECE 571R: Data-intensive computing systems. Matei Ripeanu matei at ece.ubc.ca. Contact Info. Email : matei @ ece.ubc.ca Office : KAIS 4033 Office hours : by appointment (email me) Course page : http://www.ece.ubc.ca/~matei/EECE571/. EECE 571R: Course Goals. Primary - PowerPoint PPT Presentation

TRANSCRIPT

  • EECE 571R:Data-intensive computing systems

    Matei Ripeanumatei at ece.ubc.ca

    Matei Ripeanu, UBC

  • Contact InfoEmail: matei @ ece.ubc.caOffice: KAIS 4033Office hours: by appointment (email me)Course page: http://www.ece.ubc.ca/~matei/EECE571/

    Matei Ripeanu, UBC

  • EECE 571R: Course GoalsPrimaryGain deep understanding of fundamental issues that affect design of:Data-intensive systems(more generally) Large-scale distributed systemsSurvey main current research themesGain experience with distributed systems researchResearch on: federated system, networksSecondaryBy studying a set of outstanding papers, build knowledge of how to do & present researchLearn how to read papers & evaluate ideas

    Matei Ripeanu, UBC

  • What Ill Assume You KnowBasic Internet architectureIP, TCP, DNS, HTTPBasic principles of distributed computingAsynchrony (cannot distinguish between communication failures and latency)Incomplete & inconsistent global state knowledge (cannot know everything correctly)Failures happen (In large systems, even rare failures of individual components, aggregate to high failure rates)If there are things that dont make sense, ask!

    Matei Ripeanu, UBC

  • OutlineCase study (and project ideas):Volunteer computing: SETI@home /BOINCVirtual Data SystemBatch Aware Distributed File SystemAdministrative

    Matei Ripeanu, UBC

  • Matei Ripeanu, UBC

  • How does it work?Characteristics:Fixed-rate data processing taskLow bandwidth/computation ratioIndependent parallelismError toleranceSETI@homeMaster-workerarchitecture

    Matei Ripeanu, UBC

  • SETI@home Operationsdatarecorder

    Matei Ripeanu, UBC

  • History and StatisticsConceived 1995, launched April 1999Millions of users, hostsNo ET signals yet, but other results

    Matei Ripeanu, UBC

  • Millions of individual contributors!(Problems)Server scalabilityDealing with excess CPU timeUntrusted environment: Bad user behaviorCheatingTeam recruitment by spamSale of accounts on eBayMalfunctions of individual components

    Matei Ripeanu, UBC

  • SETI@home: SummaryThe characteristics of the problem Massive (embarrassing) parallelismLow bandwidth/computation ratioFixed-rate data processing task make possible a solution that operates in an unfriendly environmentWide area distribution; huge scale High failure ratesUntrusted/malicious components

    Solution: Master-worker designMaster=central point of controlSingle point of failurePerformance bottleneck

    Matei Ripeanu, UBC

  • OutlineCase study (and project ideas):Volunteer computing: SETI@home /BOINCVirtual Data SystemBatch Aware Distributed File SystemAdministrative

    Matei Ripeanu, UBC

  • Virtual Data System

    Context: big science Motivation/goals: support science process, i.e., track all aspects of data capture, production, transformation, and analysisRequirements: ability to define complex workflows, and to reliably & efficiently execute workflows in heterogeneous, multi-domain environments.Derived benefits: helps to audit, validate, reproduce, and/or rerun with corrections various data transformations.

    Matei Ripeanu, UBC

  • The European Organisationfor Nuclear ResearchCERN builds particle accelerators for particle physics researchBIG Science!

    Matei Ripeanu, UBC

  • reconstruction

    simulationanalysis

    interactivephysicsanalysis

    batchphysicsanalysisdetectorevent summary datarawdataeventreprocessingeventsimulationanalysis objects(extracted by physics topic)Data Handling and Computation for Physics Analysisevent filter(selection &reconstruction)processeddatales.robertson@cern.ch

  • CMS Grid HierarchyOnline SystemCERN Computer Center > 20 TIPSUSA CenterFrance Center Italy Center UK Center InstituteInstituteInstituteWorkstations, other portals100MB~1.5GB/sec2.5-10 Gbits/sec0.1-1 Gbits/secBunch crossing per 25 ns 100 triggers per second ~1 MByte per eventPhysics data cache10 ~ 40 Gbits/sec0.6-2.5 Gbits/secTier 0Tier 1Tier 3Tier 4Experiment2500 Physists, 40 countries10s of Petabytes/Yr by 2008InstituteInstituteInstituteInstituteInstituteInstituteTier 2

    Matei Ripeanu, UBC

  • TransformationDerivationDataProduct-ofexecution-ofconsumed-by/generated-byIve detected a calibration error in an instrument and want to know which derived data to recompute.Motivations (1)

    Matei Ripeanu, UBC

  • Motivations (2)Data track-ability and result audit-abilityRepair and correction of dataRebuild data productsc.f., makeWorkflow managementA new, structured paradigm for organizing, locating, specifying, and requesting data productsPerformance optimizationsAbility to re-create data rather than move it

    Matei Ripeanu, UBC

  • RequirementsExpress complex multi-step workflowsPerhaps 100,000s of individual tasksOperate on heterogeneous distributed dataDifferent formats & access protocolsHarness many computing resourcesParallel computers &/or distributed GridsExecute workflows reliablyDespite diverse failure conditionsEnable reuse of data & workflowsDiscovery & compositionSupport many users, workflows, resourcesPolicy specification & enforcement

    Matei Ripeanu, UBC

  • Virtual Data SystemLocal plannerDAGmanDAGStaticallyPartitionedDAGDAGman &Condor-GDynamicallyPlannedDAGJobPlannerJobCleanupCreate Execution PlanGrid Workflow Execution

    Matei Ripeanu, UBC

  • VDS Software StackExpress complex multi-step workflowsPerhaps 100,000s of individual tasksOperate on heterogeneous distributed dataDifferent formats & access protocolsHarness many computing resourcesParallel computers &/or distributed res.Execute workflows reliably & efficientlyDespite diverse failure conditionsEnable reuse of data & workflowsDiscovery & compositionSupport many users, workflows, resourcesPolicy specification & enforcementVDL,XDTMPegasus, DAGman,GlobusVDCTBD

    Matei Ripeanu, UBC

  • OutlineCase study (and project ideas):Volunteer computing: SETI@home /BOINCVirtual Data SystemBatch Aware Distributed File SystemAdministrative

    Matei Ripeanu, UBC

  • Batch-aware Distributed File System

    Matei Ripeanu, UBC

  • Motivating question: Are existing distributed file systems adequate for batch computing workloads?NO. Internal decisions inappropriateCaching, consistency, replicationA solution: Combine scheduling knowledge with external storage controlDetail information about workload is knownStorage layer allows external controlExternal scheduler makes informed storage decisionsCombining information and control results inImproved performanceMore robust failure handlingSimplified implementationExplicit Control in a Batch-Aware Distributed File System, John Bent, Douglas Thain, Andrea C.Arpaci-Dusseau, Remzi H. Arpaci-Dusseau, Miron Livny, (NSDI '04)

    Matei Ripeanu, UBC

  • OutlineBatch computingSystemsWorkloadsEnvironmentWhy not DFS?Solution: BAD-FSDesignExperimental evaluation

    Matei Ripeanu, UBC

  • Batch computingHome storageInternet

    Matei Ripeanu, UBC

  • Batch computingNot interactiveCompute LoopUsers submit jobsJob description languagesSystem itself executesResults are copied back to user systemMany exiting batch systemsCondor, LSF, PBS, Sun Grid Engine

    Matei Ripeanu, UBC

  • Batch computingInternetSchedulerHome storage1234

    Matei Ripeanu, UBC

  • Batch workloadsGeneral propertiesLarge number of processesProcess and data dependenciesI/O intensiveDifferent types of I/OEndpointBatchPipelineUsage: mainly scientific workloads, but also video production, data mining, electronic design, financial services, graphic renderingPipeline and Batch Sharing in Grid Workloads, Douglas Thain, John Bent, Andrea Arpaci-Dusseau, Remzi Arpaci-Dussea, Miron Livny. HPDC 12, 2003.

    Matei Ripeanu, UBC

  • Batch workloadsEndpointEndpointEndpointEndpointEndpointEndpointPipelinePipelinePipelinePipelinePipelinePipelinePipeline

    Matei Ripeanu, UBC

  • Cluster-to-cluster (c2c)Not quite p2pMore organizedLess hostileMore homogeneityEach cluster is autonomousRun and managed by different entities An obvious bottleneck is wide-area networkQ: How to manage flow of data into, within and out of these clusters?InternetHomestore

    Matei Ripeanu, UBC

  • Why not a traditional Distributed File System ?Distributed file system (DFS) would be idealEasy to useUniform name space

    But . . . Designed for wide-area networks Not practicalEmbedded decisions are wrongInternetHomestore

    Matei Ripeanu, UBC

  • Distributed file systems make bad decisionsCaching Must guess what and how to cacheConsistency Output: Must guess when to commitInput: Needs mechanism to invalidate cacheReplication Must guess what to replicate

    Matei Ripeanu, UBC

  • BAD-FS makes good (i.e. informed) decisions Removes the guessworkScheduler has detailed workload knowledgeStorage layer designed to allow external controlScheduler makes informed storage decisionsManages data as well as computationsRetains simplicity of distributed file systemsPractical and deployable

    Matei Ripeanu, UBC

  • OutlineIntroductionBatch computingSystemsWorkloadsEnvironmentWhy not DFS?One solution: BAD-FSDesignExperimental evaluation

    Matei Ripeanu, UBC

  • Solution BAD-FS: Practical and deployableUser-level; requires no privilege Packaged as a modified batch system

    A new batch system which includes BAD-FSGeneral: will work on all batch systemsInternetSGESGESGESGESGES

View more >