a.r.m.s. active resource management services

37
A.R.M.S. Active Resource Management Services Presentation One 2/21/2013 1

Upload: tracen

Post on 25-Feb-2016

46 views

Category:

Documents


1 download

DESCRIPTION

A.R.M.S. Active Resource Management Services. Presentation One. Outline Introductions Societal Issue E xamined. Michael Rajs. Outline. Group Members and Roles: s lide 4 Introduce Mentor: slide 5 Societal I ssue: slide 6 History: slides 7-11 Case S tudy: slides 12-16 - PowerPoint PPT Presentation

TRANSCRIPT

PowerPoint Presentation

A.R.M.S. Active Resource Management ServicesPresentation One2/21/20131

OutlineIntroductionsSocietal Issue Examined

Michael Rajs2/21/2013

2OutlineGroup Members and Roles: slide 4Introduce Mentor: slide 5Societal Issue: slide 6History: slides 7-11Case Study: slides 12-16Problem Statement: slide 18Computer Components Identified: slides 19 -21Major Functional Component Diagram: slide 22Current Process Flow: slide 23

Solution Statement: slide 25Objectives: slide 26Improved Process Flow: slide 27Competition Identified: slides 28-30Benefits of Solution: slide 32Problems with Solution: slide 33Conclusion: slide 34References: slides 35-36

2/21/201333Group Members and RolesMichael Rajs (Group Manager)Adam Willis (Research Specialist)Sybil Acotanza (Visualization Engineer)Scott Pardue (Team Leader)Jordan Heinrichs (Marketing Analyst)David Crook (Documentation Specialist)

2/21/201344Yaohang LiIs an Associate Professor in the Department of Computer Science at Old Dominion University.His research interests are in Computational Biology, Markov Chain Monte Carlo (MCMC) methods and Parallel Distributed Grid Computing.

2/21/20135What is the societal issue being faced?How do researchers handle the massive amounts of data they are collecting?2/21/20136Historical BackgroundAdam Willis2/21/20137Collection of Data1890 Census Recorded With an Electric Machine 11935 Social Security Act 21974 Privacy Act 31989 World Wide Web 41997 Big Data 52011 IBMs Watson 6Now Every day, we create 2.5 quintillion bytes of data so much that 90% of the data in the world today has been created in the last two years alone.72/21/201388Examples of Big DataLarge Hadron Collider 8150 million sensors report 40 million times per secondFacebook 92.5 billion content items shared2.7 billion Likes300 million photos uploadedWalmart 8 1 million customer transactions 2.5 petabytes of data

2/21/20139Big Data Analysis HardwareCluster Computing 10A cluster consists of many nodes (computers).Big data can be generated and analyzed quicker by spreading the workload amongst the nodes.

2/21/201310Managing the ClusterDistributed Resource Management Systems (D-RMS)Job management subsystemPhysical resource management subsystemScheduling and queuing subsystem2/21/201311Case Study

Sybil Acotanza2/21/201312Dinosolve Case StudyBioinformaticsDisulfide bond prediction program

(Cronk, 2012)2/21/201313Dinosolve UsersWho will use it?Drug and antibody designBio-energy developmentGenetic mapping11Why will they use it?2% accuracy improvement122/21/201314Dinosolve Web Site

(Li & Yaseen, http://hpcr.cs.odu.edu/dinosolve/)2/21/201315Dinosolve Possible ProblemsHard resources for computationCPU cyclesMemoryDisk spaceNetwork bandwidthServer crashes2/21/201316Problem statementComponents of Hardware and SoftwareCurrent Process FlowScott Pardue2/21/201317What is the problem?Processing time on big data sets is computationally expensive and as the volume of queries grows the system will progressively drop in performance until the system fails.2/21/201318What are the components of our current system?The current system uses the following software and hardware.2/21/201319Software Unix operating system installed on the dinosolve clusterDinosolve algorithmSun Grid Engine which will be our Distributed Resource Management System (D-RMS) installed on the cluster.MySQL (database software) Web based user interface (website)2/21/201320HardwareMySQL database serverA computer cluster to run the dinosolve algorithmWeb server for our web based user interface

2/21/201321Major Functional Component Diagram2/21/201322

2/21/201323Solution StatementObjectivesImproved Process FlowCompetition Identified

Jordan Heinrichs2/21/201324How will we correct the problem?We aim to configure a distributed resource management system (D-RMS), in this case Sun Grid Engine (SGE), to handle resource allocation on the dinosolve cluster.2/21/201325ObjectivesInterpret and visualize current usage statisticsConfigure, utilize, and optimize the SGEAesthetically pleasing and professional user interface

2/21/201326Process Flow with Solution

2/21/201327Competing Distributed Resource Management SystemsSun Grid Engine (SGE)Portable Batch System (PBS)Load Sharing Facility (LSF)

2/21/201328Competing Resource Management SystemsFeatures of systemsPBSLSFSGESupported platformsUnixUnix & NTUnixMulti-clustersupportYesYesNoSystem level checkpoint restartYesYesYesUser level checkpoint restartNoYesYesLarge computational grid supportNoNoNoMassive ScalabilityYesYesYesParallel job support with Sun HPC ClusterToolsLoose IntegrationTight IntegrationLoose IntegrationDistribution format of end productSourceBinary onlyBinary and SourceFree?YesNoYesPosix 1002.2d complianceYesNoYes2/21/2013Reference 3129Competing Protein Prediction Servers2/21/2013Reference 19,20 and 2130DinosolveDiANNAScrath Protein PredictorAccuracy90.8%81%87%UsabilityXXX508.22 compliance percentage67%85%67%Professional30Benefits of solutionProblems with solutionConclusionDavid Crook2/21/201331What benefits will come from attaining our goals?Efficient utilization of available resourcesIncreased throughput of the clusterAn intuitive and professional user interfaceRise in popularity due to excellent accuracy, efficiency, and professional design

2/21/201332Problems with solutionImproper synchronization of cluster resources can lead to a deadlock in the systemRace conditions between the HPCR cluster and the MySQL database

2/21/201333ConclusionWith the updated user interface and correctly configured Sun Grid Engine we hope to establish a reputable Disulfide Bonding Prediction Server.2/21/201334References for historyhttp://www.columbia.edu/cu/computinghistory/hh/index.htmlhttp://query.nytimes.com/gst/abstract.html?res=F50C11FE385D13728DDDAE0A94DA415B868FF1D3http://www.census.gov/history/pdf/kraus-natdatacenter.pdfhttp://www.bbc.co.uk/history/historic_figures/berners_lee_tim.shtmlhttp://dl.acm.org/citation.cfm?id=266989.267068&coll=DL&dl=GUIDEhttp://www.nytimes.com/2012/08/12/business/how-big-data-became-so-big-unboxed.html?_r=1http://www-01.ibm.com/software/data/bigdata/http://en.wikipedia.org/wiki/Big_datahttp://techcrunch.com/2012/08/22/how-big-is-facebooks-data-2-5-billion-pieces-of-content-and-500-terabytes-ingested-every-day/http://en.wikipedia.org/wiki/Computer_cluster

2/21/201335References for case study11. Li, Y. (2010, September 1). CAREER: Novel Sampling Approaches for Protein Modeling Applications [Abstract].National Science Foundation Award Abstract #1066471.

12. Li, Y., & Yaseen, A. (2012). Enhancing Protein Disulfide Bonding PredictionAccuracy with Context-based Features.Biotechnology and Bioinformatics Symposium

13. bioinformatics. 2011. In Merriam-Webster.com. Retrieved February 15, 2013, fromhttp://www.merriam-webster.com/dictionary/bioinformatics

14. Cronk, J. D. (2012). Disulfide Bond. Retrieved February 15, 2013, from Biochemistry Dictionary:http://guweb2.gonzaga.edu/faculty/cronk/biochem/D-index.cfm?definition=disulfide_bond

15. Yan, Y., & Chapman, B. (2008).Comparative Study of Distributed Resource Management SystemsSGE, LSF, PBS Pro, and LoadLeveler. Technical Report-Citeseerx.

16. Li, Y., & Yaseen, A. (2012). Dinosolve. Retrieved from http://hpcr.cs.odu.edu/dinosolve/2/21/201336References for competition17. Arvind Krishna, Why Big Data? Why Now?, IBM , 2011 URL: http://almaden.ibm.com/colloquium/resources/Why%20Big%20Data%20Krishna.PDF18. Yonghong Yan, Barbara M. Chapman, Comparative Study of Distributed Resource Management Systems - SGE, LSF, PBS Pro, and LoadLeveler, Department of Computer Science, University of Houston, May 2005 (pdf)19. Dr. Lis site http://hpcr.cs.odu.edu/dinosolve/20. Scratch Predictorhttp://scratch.proteomics.ics.uci.edu/21. DiANNA server http://clavius.bc.edu/~clotelab/DiANNA/Portable Batch System (PBS)22. http://resources.altair.com/pbs/documentation/support/PBSProUserGuide12-2.pdf23. http://www.pbsworks.com/SupportDocuments.aspx?AspxAutoDetectCookieSupport=124. http://resources.altair.com/pbs/documentation/support/PBSProRefGuide12-2.pdf25.http://resources.altair.com/pbs/documentation/support/PBSProAdminGuide12-2.pdf26.http://www.pbsworks.com/(S(tykrsyqbemmlf3o5zwrmjrgf))/images/solutions-en-US/PBS-Pro_Datasheet-USA_WEB.pdf27.http://agendafisica.files.wordpress.com/2011/05/pbs.pdfMoab HPC Suite28.http://www.adaptivecomputing.com/publication/420/wppa_open/IBM Platform LSF29.http://public.dhe.ibm.com/common/ssi/ecm/en/dcd12354usen/DCD12354USEN.PDFApache Hadoop with Zookeeper30. http://zookeeper.apache.org/doc/current/zookeeperOver.html31. http://www.cloud-net.org/~swsellis/tech/solaris/performance/doc/blueprints/0102/jobsys.pdf

2/19/2013References37