biology 559r: introduction to phylogenetic comparative methods€¦ · • a computer cluster is an...

27
Biology 559R: Introduction to Phylogenetic Comparative Methods Topics for this week (April 7 & April 9): • Cluster fun: Intro to BYU supercomputer Intro to UNIX Running programs and jobs • First round of presenta?ons 1

Upload: others

Post on 13-Sep-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Biology 559R: Introduction to Phylogenetic Comparative Methods€¦ · • A computer cluster is an array of connected computers with"similar"aributes"thatcan"be"used"" individually

Biology 559R: Introduction to Phylogenetic Comparative Methods

Topics  for  this  week  (April  7  &  April  9):  

•    Cluster  fun:    

       Intro  to  BYU  supercomputer  

       Intro  to  UNIX  

       Running  programs  and  jobs  

•  First  round  of  presenta?ons  

1  

Page 2: Biology 559R: Introduction to Phylogenetic Comparative Methods€¦ · • A computer cluster is an array of connected computers with"similar"aributes"thatcan"be"used"" individually

2  

Presentations and Report

• This next Thursday (April 9th and April 14th) • You have 15 minutes (12 min for presentation and 3 min for questions) • You will need to send me or bring a pdf of your presentation (so I can put it in my computer) • You will need a final report (no more than 4500 words – 10 pages) in the format of a ‘Brief Communication’ of your project (last day to turn this report April 20th): Title: 150 characters (including spaces) Abstract: 300 words maximum for your abstract. Introduction: A brief background introduction Your main question or hypothesis Materials and Methods: Summary methods and data used Results: Main results including figures and tables Discussion: The relevance of the main results in the light of other evidence References: Use the Evolution journal guideline http://onlinelibrary.wiley.com/journal/10.1111/%28ISSN%291558-5646/homepage/ForAuthors.html

Page 3: Biology 559R: Introduction to Phylogenetic Comparative Methods€¦ · • A computer cluster is an array of connected computers with"similar"aributes"thatcan"be"used"" individually

• A computer cluster is an array of connected computers with  similar  aDributes  that  can  be  used    individually or work together so that, in many respects, they can be viewed as a single system. • Most clusters use UNIX as operating system and users are required to have a minimum knowledge of this programing language. • Phylogenetic and comparative methods might require the use of computer clusters to reduce the time and memory requirements to analyze large datasets.

3  

Computer Clusters

Page 4: Biology 559R: Introduction to Phylogenetic Comparative Methods€¦ · • A computer cluster is an array of connected computers with"similar"aributes"thatcan"be"used"" individually

• The Fulton Supercomputing Lab: https://marylou.byu.edu/

4  

Computer Clusters at BYU

Page 5: Biology 559R: Introduction to Phylogenetic Comparative Methods€¦ · • A computer cluster is an array of connected computers with"similar"aributes"thatcan"be"used"" individually

• The Fulton Supercomputing Lab: https://marylou.byu.edu/

5  

Computer Clusters at BYU

1  

Page 6: Biology 559R: Introduction to Phylogenetic Comparative Methods€¦ · • A computer cluster is an array of connected computers with"similar"aributes"thatcan"be"used"" individually

• The Fulton Supercomputing Lab: https://marylou.byu.edu/

6  

Computer Clusters at BYU

2  

Page 7: Biology 559R: Introduction to Phylogenetic Comparative Methods€¦ · • A computer cluster is an array of connected computers with"similar"aributes"thatcan"be"used"" individually

• The Fulton Supercomputing Lab: https://marylou.byu.edu/

7  

Computer Clusters at BYU

3  

Page 8: Biology 559R: Introduction to Phylogenetic Comparative Methods€¦ · • A computer cluster is an array of connected computers with"similar"aributes"thatcan"be"used"" individually

• The Fulton Supercomputing Lab: https://marylou.byu.edu/ • For reference: MacPro 12-cores, 64 GB memory, 1 TB storage

8  

Computer Clusters at BYU

Page 9: Biology 559R: Introduction to Phylogenetic Comparative Methods€¦ · • A computer cluster is an array of connected computers with"similar"aributes"thatcan"be"used"" individually

• The Fulton Supercomputing Lab: https://marylou.byu.edu/ • An introductory video to the Fulton Supercomputing Lab (Time 6:37): https://www.youtube.com/watch?v=i1r9BxHBG0I

9  

Computer Clusters at BYU

Page 10: Biology 559R: Introduction to Phylogenetic Comparative Methods€¦ · • A computer cluster is an array of connected computers with"similar"aributes"thatcan"be"used"" individually

• The Fulton Supercomputing Lab: https://marylou.byu.edu/account/create/

10  

Getting an account at BYU Clusters

Page 11: Biology 559R: Introduction to Phylogenetic Comparative Methods€¦ · • A computer cluster is an array of connected computers with"similar"aributes"thatcan"be"used"" individually

• This is family of computer operating systems used by programmers and very common in cluster computers. • For users, UNIX operating system is characterized: 1) Command-line based interaction with the computer 2) Plain text for storing and input of data 3) A large collection of software tools that need to be called by the user • Several Unix-like (e.g., Linux) exist that are free or open-source software development that has facilitated its distribution and popularization.

11  

UNIX

Page 12: Biology 559R: Introduction to Phylogenetic Comparative Methods€¦ · • A computer cluster is an array of connected computers with"similar"aributes"thatcan"be"used"" individually

• For Mac Users, the terminal is probably the main software to access and connect to the BYU cluster • We can open any directory in our computer by typing cd and then dragging a folder that we are interested to explore

12  

Intro to UNIX

Page 13: Biology 559R: Introduction to Phylogenetic Comparative Methods€¦ · • A computer cluster is an array of connected computers with"similar"aributes"thatcan"be"used"" individually

• For the course website, we can explore the ‘Unix-cheat-sheet’ and some of the most common tools • Using terminal, you can login to your BYU cluster account using ssh (secure shell) • Then, you can explore any folder by using list (ls) and move up and down directories (cd). 13  

Intro to UNIX

Page 14: Biology 559R: Introduction to Phylogenetic Comparative Methods€¦ · • A computer cluster is an array of connected computers with"similar"aributes"thatcan"be"used"" individually

• One of the most common used to abbreviate names is the use of ‘wildcard’ characters. This reduces significantly the amount of typing. • By pressing the up (é) arrow, you can get the previous command line that you typed before. You can continue pressing the up arrow to get previous commands. • By pressing the down (ê) arrow, you can get the posterior command line that you typed after.

14  

Intro to UNIX

Page 15: Biology 559R: Introduction to Phylogenetic Comparative Methods€¦ · • A computer cluster is an array of connected computers with"similar"aributes"thatcan"be"used"" individually

• It is very likely that you also spend a significant amount of time copying and making new directories • Notice the scp command, this will allow you to copy files and folders from your computer to the BYU cluster and vice versa.

15  

Intro to UNIX

Page 16: Biology 559R: Introduction to Phylogenetic Comparative Methods€¦ · • A computer cluster is an array of connected computers with"similar"aributes"thatcan"be"used"" individually

• Removing, deleting and renaming files and folders can be tedious and you have to be very careful with these commands. Once deleted, some files and folders are lost and there is no ‘trash bin’ that you can recover those files.

16  

Intro to UNIX

Page 17: Biology 559R: Introduction to Phylogenetic Comparative Methods€¦ · • A computer cluster is an array of connected computers with"similar"aributes"thatcan"be"used"" individually

• Most text edition should be done in your computer using ‘Text Wrangler’, ‘NotePad++’ or any other editor. However, you might need to inspect the files in the cluster (e.g., checking that you are getting the correct output or results in error.logs) Notice the use of VIM editor, this is a very powerful text editor available in the cluster. However, it use requires significant practice. Here is a link if you interested in more details about this software: http://www.fprintf.net/vimCheatSheet.html

17  

Intro to UNIX

Page 18: Biology 559R: Introduction to Phylogenetic Comparative Methods€¦ · • A computer cluster is an array of connected computers with"similar"aributes"thatcan"be"used"" individually

• Most process that require a computer cluster are usually large and require that you copy such files from your computer to the cluster and vice versa. For this reason, most files are compressed before and expanded after the transfer.

18  

Intro to UNIX

Page 19: Biology 559R: Introduction to Phylogenetic Comparative Methods€¦ · • A computer cluster is an array of connected computers with"similar"aributes"thatcan"be"used"" individually

• With the exception of some scripts (e.g., testing scripts), most jobs that will be run after a job script has been submitted to the cluster. • Currently, the BYU clusters accept job scripts submitted as sbatch included as a command of the Slurm (Simple Linux Utility for Resource Management) Workload Manager. • Slurm is an open-source job scheduler that resources (computer nodes) to users for some duration of time so they can perform work. • Slurm also provides a framework for starting, executing, and monitoring work on a set of allocated nodes. • Finally, Slurm arbitrates contention for resources by managing a queue of pending jobs. • Here is and intro video by FSL: https://www.youtube.com/watch?v=U42qlYkzP9k&index=4&list=PL326A5EB4E3B16FED

19  

Submitting Jobs to the BYU clusters

Page 20: Biology 559R: Introduction to Phylogenetic Comparative Methods€¦ · • A computer cluster is an array of connected computers with"similar"aributes"thatcan"be"used"" individually

• Make a text file (e.g., mytest.job) Note: You can also have a working folder in your home directory: cd /bluehome3/user_name/mydirectory!

20  

Creating Job Scripts

Page 21: Biology 559R: Introduction to Phylogenetic Comparative Methods€¦ · • A computer cluster is an array of connected computers with"similar"aributes"thatcan"be"used"" individually

• Batch job with modules: https://marylou.byu.edu/documentation/apps/softwareModuleList

21  

Creating Job Scripts

Page 22: Biology 559R: Introduction to Phylogenetic Comparative Methods€¦ · • A computer cluster is an array of connected computers with"similar"aributes"thatcan"be"used"" individually

• Batch job with modules: https://marylou.byu.edu/documentation/apps/softwareModuleList

22  

Creating Job Scripts

Page 23: Biology 559R: Introduction to Phylogenetic Comparative Methods€¦ · • A computer cluster is an array of connected computers with"similar"aributes"thatcan"be"used"" individually

• Running R in the cluster: It requires that you have r script (i.e., a set commands ready by run, like copy and paste)

23  

Creating Job Scripts

Page 24: Biology 559R: Introduction to Phylogenetic Comparative Methods€¦ · • A computer cluster is an array of connected computers with"similar"aributes"thatcan"be"used"" individually

• Running R in the cluster: It requires that you have r script (i.e., a set commands ready by run, like copy and paste) More info: https://www.osc.edu/documentation/howto/install-local-R-packages 24  

Creating Job Scripts

Page 25: Biology 559R: Introduction to Phylogenetic Comparative Methods€¦ · • A computer cluster is an array of connected computers with"similar"aributes"thatcan"be"used"" individually

• Copy your files and job directories to the BYU cluster. You can use ‘FileZilla’ which is a free, cross-platform FTP (File Transfer Protocol) application software. https://filezilla-project.org/ Binaries are available for Windows, Linux, and Mac OS X. • If this is your first time, you need to set up the FTP connection: Host: ssh.fsl.byu.edu Username: user_name Password: your password Port: 22 (SSH Remote Login Protocol) • Then, you can ‘drag and drop’ your folders and files (including your job files) to the BYU cluster

25  

Copy folder and files to the Cluster

Page 26: Biology 559R: Introduction to Phylogenetic Comparative Methods€¦ · • A computer cluster is an array of connected computers with"similar"aributes"thatcan"be"used"" individually

• To submit jobs, you will need to locate your job file (e.g., mytest.job) sbatch mytest.job! • Your job will be scheduled and run based on Job Scheduler see video: https://www.youtube.com/watch?v=h8TZokyI6yo&list=PL326A5EB4E3B16FED&index=2 • Your can check the status of your job squeue -u user_name! • If, for some reason, you want to cancel a job , find the job id name with and then cancel it squeue -u user_name!scancel jobnumber!

26  

Submitting and monitoring jobs in the Cluster

Page 27: Biology 559R: Introduction to Phylogenetic Comparative Methods€¦ · • A computer cluster is an array of connected computers with"similar"aributes"thatcan"be"used"" individually

27  

Presentations and Report

• This next Thursday (April 9th and April 14th) • You have 15 minutes (12 min for presentation and 3 min for questions) • You will need to send me or bring a pdf of your presentation (so I can put it in my computer) • You will need a final report (no more than 4500 words – 10 pages) in the format of a ‘Brief Communication’ of your project (last day to turn this report April 20th): Title: 150 characters (including spaces) Abstract: 300 words maximum for your abstract. Introduction: A brief background introduction Your main question or hypothesis Materials and Methods: Summary methods and data used Results: Main results including figures and tables Discussion: The relevance of the main results in the light of other evidence References: Use the Evolution journal guideline http://onlinelibrary.wiley.com/journal/10.1111/%28ISSN%291558-5646/homepage/ForAuthors.html