proceedings trim size: 9.75in x 6.5in text area: 8in ... · csiro nicta queensland cyber...

249
August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master i Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in (include runningheads) x 5in Main Text is 10/13pt For Half-Title Page (prepared by publisher)

Upload: others

Post on 26-Aug-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

i

Proceedings Trim Size: 9.75in x 6.5inText Area: 8in (include runningheads) x 5inMain Text is 10/13pt

For Half-Title Page (prepared by publisher)

Page 2: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

ii

Publishers’ page — (Blank page)

Page 3: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

iii

For Full Title Page (prepared by publisher)

Page 4: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

iv

For Copyright Page (prepared by publisher)

Page 5: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

v

PREFACE

This book contains papers presented at the Ninteenth International Conferenceon Genome Informatics (GIW 2008) held on the Gold Coast, Queensland, Australiaon December 1st to 3rd, 2008.

The GIW series provides an international forum for presentation and discussionof original research papers on all aspects of bioinformatics, computational biol-ogy, and systems biology. Its scope includes biological sequence analysis, proteinstructure prediction, gene regulatory networks, clustering algorithms, comparativegenomics, text mining, and many other areas. GIW has a history of 19 years andis the longest running international bioinformatics conference. The first GIW washeld at Kikai Shinko Kaikan, Tokyo during December 3-4, 1990 as an open workshopjust before the Japanese Human Genome Project started in 1991. GIW 2008 wasthe first time the conference has been held in Australia. This year it was hosted byBioinformatics Australia, representing the bioinformatics community in Australia,and incorporated the annual Bioinformatics Australia conference. BioinformaticsAustralia is organized within AusBiotech, the national peak body for biotechnologyin Australia.

The Program Committee of GIW 2008 received a total of 55 submissions fromauthors in 16 different countries around the world. Each submitted paper was peer-reviewed by at least three members of the Program Committee. Based on theirreports, 18 papers were accepted (33%) for presentation at the conference. These18 papers appear in this book and are indexed in Medline. In addition, this bookcontains abstracts from the six invited speakers: Sean Grimmond, University ofQueensland (Australia), Eugene Koonin, National Centre for Biotechnology Infor-mation (USA), Ming Li, University of Waterloo (Canada), Yixue Li, Shanghai Jiao-tong University (China), John Mattick, University of Queensland (Australia), andEric Schadt, Rosetta Inpharmatics (USA).

The electronic versions of all the papers in this issue are also publicly availablefrom the website of the Japanese Society for Bioinformatics (JSBi) (http://www.jsbi.org/journal.html).

Jonathan Arthur Mark RaganSee-Kiong Ng GIW 2008 Conference ChairGIW 2008 Program Committee Co-Chairs

Page 6: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

vi

ACKNOWLEDGMENTS

We thank all the authors for their efforts in preparing their manuscripts. Wealso appreciate the great efforts made by the Program Committee members inrigourously reviewing the manuscripts. The high quality of the papers presentedby the authors provided a challenging task in selecting the very best for acceptance.We greatly appreciate the time and effort of both the authors and the ProgramCommittee, in their respective contributions, to continuing the GIW tradition of ahigh quality, engaging scientific program.

We also acknowledge Bioinformatics Australia (within AusBiotech Ltd) for host-ing GIW 2008 as well as the assistance of the National Organizing Committee, theLocal Organizing Committee, and the Conference Organisers (Martin Lack andAssociates) for the coordination of the conference.

We are grateful for the support of the Department of Innovation, Industry, Sci-ence and Research, the Queensland State Government, and:

AIST Computational Biology Research CenterARC Research Network in Enterprise Information InfrastructureAustralian Centre for Plant Functional GenomicsAustralian Genome Research FacilityCSIRONICTAQueensland Cyber Infrastructure FoundationSGISydney BioinformaticsUniversity of Queensland

Finally, we give special thanks to those who presented papers or posters at GIW2008, and those who attended the conference. GIW 2008 would not be a completesuccess without their enthusiastic participation.

Page 7: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

vii

PROGRAM COMMITTEE

Jonathan Arthur – University of Sydney, Australia; Co-ChairSee-Kiong Ng – Institute for Infocomm Research, Singapore; Co-ChairCathy Abbott – Flinders University, AustraliaGary Bader – University of Toronto, CanadaVladimir Bajic – South African National Bioinformatics Institute, South AfricaChristopher Baker – Institute for Infocomm Research, SingaporeGuillaume Bourque – Genome Institute of Singapore, SingaporeJung-Hsien Chiang – National Cheng Kung University, TaiwanFrancis YL Chin – University of Hong Kong, Hong KongPeter Clote – Boston College, USAAaron Darling – University of Queensland, AustraliaBhaskar DasGupta – University of Illinois, USAColin Dewey – University of Wisconsin, USAChris Ding – University of Texas at Arlington, USARoland Dunbrack – Fox Chase Cancer Center, USAJenny Graves – Australian National University, AustraliaWin Hide – South African National Bioinformatics Institute, South AfricaTamas Horvath – University of Bonn and Fraunhofer IAIS, GermanyWen-Lian Hsu – Academia Sinica,TaiwanSeiya Imoto – University of Tokyo, JapanLars Jermiin – University of Sydney, AustraliaMinoru Kanehisa – Kyoto University, JapanGeorge Karypis – University of Minnesota, USAUri Keich – Cornell University, USADaisuke Kihara – Purdue University, USAEdda Klipp – Max Planck Institute for Molecular Genetics, GermanyStefen Kramer – Technische Universitat Munchen, GermanyDong-Yup Lee – Bioprocessing Institute & National University of Singapore, SingaporeSang Yup Lee – KAIST, Korea

Page 8: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

viii

Ming Li – University of Waterloo, CanadaFrederique Lisacek – Swiss Institute of Bioinformatics, SwitzerlandHiroshi Mamitsuka – Kyoto University, JapanAleksandar Milosavljevic – Baylor College of Medicine, USASatoru Miyano – University of Tokyo, JapanBernard Moret – Swiss Federal Institute of Technology, SwitzerlandShin-ichi Morishita – University of Tokyo, JapanPablo Moscato – University of Newcastle, AustraliaWilliam Stafford Noble – University of Washington, USALaxmi Parida – IBM T. J. Watson Research Center, USARon Pinter – Technion, IsraelShoba Ranganathan – Macquarie University, AustraliaAllen Rodrigo – University of Auckland, New ZealandRintaro Saito – Keio University, JapanYasubumi Sakakibara – Keio University, JapanChristian Schonbach – Nanyang Technological University, SingaporeTetsuo Shibuya – University of Tokyo, JapanMona Singh – Princeton University, USAWing Kin Sung – National University of Singapore, SingaporeKoji Tsuda – Max Planck Institute for Biological Cybernetics, GermanyAlfonso Valencia – Universidad Autonoma, SpainGabriel Valiente – Technical University of Catalonia, SpainJean-Philippe Vert – Ecole des Mines de Paris, FranceLusheng Wang – The City University of Hong Kong, Hong KongMarc Wilkins – University of New South Wales, AustraliaMichael Wise – University of Western Australia, AustraliaYing Xu – University of Georgia, USAGwan-Su Yi – Information & Communications University, KoreaMohammed J. Zaki – Rensselaer Polytechnic Institute, USA

CO-REVIEWERS

Satya Arjunan Hong-Jie Dai Kevin DeRonneJun-tao Guo Kosuke Hashimoto Rajaraman KanagasabaiChris Kauffman Ian Menz Nini RaoTadahiko Sakiyama Michael Shmoish Michihiro TanakaHaibao Tang Katsuyuki Yugi

Page 9: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

ix

STEERING COMMITTEE

Minoru Kanehisa – Kyoto University, JapanSatoru Miyano – University of Tokyo, JapanMark Ragan – University of Queensland, AustraliaToshihisa Takagi – University of Tokyo, JapanLimsoon Wong – National University of Singapore, Singapore

CONFERENCE CHAIR

Mark Ragan – University of Queensland, Australia

NATIONAL ORGANIZING COMMITTEE

Cathy Abbott – Flinders University, AustraliaJonathan Arthur – University of Sydney, AustraliaTim Bailey – University of Queensland, AustraliaMark Baker – Australian Proteome Analysis Facility, AustraliaJeremy Barker – Queensland Facility for Advanced Bioinformatics, AustraliaMatthew Bellgard – Murdoch University, AustraliaKevin Burrage – University of Queensland, AustraliaPhoebe Chen – Deakin University, AustraliaRoss Coppel – Monash University, AustraliaBrian Dalrymple – CSIRO Livestock Industries, AustraliaSimon Easteal – Australian National University, AustraliaDave Edwards – Australian Centre for Plant Functional Genomics, AustraliaSue Forrest – Australian Genome Research Facility, AustraliaBruno Gaeta – University of New South Wales, AustraliaJenny Graves – Australian National University, AustraliaDavid Hansen – Australian e-Health Research Centre, AustraliaJames Hogan – Queensland University of Technology, AustraliaJonathan Keith – Queensland University of Technology, AustraliaVladimir Likic – University of Melbourne & Bio21, Australia

Page 10: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

x

Tim Littlejohn – IBM Australia, AustraliaJohn Mattick – University of Queensland, AustraliaGeoff McLachlan – University of Queensland, AustraliaAnnette McGrath – Australian Genome Research Facility, AustraliaDavid Mitchell – CSIRO CMIS, AustraliaPablo Moscato – University of Newcastle, AustraliaTuan Pham – James Cook University, AustraliaMichael Poidinger – Johnson & Johnson, AustraliaMark Ragan – University of Queensland, AustraliaShoba Ranganathan – Macquarie University, AustraliaAllen Rodrigo – University of Auckland, New ZealandRohan Teasdale – University of Queensland, AustraliaMervyn Thomas – Emphron Informatics, AustraliaMatthew Wakefield – Walter & Eliza Hall Institute of Medical Research, AustraliaMarc Wilkins – University of New South Wales, AustraliaSue Wilson – Australian National University, AustraliaMichael Wise – University of Western Australia, AustraliaXiaofang Zhou – University of Queensland, AustraliaAlbert Zomaya – University of Sydney, Australia

LOCAL ORGANIZING COMMITTEE

Mark Ragan – University of Queensland, AustraliaTim Bailey – University of Queensland, AustraliaMikael Boden – University of Queensland, AustraliaBrian Dalrymple – CSIRO Livestock Industries, AustraliaDave Edwards – Australian Centre for Plant Functional Genomics, AustraliaJames Hogan – Queensland University of Technology, AustraliaRohan Teasdale – University of Queensland, Australia

Page 11: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

xi

CONTENTS

Preface v

Acknowledgments vi

Committees vii

Part A Full Papers 1

An Approach to Transcriptome Analysis of Non-Model OrganismsUsing Short-Read Sequences 3

L. J. Collins, P. Biggs, C. Voelckel & S. Joly

Factoring Local Sequence Composition in Motif Significance Analysis 15P. Ng & U. Keich

A New Model of Multi-Marker Correlation for Genome-Wide Tag SNPSelection 27

W.-B. Wang & T. Jiang

Phenotype Profiling of Single Gene Deletion Mutants of E. coli UsingBiolog Techonology 42

Y. Tohsato & H. Mori

Improved Algorithms for Enumerating Tree-Like Chemical Graphswith Given Path Frequency 53

Y. Ishida, L. Zhao, H. Nagamochi & T. Akutsu

BSAlign: A Rapid Graph-Based Algorithm for Detecting Ligand-Binding Sites in Protein Structures 65

Z. Aung & J. C. Tong

Page 12: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

xii

Protein Complex Prediction Based on Mutually Exclusive Interac-tions in Protein Interaction Network 77

S. H. Jung, W.-H. Jang, H.-Y. Hur, B. Hyun & D.-S. Han

On the Reconstruction of the Mus musculus Genome-Scale MetabolicNetwork Model 89

L.-E. Quek & L. Nielsen

Predicting Differences in Gene Regulatory Systems by State Space Models 101R. Yamaguchi, S. Imoto, M. Yamauchi, M. Nagasaki, R. Yoshida,

T. Shimamura, Y. Hatanaka, K. Ueno, T. Higuchi, N. Gotoh & S. Miyano

Exploratory Simulation of Cell Ageing Using Hierarchical Models 114M. Cvijovic, H. Soueidan, D. J. Sherman, E. Klipp & M. Nikolski

Inferring Differential Leukocyte Activity from Antibody MicroarraysUsing a Latent Variable Model 126

J. W.K. Ho, R. Koundinya, T. S. Caetano, C. G. dos Remedios & M. A.Charleston

Assessing and Predicting Protein Interactions Using Both Local andGlobal Network Topological Metrics 138

G. Liu, J. Li & L. Wong

Modelling the Evolution of Protein Coding Sequences Sampled fromMeasurably Evolving Populations 150

M. Goode, S. Guindon & A. Rodrigo

A Phylogenomic Approach for Studying Plastid Endosymbiosis 165A. Moustafa, C. X. Chan, M. Danforth, D. Zear, H. Ahmed, N. Jadhav,

T. Savage & D. Bhattacharya

Cis-Regulatory Element Based Gene Finding: An Application inArabidopsis thaliana 177

Y. Li, Y. Zhu, Y. Liu, Y. Shu, F. Meng, Y. Lu, B. Liu, X. Bai & D. Guo

Using Simple Rules on Presence and Positioning of Motifs forPromoter Structure Modeling and Tissue-Specific Expression Prediction 188

A. Vandenbon & K. Nakai

Improving Gene Expression Cancer Molecular Pattern DiscoveryUsing Nonnegative Principal Component Analysis 200

X. Han

Page 13: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

xiii

Simulation Analysis for the Efffect of Light-Dark Cycle on theEntrainment in Circadian Rhythm 212

N. Mitou, Y. Ikegami, H. Matsuno, S. Miyano & S. T. Inouye

Part B Keynote Addresses 225

Sequencing the Transcriptome in toto 227S. Grimmond

Modern Homology Search 229M. Li

Modeling Human Genome-Wide Combinatorial Regulatory NetworksInitiated by Transcription Factors and microRNAs Using Forward andReverse Engineering 230

Y.-X. Li

Reconstructing the Circuits of Disease: From Molecular States toPhysiological States 231

E. E. Schadt

The Emerging Generalizations of Prokaryotic Genomics 232E. V. Koonin

A New Understanding of the Human Genome 233J. Mattick

Author Index 235

Page 14: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

PART A

Full Papers

Page 15: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

Page 16: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

3

AN APPROACH TO TRANSCRIPTOME ANALYSIS OFNON-MODEL ORGANISMS USING SHORT-READ SEQUENCES

Transcriptome analysis using high-throughput short-read sequencing technology is

straightforward when the sequenced genome is the same species or extremely similar

to the reference genome. We present an analysis approach for when the sequenced or-ganism does not have an already sequenced genome that can be used for a reference, as

will be the case of many non-model organisms. As proof of concept, data from Solexa

sequencing of the polyploid plant Pachycladon enysii was analysed using our approachwith its nearest model reference genome being the diploid plant Arabidopsis thaliana. By

using a combination of mapping and de novo assembly tools we could determine dupli-cate genes belonging to one or other of the genome copies. Our approach demonstrates

that transcriptome analysis using high-throughput short-read sequencing need not be

restricted to the genomes of model organisms.

1. Manuscript Information

The camera ready text for this paper can be found in 01 Collins.pdf.The paper contains 12 pages in total.This is Page 1.

Page 17: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

4 Collins, Biggs, Voelckel & Joly

This is Page 2.

Page 18: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

Transcriptome Analysis of Non-model Organisms Using Short-Read Sequences 5

This is Page 3.

Page 19: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

6 Collins, Biggs, Voelckel & Joly

This is Page 4.

Page 20: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

Transcriptome Analysis of Non-model Organisms Using Short-Read Sequences 7

This is Page 5.

Page 21: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

8 Collins, Biggs, Voelckel & Joly

This is Page 6.

Page 22: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

Transcriptome Analysis of Non-model Organisms Using Short-Read Sequences 9

This is Page 7.

Page 23: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

10 Collins, Biggs, Voelckel & Joly

This is Page 8.

Page 24: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

Transcriptome Analysis of Non-model Organisms Using Short-Read Sequences 11

This is Page 9.

Page 25: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

12 Collins, Biggs, Voelckel & Joly

This is Page 10.

Page 26: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

Transcriptome Analysis of Non-model Organisms Using Short-Read Sequences 13

This is Page 11.

Page 27: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

14 Collins, Biggs, Voelckel & Joly

This is Page 12.

Page 28: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

15

Factoring local sequence composition in motif significance analysis

We recently introduced a biologically realistic and reliable significance analysis of the

output of a popular class of motif finders [16]. In this paper we further improve oursignificance analysis by incorporating local base composition information. Relying on

realistic biological data simulation, as well as on FDR analysis applied to real data,we show that our method is significantly better than the increasingly popular practice

of using the normal approximation to estimate the significance of a finder’s output.

Finally we turn to leveraging our reliable significance analysis to improve the actual motiffinding task. Specifically, endowing a variant of the Gibbs Sampler [18] with our improved

significance analysis we demonstrate that de novo finders can perform better than has

been perceived. Significantly, our new variant outperforms all the finders reviewed ina recently published comprehensive analysis [23] of the Harbison genome-wide binding

location data [9]. Interestingly, many of these finders incorporate additional information

such as nucleosome positioning and the significance of binding data.

1. Manuscript Information

The camera ready text for this paper can be found in 06 Ng.pdf.The paper contains 12 pages in total.This is Page 1.

Page 29: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

16 Ng & Keich

This is Page 2.

Page 30: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

Factoring Local Sequence Composition in Motif Significance Analysis 17

This is Page 3.

Page 31: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

18 Ng & Keich

This is Page 4.

Page 32: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

Factoring Local Sequence Composition in Motif Significance Analysis 19

This is Page 5.

Page 33: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

20 Ng & Keich

This is Page 6.

Page 34: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

Factoring Local Sequence Composition in Motif Significance Analysis 21

This is Page 7.

Page 35: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

22 Ng & Keich

This is Page 8.

Page 36: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

Factoring Local Sequence Composition in Motif Significance Analysis 23

This is Page 9.

Page 37: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

24 Ng & Keich

This is Page 10.

Page 38: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

Factoring Local Sequence Composition in Motif Significance Analysis 25

This is Page 11.

Page 39: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

26 Ng & Keich

This is Page 12.

Page 40: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

27

A New Model of Multi-Marker Correlation for Genome-Wide Tag SNPSelection

Tag SNP selection is an important problem in computational biology and genetics be-

cause a small set of tag SNP markers may help reduce the cost of genotyping and

thus genome-wide association studies. Several methods for selecting a smallest possibleset of tag SNPs based on dierent formulations of tag SNP selection (block-based or

genome-wide) and mathematical models of marker correlation have been investigated in

the literature. In this paper, we propose a new model of multi-marker correlation forgenome-wide tag SNP selection, and a simple greedy algorithm to select a smallest pos-

sible set of tag SNPs according to the model. Our experimental results on several realdatasets from the HapMap project demonstrate that the new model yields more succinct

tag SNP sets than the previous methods.

1. Manuscript Information

The camera ready text for this paper can be found in 19 Wang.pdf.The paper contains 15 pages in total.This is Page 1.

Page 41: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

28 Wang & Jiang

This is Page 2.

Page 42: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

Multi-Marker Correlation for Genome-Wide Tag SNP Selection 29

This is Page 3.

Page 43: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

30 Wang & Jiang

This is Page 4.

Page 44: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

Multi-Marker Correlation for Genome-Wide Tag SNP Selection 31

This is Page 5.

Page 45: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

32 Wang & Jiang

This is Page 6.

Page 46: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

Multi-Marker Correlation for Genome-Wide Tag SNP Selection 33

This is Page 7.

Page 47: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

34 Wang & Jiang

This is Page 8.

Page 48: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

Multi-Marker Correlation for Genome-Wide Tag SNP Selection 35

This is Page 9.

Page 49: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

36 Wang & Jiang

This is Page 10.

Page 50: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

Multi-Marker Correlation for Genome-Wide Tag SNP Selection 37

This is Page 11.

Page 51: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

38 Wang & Jiang

This is Page 12.

Page 52: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

Multi-Marker Correlation for Genome-Wide Tag SNP Selection 39

This is Page 13.

Page 53: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

40 Wang & Jiang

This is Page 14.

Page 54: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

Multi-Marker Correlation for Genome-Wide Tag SNP Selection 41

This is Page 15.

Page 55: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

42

PHENOTYPE PROFILING OF SINGLE GENE DELETIONMUTANTS OF E. COLI USING BIOLOG TECHONOLOGY

Phenotype MicroArray (PM) technology is high-throughput phenotyping system [1] and

is directly applicable to assay the effects of genetic changes in cells. In this study, we

performed comprehensive PM analysis using single gene deletion mutants of centralmetabolic pathway and related genes. To elucidate the structure of central metabolic

networks in Escherichia coli K-12, we focused 288 different PM conditions of carbon

and nitrogen sources and performed bioinformatic analysis. For data processing, we em-ployed noise reduction procedures. The distance between each of the mutants was defined

by Manhattan distance and agglomerative Ward’s hierarchical method was applied forclustering analysis. As a result, five clusters were revealed which represented to activate

or repress cellular respiratory activities. Furthermore, the results might suggest that

Glyceraldehyde-3P plays a key role as a molecular switch of central metabolic network.

1. Manuscript Information

The camera ready text for this paper can be found in 36 Tohsato.pdf.The paper contains 11 pages in total.This is Page 1.

Page 56: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

Phenotype Profiling of Single Gene Deletion Mutants 43

This is Page 2.

Page 57: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

44 Tohsato & Mori

This is Page 3.

Page 58: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

Phenotype Profiling of Single Gene Deletion Mutants 45

This is Page 4.

Page 59: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

46 Tohsato & Mori

This is Page 5.

Page 60: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

Phenotype Profiling of Single Gene Deletion Mutants 47

This is Page 6.

Page 61: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

48 Tohsato & Mori

This is Page 7.

Page 62: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

Phenotype Profiling of Single Gene Deletion Mutants 49

This is Page 8.

Page 63: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

50 Tohsato & Mori

This is Page 9.

Page 64: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

Phenotype Profiling of Single Gene Deletion Mutants 51

This is Page 10.

Page 65: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

52 Tohsato & Mori

This is Page 11.

Page 66: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

53

IMPROVED ALGORITHMS FOR ENUMERATING TREE-LIKECHEMICAL GRAPHS WITH GIVEN PATH FREQUENCY

This paper considers the problem of enumerating all non-isomorphic tree-like chemi-

cal graphs with given path frequency, where “tree-like” means that the graph can be

viewed as a tree if multiple edges (i.e., edges with the same end points) and a benzenering are treated as one edge and one vertex, respectively, and “path frequency” is a

vector of the numbers of specified vertex-labeled paths that must be realized in every

output. This and related problems have several potential applications such as classi-fication of chemical compounds, structure determination using mass-spectrum and/or

NMR and design of novel chemical compounds. For this problem, several studies havebeen done. Recently, Fujiwara et al. (2008) showed two formulations and for each of

them, they gave a branch-and-bound algorithm, which combined efficient enumeration

of non-isomorphic trees with bounding operations based on the path frequency and theatom-atom bonds to avoid the generation of invalid trees. In this paper, based on their

work and a result of Nagamochi (2006), we introduce two new bounding operations,

the detachment-cut and the Hcut, to further reduce the size of the search space. Weperformed computational experiments to compare our proposed algorithms with those

of Fujiwara et al. (2008) using some chemical compound data obtained from the KEGG

LIGAND database (http://www.genome.jp/kegg/ligand.html). The results show thatour proposed algorithms are much faster than their algorithms.

1. Manuscript Information

The camera ready text for this paper can be found in 10 Ishida.pdf.The paper contains 12 pages in total.This is Page 1.

Page 67: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

54 Ishida, Zhao, Nagamochi & Akutsu

This is Page 2.

Page 68: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

Enumerating Tree-Like Chemical Graphs 55

This is Page 3.

Page 69: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

56 Ishida, Zhao, Nagamochi & Akutsu

This is Page 4.

Page 70: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

Enumerating Tree-Like Chemical Graphs 57

This is Page 5.

Page 71: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

58 Ishida, Zhao, Nagamochi & Akutsu

This is Page 6.

Page 72: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

Enumerating Tree-Like Chemical Graphs 59

This is Page 7.

Page 73: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

60 Ishida, Zhao, Nagamochi & Akutsu

This is Page 8.

Page 74: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

Enumerating Tree-Like Chemical Graphs 61

This is Page 9.

Page 75: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

62 Ishida, Zhao, Nagamochi & Akutsu

This is Page 10.

Page 76: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

Enumerating Tree-Like Chemical Graphs 63

This is Page 11.

Page 77: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

64 Ishida, Zhao, Nagamochi & Akutsu

This is Page 12.

Page 78: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

65

BSAlign: A Rapid Graph-Based Algorithm for DetectingLigand-Binding Sites in Protein Structures

Detection of ligand-binding sites in protein structures is a crucial task in structural

bioinformatics, and has applications in important areas like drug discovery. Given the

knowledge of the site in a particular protein structure that binds to a specic ligand,we can search for similar sites in the other protein structures that the same ligand is

likely to bind. In this paper, we propose a new method named “BSAlign” (Binding Site

Aligner) for rapid detection of potential binding site(s) in the target protein(s) thatis/are similar to the query protein’s ligand-binding site. We represent both the binding

site and the protein structure as graphs, and employ a subgraph isomorphism algorithmto detect the similarities of the binding sites in a very time-ecient manner. Preliminary

experimental results show that the proposed BSAlign binding site detection method is

about 14 times faster than a well-known method called SiteEngine, while oering the samelevel of accuracy. Both BSAlign and SiteEngine achieve 60% search accuracy in nding

adenine-binding sites from a data set of 126 proteins. The proposed method can be a

useful contribution towards speed-critical applications such as drug discovery in whicha large number of proteins are needed to be processed. The program is available for

download at: http://www1.i2r.a-star.edu.sg/ azeyar/BSAlign/.

1. Manuscript Information

The camera ready text for this paper can be found in 50 Aung.pdf.The paper contains 12 pages in total.This is Page 1.

Page 79: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

66 Aung & Tong

This is Page 2.

Page 80: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

Detecting Ligand-Binding Sites in Protein Structures 67

This is Page 3.

Page 81: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

68 Aung & Tong

This is Page 4.

Page 82: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

Detecting Ligand-Binding Sites in Protein Structures 69

This is Page 5.

Page 83: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

70 Aung & Tong

This is Page 6.

Page 84: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

Detecting Ligand-Binding Sites in Protein Structures 71

This is Page 7.

Page 85: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

72 Aung & Tong

This is Page 8.

Page 86: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

Detecting Ligand-Binding Sites in Protein Structures 73

This is Page 9.

Page 87: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

74 Aung & Tong

This is Page 10.

Page 88: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

Detecting Ligand-Binding Sites in Protein Structures 75

This is Page 11.

Page 89: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

76 Aung & Tong

This is Page 12.

Page 90: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

77

PROTEIN COMPLEX PREDICTION BASED ON MUTUALLYEXCLUSIVE INTERACTIONS IN PROTEIN INTERACTION

NETWORK

The increasing amount of available Protein-Protein Interaction (PPI) data enables scal-able methods for the protein complex prediction. A protein complex is a group of two

or more proteins formed by interactions that are stable over time, and it generally cor-

responds to a dense sub-graph in PPI Network (PPIN). However, dense sub-graphscorrespond not only to stable protein complexes but also to sets of proteins including

dynamic interactions. As a result, conventional simple PPIN based graph-theoretic clus-

tering methods have high false positive rates in protein complex prediction. In this paper,we propose an approach to predict protein complexes based on the integration of PPI

data and mutually exclusive interaction information drawn from structural interface data

of protein domains. The extraction of Simultaneous Protein Interaction Cluster (SPIC)is the essence of our approach, which excludes interaction conflicts in network clusters

by achieving mutually exclusion among interactions. The concept of SPIC was appliedto conventional graph-theoretic clustering algorithms, MCODE and LCMA, to evalu-

ate the density of clusters for protein complex prediction. The comparison with original

graph-theoretic clustering algorithms verified the effectiveness of our approach; SPICbased methods refined false positives of original methods to be true positive complexes,

without any loss of true positive predictions yielded by original methods.

1. Manuscript Information

The camera ready text for this paper can be found in 23 Jung.pdf.The paper contains 12 pages in total.This is Page 1.

Page 91: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

78 Jung et al.

This is Page 2.

Page 92: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

Protein Complex Prediction Based on Mutually Exclusive Interactions 79

This is Page 3.

Page 93: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

80 Jung et al.

This is Page 4.

Page 94: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

Protein Complex Prediction Based on Mutually Exclusive Interactions 81

This is Page 5.

Page 95: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

82 Jung et al.

This is Page 6.

Page 96: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

Protein Complex Prediction Based on Mutually Exclusive Interactions 83

This is Page 7.

Page 97: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

84 Jung et al.

This is Page 8.

Page 98: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

Protein Complex Prediction Based on Mutually Exclusive Interactions 85

This is Page 9.

Page 99: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

86 Jung et al.

This is Page 10.

Page 100: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

Protein Complex Prediction Based on Mutually Exclusive Interactions 87

This is Page 11.

Page 101: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

88 Jung et al.

This is Page 12.

Page 102: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

89

ON THE RECONSTRUCTION OF THE MUS MUSCULUSGENOME-SCALE METABOLIC NETWORK MODEL

Genome-scale metabolic modeling is a systems-based approach that attempts to cap-

ture the metabolic complexity of the whole cell, for the purpose of gaining insight into

metabolic function and regulation. This is achieved by organizing the metabolic com-ponents and their corresponding interactions into a single context. The reconstruction

process is a challenging and laborious task, especially during the stage of manual cu-

ration. For the mouse genome-scale metabolic model, however, we were able to rapidlyreconstruct a compartmentalized model from well-curated metabolic databases online.

The prototype model was comprehensive. Apart from minor compound naming and com-partmentalization issues, only nine additional reactions without gene associations were

added during model curation before the model was able to simulate growth in silico.

Further curation led to a metabolic model that consists of 1399 genes mapped to 1757reactions, with a total of 2037 reactions compartmentalized into the cytoplasm and mi-

tochondria, capable of reproducing metabolic functions inferred from literatures. The

reconstruction is made more tractable by developing a formal system to update themodel against online databases. Effectively, we can focus our curation efforts into estab-

lishing better model annotations and gene-protein-reaction associations within the core

metabolism, while relying on genome and proteome databases to build new annotationsfor peripheral pathways, which may bear less relevance to our modeling interest.

1. Manuscript Information

The camera ready text for this paper can be found in 32 Quek.pdf.The paper contains 12 pages in total.This is Page 1.

Page 103: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

90 Quek & Nielsen

This is Page 2.

Page 104: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

Mus musculus Genome-Scale Metabolic Network Model 91

This is Page 3.

Page 105: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

92 Quek & Nielsen

This is Page 4.

Page 106: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

Mus musculus Genome-Scale Metabolic Network Model 93

This is Page 5.

Page 107: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

94 Quek & Nielsen

This is Page 6.

Page 108: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

Mus musculus Genome-Scale Metabolic Network Model 95

This is Page 7.

Page 109: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

96 Quek & Nielsen

This is Page 8.

Page 110: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

Mus musculus Genome-Scale Metabolic Network Model 97

This is Page 9.

Page 111: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

98 Quek & Nielsen

This is Page 10.

Page 112: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

Mus musculus Genome-Scale Metabolic Network Model 99

This is Page 11.

Page 113: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

100 Quek & Nielsen

This is Page 12.

Page 114: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

101

Predicting Differences in Gene Regulatory Systems by State SpaceModels

We propose a statistical strategy to predict differentially regulated genes of case and

control samples from time-course gene expression data by leveraging unpredictability

of the expression patterns from the underlying regulatory system inferred by a statespace model. The proposed method can screen out genes that show different patterns but

generated by the same regulations in both samples, since these patterns can be predicted

by the same model. Our strategy consists of three steps. Firstly, a gene regulatory systemis inferred from the control data by a state space model. Then the obtained model for

the underlying regulatory system of the control sample is used to predict the case data.Finally, by assessing the significance of the difference between case and predicted-case

time-course data of each gene, we are able to detect the unpredictable genes that are the

candidate as the key differences between the regulatory systems of case and control cells.We illustrate the whole process of the strategy by an actual example, where human small

airway epithelial cell gene regulatory systems were generated from novel time courses of

gene expressions following treatment with(case)/without(control) the drug gefitinib, aninhibitor for the epidermal growth factor receptor tyrosine kinase. Finally, in gefitinib

response data we succeeded in finding unpredictable genes that are candidates of the

specific targets of gefitinib. We also discussed differences in regulatory systems for theunpredictable genes. The proposed method would be a promising tool for identifying

biomarkers and drug target genes.

1. Manuscript Information

The camera ready text for this paper can be found in 48 Yamaguchi.pdf.The paper contains 13 pages in total.This is Page 1.

Page 115: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

102 Yamaguchi et al.

This is Page 2.

Page 116: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

Predicting Differences in Gene Regulatory Systems 103

This is Page 3.

Page 117: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

104 Yamaguchi et al.

This is Page 4.

Page 118: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

Predicting Differences in Gene Regulatory Systems 105

This is Page 5.

Page 119: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

106 Yamaguchi et al.

This is Page 6.

Page 120: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

Predicting Differences in Gene Regulatory Systems 107

This is Page 7.

Page 121: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

108 Yamaguchi et al.

This is Page 8.

Page 122: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

Predicting Differences in Gene Regulatory Systems 109

This is Page 9.

Page 123: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

110 Yamaguchi et al.

This is Page 10.

Page 124: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

Predicting Differences in Gene Regulatory Systems 111

This is Page 11.

Page 125: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

112 Yamaguchi et al.

This is Page 12.

Page 126: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

Predicting Differences in Gene Regulatory Systems 113

This is Page 13.

Page 127: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

114

Exploratory simulation of cell ageing using hierarchical models

Thorough knowledge of the model organism S. cerevisiae has fueled efforts in developing

theories of cell ageing since the 1950s. Models of these theories aim to provide insight intothe general biological processes of ageing, as well as to have predictive power for guiding

experimental studies such as cell rejuvenation. Current fforts in in silico modeling arefrustrated by the lack of efficient simulation tools that admit precise mathematical models

at both cell and population levels simultaneously. We developed a novel hierarchical

simulation tool that allows dynamic creation of entities while rigorously preserving themathematical semantics of the model. We used it to expand a single-cell model of protein

damage segregation to a cell population model that explicitly tracks mother-daughter

relations. Large-scale exploration of the resulting tree of simulations established thatdaughters of older mothers show a rejuvenation effect, consistent with experimental

results. The combination of a single-cell model and a simulation platform permitting

parallel composition and dynamic node creation has proved to be an efficient tool for insilico exploration of cell behavior.

1. Manuscript Information

The camera ready text for this paper can be found in 31 Cvijovic.pdf.The paper contains 12 pages in total.This is Page 1.

Page 128: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

Exploratory Simulation of Cell Ageing 115

This is Page 2.

Page 129: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

116 Cvijovic et al.

This is Page 3.

Page 130: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

Exploratory Simulation of Cell Ageing 117

This is Page 4.

Page 131: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

118 Cvijovic et al.

This is Page 5.

Page 132: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

Exploratory Simulation of Cell Ageing 119

This is Page 6.

Page 133: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

120 Cvijovic et al.

This is Page 7.

Page 134: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

Exploratory Simulation of Cell Ageing 121

This is Page 8.

Page 135: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

122 Cvijovic et al.

This is Page 9.

Page 136: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

Exploratory Simulation of Cell Ageing 123

This is Page 10.

Page 137: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

124 Cvijovic et al.

This is Page 11.

Page 138: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

Exploratory Simulation of Cell Ageing 125

This is Page 12.

Page 139: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

126

INFERRING DIFFERENTIAL LEUKOCYTE ACTIVITY FROMANTIBODY MICROARRAYS USING A LATENT VARIABLE

MODEL

Recent development of cluster of differentiation (CD) antibody arrays has enabled ex-pression levels of many leukocyte surface CD antigens to be monitored simultaneously.

Such membrane-proteome surveys have provided a powerful means to detect changes

in leukocyte activity in various human diseases, such as cancer and cardiovascular dis-eases. The challenge is to devise a computational method to infer differential leukocyte

activity among multiple biological states based on antigen expression profiles. Standard

DNA microarray analysis methods cannot accurately infer differential leukocyte activitybecause they often fail to take the cell-to-antigen relationships into account. Here we

present a novel latent variable model (LVM) approach to tackle this problem. The idea

is to model each cell type as a latent variable, and represent the class-to-cell and cell-to-antigen relationships as a LVM. Once the parameters of the LVM are learned from

the data, differentially active leukocytes can be easily identified from the model. Wedescribe the model formulation and assumptions which lead to an efficient expectation-

maximization algorithm. Our LVM method was applied to re-analyze two cardiovascular

disease datasets. We show that our results match existing biological knowledge betterthan other methods such as gene set enrichment analysis. Furthermore, we discuss how

our approach can be extended to become a general framework for gene set analysis for

DNA microarrays.

1. Manuscript Information

The camera ready text for this paper can be found in 18 Ho.pdf.The paper contains 12 pages in total.This is Page 1.

Page 140: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

Inferring Differential Leukocyte Activity from Antibody Microarrays 127

This is Page 2.

Page 141: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

128 Ho et al.

This is Page 3.

Page 142: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

Inferring Differential Leukocyte Activity from Antibody Microarrays 129

This is Page 4.

Page 143: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

130 Ho et al.

This is Page 5.

Page 144: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

Inferring Differential Leukocyte Activity from Antibody Microarrays 131

This is Page 6.

Page 145: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

132 Ho et al.

This is Page 7.

Page 146: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

Inferring Differential Leukocyte Activity from Antibody Microarrays 133

This is Page 8.

Page 147: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

134 Ho et al.

This is Page 9.

Page 148: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

Inferring Differential Leukocyte Activity from Antibody Microarrays 135

This is Page 10.

Page 149: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

136 Ho et al.

This is Page 11.

Page 150: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

Inferring Differential Leukocyte Activity from Antibody Microarrays 137

This is Page 12.

Page 151: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

138

Assessing and Predicting Protein Interactions Using Both Local andGlobal Network Topological Metrics

High-throughput protein interaction data, with ever-increasing volume, are becoming

the foundation of many biological discoveries. However, high-throughput protein inter-

action data are often associated with high false positive and false negative rates. It isdesirable to develop scalable methods to identify these errors. In this paper, we develop

a computational method to identify spurious interactions and missing interactions from

high-throughput protein interaction data. Our method uses both local and global topo-logical information of protein pairs, and it assigns a local interacting score and a global

interacting score to every protein pair. The local interacting score is calculated basedon the common neighbors of the protein pairs. The global interacting score is computed

using globally interacting protein group pairs. The two scores are then combined to ob-

tain a final score called LGTweight to indicate the interacting possibility of two proteins.We tested our method on the DIP yeast interaction dataset. The experimental results

show that the interactions ranked top by our method have higher functional homogeneity

and localization coherence than existing methods, and our method also achieves highersensitivity and precision under 5-fold cross validation than existing methods.

1. Manuscript Information

The camera ready text for this paper can be found in 49 Liu.pdf.The paper contains 12 pages in total.This is Page 1.

Page 152: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

Assessing and Predicting Protein Interactions 139

This is Page 2.

Page 153: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

140 Liu, Li & Wong

This is Page 3.

Page 154: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

Assessing and Predicting Protein Interactions 141

This is Page 4.

Page 155: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

142 Liu, Li & Wong

This is Page 5.

Page 156: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

Assessing and Predicting Protein Interactions 143

This is Page 6.

Page 157: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

144 Liu, Li & Wong

This is Page 7.

Page 158: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

Assessing and Predicting Protein Interactions 145

This is Page 8.

Page 159: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

146 Liu, Li & Wong

This is Page 9.

Page 160: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

Assessing and Predicting Protein Interactions 147

This is Page 10.

Page 161: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

148 Liu, Li & Wong

This is Page 11.

Page 162: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

Assessing and Predicting Protein Interactions 149

This is Page 12.

Page 163: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

150

Modelling the evolution of protein coding sequences sampled fromMeasurably Evolving Populations

Models of nucleotide or amino acid sequence evolution that implement homogeneous

and stationary Markov processes of substitutions are mathematically convenient but are

unlikely to represent the true complexity of evolution. With the large amounts of datathat next generation sequencing promises, appropriate models of evolution are impor-

tant, particularly when data are collected from ancient and sub-fossil remains, where

changes in evolutionary parameters are the norm and not the exception. In this paper,we describe a new codon-based model of evolution that applies to Measurably Evolving

Populations (MEPs). A MEP is defined as a population from which it is possible to de-tect a statistically significant accumulation of substitutions when sequences are obtained

at different times. The new model of codon evolution permits changes to the substitu-

tion process, including changes to the intensity of selection and the proportions of sitesundergoing different selective pressures. In our serial model of codon evolution, changes

in the selective regime occur simultaneously across all lineages. Different regions of the

protein may also evolve under distinct selective patterns. We illustrate the applicationof the new model to a dataset of HIV-1 sequences obtained from an infected individual

before and after the commencement of antiretroviral therapy.

1. Manuscript Information

The camera ready text for this paper can be found in 16 Goode.pdf.The paper contains 15 pages in total.This is Page 1.

Page 164: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

Modelling the Evolution of Protein Coding Sequences 151

This is Page 2.

Page 165: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

152 Goode, Guindon & Rodrigo

This is Page 3.

Page 166: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

Modelling the Evolution of Protein Coding Sequences 153

This is Page 4.

Page 167: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

154 Goode, Guindon & Rodrigo

This is Page 5.

Page 168: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

Modelling the Evolution of Protein Coding Sequences 155

This is Page 6.

Page 169: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

156 Goode, Guindon & Rodrigo

This is Page 7.

Page 170: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

Modelling the Evolution of Protein Coding Sequences 157

This is Page 8.

Page 171: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

158 Goode, Guindon & Rodrigo

This is Page 9.

Page 172: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

Modelling the Evolution of Protein Coding Sequences 159

This is Page 10.

Page 173: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

160 Goode, Guindon & Rodrigo

This is Page 11.

Page 174: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

Modelling the Evolution of Protein Coding Sequences 161

This is Page 12.

Page 175: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

162 Goode, Guindon & Rodrigo

This is Page 13.

Page 176: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

Modelling the Evolution of Protein Coding Sequences 163

This is Page 14.

Page 177: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

164 Goode, Guindon & Rodrigo

This is Page 15.

Page 178: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

165

A PHYLOGENOMIC APPROACH FOR STUDYING PLASTIDENDOSYMBIOSIS

Gene transfer is a major contributing factor to functional innovation in genomes. En-

dosymbiotic gene transfer (EGT) is a specific instance of lateral gene transfer (LGT) in

which genetic materials are acquired by the host genome from an endosymbiont that hasbeen engulfed and retained in the cytoplasm. Here we present a comprehensive approach

for detecting gene transfer within a phylogenetic framework. We applied the approach

to examine EGT of red algal genes into Thalassiosira pseudonana, a free-living diatomfor which a complete genome sequence has recently been determined. Out of 11,390 pre-

dicted protein-coding sequences from the genome of T. pseudonana, 124 (1.1%, clusteredinto 80 gene families) are inferred to be of red algal origin (bootstrap support 75%). Of

these 80 gene families, 22 (27.5%) encode novel, unknown functions. We found 21.3%

of the gene families to putatively encode non-plastid-targeted proteins. Our results sug-gest that EGT of red algal genes provides a relatively minor contribution to the nuclear

genome of the diatom, but the transferred genes have functions that extend beyond pho-

tosynthesis. This assertion awaits experimental validation. Whereas the current study isfocused within the context of secondary endosymbiosis, our approach can be applied to

large-scale detection of gene transfer in any system.

1. Manuscript Information

The camera ready text for this paper can be found in 15 Moustafa.pdf.The paper contains 12 pages in total.This is Page 1.

Page 179: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

166 Moustafa et al.

This is Page 2.

Page 180: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

Phylogenomic Approach for Studying Plastid Endosymbiosis 167

This is Page 3.

Page 181: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

168 Moustafa et al.

This is Page 4.

Page 182: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

Phylogenomic Approach for Studying Plastid Endosymbiosis 169

This is Page 5.

Page 183: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

170 Moustafa et al.

This is Page 6.

Page 184: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

Phylogenomic Approach for Studying Plastid Endosymbiosis 171

This is Page 7.

Page 185: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

172 Moustafa et al.

This is Page 8.

Page 186: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

Phylogenomic Approach for Studying Plastid Endosymbiosis 173

This is Page 9.

Page 187: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

174 Moustafa et al.

This is Page 10.

Page 188: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

Phylogenomic Approach for Studying Plastid Endosymbiosis 175

This is Page 11.

Page 189: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

176 Moustafa et al.

This is Page 12.

Page 190: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

177

CIS-REGULATORY ELEMENT BASED GENE FINDING: ANAPPLICATION IN ARABIDOPSIS THALIANA

Gene expression is largely controlled at the transcriptional level through the interactions

between transcription factors and cis-regulatory elements. Using cis-regulatory motifs

known to regulate plant osmotic stress response, an artificial neural network model wasbuilt to identify other functionally releted genes involved in the same process. Gene

Ontology enrichment analysis on the 500 top-scoring predictions showed that, except

for those un-annotated ORFs ( 40%), 60% of the enriched GO classification was relatedto stress response and ABA response. RT-PCR analysis showed that 27 of the tested

41 top-scoring predictions exhibited altered expression under various stress treatments.We expect that similar approach is widely applicable to infer gene function in various

cellular processes in different species.

1. Manuscript Information

The camera ready text for this paper can be found in 05 Li.pdf.The paper contains 11 pages in total.This is Page 1.

Page 191: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

178 Li et al.

This is Page 2.

Page 192: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

Cis-Regulatory Element Based Gene Finding 179

This is Page 3.

Page 193: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

180 Li et al.

This is Page 4.

Page 194: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

Cis-Regulatory Element Based Gene Finding 181

This is Page 5.

Page 195: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

182 Li et al.

This is Page 6.

Page 196: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

Cis-Regulatory Element Based Gene Finding 183

This is Page 7.

Page 197: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

184 Li et al.

This is Page 8.

Page 198: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

Cis-Regulatory Element Based Gene Finding 185

This is Page 9.

Page 199: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

186 Li et al.

This is Page 10.

Page 200: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

Cis-Regulatory Element Based Gene Finding 187

This is Page 11.

Page 201: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

188

USING SIMPLE RULES ON PRESENCE AND POSITIONING OFMOTIFS FOR PROMOTER STRUCTURE MODELING AND

TISSUE-SPECIFIC EXPRESSION PREDICTION

Regulation of transcription is controlled by sets of transcription factors binding specificsites in the regulatory regions of genes. It is therefore believed that regulatory regions

driving similar expression profiles share some common structural features. We here in-

troduce a computational approach for finding a small set of rules describing the presenceand positioning of motifs in a set of promoter sequences. This rule set is subsequently

used for finding promoters that drive similar expression profiles from a genomic set of se-

quences. We applied our approach on muscle-expressed genes in Caenorhabditis elegans.We obtained a high average performance, and in the best case we found that almost 50%

of true positive test genes scored higher than 90% of the true negative test genes. High

scoring non-training sequences were enriched for muscle-expressed genes, and predictedmotifs fitting the rules showed a significant tendency to be present in experimentally ver-

ified regulatory regions. Our model is more general than existing cis-regulatory modulemodels, as rules selected by our model contain a variety of information, including not

only proximal but also distal positioning of pairs of motifs, positioning with regard to

the translation start site, and simply presences of motifs. We believe our model can helpto increase our understanding about transcription factor cooperation and transcription

initiation.

1. Manuscript Information

The camera ready text for this paper can be found in 29 Vandenbon.pdf.The paper contains 12 pages in total.This is Page 1.

Page 202: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

Promoter Structure Modeling and Tissue-Specific Expression Prediction 189

This is Page 2.

Page 203: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

190 Vandenbon & Nakai

This is Page 3.

Page 204: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

Promoter Structure Modeling and Tissue-Specific Expression Prediction 191

This is Page 4.

Page 205: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

192 Vandenbon & Nakai

This is Page 5.

Page 206: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

Promoter Structure Modeling and Tissue-Specific Expression Prediction 193

This is Page 6.

Page 207: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

194 Vandenbon & Nakai

This is Page 7.

Page 208: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

Promoter Structure Modeling and Tissue-Specific Expression Prediction 195

This is Page 8.

Page 209: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

196 Vandenbon & Nakai

This is Page 9.

Page 210: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

Promoter Structure Modeling and Tissue-Specific Expression Prediction 197

This is Page 10.

Page 211: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

198 Vandenbon & Nakai

This is Page 11.

Page 212: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

Promoter Structure Modeling and Tissue-Specific Expression Prediction 199

This is Page 12.

Page 213: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

200

IMPROVING GENE EXPRESSION CANCER MOLECULARPATTERN DISCOVERY USING NONNEGATIVE PRINCIPAL

COMPONENT ANALYSIS

Robust cancer molecular pattern identification from microarray data not only plays anessential role in modern clinic oncology, but also presents a challenge for statistical

learning. Although principal component analysis (PCA) is a widely used feature selec-

tion algorithm in microarray analysis, its holistic mechanism prevents it from capturingthe latent local data structure in the following cancer molecular pattern identification.

In this study, we investigate the benefit of enforcing non-negativity constraints on princi-

pal component analysis (PCA) and propose a nonnegative principal component (NPCA)based classification algorithm in cancer molecular pattern analysis for gene expression

data. This novel algorithm conducts classification by classifying meta-samples of input

cancer data by support vector machines (SVM) or other classic supervised learning al-gorithms. The meta-samples are low-dimensional projections of original cancer samples

in a purely additive meta-gene subspace generated from the NPCA-induced nonnega-tive matrix factorization (NMF). We report strongly leading classification results from

NPCA-SVM algorithm in the cancer molecular pattern identification for five bench-

mark gene expression datasets under 100 trials of 50% hold-out cross validations andleave one out cross validations. We demonstrate superiority of NPCA-SVM algorithm by

direct comparison with seven classification algorithms: SVM, PCA-SVM, KPCASVM,

NMF-SVM, LLE-SVM, PCA-LDA and k-NN, for the five cancer datasets in classifi-cation rates, sensitivities and specificities. Our NPCA-SVM algorithm overcomes the

over-fitting problem associative with SVM-based classifications for gene expression data

under a Gaussian kernel. As a more robust high-performance classifier, NPCA-SVM canbe used to replace the general SVM and k-NN classifiers in cancer biomarker discovery

to capture more meaningful oncogenes.

1. Manuscript Information

The camera ready text for this paper can be found in 17 Han.pdf.The paper contains 12 pages in total.This is Page 1.

Page 214: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

Improving Gene Expression Cancer Molecular Pattern Discovery 201

This is Page 2.

Page 215: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

202 Han

This is Page 3.

Page 216: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

Improving Gene Expression Cancer Molecular Pattern Discovery 203

This is Page 4.

Page 217: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

204 Han

This is Page 5.

Page 218: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

Improving Gene Expression Cancer Molecular Pattern Discovery 205

This is Page 6.

Page 219: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

206 Han

This is Page 7.

Page 220: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

Improving Gene Expression Cancer Molecular Pattern Discovery 207

This is Page 8.

Page 221: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

208 Han

This is Page 9.

Page 222: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

Improving Gene Expression Cancer Molecular Pattern Discovery 209

This is Page 10.

Page 223: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

210 Han

This is Page 11.

Page 224: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

Improving Gene Expression Cancer Molecular Pattern Discovery 211

This is Page 12.

Page 225: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

212

Simulation Analysis for the Efffect of Light-Dark Cycle on theEntrainment in Circadian Rhythm

Circadian rhythms of the living organisms are 24hr oscillations found in behavior, bio-

chemistry and physiology. Under constant conditions, the rhythms continue with their

intrinsic period length, which are rarely exact 24hr. In this paper, we examine the ef-fects of light on the phase of the gene expression rhythms derived from the interacting

feedback network of a few clock genes, taking advantage of a computer simulation with

Cell Illustrator. The simulation results suggested that the interacting circadian feedbacknetwork at the molecular level is essential for phase dependence of the light effects, ob-

served in mammalian behavior. Furthermore, the simulation reproduced the biologicalobservations that the range of entrainment to shorter or longer than 24hr light-dark cy-

cles is limited, centering around 24hr. Application of our model to inter-time zone flight

successfully demonstrated that 6 to 7 days are required to recover from jet lag whentraveling from Tokyo to New York.

1. Manuscript Information

The camera ready text for this paper can be found in 07 Mitou.pdf.The paper contains 12 pages in total.This is Page 1.

Page 226: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

Simulation Analysis for the Efffect of Light-Dark Cycle 213

This is Page 2.

Page 227: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

214 Mitou et al.

This is Page 3.

Page 228: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

Simulation Analysis for the Efffect of Light-Dark Cycle 215

This is Page 4.

Page 229: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

216 Mitou et al.

This is Page 5.

Page 230: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

Simulation Analysis for the Efffect of Light-Dark Cycle 217

This is Page 6.

Page 231: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

218 Mitou et al.

This is Page 7.

Page 232: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

Simulation Analysis for the Efffect of Light-Dark Cycle 219

This is Page 8.

Page 233: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

220 Mitou et al.

This is Page 9.

Page 234: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

Simulation Analysis for the Efffect of Light-Dark Cycle 221

This is Page 10.

Page 235: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

222 Mitou et al.

This is Page 11.

Page 236: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

Simulation Analysis for the Efffect of Light-Dark Cycle 223

This is Page 12.

Page 237: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

224 Mitou et al.

Page 238: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

PART B

Keynote Addresses

Page 239: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

Page 240: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

227

SEQUENCING THE TRANSCRIPTOME IN TOTO

SEAN M. GRIMMOND

[email protected]

Expression Genomics LaboratoryInstitute for Molecular Bioscience

University of Queensland, AUSTRALIA

Abstract

Since the sequencing of the mouse and human genomes, there has been a concertedeffort to define their complete transcriptional output. EST, full length cDNA se-quencing, and transcriptome annotation efforts by FANTOM, ENCODE and otherconsortia surveyed mammalian expression space, revealing that loci on average gen-erate 6-10 transcripts. Alternative promoters, splicing and 3’UTRs are common-place.

While these data have provided an excellent atlas of what can be generated frommammalian genomes, we have not had, until recently, the right genomic tools toplace this transcriptional complexity into a biological context. Array based profilinghas been an excellent tool for assessing overall gene activity, but lacks the sensitivityand resolution required to study complete transcriptome content

RNA sequencing (RNAseq) has recently been demonstrated in several eukaryoticspecies and is redefining our understanding of mRNA transcriptome content andmRNA dynamics, all at a single nucleotide resolution. We have developed methodsfor performing multi-gigabase shotgun sequencing of human and mouse transcrip-tomes and have developed approaches to assess locus activity and demonstrated itsimproved sensitivity relative to the current “gold standard” array platforms. Wealso use RNAseq to assess the expression levels of variant transcripts via diagnosticsequences. Thirdly, we are able to perform genome-wide transcriptome discovery.Finally we have also established approaches to identify alternations to the referencesequence content, allowing us to search for expressed polymorphisms, mutations orevents such as RNA editing.

These data are combined with RNAseq surveys of other fractions of the tran-scriptome (i.e. small RNA and polysome-associated RNAs) to gain a fuller picture ofcoding and functional RNA content. This is being used to define, at unprecedentedresolution, the transcriptional networks driving specific biological states.

Page 241: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

228 Grimmond

References

[1] Cloonan N, Forrest ARR, Kolle G, Gardiner BBA, Faulkner GJ, Brown MK, Tay-lor DF, Steptoe AL, Wani S, Bethel G et al.: Stem cell transcriptome profiling viamassive-scale mRNA sequencing. Nat Meth 2008, 5(7):613–619.

Page 242: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

229

MODERN HOMOLOGY SEARCH

MING LI

[email protected]

School of Computer ScienceUniversity of Waterloo

Waterloo, CANADA

Abstract

Dynamic programming [1] has full sensitivity, but too slow for large scale homologysearch. FASTA / BLAST type of heuristics [2] trade sensitivity for speed. Can wehave both sensitivity and speed?

We present the mathematical theory of optimized spaced seeds which allowsmodern homology search to achieve high sensitivity and high speed simultaneously.The spaced seed methodology is implemented in our PatternHunter software [3,4], as well as many other modern homology search software, serving thousands ofqueries daily.

The theory is then extended and implemented in ZOOM [5] to do fast genomescale reads mapping for the second generation sequencers.

Joint work with Bin Ma, John Tromp, D. Kisman, Hao Lin, and Zefeng Zhang.

References

[1] S.F. Altschul, W. Gish, W. Miller W, E.W. Myers, D.J. Lipman. Basic local alignmentsearch tool. J Mol Biol 215:3(1990), 403–410.

[2] T.F. Smith, M.S. Waterman, Identification of Common Molecular Subsequences.Journal of Molecular Biology , 147(1981), 195–197.

[3] B. Ma, J. Tromp, M. Li, PatternHunter: Faster and more sensitive homology search.Bioinformatics, 18:3(2002), 440–445.

[4] M. Li, B. Ma, D. Kisman and J. Tromp. PatternHunter II: highly sensitive and fasthomology search. J. Bioinformatics and Computational Biology , 2:3(2004), 417–440.

[5] H. Lin, Z. Zhang, M.Q. Zhang, B. Ma, M. Li. ZOOM! Zillions of oligos mapped.Bioinformatics. In press. 2008.

Page 243: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

230

MODELING HUMAN GENOME-WIDE COMBINATORIALREGULATORY NETWORKS INITIATED BY TRANSCRIPTION

FACTORS AND MICRORNAS USING FORWARD AND REVERSEENGINEERING

YI-XUE LI

[email protected]

Shanghai Center for Bioinformation Technology

andShanghai Institutes for Biological Sciences, Chinese Academy of Sciences, CHINA

Abstract

MicroRNAs are short endogenous non-coding transcripts which regulate their targetmRNAs by translational inhibition or mRNA degradation. Recent microRNA trans-fection experiments show strong evidence that microRNAs influence not only theirtarget but also non-target genes, but how the regulatory signals are transduced frommicroRNAs to the downstream genes remains to been elucidated. We suspect thatprimary and secondary regulatory mechanisms, initially triggered by microRNAs,form refined local networks in the cell. In light of this hypothesis, a comprehen-sive strategy was developed to reconstruct combinatory networks of primary andsecondary microRNA regulatory cascades, using microRNA’s target and non-targetgene expression profiles and information of microRNA-regulated transcription fac-tors (TF) and TF regulated genes. This strategy was then applied to 53 microRNAtransfection expression datasets and led to discovery of combinatorial regulatorynetworks triggered by 20 microRNAs. Many of these networks were enriched withgenes whose functional roles were consistent with known regulatory roles of microR-NAs. More importantly, a tumor-related regulatory network and related pathwayswere discovered, in which novel discoveries were integrated with existing knowledgeon the regulatory mechanisms of four microRNAs. In the network, by activatingmir-34 family, the tumor suppressor gene p53 can inhibit five target oncogenes, fourof which have never been reported. Our approach was carried out on a sizeable num-ber of public microRNA transfection experiment datasets, enabling a global viewof combinatory regulatory networks triggered by microRNAs. Through reconstruct-ing microRNA-triggered combinatory regulatory networks, the work help identifythe true degradation targets of mammal microRNAs, and more importantly, aid infundamental understanding of microRNA related biological functional processes.

Page 244: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

231

RECONSTRUCTING THE CIRCUITS OF DISEASE: FROMMOLECULAR STATES TO PHYSIOLOGICAL STATES

ERIC E. SCHADT

eric [email protected]

Department of Genetics

Rosetta Inpharmatics, LLC/Merck Research Labs, USA

Abstract

Common human diseases and drug response are complex traits that involve entirenetworks of changes at the molecular level driven by genetic and environmentalperturbations. Efforts to elucidate disease and drug response traits have focused onsingle dimensions of the system. Studies focused on identifying changes in DNA thatcorrelate with changes in disease or drug response traits, changes in gene expressionthat correlate with disease or drug response traits, or changes in other moleculartraits (e.g., metabolite, methylation status, protein phosphorylation status, andso on) that correlate with disease or drug response are fairly routine and havemet with great success in many cases. However, to further our understanding ofthe complex network of molecular and cellular changes that impact disease risk,disease progression, severity, and drug response, these multiple dimensions mustbe considered together. Here I present an approach for integrating a diversity ofmolecular and clinical trait data to uncover models that predict complex systembehavior. By integrating diverse types of data on a large scale I demonstrate thatsome forms of common human diseases are most likely the result of perturbationsto specific gene networks that in turn causes changes in the states of other genenetworks both within and between tissues that drive biological processes associatedwith disease. These models elucidate not only primary drivers of disease and drugresponse, but they provide a context within which to interpret biological function,beyond what could be achieved by looking at one dimension alone. That some formsof common human diseases are the result of complex interactions among networkshas significant implications for drug discovery: designing drugs or drug combinationsto impact entire network states rather than designing drugs that target specificdisease associated genes.

Page 245: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

232

THE EMERGING GENERALIZATIONS OF PROKARYOTICGENOMICS

EUGENE V. KOONIN

[email protected]

National Center for Biotechnology Information

National Library of Medicine

National Institutes of Health, Bethesda MD, USA

Abstract

The first bacterial genome was sequenced in 1995, and the first archaeal genomein 1996. Soon after these breakthroughs, an exponential rate of genome sequenc-ing was established, with a doubling time of approximately 18 months for bacteriaand approximately 34 months for archaea. Comparative analysis of the hundredsof sequenced bacterial and dozens of archaeal genomes leads to several generaliza-tions on the principles of genome organization and evolution. A crucial finding thatenables functional characterization of the sequenced genomes and evolutionary re-construction is that the majority of archaeal and bacterial genes have conservedorthologs in other, often, distant organisms. However, comparative genomics alsoshows that horizontal gene transfer (HGT) is a dominant force of prokaryotic evo-lution, along with the loss of genetic material resulting in genome streamlining. Acrucial component of the prokaryotic world is the mobilome, the enormous collec-tion of viruses, plasmids and other selfish elements which are in constant exchangewith more stable chromosomes and serve as HGT vehicles. Thus, the prokaryoticgenome space is a tightly connected, although compartmentalized, network, a newnotion that undermines the “Tree of Life” model of evolution and requires a newconceptual framework and tools for the study of prokaryotic evolution.

Page 246: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

233

A NEW UNDERSTANDING OF THE HUMAN GENOME

JOHN MATTICK

[email protected]

Institute for Molecular BioscienceUniversity of Queensland, AUSTRALIA

Abstract

It appears that the genetic programming of mammals and other complex organismshas been misunderstood for the past 50 years, because of the assumption – largelytrue in prokaryotes, but not in complex eukaryotes – that most genetic informationis transacted by proteins. The numbers of protein-coding genes do not change ap-preciably across the metazoa, whereas the relative proportion of non-protein-codingsequences increases markedly. Moreover, while only a tiny fraction encodes proteins,it is now evident that the majority of the mammalian genome is transcribed in adevelopmentally regulated manner, and that most complex genetic phenomena ineukaryotes are RNA-directed. Evidence will be presented that (i) regulatory infor-mation scales quadratically with functional complexity and hence the majority ofthe genomes of the higher organisms comprises regulatory information; (ii) thereare thousands of non-protein-coding transcripts in mammals that are dynamicallyexpressed during differentiation and development, including in embryonal stem celland neuronal cell differentiation, and T-cell and macrophage activation, amongothers, many of which show precise expression patterns and subcellular localizationin the brain; (iii) many 3’UTRs are not only linked to but are also expressed ina regulated manner separately from their associated protein-coding sequences totransmit genetic information in trans (iv) there are large numbers of small RNAs,including new classes, expressed from the human and mouse genomes, that may bediscerned from bioinformatic analysis of genomic and deep sequencing transcrip-tomic datasets; and (v) much, if not most, of the mammalian genome may notbe evolving neutrally, but rather is composed of different types of sequences (in-cluding transposon-derived sequences) that are evolving at different rates underdifferent selection pressures and different structure-function constraints. There isalso genome-wide evidence of editing of noncoding RNA sequences, especially inthe brain and especially in humans (Alu elements), which may constitute a keypart of the molecular basis of memory and cognition. Taken together, these andother observations suggest that the majority of the human genome is devoted to an

Page 247: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

234 Mattick

very sophisticated RNA regulatory system that directs developmental trajectoriesand mediates gene-environment interactions via the control of chromatin architec-ture and epigenetic memory, transcription, splicing, RNA modification and editing,mRNA translation and RNA stability.

Page 248: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

235

AUTHOR INDEX

Ahmed, H., 165Akutsu, T., 53Aung, Z., 65

Bai, X., 177Bhattacharya, D., 165Biggs, P. J., 3

Caetano, T. S., 126Chan, C. X., 165Charleston, M. A., 126Collins, L. J., 3Cvijovic, M., 114

Danforth, M., 165dos Remedios, C. G., 126

Goode, M., 150Gotoh, N., 101Grimmond, S. M., 227Guindon, S., 150Guo, D., 177

Han, D.-S., 77Han, X., 200Hatanaka, Y., 101Higuchi, T., 101Ho, J. W.K., 126Hur, H.-Y., 77Hyun, B., 77

Ikegami, Y., 212Imoto, S., 101Inouye, S. T., 212Ishida, Y., 53

Jadhav, N., 165Jang, W.-H., 77Jiang, T., 27Joly, S., 3Jung, S. H., 77

Keich, U., 15Klipp, E., 114Koonin, E. V., 232Koundinya, R., 126

Li, J., 138Li, M., 229Li, Y., 177Li, Y.-X., 230Liu, B., 177Liu, G., 138Liu, Y., 177Lu, Y., 177

Matsuno, H., 212Mattick, J., 233Meng, F., 177Mitou, N., 212Miyano, S., 101, 212Mori, H., 42Moustafa, A., 165

Nagamochi, H., 53Nagasaki, M., 101Nakai, K., 188Ng, P., 15Nielsen, L., 89Nikolski, M., 114

Quek, L.-E., 89

Rodrigo, A., 150

Savage, T., 165Schadt, E. E., 231Sherman, D. J., 114Shimamura, T., 101Shu, Y., 177Soueidan, H., 114

Tohsato, Y., 42

Page 249: Proceedings Trim Size: 9.75in x 6.5in Text Area: 8in ... · CSIRO NICTA Queensland Cyber Infrastructure Foundation SGI Sydney Bioinformatics University of Queensland Finally, we give

August 29, 2008 11:13 WSPC - Proceedings Trim Size: 9.75in x 6.5in GIW2008˙master

236 Author Index

Tong, J. C., 65

Ueno, K., 101

Vandenbon, A., 188Voelckel, C., 3

Wang, W.-B., 27Wong, L., 138

Yamaguchi, R., 101Yamauchi, M., 101Yoshida, R., 101

Zear, D., 165Zhao, L., 53Zhu, Y., 177