human genome center laboratory of genome database … · 2020. 6. 2. · the kegg system...

39
DNA, RNA, and proteins are the basic molecular building blocks of life, but the liv- ing cell contains additional molecules, including water, ions, small chemical com- pounds, glycans, lipids, and other biochemical molecules, without which the cell would not function. We are developing bioinformatics methods to integrate differ- ent types of data and knowledge on various aspects of the biological systems to- wards basic understanding of life as a molecular interaction/reaction system and also for practical applications in medical and pharmaceutical sciences. 1. KEGG DRUG and KEGG DISEASE Minoru Kanehisa and Michihiro Araki KEGG is a database of biological systems that integrates genomic, chemical, and systemic func- tional information. It is widely used as a refer- ence knowledge base for understanding higher- order functions and utilities of the cell or the or- ganism from genomic information. Although the basic components of the KEGG resource are de- veloped in Kyoto University, this Laboratory in the Human Genome Center is responsible for the applied areas of KEGG, especially in medi- cal and pharmaceutical sciences. We develop KEGG DRUG (http://www.genome.jp/kegg/ drug/), which is a chemical structure database for all approved drugs, associated with target information in the context of KEGG pathways, efficacy information in the context of hierarchi- cal drug classifications, etc. We also develop KEGG DISEASE (http://www.genome.jp/kegg/ disease/) as a new addition to the KEGG suite of databases. Each disease entry consists of a list of diseases genes and other lists of molecules such as environmental factors, markers, drugs, etc. Both DRUG and DISEASE are highly inte- grated with other KEGG databases including PATHWAY, BRITE, GENES, and COMPOUND, and also with other Internet resources. 2. Automatic assignments of orthologs and paralogs in complete genomes Toshiaki Katayama, Shuichi Kawashima , Akihiro Nakaya and Minoru Kanehisa The increase in the number of complete genomes has provided clues to gain useful in- sights to understand the evolution of the gene universe. Among the KEGG suites of databases, the GENES database contains more than 2.5 mil- Human Genome Center Laboratory of Genome Database Laboratory of Sequence Analysis ゲノムデータベース分野 シークエンスデータ情報処理分野 Professor Minoru Kanehisa, Ph.D. Assistant Professor Toshiaki Katayama, M.Sc. Assistant Professor Shuichi Kawashima, M.Sc. Lecturer Tetsuo Shibuya, Ph.D. Assistant Professor Michihiro Araki, Ph.D. 理学博士 理学修士 理学修士 理学博士 薬学博士 104

Upload: others

Post on 06-Mar-2021

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Human Genome Center Laboratory of Genome Database … · 2020. 6. 2. · the KEGG system programatically, we have been developing and maintaining the KEGG API as a SOAP/WSDL based

DNA, RNA, and proteins are the basic molecular building blocks of life, but the liv-ing cell contains additional molecules, including water, ions, small chemical com-pounds, glycans, lipids, and other biochemical molecules, without which the cellwould not function. We are developing bioinformatics methods to integrate differ-ent types of data and knowledge on various aspects of the biological systems to-wards basic understanding of life as a molecular interaction/reaction system andalso for practical applications in medical and pharmaceutical sciences.

1. KEGG DRUG and KEGG DISEASE

Minoru Kanehisa and Michihiro Araki

KEGG is a database of biological systems thatintegrates genomic, chemical, and systemic func-tional information. It is widely used as a refer-ence knowledge base for understanding higher-order functions and utilities of the cell or the or-ganism from genomic information. Although thebasic components of the KEGG resource are de-veloped in Kyoto University, this Laboratory inthe Human Genome Center is responsible forthe applied areas of KEGG, especially in medi-cal and pharmaceutical sciences. We developKEGG DRUG (http://www.genome.jp/kegg/drug/), which is a chemical structure databasefor all approved drugs, associated with targetinformation in the context of KEGG pathways,efficacy information in the context of hierarchi-cal drug classifications, etc. We also develop

KEGG DISEASE (http://www.genome.jp/kegg/disease/) as a new addition to the KEGG suiteof databases. Each disease entry consists of a listof diseases genes and other lists of moleculessuch as environmental factors, markers, drugs,etc. Both DRUG and DISEASE are highly inte-grated with other KEGG databases includingPATHWAY, BRITE, GENES, and COMPOUND,and also with other Internet resources.

2. Automatic assignments of orthologs andparalogs in complete genomes

Toshiaki Katayama, Shuichi Kawashima,Akihiro Nakaya and Minoru Kanehisa

The increase in the number of completegenomes has provided clues to gain useful in-sights to understand the evolution of the geneuniverse. Among the KEGG suites of databases,the GENES database contains more than 2.5 mil-

Human Genome Center

Laboratory of Genome DatabaseLaboratory of Sequence Analysisゲノムデータベース分野シークエンスデータ情報処理分野

Professor Minoru Kanehisa, Ph.D.Assistant Professor Toshiaki Katayama, M.Sc.Assistant Professor Shuichi Kawashima, M.Sc.Lecturer Tetsuo Shibuya, Ph.D.Assistant Professor Michihiro Araki, Ph.D.

教 授 理学博士 金 久 實助 教 理学修士 片 山 俊 明助 教 理学修士 川 島 秀 一講 師 理学博士 渋 谷 哲 朗助 教 薬学博士 荒 木 通 啓

104

Page 2: Human Genome Center Laboratory of Genome Database … · 2020. 6. 2. · the KEGG system programatically, we have been developing and maintaining the KEGG API as a SOAP/WSDL based

lion genes from over 600 organisms as of Octo-ber 2007. Sequence similarities among thesegenes are calculated by all-against-all SSEARCHcomparison and stored them in the SSDB data-base. Based on those databases, the ORTHOL-OGY database has been manually constructed tostore the relationships among the genes sharingthe same biological function. However, in thisstrategy, only the well known functions can beused for annotation of newly added genes, thusthe number of annotated genes is limited. Toovercome this situation, we developed a fullyautomated procedure to find candidate ortholo-gous clusters including whose current functionalannotation is anonymous. The method is basedon a graph analysis of the SSDB database, treat-ing genes as nodes and Smith-Waterman se-quence similarity scores as weight of edges. Thecluster is found by our heuristic method forfinding quasi-cliques, but the SSDB graph is toolarge to perform quasi-clique finding at a time.Therefore, we introduce a hierarchy (evolution-ary relationship) of organisms and treat theSSDB graph as a nested graph. The automaticdecomposition of the SSDB graph into a set ofquasi-cliques results in the KEGG OC (OrthologCluster) database. A web interface to search andbrowse genes in clusters is made available athttp://dev.kegg.jp/oc/.

3. EGENES: Transcriptome-Based Plant Data-base of Genes with Metabolic Pathway In-formation and Expressed Sequence TagIndices in KEGG

Ali Masoudi-Nejad, Susumu Goto, RuyJauregui, Masumi Itoh, Shuichi Kawashima,Yuki Moriya, Takashi R. Endo and MinoruKanehisa

EGENES is a knowledge-based database forefficient analysis of plant expressed sequencetags (ESTs) that was recently added to theKEGG suite of databases. It links plant genomicinformation with higher order functional infor-mation in a single database. It also providesgene indices for each genome. The genomic in-formation in EGENES is a collection of EST con-tigs constructed from assembled ESTs by usingEGassembler. EGassembler is a web server,which provides an automated as well as a user-customized analysis tool for cleaning, repeatmasking, vector trimming, organelle masking,clustering and assembling of ESTs and genomicfragments. Due to the extremely large genomesof plant species, the bulk collection of data suchas ESTs is a quick way to capture a completerepertoire of genes expressed in an organism.EGENES and EGassembler are publicly available

at http://www.genome.jp/kegg-bin/create_kegg_menu?category5plants_egenes and http://egassembler.hgc.jp/respectively.

4. SOAP/WSDL interface for the KEGG sys-tem

Shuichi Kawashima, Toshiaki Katayama andMinoru Kanehisa

KEGG (Kyoto Encyclopedia of Genes andGenomes) is a suite of databases and associatedsoftware, integrating our current knowledge ofmolecular interaction/reaction pathways andother systemic functions (PATHWAY and BRITEdatabases), the information about the genomicspace (GENES database), and information aboutthe chemical space (LIGAND and DRUG data-bases). To facilitate large-scale applications ofthe KEGG system programatically, we havebeen developing and maintaining the KEGGAPI as a SOAP/WSDL based web service. Re-cent improvements includes retrieval of the ref-erence information from the KEGG PATHWAYdatabase and utilization of the newly developedKEGG DRUG database. We are refining ourservice to compromise standardization of thebioinformatics web services and checking of op-erations in various computer languages. TheKEGG API is available at http://www.genome.jp/kegg/soap/.

5. Comprehensive repository for communitygenome annotation

Toshiaki Katayama, Mari Watanabe andMinoru Kanehisa

KEGG DAS is an advanced genome databasesystem providing DAS (Distributed AnnotationSystem) service for all organisms in theGENOME and GENES databases in KEGG(Kyoto Encyclopedia of Genes and Genomes).Currently, KEGG DAS contains over 8 millionannotations assigned to the genome sequencesof 615 organisms (increased from 440 organismsin last year). The KEGG DAS server providesgene annotations linked to the KEGG PATH-WAY and LIGAND databases. In addition to thecoding genes, information of non-coding RNAspredicted using Rfam database is also providedto fill the annotation of the intergenic regions ofthe genomes. We have been developing theserver based on open source software includingBioRuby, BioPerl, BioDAS and GMOD/GBrowseto make the system consistent with the existingopen standards. We are also responsible to theJapanese localization of the GBrowse genomebrowser. The contents of the KEGG DAS data-

105

Page 3: Human Genome Center Laboratory of Genome Database … · 2020. 6. 2. · the KEGG system programatically, we have been developing and maintaining the KEGG API as a SOAP/WSDL based

base can be accessed graphically in a webbrowser using GBrowse GUI (graphical user in-terface) and also programatically by the DAS(XML over HTTP) protocol. Users are able toadd their own annotations onto the KEGG DASserver by connecting other DAS servers or bysimply uploading their own data as a file. Thisfunctionality enables researchers to share the“community annotation.” The KEGG DAS isweekly updated and freely available at http://das.hgc.jp/.

6. Constructing a database of full lengthcDNA of pathogenetic arthropods

Toshiaki Katayama, Shuichi Kawashima,Junichi Watanabe, Yutaka Suzuki, SumioSugano and Minoru Kanehisa

Anopheles mosquito, tsetse fly, tsutsugamushi-mite, dust mite are arthropods which are knownas medically important because these eithertransmit various infectious disease includingmalaria, Japanese river fever, or cause allergysuch as asthma and dermatitis. Because of seri-ous medical problems they cause, their genomesare being extensively analyzed recently. Wehave produced libraries of the four organismsand are constructing their databases for thefunctional genome analysis. It will be availableon the site http://fullarth.hgc.jp/

7. AAindex: Amino Acid index database

Shuichi Kawashima, Toshiaki Katayama andMinoru Kanehisa

AAindex is a database of numerical indicesrepresenting various physicochemical and bio-chemical properties of amino acids and pairs ofamino acids. We have added a collection of pro-tein contact potentials to the AAindex as a newsection. Accordingly AAindex consists of threesections now: AAindex1 for the amino acid in-dex of 20 numerical values, AAindex2 for theamino acid mutation matrix and AAindex3 forthe statistical protein contact potentials. All dataare derived from published literature. The data-base can be accessed through the DBGET/LinkDB system at GenomeNet (http://www.genome.jp/dbget-bin/www_bfind?aaindex) ordownloaded by anonymous FTP (ftp://ftp.genome.jp/pub/db/community/aaindex/).

8. Comparative pair-wise domain-combinationfor screening the clade specific domain-architectures in metazoan genomes

Shuichi Kawashima, Takeshi Kawashima,

Hiroshi Wada and Minoru Kanehisa

In the evolution of the eukaryotic genome,exon or domain shuffling has produced a vari-ety of proteins. On the assumption that each fu-sion event between two independent protein-domains occurred only once in the evolution ofmetazoans, we can roughly estimate when thefusion events were happened. For this purpose,we made phylogenetic profiles of pair-wisedomain-combinations of metazoans. The phylo-genetic profiles can be expected to reflect theprotein evolution of metazoan. Interestingly, thephylogenetic tree of metazoans, derived fromthe profiles, supported the “Ecdysozoa hypothe-sis” that is one of the major hypotheses formetazoan evolution. Further, the phylogeneticprofiles showed the candidates of genes thatwere required for each clade-specific features inmetazoan evolution. We propose that compara-tive proteome analysis focusing on pair-wisedomain-combinations is a useful strategy for re-searching the metazoan evolution. Additionally,we found that the extant ecdysozoans shareonly fourteen domain-combinations in our pro-files. Such a small number of ecdysozoan-specific domain-combinations is consistent withthe extensive gene losses through the evolutionof ecdysozoans.

9. SSS: a sequence similarity search service

Toshiaki Katayama, Kazuhiro Ohi and MinoruKanehisa

There are various services in the world to findsimilar sequences from the database, such as thefamous BLAST service provided at NCBI. How-ever, the method to search and the database tobe searched could not be added from outside.To provide our super computer resources at theHuman Genome Center to the research commu-nity, we started to develop a new service for thesequence similarity search, SSS. In SSS, user canselect the search algorithm from BLAST, FASTA,SSEARCH, TRANS and EXONERATE. This va-riety of options is unique among the publicservices. Then user is prompted to select appro-priate database depending on the algorithm se-lected and the search is executed. On the back-end, we implemented the search system on theSun Grid Engine to provide efficient resourceson distributed computers. As a result, we areable to provide time consuming services such asTRANS and EXONERATE in addition to thepopular algorithms. The SSS service is freelyavailable at http://sss.hgc.jp/.

106

Page 4: Human Genome Center Laboratory of Genome Database … · 2020. 6. 2. · the KEGG system programatically, we have been developing and maintaining the KEGG API as a SOAP/WSDL based

10. High performance database entry re-trieval system

Toshiaki Katayama, Shuichi Kawashima,Kenta Nakai and Minoru Kanehisa

Recently, the number of entries in biologicaldatabases is exponentially increasing year byyear. For example, there were 10,106,023 entriesin the GenBank database in the year 2000, whichhas now grown to 83,051,496 (Release 162+daily updates). In order for such a vast amountof data to be searched at a high speed, we havedeveloped a high performance database entryretrieval system, named HiGet. The HiGet sys-tem is constructed on the HiRDB, a commercialORDBMS (Object-oriented Relational DatabaseManagement System) developed by Hitachi,Ltd. It is publicly accessible on the Web page athttp://higet.hgc.jp/ or SOAP based web serviceat http://higet.hgc.jp/soap/. HiGet can executefull text search on various biological databases.In addition to the original plain format, the sys-tem contains data in the XML format in order toprovide a field specific search facility. When acomplicated search condition is issued to thesystem, the search processing is executed effi-ciently by combining several types of indices toreduce the number of records to be processedwithin the system. Current searchable databasesare GenBank, UniProt, Prosite, OMIM, PDB andRefSeq. We are planning to include other valu-able databases and also planning to develop aninter-database search interface and a complexsearch facility combining keyword search andsequence similarity search.

11. PSGST: A New Data Structure for Index-ing Protein Structures

Tetsuo Shibuya

Protein structure analysis is one of the mostimportant research issues in the post-genomicera, and faster and more accurate index datastructures for such 3-D structures are highly de-sired for research on proteins. The geometricsuffix tree, which is also proposed by our group,is a very sophisticated index structure that en-ables fast and accurate search on protein 3-Dstructures. By using it, we can search from 3-Dstructure databases for all the substructureswhose RMSDs (root mean square deviations) toa given query 3-D structure are not larger thana given bound. We proposed a new data struc-ture based on the geometric suffix tree whosequery performance is much better than theoriginal geometric suffix tree. We call the modi-fied data structure the prefix-shuffled geometric

suffix tree (or PSGST for short). According toour experiments, the PSGST outperforms thegeometric suffix tree in most cases. The PSGSTshows its best performance when the databasedoes not have many substructures similar to thequery. The query is sometimes 100 times fasterthan the original geometric suffix trees in suchcases.

12. Compact Geometric Suffix Tree

Tetsuo Shibuya

Protein structure analysis is one of the mostimportant research issues in the post-genomicera, and faster and more accurate index datastructures for such 3-D structures are highly de-sired for research on proteins. The geometricsuffix tree, which is also a data structure pro-posed by our group, is a very sophisticated in-dex structure that enables fast and accuratesearch on protein 3-D structures. By using it, wecan search from 3-D structure databases for allthe substructures whose RMSDs (root meansquare deviations) to a given query 3-D struc-ture are not larger than a given bound. But thegeometric suffix tree requires rather large mem-ory, i.e., about 65n byte for a protein of lengthn. We developed a new data structure called“compact geometric suffix tree” which requiresonly 23n byte, which is almost about 1/3 of theoriginal geometric suffix tree, while the queryspeed is almost the same as the original geomet-ric suffix tree.

13. Protein Function Prediction based on 3-DStructure Motifs

Chia-Han Chu, Hiroki Sakai, and TetsuoShibuya

Protein functions are said to be determined byits 3-D structures, but not all functions havebeen known to be related to some 3-D structuremotifs. The geometric suffix tree, a data struc-ture for indexing 3-D protein structures, whichis also developed by us, enables comprehen-sively enumeration of all the possible structuralmotifs among given set of proteins. We are de-veloping a new algorithm based on the supportvector machine that decides protein’s functionfrom the 3-D structure of a protein. This algo-rithm utilizes all the possible 3-D motifs foundby using the geometric suffix tree.

14. Fast Hinge Detection Algorithm in ProteinStructures

Tetsuo Shibuya

107

Page 5: Human Genome Center Laboratory of Genome Database … · 2020. 6. 2. · the KEGG system programatically, we have been developing and maintaining the KEGG API as a SOAP/WSDL based

Analysis of conformational changes is one ofthe keys to the understanding of protein func-tions and interactions. For the analysis, we oftencompare two protein structures, taking flexibleregions like hinge domains into consideration.The RMSD (Root Mean Square Deviation) is themost popular measure for comparing two pro-tein structures, but it is only for rigid structureswithout hinge domains. In this paper, we pro-pose a new measure called RMSDh (Root MeanSquare Deviation considering hinges) and itsvariant RMSDh(k) for comparing two flexibleproteins with hinge domains. We also proposenovel efficient algorithms for computing them,which can detect the hinge positions at the sametime. The RMSDh is suitable for cases wherethere is one small hinge domain in each of thetwo target structures. The new algorithm forcomputing the RMSDh runs in linear time,which is same as the time complexity for com-puting the RMSD and is faster than any of pre-vious algorithms for hinge detection. TheRMSDh(k) is designed for comparing structureswith more than one hinge domain. The RMSDh(k) measure considers at most k small hinge do-mains, i.e., the RMSDh(k) value should be smallif the two structures are similar except for atmost k hinge domains. To compute the value,we propose an O(kn2)-time and O(n)-space algo-rithm based on a new dynamic programmingtechnique. We also test our measures againstboth flexible protein structures and non-flexibleprotein structures, and show that the hinge po-sitions can be correctly detected by our algo-rithms.

15. Fast Flexible Protein Structure Alignment

Kohichi Suematsu and Tetsuo Shibuya

The Hinge Detection Algorithm described insection 14 only considered rigid hinge points,but the hinges are sometimes bends a little by it-self, which sometimes leads to inaccurate pre-diction of hinge positions. Thus we incorporatedthe notion ‘bending hinge’ to detect such hingepositions. We developed a very efficient heuris-tic algorithm for finding such bending hinges, asthe exact algorithm for this problem requires ex-ponential time.

16. Fast Algorithms for Fast Range RMSDQuery

Tetsuo Shibuya

Protein structure analysis is a very importantresearch topic in the molecular biology of thepost-genomic era. The RMSD (root mean square

deviation) is the most frequently used measurefor comparing two protein 3-D structures. Wedeal with two fundamental problems related tothe RMSD. We first deal with a problem calledthe ‘range RMSD query’ problem. Given analigned pair of structures, the problem is tocompute the RMSD between two aligned sub-structures of them without gaps. This problemhas many applications in protein structureanalysis. We propose a linear-time preprocess-ing algorithm that enables constant-time RMSDcomputation. Next, we consider a problemcalled the ‘substructure RMSD query’ problem,which is a generalization of the above rangeRMSD query problem. It is a problem to com-pute the RMSD between any substructures oftwo unaligned structures without gaps. Basedon the algorithm for the range RMSD problem,we propose an O(nm) preprocessing algorithmthat enables constant-time RMSD computation,where n and m are the lengths of the givenstructures. Moreover, we propose O(nm log r/r)-time and O(nm/r)-space preprocessing algo-rithm that enables O(r) query, where r is an ar-bitrary integer such that 0<r<min(n, m). Wealso show that our strategy also works for an-other measure called the URMSD (unit-vectorroot mean square deviation), which is a variantof the RMSD.

17. Suffix Array Construction with a LazyScheme

Ben Hachimori and Tetsuo Shibuya

The suffix array is one of the most importantindexing data structures for alphabet strings, in-cluding DNA sequences, RNA sequences, pro-tein sequences, web pages, Medline database,and so on. But even the most sophisticated algo-rithm for constructing the suffix array requires alot of time. We developed a new efficient lazyalgorithm that computes the suffix array onlyafter we get the query. By doing so, we have tocompute only the necessary part of the suffix ar-ray.

18. Genotype Clustering based on HiddenMarkov Models

Ritsuko Onuki, Tetsuo Shibuya and MinoruKanehisa

Haplotype clustering is important for genemapping of human disease. Although its impor-tance for the analysis, it is difficult to obtainhaplotype data from present experiment for itscost and error rate. Instead of haplotypes, geno-types are much easier to obtain. In this work,

108

Page 6: Human Genome Center Laboratory of Genome Database … · 2020. 6. 2. · the KEGG system programatically, we have been developing and maintaining the KEGG API as a SOAP/WSDL based

we propose a new method for clustering geno-types. In the algorithm. we first infer the multi-ple haplotype candidates from the genotype,and next we calculate the distance between thegenotypes based on the results of the haplotypeinference. Then we perform genotype clusteringbased on the distances. We evaluated our algo-rithm by applying our algorithm against severalactual genotype data.

19. Analysis of the Chemical Diversity of Me-dicinal Natural Products

Michihiro Araki, Tetsuo Shibuya, KohichiSuematsu, and Minoru Kanehisa

The diverse activities of medicinal naturalproducts are explained by extensive diversitiesin the chemical structures. Although several hy-potheses have been proposed to understand theorigin of the chemical diversity, no systematicanalyses have been done so far. In order to elu-cidate how the chemical diversities are designedin the biosynthetic circuits, we defined molecu-lar building blocks required for describing thechemical information of natural products andperformed the systematic analyses with thetransformational and combinatorial informationof the building blocks. We further reconstructedthe metabolic network using the chemical andthe genomic information to identify alternativebiosynthetic pathways as well as new chemicalentities.

20. Extraction of Chemical Modification Pat-terns from Drug Developmental Historyand Its Application

Daichi Shigemizu, Michihiro Araki, ShujiroOkuda, Susumu Goto and Minoru Kanehisa

The chemical modifications in the history ofdrug discovery are expected to provide a wealthof the medicinal chemists’ knowledge and it isof importance to extract the information on thechemical modifications in the form of distinctpatterns. KEGG DRUG structure map collectedsuch information by illustrating the develop-mental relationships among the drug structures.Here we focused on 288 drug pairs in the KEGGDRUG structure map to extract the chemicalmodification patterns. We developed a methodfor taking the transformations of core chemo-types and modified fragments into considera-tions. A set of the pairs of the atoms on the corechemotypes and the fragments was collected asthe indicators of the chemical modifications,which identified 347 chemical modification pat-terns in the structure maps. The transformation

patterns were then applied for in silico drugmodifications. We developed a scheme to pre-dict potential drug-like compounds, whichcould demonstrate the reconstruction of thedrug developmental maps.

21. Comprehensive Analysis of DistinctivePolyketide and Nonribosomal PeptideStructural Motifs Encoded in MicrobialGenomes

Yohsuke Minowa, Michihiro Araki andMinoru Kanehisa

We developed a highly accurate method topredict polyketide (PK) and nonribosomal pep-tide (NRP) structures encoded in the microbialgenome. PKs/NRPs are polymers of carbonyl orpeptidyl chains synthesized by polyketide syn-thases (PKS) and nonribosomal peptide syn-thetases (NRPS). We analyzed domain se-quences corresponding to specific substrates andphysical interactions between PKSs/NRPSs inorder to predict which substrates are selectedand assembled into highly ordered chemicalstructures. The predicted PKs/NRPs were repre-sented as the sequences of carbonyl/peptidylunits to extract the structural motifs efficiently.The structural sequences were compared usingthe Smith-Waterman algorithm to extract 33structural motifs that are significantly relatedwith their bioactivities. The integrative analysisof genomic and chemical information here willprovide a strategy to predict the chemical struc-tures, the biosynthetic pathways, and the bio-logical activities of PKs/NRPs, which is usefulfor the rational design of novel PKs/NRPs.

22. Prediction of Drug-target Interaction Net-works from the Integration of Chemicaland Genomic Spaces

Yoshihiro Yamanishi, Michihiro Araki, AlexGutteridge, Wataru Honda and MinoruKanehisa

The identification of interactions betweendrugs and target proteins is a key area in drugdiscovery. We characterize four classes of drug-target interaction networks in humans involvingenzymes, ion channels, GPCRs, and nuclear re-ceptors, and reveal significant correlations be-tween drug structure similarity, target sequencesimilarity, and the drug-target interaction net-work topology. We then develop new statisticalmethods to predict unknown drug-target inter-action networks from chemical structure andgenomic sequence information simultaneouslyon a large scale. As a result, we demonstrate the

109

Page 7: Human Genome Center Laboratory of Genome Database … · 2020. 6. 2. · the KEGG system programatically, we have been developing and maintaining the KEGG API as a SOAP/WSDL based

usefulness of our proposed method for the pre-diction of the four classes of drug-target interac-tion networks. Our comprehensively predicteddrug-target interaction networks enable us to

suggest many potential drug-target interactionsand to increase research productivity towardgenomic drug discovery.

Publications

Kanehisa, M., Araki, M., Goto, S., Hattori, M.,Hirakawa, M., Itoh, M., Katayama, T.,Kawashima, S., Okuda, S., Tokimatsu, T., andYamanishi, Y. KEGG for linking genomes tolife and the environment. Nucleic Acids Re-search 36: 2008, in press.

Kawashima, S., Pokarowski, P., Pokarowska, M.,Kolinski, A., Katayama, T., Kanehisa, M.AAindex: amino acid index database, progressreport 2008. Nucleic Acids Research 36: 2008, inpress.

Lapp, H., Bala, S., Balhoff, J.P., Bouck, A., Goto,N., Holder, M., Holland, R., Holloway, A.,Katayama, T., Lewis, P.O., Mackey, A., Os-borne, B.I., Piel, W.H., Kosakovsky Pond, S.L.,Poon, A., Qiu, W.-G, Stajich, J.E., Stoltzfus, A.,Thierer, T., Vilella, A.J., Vos, R.A., Zmasek, C.M., Zwickl, D. and Vision, T.J. The 2006 NES-Cent Phyloinformatics Hackathon: A field re-port, Evolutionary Bioinformatics, 2008, in press.

Kawashima, S., Kawashima, T., Putnam, N.H.,Rokhsar D.S., Wada, H. and Kanehisa, M.Comparative pair-wise domain-combinationsfor screening the clade specific domain-architectures in metazoan genomes, GenomeInformatics, 19: 50-60, 2007

Hu, Z., Ng, D.M., Yamada, T., Chen, C.,Kawashima, S., Mellor, J., Linghu, B., Kane-hisa, M., Stuart, J.M. and DeLisi, C. VisANT3.0: new modules for pathway visualization,editing, prediction and construction. NucleicAcids Research , 35: W625-32, 2007

Masoudi-Nejad, A., Goto, S., Jauregui, R., Ito,M., Kawashima, S., Moriya, Y., Endo, T.R. andKanehisa, M. EGENES: transcriptome-basedplant database of genes with metabolic path-way information and expressed sequence tagindices in KEGG. Plant Physiology , 144: 857-866, 2007

Okamoto, S., Yamanishi, Y. , Ehira, S. ,Kawashima, S., Tonomura, K. and Kanehisa,M. Prediction of nitrogen metabolism-relatedgenes in Anabaena by kernel-based networkanalysis. Proteomics, 7: 900-909, 2007

Okuda, S., Kawashima, S., Kobayashi, K.,Ogasawara, N., Kanehisa, M. and Goto, S.Characterization of relationships betweentranscriptional units and operon structures inBacillus subtilis and Escherichia coli, BMC

Genomics, 8: 48, 2007Huang, J., Kawashima, S. and Kanehisa, M.

New amino acid indices based on residue net-work topology, Genome Informatics, 18, 2007,in press

Shibuya, T., Combitorial Pattern Matching forProtein 3-D Structures, SIG FPAI, 2008, to ap-pear.

Minowa, Y., Araki, M., Kanehisa, M. Compre-hensive Analysis of Distinctive Polyketide andNonribosomal Peptide Structural Motifs En-coded in Microbial Genomes, Journal of Mo-lecular Biology, 368: 1500-1517, 2007.

Shibuya, T., Efficient Substructure RMSD QueryAlgorithms, Journal of Computational Biol-ogy, Vol. 14, No. 9., 2007, pp. 1201-1207.

Shibuya, T., Prefix-Shuffled Geometric SuffixTree, Proc. 14th String Processing and Infor-mation Retrieval Symposium (SPIRE 2007),LNCS 4726, 2007, pp. 300-309.

Shibuya, T., Current Bioinformatics and Its Fu-ture, Journal of the Institute of Electronics, In-formation and Communication Engineers, vol.90, no. 2, 2007, pp. 145-147.

Onuki, R., Shibuya, T., and Kanehisa, M.,Haplotype Inference Probability-based Geno-type Clustering, Proc. 7th Annual Workshopon Bioinformatics and Systems Biology (IBSB),2007, pp. 26-27.

Shibuya, T., Efficient Substructure RMSD QueryAlgorithms, IPSJ SIG Notes SIGAL 114-8,2007, pp. 57-64.

Shibuya, T., Fast and Accurate Algorithms forProtein Hinge Detection, IPSJ SIG Notes SIG-BIO 10-4, 2007, pp. 25-32.

Shibuya, T., Prefix-Shuffled Geometric SuffixTree, IPSJ SIG Notes SIGAL 112-1, May 11,2007, pp. 1-8.

Onuki, R., Shibuya, T., and Kanehisa, M., Geno-type Clustering based on Haplotype Inference,Japanese Society of Human Genetics, 2007.

Onuki, R., Shibuya, T., and Kanehisa, M., ANew Method for Genotype Clustering basedon Haplotype Inference, Biochemistry andMolecular Biology (BMB 2007), 1P-1071, 2007.渋谷哲朗,坂内英夫(訳),Neil C. Jones, Pavel

Pevzner(著),バイオインフォマティクスのためのアルゴリズム入門,共立出版,2007.

110

Page 8: Human Genome Center Laboratory of Genome Database … · 2020. 6. 2. · the KEGG system programatically, we have been developing and maintaining the KEGG API as a SOAP/WSDL based

The original aim of the research at this laboratory is to establish computationalmethodologies for discovering and interpreting information of nucleic acid se-quences, proteins and some other experimental data arising from researches inGenome Science. The recent advances in biomedical research have been produc-ing large-scale, ultra-high dimensional, ultra-heterogeneous data. Due to thesepost-genomic research progresses, our current mission is to create computationalstrategy for systems biology and medicine towards translational bioinformatics.With this mission, we have been developing computational methods for under-standing life as system and applying them to practical issues in medicine and biol-ogy.

1. Computational Systems Biology

a. Cell System Ontology: Representation formodeling, visualizing, and simulating bio-logical pathways

Euna Jeong, Masao Nagasaki, Ayumu Saito,Satoru Miyano

With the rapidly accumulating knowledge ofbiological entities and networks, there is a grow-ing need for a general framework to understandthis information at a system level. In order tounderstand life as system, a formal descriptionof system dynamics with semantic validationwill be necessary. Within the context of biologi-cal pathways, several formats have been pro-posed, e.g., SBML, CellML, and BioPAX. Unfor-tunately, these formats lack the formal defini-tions of each term or fail to capture the systemdynamics behavior. Thus, we have developed anew system dynamics centered ontology calledCell System Ontology (CSO). As an exchange

format, the ontology is implemented in the WebOntology Language (OWL), which enables se-mantic validation and automatic reasoning tocheck the consistency of biological pathwaymodels. The features of CSO are as follows: (1)manipulation of different levels of granularityand abstraction of pathways, e.g., metabolicpathways, regulatory pathways, signal transduc-tion pathways, and cell-cell interactions; (2) cap-ture of both quantitative and qualitative aspectsof biological function by using hybrid functionalPetri net with extension (HFPNe); and (3) en-coding of biological pathway data related tovisualization and simulation, as well as model-ing. The new ontology also predefines maturecore vocabulary, which will be necessary for cre-ating models with system dynamics. In addition,each of the core terms has at least one standardicon for easy modeling and accelerating the ex-changeability among applications. In order todemonstrate the potential of CSO-based path-way modeling, visualization, and simulation, wepresent an HFPNe model for the ASEL and

Human Genome Center

Laboratory of DNA Information AnalysisDNA情報解析分野

Professor Satoru Miyano, Ph.D.Associate Professor Seiya Imoto, Ph.D.Assistant Professor Masao Nagasaki, Ph.D.Project Lecturer Rui Yamaguchi, Ph.D.

教 授 理学博士 宮 野 悟准教授 博士(数理学) 井 元 清 哉助 教 博士(理学) 長 � 正 朗特任講師 博士(理学) 山 口 類

111

Page 9: Human Genome Center Laboratory of Genome Database … · 2020. 6. 2. · the KEGG system programatically, we have been developing and maintaining the KEGG API as a SOAP/WSDL based

ASER regulatory networks in Caenorhabditis ele-gans .

b. Conversion from BioPAX to CSO for sys-tem dynamics and visualization of biologi-cal pathway

Euna Jeong, Masao Nagasaki, Satoru Miyan

The vast accumulation of biological pathwaydata scattered in various sources presents chal-lenges in the exchange and integration of thesedata. Major new standards for representation ofpathway data and the ability to check inconsis-tency in pathways are inevitable for the devel-opment of a reliable pathway data repository.Within the context of biological pathways, thecell system ontology (CSO) had been developedas a general framework to model system dy-namics and visualization of diverse biologicalpathways. CSO provides an excellent environ-ment for modeling, visualizing, and simulatingcomplex molecular mechanisms at different lev-els of details. This paper examines whether CSOaddresses the integration capability of pathwaydata with system dynamics. We present a con-version tool for converting BioPAX to CSO.Transforming the data from BioPAX to CSO notonly allows an analysis of the dynamic behav-iors in molecular interactions but also allows theresults to be stored for further biological investi-gations, which is not possible in BioPAX. Theconversion is done using simple inference algo-rithms with the addition of view- andsimulation-related properties. We demonstratehow CSO can be used to build a complete andconsistent pathway repository and enhance theinteroperability among applications.

c. An efficient grid layout algorithm for bio-logical networks utilizing various biologi-cal attributes

Kaname Kojima, Masao Nagasaki, EunaJeong, Mitsuru Kato, Satoru Miyano

Clearly visualized biopathways provide agreat help in understanding biological systems.However, manual drawing of large-scalebiopathways is time consuming. We proposed agrid layout algorithm that can handle gene-regulatory networks and signal transductionpathways by considering edge-edge crossing,node-edge crossing, distance measure betweennodes, and subcellular localization informationfrom Gene Ontology. Consequently, the layoutalgorithm succeeded in drastically reducingthese crossings in the apoptosis model. How-ever, for larger-scale networks, we encountered

three problems: (i) the initial layout is often veryfar from any local optimum because nodes areinitially placed at random, (ii) from a biologicalviewpoint, human layouts still exceed automaticlayouts in understanding because except subcel-lular localization, it does not fully utilize bio-logical information of pathways, and (iii) it em-ploys a local search strategy in which the neigh-borhood is obtained by moving one node ateach step, and automatic layouts suggest that si-multaneous movements of multiple nodes arenecessary for better layouts, while such exten-sion may face worsening the time complexity.We propose a new grid layout algorithm. To ad-dress problem (i), we devised a new forcedi-rected algorithm whose output is suitable as theinitial layout. For (ii), we considered that an ap-propriate alignment of nodes having the samebiological attribute is one of the most importantfactors of the comprehension, and we defined anew score function that gives an advantage tosuch configurations. For solving problem (iii),we developed a search strategy that considersswapping nodes as well as moving a node,while keeping the order of the time complexity.Though a naive implementation increases byone order, the time complexity, we solved thisdifficulty by devising a method that caches dif-ferences between scores of a layout and its pos-sible updates. Layouts of the new grid layout al-gorithm are compared with that of the previousalgorithm and human layout in an endothelialcell model, three times as large as the apoptosismodel. The total cost of the result from the newgrid layout algorithm is similar to that of thehuman layout. In addition, its convergence timeis drastically reduced (40% reduction).

d. Modelling and simulation of signal trans-ductions in an apoptosis pathway by us-ing timed Petri nets

Chen Li, Qi-Wei Ge1, Mitsuru Nakata1, HiroshiMatsuno1, Satoru Miyano: 1Yamaguchi Univer-sity

This paper first presents basic Petri net com-ponents representing molecular interactions andmechanisms of signalling pathways, and intro-duces a method to construct a Petri net modelof a signalling pathway with these components.Then a simulation method of determining thedelay time of transitions, by using timed Petrinets, i.e. the time taken in firing of each transi-tion is proposed based on some simple princi-ples that the number of tokens flowed into aplace is equivalent to the number of tokensflowed out. Finally, the availability of proposedmethod is confirmed by observing signalling

112

Page 10: Human Genome Center Laboratory of Genome Database … · 2020. 6. 2. · the KEGG system programatically, we have been developing and maintaining the KEGG API as a SOAP/WSDL based

transductions in biological pathways throughsimulation experiments of the apoptosis signal-ling pathways as an example.

e. Understanding endothelial cell apoptosis:What can the transcriptome glycome andproteome reveal?

Muna Affara2 , Benjamin Dunmore2 ,Christopher Savoie3, Seiya Imoto, YoshinoriTamada3, Hiromitsu Araki3, D. StephenCharnock-Jones2, Satoru Miyano, CristinPrint4: 2Cambridge University, 3GNI, Ltd. 4Uni-versity of Auckland

Endothelial cell (EC) apoptosis may play animportant role in blood vessel development, ho-meostasis and remodelling. In support of thisconcept, EC apoptosis has been detected withinremodelling vessels in vivo, and inactivation ofEC apoptosis regulators has caused dramaticvascular phenotypes. EC apoptosis has also beenassociated with cardiovascular pathologies.Therefore, understanding the regulation of ECapoptosis, with the goal of intervening in thisprocess, has become a current research focus.The protein-based signalling and cleavage cas-cades that regulate EC apoptosis are wellknown. However, the possibility that pro-grammed transcriptome and glycome changescontribute to EC apoptosis has only recentlybeen explored. Traditional bioinformatic tech-niques have allowed simultaneous study ofthousands of molecular signals during the proc-ess of EC apoptosis. However, to progress fur-ther, we now need to understand the complexcause and effect relationships among these sig-nals. In this article, we will first review currentknowledge about the function and regulationof EC apoptosis including the roles of the pro-teome transcriptome and glycome. Then, we as-sess the potential for further bioinformaticanalysis to advance our understanding of ECapoptosis, including the limitations of currenttechnologies and the potential of emerging tech-nologies such as gene regulatory networks.

f. Weighted Lasso in graphical Gaussianmodeling for large gene network estima-tion based on microarray data

Teppei Shimamura , Seiya Imoto , RuiYamaguchi, Satoru Miyano

We propose a statistical method based ongraphical Gaussian models for estimating largegene networks from DNA microarray data. Inestimating large gene networks, the number ofgenes is larger than the number of samples, we

need to consider some restrictions for modelbuilding. We propose weighted lasso estimationfor the graphical Gaussian models as a model oflarge gene networks. In the proposed method,the structural learning for gene networks isequivalent to the selection of the regularizationparameters included in the weighted lasso esti-mation. We investigate this problem from aBayes approach and derive an empirical Baye-sian information criterion for choosing them.Unlike Bayesian network approach, our methodcan find the optimal network structure and doesnot require to use heuristic structural learningalgorithm. We conduct Monte Carlo simulationto show the effectiveness of the proposedmethod. We also analyze Arabidopsis thalianamicroarray data and estimate gene networks.

g. A structure learning algorithm for infer-ence of gene networks from microarraygene expression data using Bayesian net-works

Kazuyuki Numata, Seiya Imoto, SatoruMiyano

Estimation of gene networks based on mi-croarray gene expression data is an importantproblem in systems biology. In this paper weuse Bayesian networks as a mathematical modelfor reverse-engineering gene networks from mi-croarray data. In such a case, structural learningof Bayesian networks is known as an NP-hardproblem and we need to use heuristic algo-rithms to find better network structures. Re-cently, several algorithms have been proposedto estimate optimal Bayesian network structure,but the number of genes included in the net-work is limited less than 30 or so. In order toapply Bayesian network approach to drug targetgene discovery, we need to consider gene net-works with several hundreds of genes. There-fore we need to develop more efficient algo-rithms to learn Bayesian network structurebased on observed data. In this paper we pro-pose an efficient structural learning algorithmfor Bayesian networks by extending K2 algo-rithm that is one of the standard learning algo-rithms in Bayesian networks. We conduct MonteCarlo simulations to examine the effectiveness ofthe proposed algorithm by comparing withgreedy hill-climbing algorithm. We also showthe application of yeast gene network estimationbased on the proposed algorithm.

h. Clustering samples characterized by timecourse gene expression profiles using themixture of state space models

113

Page 11: Human Genome Center Laboratory of Genome Database … · 2020. 6. 2. · the KEGG system programatically, we have been developing and maintaining the KEGG API as a SOAP/WSDL based

Osamu Hirose, Ryo Yoshida5, Rui Yamaguchi,Seiya Imoto, Tomoyuki Higuchi5, SatoruMiyano: 5Institute of Statistical Mathematics

We propose a novel method to classify sam-ples where each sample is characterized by atime course gene expression profile. By exploit-ing the mixture of state space model, the pro-posed method addresses the following tasks: (1)clustering samples according to temporal pat-terns of gene expressions, (2) automatic detec-tion of genes that discriminate identified clus-ters, (3) estimation of a restricted autoregressivecoefficient for each cluster. We demonstrate theproposed method along with the cluster analysisof 53 multiple sclerosis patients under recombi-nant interferon βtherapy with the longitudinaltime course expression profiles.

i. Identification of activated transcription fac-tors from microarray gene expression dataof Kampo medicine-treated mice

Rui Yamaguchi, Masahiro Yamamoto6, SeiyaImoto, Masao Nagasaki, Ryo Yoshida5, KenjiTsuiji6, Atsushi Ishige6, Hiroaki Asou7, KenjiWatanabe2, Satoru Miyano: 6Keio UniversitySchool of Medicine, 7Tokyo Metropolitan Insti-tute of Gerontology

We propose an approach to identify activatedtranscription factors from gene expression datausing a statistical test. Applying the method, wecan obtain a synoptic map of transcription factoractivities which helps us to easily grasp the sys-tem’s behavior. As a real data analysis, we use acase-control experiment data of mice treated bya drug of Kampo medicine remedying degradedmyelin sheath of nerves in central nervous sys-tem. Kampo medicine is Japanese traditionalherbal medicine. Since the drug is not a singlechemical compound but extracts of multiple me-dicinal herb, the effector sites are possibly multi-ple. Thus it is hard to understand the actionmechanism and the system’s behavior by inves-tigating only few highly expressed individualgenes. Our method gives summary for the sys-tem’s behavior with various functional annota-tions, e.g. TFAs and gene ontology, and thus of-fer clues to understand it in more holistic man-ner.

j. AYUMS: an algorithm for completely auto-matic quantitation based on LC-MS/MSproteome data and its application to theanalysis of signal transduction

Ayumu Saito, Masao Nagasaki, MasaakiOyama, Hiroko Kozuka-Hata, Kentaro Semba,

Sumio Sugano, Tadashi Yamamoto, Satoru Mi-yano

Comprehensive description of the behavior ofcellular components in a quantitative manner isessential for systematic understanding of bio-logical events. Recent LC-MS/MS (tandem massspectrometry coupled with liquid chromatogra-phy) technology, in combination with the SILAC(Stable Isotope Labeling by Amino acids in Cellculture) method, has enabled us to make rela-tive quantitation at the proteome level. The re-cent report by Blagoev et al . (Nat. Biotechnol.,22, 1139-1145, 2004) indicated that this methodwas also applicable for the time-course analysisof cellular signalling events. Relative quatitationcan easily be performed by calculating the ratioof peak intensities corresponding to differen-tially labeled peptides in the MS spectrum. Ascurrently available software requires some GUIapplications and is time-consuming, it is notsuitable for processing large-scale proteomedata.

To resolve this difficulty, we developed an al-gorithm that automatically detects the peaks ineach spectrum. Using this algorithm, we devel-oped a software tool named AYUMS that auto-matically identifies the peaks corresponding todifferentially labeled peptides, compares thesepeaks, calculates each of the peak ratios inmixed samples, and integrates them into onedata sheet. This software has enabled us to dra-matically save time for generation of the finalreport.

AYUMS is a useful software tool for compre-hensive quantitation of the proteome data gen-erated by LC-MS/MS analysis. This softwarewas developed using Java and runs on Linux,Windows, and Mac OS X. Please contact [email protected] if you are interested in theapplication. The project web page is http://www.csml.org/ayums/

k. Modeling gene expression regulatory net-works with the sparse vector autoregres-sive model

Andre Fujita, Joao R Sato8, Humberto M.Garay-Malpartida8, Rui Yamaguchi, SatoruMiyano, Mari C. Sogayar8, Carlos E Ferreira8:

~8University of Sao Paulo

To understand the molecular mechanisms un-derlying important biological processes, a de-tailed description of the gene products networksinvolved is required. In order to define and un-derstand such molecular networks, some statisti-cal methods are proposed in the literature to es-timate gene regulatory networks from time-

114

Page 12: Human Genome Center Laboratory of Genome Database … · 2020. 6. 2. · the KEGG system programatically, we have been developing and maintaining the KEGG API as a SOAP/WSDL based

series microarray data. However, several prob-lems still need to be overcome. Firstly, informa-tion flow need to be inferred, in addition to thecorrelation between genes. Secondly, we usuallytry to identify large networks from a large num-ber of genes (parameters) originating from asmaller number of microarray experiments(samples). Due to this situation, which is ratherfrequent in Bioinformatics, it is difficult to per-form statistical tests using methods that modellarge genegene networks. In addition, most ofthe models are based on dimension reductionusing clustering techniques, therefore, the result-ing network is not a gene-gene network but amodule-module network. Here, we present theSparse Vector Autoregressive model as a solu-tion to these problems. We have applied theSparse Vector Autoregressive model to estimategene regulatory networks based on gene expres-sion profiles obtained from time-series microar-ray experiments. Through extensive simulations,by applying the SVAR method to artificial regu-latory networks, we show that SVAR can infertrue positive edges even under conditions inwhich the number of samples is smaller thanthe number of genes. Moreover, it is possible tocontrol for false positives, a significant advan-tage when compared to other methods de-scribed in the literature, which are based onranks or score functions. By applying SVAR toactual HeLa cell cycle gene expression data, wewere able to identify well known transcriptionfactor targets. The proposed SVAR method isable to model gene regulatory networks in fre-quent situations in which the number of sam-ples is lower than the number of genes, makingit possible to naturally infer partial Granger cau-salities without any a priori information. In ad-dition, we present a statistical test to control thefalse discovery rate, which was not previouslypossible using other gene regulatory networkmodels.

2. Statistical and Computational KnowledgeDiscovery

a. Statistical absolute evaluation of gene on-tology terms with gene expression datas

Pramod K. Gupta, Ryo Yoshida5, Seiya Imoto,Rui Yamaguchi, Satoru Miyano

We propose a new testing procedure for theautomatic ontological analysis of gene expres-sion data. The objective of the ontological analy-sis is to retrieve some functional annotations, e.g. Gene Ontology terms, relevant to underlyingcellular mechanisms behind the gene expressionprofiles, and currently, a large number of tools

have been developed for this purpose. The mostexisting tools implement the same approach thatexploits rank statistics of the genes which areordered by the strength of statistical evidences,e.g. p-values computed by testing hypotheses atthe individual gene level. However, such an ap-proach often causes the serious false discovery.Particularly, one of the most crucial drawbacksis that the rank-based approaches wrongly judgethe ontology term as statistically significant al-though all of the genes annotated by the ontol-ogy term are irrelevant to the underlying cellu-lar mechanisms. In this paper, we first point outsome drawbacks of the rank-based approachesfrom the statistical point of view, and then, pro-pose a new testing procedure in order to over-come the drawbacks. The method that we pro-pose has the theoretical basis on the statisticalmeta-analysis, and the hypothesis to be tested issuitably stated for the problem of the ontologi-cal analysis. We perform Monte Carlo experi-ments for highlighting the disadvantages of therank-based approach and the advantages of theproposed method. Finally, we demonstrate theapplicability of the proposed method along withthe ontological analysis of the gene expressiondata of human diabetes.

b. Computational discovery of aberrantsplice variations with genome-wide exonexpression profiles

Ryo Yoshida5, Kazuyuki Numata, Seiya Imoto,Masao Nagasaki, Atsushi Doi3, Kazuko Ueno,Satoru Miyano

Alternative splicing plays a prominent role ineukaryotic gene regulations that allow a singlegene to generate the multiple mRNA products.The recent advent of GeneChip Human Exon 1.0ST Array enables us to measure the exon ex-pression profiles of human cells on a genome-wide scale. With this advent, analysis of func-tional gene regulation could be extended to de-tect not only differentially expressed genes, butalso specific splicing events that occur in targetcells, but not in normal controls. We addresssome statistical issues for the identification ofbiomarker splice variations with exon expres-sion data. The proposed method involves thefollowing steps: (1) Whole transcript analysiswith the nonparametric analysis of variance(ANOVA) to identify potential biomarkers thatpresent specific splice variations. (2) Meta-analysis for discriminating non-specific splicevariations that are caused by clinical heterogene-ity in the collected samples. In the analysis ofhuman cells, controlling non-specific splicingfactors is essential for success in the detection of

115

Page 13: Human Genome Center Laboratory of Genome Database … · 2020. 6. 2. · the KEGG system programatically, we have been developing and maintaining the KEGG API as a SOAP/WSDL based

biomarker splice variations because splice pat-terns are possibly affected by inter-individualdifferences in the collected samples. We demon-strate its utility and perform a whole transcriptanalysis of exon expression profiles of colorectalcarcinoma.

c. Performance improvement in protein N-myristoyl classification by BONSAI with in-significant indexing symbol

Manabu Sugii1, Ryo Okada1, Hiroshi Matsuno1,Satoru Miyano

Many N-myristoylated proteins play key rolesin regulating cellular structure and function. Inthe previous study, we have applied the ma-

chine learning system BONSAI to predict pat-terns based on which positive and negative ex-amples could be classified. Although BONSAIhas helped establish 2 interesting rules regard-ing the requirements for N-myristoylation, theaccuracy rates of these rules are not satisfactory.This paper suggests an enhancement of BONSAIby introducing an “insignificant indexing sym-bol”and demonstrates the efficiency of this en-hancement by showing an improvement in theaccuracy rates. We further examine the perform-ance of this enhanced BONSAI by comparingthe results of classification obtained the pro-posed method and an existing public methodfor the same sets of positive and negative exam-ples.

Publications

1. Affara, M., Dunmore, B., Savoie, C.J., Imoto,S., Tamada, Y., Araki, H., Charnock-Jones,D.S., Miyano, S., Print, C. Understanding en-dothelial cell apoptosis: What can the tran-scriptome glycome and proteome reveal?Philosophical Transactions of Royal Society.362 (1484): 1469-1487, 2007.

2. Akutsu, T., Bannai, H., Miyano, S., Ott, S.On the complexity of deriving position spe-cific score matrices from positive and nega-tive sequences. Discrete Applied Mathemat-ics. 155: 676-685, 2007.

3. Fujita, A., Sato, J.R., Garay-Malpartida, H.M., Sogayar, M.C., Ferreira, C.E., Miyano, S.Modeling nonlinear gene regulatory net-works from time series gene expressiondata. J. Bioinformatics and ComputationalBiology. In press.

4. Fujita, A., Sato, J.R., Garay-Malpartida, H.M., Yamaguchi, R., Miyano, S., Sogayar, M.C., Ferreira, C.E. Modeling gene expressionregulatory networks with the sparse vectorautoregressive model. BMC Systems Biology2007, 1: 39 (doi:10.1186/1752-0509-1-39)

5. Gupta, P.K., Yoshida, R., Imoto, S., Yama-guchi, R., Miyano, S. Statistical absoluteevaluation of gene ontology terms with geneexpression data. Lecture Notes in Bioinfor-matics. 4463: 146-157, 2007.

6. Hirose, O., Yoshida, R., Imoto, S., Yama-guchi, R., Higuchi, T., Charnock-Jones, D.S.,Print, C., Miyano, S. Statistical inference oftranscriptional module-based gene networksfrom time course gene expression profiles byusing state space models. Bioinformatics. Inpress.

7. Hirose, O., Yoshida, R., Yamaguchi, R.,Imoto, S., Higuchi, T., Miyano, S. Clustering

with time course gene expression profilesand the mixture of state space models.Genome Informatics. 18: 258-266, 2007.

8. Imoto, S. Knowledge discovery of causal re-lations among genes from microarray geneexpression data, Journal of Japan StatisticalSociety. 37 (1): 55-70, 2007.

9. Imoto, S., Miyano, S. Bayesian network ap-proach to estimate gene networks. A. Mittal,A. Kassim and T. Tan (Eds.), Bayesian Net-work Technologies : Applications andGraphical Models, Idea Group Publishers,USA. 269-299, 2007.

10. Imoto, S., Tamada, Y., Savoie, C.J., Miyano,S., Analysis of gene networks for drug tar-get discovery and validation. Methods inMolecular Biology. 360: 33-56, 2007.

11. Jeong, E., Nagasaki, M., Miyano, S. Conver-sion from BioPAX to CSO for system dy-namics and visualization of biological path-way. Genome Informatics. 18: 225-236, 2007.

12. Jeong, E., Nagasai, M., Saito, A., Miyano, S.Cell System Ontology: Representation formodeling, visualizing, and simulating bio-logical pathways. In Silico Biology 7, 0055,2007.

13. Kojima, K., Nagasaki, M., Jeong, E., Kato,M., Miyano, S. An efficient grid layout algo-rithm for biological networks utilizing vari-ous biological attributes. BMC Bioinformat-ics. 8: 76, 2007.

14. Li, C., Ge, Q.-W., Nakata, M., Matsuno, H.,Miyano, S. Modeling and simulation of sig-nal transductions in an apoptosis pathwayby using timed Petri nets. J. Biosciences. 32(1): 113-125, 2007.

15. Miyano, S., DeLisi, C., Holzhütter, H.-G.,Kanehisa, M. (Eds.). Genome Informatics. 18,

116

Page 14: Human Genome Center Laboratory of Genome Database … · 2020. 6. 2. · the KEGG system programatically, we have been developing and maintaining the KEGG API as a SOAP/WSDL based

2007.16. Numata, K., Imoto, S., Miyano, S.A structure

learning algorithm for inference of gene net-works from microarray gene expression datausing Bayesian networks. Proc. IEEE 7th In-ternational Symposium on Bioinformatics &Bioengineering. 1280-1284, 2007. (BIBE2007:Refereed conference; Digital Object Identifier10.1109/BIBE.2007.4375731)

17. Saito, A., Nagasaki, M., Oyama, M., Kozuka-Hata, H., Semba, K., Sugano, S., Yamamoto,T., Miyano, S. AYUMS: an algorithm forcompletely automatic quantitation based onLC-MS/MS proteome data and its applica-tion to the analysis of signal transduction.BMC Bioinformatics. 8: 15, 2007.

18. Shimamura, T., Yamaguchi, R., Imoto, S.,Miyano, S. Weighted lasso in graphicalGaussian modeling for large gene networkestimation based on microarray data.Genome Informatics. 19: 142-153, 2007.

19. Sugii, M., Okada, R., Matsuno, H., Miyano,S. Performance improvement in protein N-myristoyl classification by BONSAI with in-significant indexing symbol. Genome Infor-matics. 18: 277-286, 2007.

20. Termier, A., Tamada, Y., Numata, K., Imoto,

S., Washio, T., Higuchi, T., DIGDAG, a firstalgorithm to mine closed frequent embed-ded sub-DAGs. Proc. 5th InternationalWorkshop on Mining and Learning withGraphs, in press, 2007.

21. Yamaguchi, R., Yamamoto, M., Imoto, S.,Nagasaki, M., Yoshida, R., Tsujii, K., Ishiga,A., Asou, H., Watanabe, K., Miyano, S. Iden-tification of activated transcription factorsfrom microarray gene expression data ofKampo-medicine treated mice. Genome In-formatics. 18: 119-129, 2007.

22. Yamaguchi, R., Yoshida, R., Imoto, S.,Higuchi, T., Miyano, S. Finding module-based gene networks with state-space mod-els? Mining high-dimensional and shorttime-course gene expression data. IEEE Sig-nal Processing Magazine, 24 (1): 37-46, 2007.

23. Yoshida, R., Numata, K., Imoto, S., Na-gasaki, M., Doi, A., Ueno, K., Miyano, S.Computational discovery of aberrant splicevariations with genome-wide exon expres-sion profiles. Proc. IEEE 7th InternationalSymposium on Bioinformatics & Bioengi-neering. 715-722, 2007. (IEEE BIBE2007:Refereed conference; Digital Object Identifier10.1109/BIBE.2007.4375639)

117

Page 15: Human Genome Center Laboratory of Genome Database … · 2020. 6. 2. · the KEGG system programatically, we have been developing and maintaining the KEGG API as a SOAP/WSDL based

The major goal of our group is to identify genes of medical importance, and to de-velop new diagnostic and therapeutic tools. We have been attempting to isolategenes involving in carcinogenesis and also those causing or predisposing to vari-ous diseases as well as those related to drug efficacies and adverse reactions. Bymeans of technologies developed through the genome project including a high-resolution SNP map, a large-scale DNA sequencing, and the cDNA microarraymethod, we have isolated a number of biologically and/or medically importantgenes, and are developing novel diagnostic and therapeutic tools.

1. Genes playing significant roles in humancancer

Toyomasa Katagiri, Yataro Daigo, HidewakiNakagawa, Hitoshi Zembutsu, Koichi Matsuda,Ryo Takata, Mitsugu Kanehira, Koji Taka-hashi, Atsushi Takano, Nobuhisa Ishikawa,Tatsuya Kato, Satoshi Hayama, Chie Suzuki,Akira Togashi, Kazuhito Morioka, TomohideKidokoro, Chizu Tanikawa, Asahi Hishida,Miki Akiyama, Sachiko Dobashi, Meng-LayLin, Jae-Hyun Park, Tomomi Ueki, YosukeHarada, Chikako Fukukawa, Toshihiko Nishi-

date, Koji Ueda, Nguyen Minh-Hue, YuriaMano, Masaya Taniwaki, Ryohei Nishino,Daizaburo Hirata, Takumi Yamabuki, NagatoSato, Junkichi Koinuma, Banu Turel, KenjiTamura, Masayo Hosokawa, Su-Youn Chung,Motohide Uemura, Akio Takehara, ArataShimo, Fabio Pittella Silva and Yusuke Naka-mura

(1) Lung cancer

MAPJD (Myc-associated protein with JmjCdomain)

Human Genome Center

Laboratory of Molecular MedicineLaboratory of Genome TechnologyDivision of Advanced Clinical Proteomicsゲノムシークエンス解析分野シークエンス技術開発分野先端臨床プロテオミクス共同研究ユニット

Professor Yusuke Nakamura, M.D., Ph.D.Associate Professor Toyomasa Katagiri, Ph.D.

Hidewaki Nakagawa, M.D., Ph.D.Assistant Professor Ryuji Hamamoto, Ph.D.

Koichi Matsuda, M.D., Ph.D.Hitoshi Zembutsu, M.D., Ph.D.

Project Associate Professor Yataro Daigo, M.D., Ph.D.

教 授 医学博士 中 村 祐 輔准教授 医学博士 片 桐 豊 雅准教授 医学博士 中 川 英 刀助 教 理学博士 浜 本 隆 二助 教 医学博士 松 田 浩 一助 教 医学博士 前 佛 均特任准教授 医学博士 醍 醐 弥太郎

118

Page 16: Human Genome Center Laboratory of Genome Database … · 2020. 6. 2. · the KEGG system programatically, we have been developing and maintaining the KEGG API as a SOAP/WSDL based

Through genome-wide expression profileanalysis for non-small cell lung carcinomas(NSCLCs), we found over-expression of aMAPJD (Myc-associated protein with JmjC do-main) gene in the great majority of NSCLCcases. Induction of exogenous expression ofMAPJD into NIH3T3 cells conferred growth-promoting activity. Concordantly, in vitro sup-pression of MAPJD expression with siRNA ef-fectively suppressed growth of NSCLC cells inwhich MAPJD was over-expressed. We foundfour candidate MAPJD-target genes, SBNO1,TGFBRAP1, RIOK1 , and RASGEF1A, whichwere the most significantly induced by exoge-nous MAPJD expression. Through interactionwith MYC protein, MAPJD transactivates a setof genes including kinases and cell signal trans-ducers that are possibly related to proliferationof lung cancer cells. As our data imply thatMAPJD is a novel member of the MYC tran-scriptional complex and its activation is a com-mon feature of lung-cancer, selective suppres-sion of this pathway could be a promisingtherapeutic target for treatment of lung cancers.

CDCA8 (cell-division-associated 8)

We found in the great majority of lung-cancersamples co-transactivation of cell-division-associated 8 (CDCA8) and aurora kinase B(AURKB ) that were considered to be compo-nents of the vertebrate chromosomal passengercomplex. Immunohistochemical analysis of lung-cancer tissue microarrays demonstrated thatover-expression of CDCA8 and AURKB was sig-nificantly associated with poor prognosis oflung-cancer patients. AURKB directly phospho-rylated CDCA8 at Ser-154, Ser-219, Ser-275, andThr-278, and appeared to stabilize CDCA8 pro-tein in cancer cells. Suppression of CDCA8 ex-pression with siRNA against CDCA8 signifi-cantly suppressed the growth of lung cancercells. In addition, functional inhibition of inter-action between CDCA8 and AURKB by a cell-permeable peptide corresponding to 20 amino-acid sequence of a part of CDCA8 (11R-CDCA8261-280), which included two phosphorylationsites by AURKB, significantly reduced phospho-rylation of CDCA8 and resulted in growth sup-pression of lung cancer cells. Our data implythat selective suppression of the CDCA8-AURKB pathway could be a promising thera-peutic strategy for treatment of lung cancer pa-tients.

HJURP (Holliday Junction-Recognizing Pro-tein)

We additionally identified a novel gene en-

coding HJURP (Holliday Junction-RecognizingProtein) whose activation appeared to play apivotal role in immortality of cancer cells.HJURP was shown to be a possible downstreamtarget of ATM signaling and increased by DNAdouble-strand breaks (DSBs). HJURP is involvedin the homologous recombination (HR) pathwayin the DSBs repair process, interacting withhMSH5, or a member of a MRN protein com-plex NBS1. HJURP formed nuclear foci in cellsat S-phase or in those after having DNA dam-age. In vitro assays implied that HJURP directlybound to Holliday Junction and rDNA array.Treatment of cancer cells with siRNA againstHJURP caused abnormal chromosomal fusions,and led to genomic instability and senescence.In addition, HJURP over-expression was ob-served in the majority of lung cancers and asso-ciated with poorer prognosis. We suggest thatHJURP is an indispensable factor for chromoso-mal stability in immortalized cancer cells andcould be a novel therapeutic target for develop-ment of anti-cancer drugs.

FGFR1OP (fibroblast growth factor receptor 1oncogene partner)

We found an elevated expression of FGFR1OP(fibroblast growth factor receptor 1 oncogenepartner) in the great majority of lung cancers.Immunohistochemical staining using tumor tis-sue microarrays consisting of 372 archived non-small cell lung cancer (NSCLC) specimens re-vealed positive staining of FGFR1OP in 334(89.8%) of 372 NSCLCs. We also found that thehigh level of FGFR1OP expression was signifi-cantly associated with shorter tumor-specificsurvival times (P<0.0001 by log-rank test).Moreover multivariate analysis determined thatFGFR1OP was an independent prognostic factorfor surgically-treated NSCLC patients (P<0.0001). Treatment of lung cancer cells, in whichendogenous FGFR1OP was over-expressed, us-ing FGFR1OP siRNA, suppressed its expressionand resulted in inhibition of the cell growth.Furthermore, induction of FGFR1OP increasedthe cellular motility, invasion, and growth-promoting activity of mammalian cells. To in-vestigate its function, we searched for FGFR1OP-interacting proteins in lung cancer cells andidentified the ABL1 (Abelson murine leukemiaviral oncogene homolog 1) and WRNIP1(Werner helicase interacting protein 1), whichwas known to be involved in cell cycle progres-sion. ABL1 appeared to suppress DNA synthesisby directly phosphorylating tyrosine resides ofWRNIP1, while FGFR1OP significantly reducedABL1-dependent phosphorylation of WRNIP1and resulted in the promotion of DNA synthesis

119

Page 17: Human Genome Center Laboratory of Genome Database … · 2020. 6. 2. · the KEGG system programatically, we have been developing and maintaining the KEGG API as a SOAP/WSDL based

of cancer cells. Since our data imply that FGFR1OP is likely to play a significant role in lungcancer growth and progression, FGFR1OPshould be useful as a prognostic biomarker andprobably as a therapeutic target for lung cancer.

(2) Breast Cancer

MELK (maternal embryonic leucine-zipperkinase)

Cancer therapy directing at specific moleculartargets in signaling pathways of cancer cellssuch as Tamoxifen, aromatase inhibitors andtrastuzumab has been proven its usefulness fortreatment of advanced breast cancers. However,increases of the risk of endometrial cancer bylong-term tamoxifen administration as well asthose of bone fracture due to osteoporosis inpostmenopausal women with aromatase inhibi-tor prescription are recognized as their side ef-fects. Due to the emergence of these side effectsand also drug resistance, it is necessary tosearch novel targets for molecularly-orientateddrugs on the basis of well-characterized mecha-nisms of action. Using the accurate genome-wide expression profiles of breast cancers, wefound maternal embryonic leucine-zipper kinase(MELK ) that was significantly overexpressed inthe great majority of breast cancer cells. To as-sess a possible role of MELK in mammary car-cinogenesis, we knocked down the expression ofendogenous MELK in breast cancer cell-lines bymeans of the mammalian vector-based RNA in-terference (RNAi) technique. Furthermore, weidentified a long isoform of Bcl-G (Bcl-GL), apro-apoptotic member of Bcl-2 family, as a pos-sible substrate(s) for the MELK kinase by pull-down assay with wild-type- and kinase-dead-MELK recombinant proteins. Finally, we per-formed TUNEL assay and FACS analysis tomeasure the proportions of sub-G1 populationto investigate MELK is involved in apoptosiscascade through the Bcl-GL-related pathway. Themultiple human tissues- and cancer cell lines-northern blot analyses demonstrated that MELKwas overexpressed at a significantly high levelin a great majority of breast cancers and cancercell lines, but not expressed in normal vital or-gans (heart, liver, lung, hidney). Suppression ofMELK expression with small-interfering RNAsignificantly inhibited growth of human breastcancer cells. We also found that MELK proteinphysically interacted with Bcl-GL protein, a pro-apoptotic member of the Bcl-2 family throughits N-terminal region. Subsequent immunocom-plex kinase assay showed Bcl-GL was specificallyphosphorylated by MELK in vitro. TUNEL assayand FACS analysis revealed that overexpression

of WT-MELK suppressed Bcl-GL-induced apop-tosis, while that of D150A-MELK did not. Ourfindings suggest that kinase activity of MELK islikely to be involved in mammary carcinogene-sis through inhibition of pro-apoptotic functionof Bcl-GL. The kinase activity of MELK shouldbe a promising target for development ofmolecular-targeting therapy for patients withbreast cancers.

KIF2C/MCAK (Kinesin family member 2C/Mi-totic centromere-associated kinesin)

We also demonstrated functional significanceof KIF2C/MCAK (Kinesin family member 2C/Mitotic centromere-associated kinesin) in growthof breast cancer cells. Northern-blot and immu-nohistochemical analyses confirmed KIF2C/MCAK overexpression in breast cancer cells,and indicated its undetectable level of expres-sion in normal human tissues except testis, sug-gesting KIF2C/MCAK to be a cancer-testis anti-gen. Western-blot analysis using breast cancercell-lines revealed a significant increase of endo-genous KIF2C/MCAK protein level and itsphosphorylation in G2/M phase. Treatment ofbreast cancer cells with small-interfering RNAs(siRNAs) against KIF2C/MCAK effectively sup-pressed KIF2C/MCAK expression and inhibitedthe growth of breast cancer cell-lines, T47D andHBC5 cells. We also demonstrated that KIF2C/MCAK expression was significantly suppressedby ectopic introduction of p53. These findingssuggest that overexpression of KIF2C/MCAKmight be involved in breast carcinogenesis andis a promising therapeutic target for breast can-cers.

(3) Bladder cancer

MPHOSPH1 (M-phase phosphoprotein 1)

To disclose molecular mechanism of bladdercancer, the second most common genitourinarytumor, we had previously performed genome-wide expression profile analysis of 26 bladdercancers by means of cDNA microarray repre-senting 27,648 genes. Among genes that weresignificantly up-regulated in the majority ofbladder cancers, we here report identification ofMPHOSPH1 (M-phase phosphoprotein 1) as a can-didate molecule for drug development for blad-der cancer. Northern blot analyses usingmRNAs of normal human organs and cancercell-lines indicated this molecule to be a novelcancer / testis antigen . Introduction ofMPHOSPH1 into NIH3T3 cells significantly en-hanced cell growth at in vitro and in vivo condi-tions. We subsequently found an interaction be-

120

Page 18: Human Genome Center Laboratory of Genome Database … · 2020. 6. 2. · the KEGG system programatically, we have been developing and maintaining the KEGG API as a SOAP/WSDL based

tween MPHOSPH1 and PRC1 (Protein Regula-tor of Cytokinesis 1), which was also up-regulated in bladder cancer cells. Immunocyto-chemical analysis revealed co-localization of en-dogenous MPHOSPH1 and PRC1 proteins inbladder cancer cells. Interestingly, knockdownof either of MPHOSPH1 or PRC1 expressionwith specific siRNAs caused significant increaseof multi-nuclear cells and subsequent cell deathof bladder cancer cells. Our results imply thatthe MPHOSPH1/PRC1 complex is likely to playa crucial role in bladder carcinogenesis and thatinhibition of the MPHOSPH1/PRC1 expressionor their interaction should be novel therapeutictargets for bladder cancers.

DEPDC1 (DEP domain containing 1)

We also found a novel gene, DEPDC1 (DEPdomain containing 1), whose over-expression wasconfirmed in bladder cancers by northern-blotand immunohistochemical analyses. Immunocy-tochemical staining analysis detected strongstaining of endogenous DEPDC1 protein in thenucleus of bladder cancer cells. Since DEPDC1expression was hardly detectable in any of 24normal human tissues we examined except thetestis, we considered this gene-product to be anovel cancer/testis antigen. Suppression ofDEPDC1 expression with small-interfering RNA(siRNA) significantly inhibited growth of blad-der cancer cells. Taken together, these findingssuggest that DEPDC1 might play an essentialrole in the growth of bladder cancer cells, andwould be a promising molecular-target for noveltherapeutic drugs or cancer peptide-vaccine tobladder cancers.

(4) Pancreatic cancer

KIAA0101

Pancreatic ductal adenocarcinoma (PDAC)shows the worst mortality among the commonmalignancies, with a 5-year survival rate of only4%, and the majority of PDAC patients are di-agnosed at an advanced stage, in which no ef-fective therapy is available at present. Althoughthe proportion of curable cases is still not sohigh, surgical resection of early-staged PDACs isthe only way to cure the disease. Hence, estab-lishment of a screening strategy to detect early-staged PDACs by novel serological markers isurgently required, and development of novelmolecular therapies for PDAC treatment is alsoeagerly expected. To isolate novel diagnosticmarkers and therapeutic targets for pancreaticcancer, we earlier performed expression profileanalysis of pancreatic cancer cells using a

genome-wide cDNA microarray combined withlaser microdissection. Among dozens of trans-activated genes in pancreatic cancer cells, thisstudy focused on KIAA0101 whose overexpres-sion in pancreatic cancer cells was validated byimmunohitochemical analysis. KIAA0101 waspreviously identified as p15PAF (PCNA-associatedfactor) to bind with PCNA (proliferating cell nu-clear antigen), however its function remains un-known. To investigate for the biological signifi-cance of KIAA0101 overexpression in cancercells, we knocked down KIAA0101 by siRNA inpancreatic cancer cells, and found that the re-duced expression by siRNA caused drastic at-tenuation of their proliferation as well as signifi-cant decrease in DNA replication rate. Concor-dantly, exogenous over-expression of KIAA0101enhanced cancer cell growth and NIH3T3-derivative cells expressing KIAA0101 revealedin vivo tumor formation, implying its growthpromoting and oncogenic property. We alsodemonstrated that the expression of KIAA0101was regulated tightly by p53-p21 pathway. Toconsider the KIAA0101/PCNA interaction as atherapeutic target, we designed the cell-permeable 20-amino-acid dominant-negativepeptide and found that it could effectively in-hibit the KIAA0101/PCNA interaction and re-sulted in the significant growth suppression ofcancer cells. Our results clearly implicated thatsuppression of the KIAA0101 and PCNA onco-genic activity, or the inhibition of KIAA0101/PCNA interaction is likely to be a promisingstrategy to develop novel cancer therapeuticdrugs.

GABRP (GABA receptor π subunit)

GABA (γ-amino-butyric acid) functions pri-marily as an inhibitory neurotransmitter in themature central nervous system, and GABA/GABA receptors are also present in non-neuraltissues, including cancer, but their precise func-tion in non-neuronal or cancerous cells is poorlydefined so far. We identified over-expression ofGABA receptor π subunit (GABRP) in pancreaticductal adenocarcinoma (PDAC) cells. We alsofound the expression of this peripheral-typeGABAA receptor subunit, GABRP, in some adulthuman organs. Knockdown of endogenousGABRP expression in PDAC cells by siRNA at-tenuated PDAC cell growth, suggesting its es-sential role in PDAC cell viability. Notably, ad-dition of GABA into the cell-culture mediumpromoted the proliferation of GABRP-expressingPDAC cells, but not GABRP-negative cells, andGABAA receptor antagonists inhibited thisgrowth-promoting effect by GABA. The HEK293cells constitutively expressing exogenous

121

Page 19: Human Genome Center Laboratory of Genome Database … · 2020. 6. 2. · the KEGG system programatically, we have been developing and maintaining the KEGG API as a SOAP/WSDL based

GABRP revealed the growth-promoting effect byGABA treatment. Furthermore, GABA treatmentin GABRP-positive cells increased intracellularCa2+ level and activated MAPK (mitogen-activated protein kinase) cascade. Clinical PDACtissues contained a higher level of GABA ligandthan normal pancreas tissues due to the up-regulation of GAD1 (glutamate decarboxylase 1)expression, suggesting their autocrine/paracrinegrowth-promoting effect in PDACs. These find-ings imply that GABA ligand and GABRP couldplay important roles in PDAC development andprogression, and this pathway can be a promis-ing molecular target for development of newtherapeutic strategies for PDAC.

(4) Prostate cancer

One of the most critical issues in prostate can-cer clinic is emerging hormone-refractory pros-tate cancers (HRPCs) and their management.Prostate cancer is usually androgen-dependentand responds well to androgen ablation therapy.However, at a certain stage they eventually ac-quire androgen-independent phenotype andshow very poor response to any anti-cancertherapies that are presently available. To charac-terize the molecular features of clinical HRPCs,we analyzed gene-expression profiles of 25 clini-cal HRPCs and 10 hormone-sensitive prostatecancers (HSPCs) by genome-wide cDNA mi-croarrays combining with laser microbeam mi-crodissection. An unsupervised clustering analy-sis clearly distinguished expression patterns ofHRPC cells from those of HSPC cells. In addi-tion, primary and metastatic HRPCs from indi-vidual patients were closely clustered regardlessto metastatic organs. A supervised clusteringanalysis identified 36 up-regulated genes and 70down-regulated genes in HRPCs, comparedwith HSPCs (P<0.0001, gap>1.5). We observedover-expression of AR, ANLN , and SNRPE , anddown-regulation of NR4A1 , CYP27A1 , andHLA-A antigen in HRPC progression. Suchgenes were considered to be related to theandrogen-independent and more aggressivephenotype of HRPCs, and in fact, knockdown ofsome over-expressing genes by siRNA resultedin drastic attenuation of prostate cancer cellgrowth. Our precise microarray analysis ofHRPC cells should provide useful informationto understand the molecular mechanism ofHRPC progression as well as to identify molecu-lar targets for development of treatment ofHRPCs.

SRD5A3

We then identified one novel gene, SRD5A2L ,

encoding a putative 5α-steroid reductase thatproduces the most potent androgen DHT (5α-dihydrotestosterone) from testosterone. LC-MS/MS analysis following in vitro 5α-steroid reduc-tase reaction validated its ability to produceDHT from testosterone as similar to type-1 5α-steroid reductase. Since two types of 5α-steroidreductases were previously reported, we termedthis novel 5α-steroid reductase to be SRD5A3(type-3 5α-steroid reductase). RT-PCR andnorthern blot analyses confirmed its over-expression in HRPC cells, and indicated no orlittle expression in normal adult organs. Knock-down of SRD5A3 expression by siRNA in pros-tate cancer cells resulted in significant decreaseof DHT production and drastic reduction oftheir cell viability. These findings implicate thata novel 5α-steroid reductase, SRD5A3, is associ-ated with DHT production and maintenance ofthe AR pathway activation in HRPC cells, andthat this enzymatic activity should be a promis-ing molecular target for prostate cancer therapy.

(5) Chemosensitivity

Bladder Cancer

To predict the efficacy of the M-VAC neoadju-vant chemotherapy for invasive bladder cancers,we previously established the method to calcu-late the prediction score on the basis of expres-sion profiles of 14 predictive genes. This scoringsystem had clearly distinguished the respondergroup from the non-responder group. To furthervalidate the clinical significance of the system,we applied it to 22 additional cases of bladdercancer patients and found that the scoring sys-tem correctly predicted clinical response for 19of the 22 test cases. The group of patients withpositive predictive scores had significantlylonger survival than that with negative scores.When we compared our results with the previ-ous report describing the prognosis of the pa-tients with cystectomy alone, the results implythat patients with positive scores are likely tohave benefit by having M-VAC neoadjuvantchemotherapy, but that the chemotherapywould shorten lives of patients with negativescores. We are confident that our prediction sys-tem to M-VAC therapy should provide opportu-nities for achieving better prognosis and im-proving quality of life of patients.

Philadelphia chromosome-positive acute lym-phoblastic leukemia (Ph+ALL)

Philadelphia chromosome-positive acute lym-phoblastic leukemia (Ph+ALL) reveals verypoor prognosis due to high incidence of relapse

122

Page 20: Human Genome Center Laboratory of Genome Database … · 2020. 6. 2. · the KEGG system programatically, we have been developing and maintaining the KEGG API as a SOAP/WSDL based

when treated with standard chemotherapy. Al-though more than 96 % of patients with Ph+ALL achieved complete remission (CR) withimatinib-combined chemotherapy in a phase IIclinical trial conducted by the Japan Adult Leu-kemia Study Group (JALSG), 26 % of them ex-perienced hematological relapse in a short timeafter achievement of CR. In this study, to estab-lish a prediction system for risk of relapse, weanalyzed gene expression profiles of 23 bonemarrow samples from patients with Ph+ALLusing cDNA microarray consisting of 27,648cDNA sequences. Using the 19 randomly-selected test cases, we identified 16 genes thatwere expressed significantly differently betweenpatients with (n=8) and without (n=11) con-tinuous response; from the list of 16 genes, weselected the 6 “predictive” genes and con-structed a numerical scoring system by whichthe two groups were clearly separated, withpositive scores for the former and the negativescores for the latter. Scoring of 4 cases that werereserved from the original 23 cases predictedcorrectly their clinical responses. In addition,three cases whose BCR-Abl transcript levelsfailed to reduce sufficiently after induction ther-apy, also revealed negative scores. We also de-veloped a quantitative reverse transcription-PCR-based prediction system that could be feasiblefor routine clinical use. Our results suggest thatachievement of continuous response withimatinib-combined chemotherapy can be pre-dicted by expression patterns in this set ofgenes, leading to achievement of “personalizedtherapy” for treatment of this disease.

(6) Biomarker

DKK1 (Dickkopf-1)

Gene-expression profile analysis of lung andesophageal carcinomas revealed that Dickkopf-1(DKK1) was highly transactivated in the greatmajority of lung cancers and esophagealsquamous-cell carcinomas (ESCCs). Immunohis-tochemical staining using tumor tissue microar-rays consisting of 279 archived non-small celllung cancers (NSCLCs) and 280 ESCC speci-mens demonstrated that a high level of DKK1expression was associated with poor prognosisof patients with NSCLC as well as ESCC, andmultivariate analysis confirmed its independentprognostic value for NSCLC. In addition, weidentified that exogenous expression of DKK1increased the migratory activity of mammaliancells, suggesting that DKK1 may play a signifi-cant role in progression of human cancer. Weestablished an ELISA system to measure serumlevels of DKK1 and found that serum DKK1 lev-

els were significantly higher in lung andesophageal cancer patients than in healthy con-trols. The proportion of the DKK1-positive caseswas 126 (70.0%) of 180 NSCLC, 59 (69.4%) of 85SCLC, and 51 (63.0%) of 81 ESCC patients,while only 10 (4.8%) of 207 healthy volunteerswere falsely diagnosed as positive. A combinedELISA assays for both DKK1 and CEA increasedsensitivity, and classified 82.2% of the NSCLCpatients as positive while only 7.7% of healthyvolunteers were falsely diagnosed to be positive.The use of both DKK1 and ProGRP increasedsensitivity to detect SCLCs up to 89.4%, whilefalse positive rate in healthy donors were only6.3%. Our data imply that DKK1 should be use-ful as a novel diagnostic/prognostic biomarkerin clinic and probably as a therapeutic target forlung and esophageal cancer.

KIF4A

To identify molecules that might be useful asdiagnostic/prognostic biomarkers and as targetsfor the development of new molecular therapies,we screened genes that were highly transacti-vated in a large proportion of 101 lung cancersby means of a cDNA microarray representing27,648 genes. We found a gene encoding KIF4A,a kinesin family member 4A, as one of such can-didates. Tumor-tissue microarray was applied toexamine the expression of KIF4A protein and itsclinicopathological significance in archival non-small cell lung cancer (NSCLC) samples from357 patients. A role of KIF4A in cancer cellgrowth and/or survival was examined by smallinterfering RNA (siRNA) experiments. Cellularinvasive activity of KIF4A on mammalian cellswas examined using Matrigel assays. Immuno-histochemical staining detected positive KIF4Astaining in 127 (36%) of 357 NSCLCs and 19 (66%) of 29 SCLCs examined. Positive im-munostaining of KIF4A protein was associatedwith male gender ( P = 0.0287 ) , non-adenocarcinoma histology (P=0.0097), andshorter survival for patients with NSCLC (P=0.0005), and multivariate analysis confirmed itsindependent prognostic value (P=0.0012).Treatment of lung cancer cells with siRNAs forKIF4A suppressed growth of the cancer cells.Furthermore, we found that induction of exoge-nous expression of KIF4A conferred cellular in-vasive activity on mammalian cells. These datastrongly implied that targeting the KIF4A mole-cule might hold a promise for the developmentof anti-cancer drugs and cancer vaccines as wellas a prognostic biomarker in clinic.

123

Page 21: Human Genome Center Laboratory of Genome Database … · 2020. 6. 2. · the KEGG system programatically, we have been developing and maintaining the KEGG API as a SOAP/WSDL based

LY6K (lymphocyte antigen 6 complex, locus K)

We revealed that a member of low molecularweight GPI-anchored molecule-like protein, lym-phocyte antigen 6 complex, locus K (LY6K) wastransactivated in the great majority of NSCLCsand ESCCs. Northern-blot analysis detected ex-pression of LY6K gene only in testis among nor-mal tissues examined. Immunohistochemicalstaining using tumor tissue microarrays consist-ing of 413 archived NSCLC and 271 ESCC speci-mens confirmed that a high level of LY6K ex-pression was associated with poor prognosis ofpatients with NSCLC as well as ESCC, and mul-tivariate analysis confirmed its independentprognostic value for NSCLC. In addition, treat-ment of NSCLC cells with small interferingRNAs against LY6K knocked-down its expres-sion and resulted in growth suppression of thecancer cells. Serum levels of LY6K were signifi-cantly higher in NSCLC and ESCC patients thanin healthy controls. The proportion of the serumLY6K-positive cases defined by our criteria was38 (33.9%) of 112 NSCLC and 26 (32.1%) of 81ESCC, while only 3 (4.1%) of 74 healthy volun-teers were falsely diagnosed as positive. A com-bined assay using both LY6K and CEA in-creased sensitivity, and judged 64.7% of thelung adenocarcinoma patients as positive while9.5% of healthy volunteers were falsely diag-nosed to be positive. The use of both LY6K andCYFRA 21-1 increased sensitivity to detect lungsquamous-cell carcinomas up to 70.4%, whilefalse positive rate in healthy donors were only6.8%. Our data imply LY6K to be a cancer-testisantigen and suggest that LY6K should be usefulas a diagnostic/prognostic biomarker and prob-ably as a therapeutic target for development ofnew molecular-targeted agents and/or immuno-therapy of lung and esophageal cancers.

Proteomics

The recent progress in various proteomic tech-nologies allows us to screen serum biomarkerincluding carbohydrate antigens. However, onlya limited number of proteins could be detectedby current conventional methods such asshotgun-proteomics, primarily because of theenormous concentration distribution of serumproteins and peptides. To circumvent this diffi-culty and isolate potential cancer-specificbiomarkers for diagnosis and treatment, we es-tablished a new screening system consisting ofthe sequential steps of (1) immuno-depletion of6 high abundant proteins, (2) targeted enrich-ment of glycoproteins by lectin column chroma-tography, and (3) the quantitative proteomeanalysis using 12C6- or 13C6-NBS ( 2-

nitrobenzensulfenyl) stable isotope labeling fol-lowed by MALDI-QIT-TOF mass spectrometricanalysis. Through this systematic analysis forfive serum samples derived from patients withlung adenocarcinoma, we identified as candi-date biomarkers 34 serum glycoproteins that re-vealed significant difference in α1,6-fucosylationlevel between lung cancer and healthy control,clearly demonstrating that the carbohydrate-focused proteomics could allow for the detectionof serum components with cancer-specific fea-tures. In addition, we developed more simpli-fied and practical technique, mass spectrometry-based glycan structure analysis and lectin blot-ting, in order to validate glycan structure of can-didate biomarkers that could be applicable inclinical use. Our new glycoproteomic strategywill provide highly-sensitive and quantitativeprofiling of specific glycan structures on multi-ple proteins, which should be useful for serumbiomarker discovery.

2. Common diseases

(1) Cerebral infarction

Michiaki Kubo1,2,4, Jun Hata1,2,4, Toshiharu Ni-nomiya1,2, Koichi Matsuda4, Koji Yonemoto1,Toshiaki Nakano2,3, Tomonaga Matsushita2,4,Keiko Yamanaka-Yamazaki4, Yozo Ohnishi5,Susumu Saito5, Takanori Kitazono2, SetsuroIbayashi2, Katsuo Sueishi3, Mitsuo Iida2,Yusuke Nakamura4, and Yutaka Kiyohara1.:1Department of Environmental Medicine, 2De-partment of Medicine and Clinical Science,3Pathophysiological and Experimental Pathol-ogy, Graduate School of Medical Sciences, Ky-ushu University, Fukuoka 812-8582, Japan.4Laboratory of Molecular Medicine, HumanGenome Center, Institute of Medical Science,University of Tokyo, Tokyo 108-8639, Japan.5Laboratory for Genotyping, SNP ResearchCenter, the Institute of Physical and ChemicalResearch (RIKEN), Yokohama 230-0045, Ja-pan.

PRKCH

Cerebral infarction is the most common typeof stroke and often causes long-term disability.To investigate the genetic contribution to cere-bral infarction, we conducted a case-controlstudy using 52,608 gene-based tag-SNPs selectedfrom JSNP database. Here we reported that onenon-synonymous SNP in a member of proteinkinase C (PKC) family, PRKCH , was signifi-cantly associated with lacunar infarction in twoindependent Japanese samples (p=3.2×10-7,crude odds ratio of 1.39). This SNP was likely to

124

Page 22: Human Genome Center Laboratory of Genome Database … · 2020. 6. 2. · the KEGG system programatically, we have been developing and maintaining the KEGG API as a SOAP/WSDL based

affect the PKC activity. Furthermore, a 14-yearfollow-up cohort study in Hisayama (Fukuoka,Japan) supported involvement of this SNP forthe development of cerebral infarction (p=0.03,age- and sex-adjusted hazard ratio of 2.83). Wealso found that PKCη was mainly expressed invascular endothelial cells and foamy macro-phages in human atherosclerotic lesions, and itsexpression was enhanced as the lesion type pro-gressed. Our results support a role for PRKCHin the pathogenesis of cerebral infarction.

AGTRL1 (angiotensin receptor like-1)

Comparison of allele frequencies between1,112 cases with brain infarction and age- andsex-matched control subjects of the same num-ber found a SNP in the 5’-flanking region of an-giotensin receptor like-1 (AGTRL1) gene (rs9943582, -154G/A) to have a significant associa-tion with brain infarction (Odds ratio=1.30, 95% confidence interval (CI)=1.14-1.47, P=0.000066). We also found the binding of Sp1transcription factor to the region including thesusceptible G allele, but not the non-susceptibleA allele. Luciferase assay and RT-PCR analysisdemonstrated that exogenously-introduced Sp1induced transcription of AGTRL1 and its ligand,apelin as well, indicating direct regulation ofapelin/APJ pathway by Sp1. Furthermore, a 14-year follow-up cohort study in a Japanese com-munity in Hisayama town, Japan revealed thatthe homozygote of the susceptible G allele ofthis particular SNP had significantly higher riskof brain infarction (Hazard ratio=2.00, 95 % CI=1.22-3.29, P=0.006). Our results indicate thatthe SNP in the AGTRL1 gene is associated withthe susceptibility to brain infarction.

(2) Kawasaki disease

Yoshihiro Onouchi1, Tomohiko Gunji1,2, JaneC. Burns3, Chisato Shimizu3, Jane W. New-burger4, Mayumi Yashiro5, Yoshikazu Naka-mura5, Hiroshi Yanagawa6, Keiko Wakui7,Yoshimitsu Fukushima7, Fumio Kishi8, Kuni-hiro Hamamoto9, Masaru Terai10, YoshitakeSato11, Kazunobu Ouchi12, Tsutomu Saji13, Aki-yoshi Nariai14, Yoichi Kaburagi15, TetsushiYoshikawa16, Kyoko Suzuki17, Takeo Tanaka18,Toshiro Nagai19, Hideo Cho20, Akihiro Fujino21,Akihiro Sekine22, Reiichiro Nakamichi23, Tat-suhiko Tsunoda23 , Tomisaku Kawasaki24 ,Yusuke Nakamura25 & Akira Hata1: 1Labora-tory for Gastrointestinal Diseases, SNP Re-search Center, RIKEN, Yokohama, Kanagawa,230-0045, Japan. 2Department of Hard TissueEngineering, Graduate School Tokyo Medicaland Dental University, 113-8549, Tokyo, Ja-

pan. 3Department of Pediatrics, University ofCalifornia San Diego, School of Medicine, LaJolla, CA, and Rady Children’s Hospital SanDiego, CA, USA. 3Department of Cardiology,Boston Children’s Hospital, Boston, MA, USA.4Department of Public Health, Jichi MedicalSchool, Minamikawachi, Tochigi, Japan. 4Sai-tama Prefectural University, Koshigaya, Sai-tama, Japan. 5Department of Preventive Medi-cine, Shinshu University School of Medicine,Matsumoto, Japan. 6Department of MolecularBiology, Kawasaki Medical School, Kurashiki,Okayama, Japan. 7Department of Pediatrics,Fukuoka University School of Medicine,Fukuoka, Fukuoka, Japan. 8Department of Pe-diatrics, Tokyo Women’s Medical UniversityYachiyo Medical Center, Yachiyo, Chiba, Ja-pan. 9Department of Pediatrics, Fuji Heavy In-dustries LTD. Health Insurance Society Gen-eral Ohta Hospital, Ohta, Gunma, Japan. 10De-partment of Pediatrics, Kawasaki MedicalSchool, Kurashiki, Okayama, Japan. 11Depart-ment of Pediatrics, Toho University School ofMedicine, Tokyo, Japan. 12Department of Pedi-atrics, Yokohama Minami Kyousai Hospital,Yokohama, Kanagawa, Japan. 15Department ofPediatrics, National Hospital OrganizationYokohama Medical Center, Yokohama, Kana-gawa, Japan. 16Department of Pediatrics, FujitaHealth University, Toyoake, Aichi, Japan. 17De-partment of Pediatrics, Toyokawa Citizen’sHospital, Toyokawa, Aichi, Japan. 18Depart-ment of Pediatrics, National Hospital Organi-zation Kure Medical Center, Kure, Okayama,Japan. 19Department of Pediatrics, DokkyoMedical University, Koshigaya Hospital, Koshi-gaya, Saitama, Japan. 20Department of Pediat-rics, Kawasaki Municipal Hospital, Kawasaki,Japan. 21Department of Surgery, Keio Univer-sity School of Medicine, Tokyo, Japan. 22Labo-ratory for Genotyping, SNP Research Center,RIKEN, Yokohama, Kanagawa, Japan. 23Labo-ratory for Medical Informatics, SNP ResearchCenter, RIKEN, Yokohama, Kanagawa, Japan.24Japan Kawasaki Disease Research Center,Tokyo, Japan. 25Laboratory for MolecularMedicine, Human Genome Center, Institute ofMedical Science, University of Tokyo, Tokyo,Japan

Kawasaki disease (KD; OMIM 300530) is anacute, self-limited vasculitis of infants and chil-dren. KD is characterized by prolonged feverunresponsive to antibiotics, polymorphous skinrash, erythema of the oral mucosa, lips andtongue, erythema of the palms and soles, bilat-eral conjunctival injection, and cervical lympha-denopathy. Coronary artery aneurysms developin 15-25% of untreated patients making KD the

125

Page 23: Human Genome Center Laboratory of Genome Database … · 2020. 6. 2. · the KEGG system programatically, we have been developing and maintaining the KEGG API as a SOAP/WSDL based

leading cause of acquired heart disease in chil-dren in developed countries. Treatment with in-travenous immunoglobulin (IVIG) abrogates theinflammation in approximately 80% of patientsand reduces the aneurysm rate to less than 5%.Cardiac sequelae of the aneurysms includeischemic heart disease, myocardial infarction,and sudden death. Epidemiological featuressuch as seasonality and clustering of cases sug-gest an infectious trigger, although no pathogenhas been isolated and the etiology remains un-known. Several lines of evidence suggest the im-portance of genetic factors in disease susceptibil-ity and outcome. First, the incidence of KD is 10to 20 times higher in Japan than in Westerncountries4. Second, the risk of KD in siblings ofaffected children is 10 times higher than thegeneral population (λs=10) and the incidence ofKD in children born to parents with a history ofKD is twice as high as the general population.Familial aggregation of the disease has also beenobserved. Although association studies haveidentified candidate genes that may influenceKD susceptibility, a systematic genetic approachhas not been previously performed. Recently,we conducted affected sib pair analysis of KDthat demonstrated linkage in several chromoso-mal regions including chromosome 19. Here weshow the results of linkage disequilibrium map-ping performed on 19q13.2 which identified afunctional SNP in intron 1 of ITPKC signifi-cantly associated with the risk of KD and theformation of coronary artery aneurysms. Wealso characterized ITPKC as a novel negativeregulator of Ca2+/NFAT signaling pathway in T-cells. We identified a functional SNP (itpkc_3) ofthe inositol 1,4,5-trisphosphate 3-kinase C(ITPKC ) gene on chromosome 19q13.2 which issignificantly associated with susceptibility to KDboth in Japanese and U.S. children, as well aswith an increased risk of coronary artery lesions.Transfection experiments showed that the C-allele of itpkc_3 reduces splicing efficiency ofthe ITPKC mRNA. ITPKC acts as a negativeregulator of T-cell activation through the Ca2+/NFAT signaling pathway and the C-allele maycontribute to immune hyper-reactivity in KD.This finding provides new insights into themechanisms of immune activation in KD andemphasizes the importance of activated T-cellsin the pathogenesis of this vasculitis.

(3) Systemic Lupus Erythematosus

Tetsuya Oishi1,2, Aritoshi Iida3, Shigeru Ot-subo1,2, Yoichiro Kamatani1, Masayuki Usami1,Takashi Takei2, Keiko Uchida2, Ken Tsuchiya2,Susumu Saito3, Yozo Ohnisi1, Katsushi Toku-naga4, Kosaku Nitta2,Yasushi Kawaguchi5,

Naoyuki Kamatani5, Yuta Kochi6, Kenichi Shi-mane6, Kazuhiko Yamamoto6, Yusuke Naka-mura1, Wako Yumura2, and Koichi Matsuda1*:1Laboratory of Molecular Medicine, HumanGenome Center, Institute of Medical Science,4Department of Human Genetics, GraduateSchool of Medicine, University of Tokyo, To-kyo 108-8639, Japan. 2Department of Medicine,Kidney Center, 5Institute of Rheumatology, To-kyo Women’s Medical University, Tokyo 162-8666, Japan. 3Laboratory for Genotyping,6Laboratory for Rheumatic Diseases, SNP Re-search Center, The Institute of Physical andChemical Research (RIKEN), Kanagawa 230-0045, Japan.

SLE (OMIM #152700) is one of the commonautoimmune diseases that predominantly afflictwomen of child-bearing age. The clinical fea-tures and serological abnormalities observed inpatients with SLE are remarkably diverse, andmake clinical assessment and treatment very dif-ficult. Anti-inflammatory drugs and immuno-suppressive drugs such as cortisone, azathio-prine, methotrexate, and cyclophosphamidehave been widely used for treatment of this dis-ease and contribute to significant improvementin prognosis of patients with SLE; although the4-year survival was estimated to be ~50% inthe 1950s (Merrell and Shulman 1955), the 15-year survival rate is at present estimated to beapproximately 80%. Despite the improved prog-nosis in the majority of SLE patients, a signifi-cant proportion of the patients still suffer fromthe disease and/or severe adverse reactionscaused by these drugs. For example, syntheticglucocorticoid causes adverse events includinghypertension, hyperglycemia, diabetes, cataract,glaucoma, infection, psychosis, osteoporosis, andosteonecrosis. To improve quality of life for theSLE patients, it is one of the critical steps to elu-cidate the molecular mechanism causing SLE. Toidentify a gene(s) susceptible to SLE, we per-formed a case-control association study usinggenome-wide gene-based single nucleotide poly-morphisms (SNPs) in Japanese population. Herewe report that an SNP (rs3748079) located in apromoter region of the inositol 1,4,5-triphosphate receptor type 3 (ITPR3) gene onchromosome 6p21 was significantly associatedwith SLE in two independent Japanese case-control samples (p=0.0000000178 with odds ra-tio of 1.88, 95% Confidence Interval (CI) of 1.51-2.35). This particular SNP also revealed associa-tions with rheumatoid arthritis (RA) (p=0.0084with odds ratio of 1.23, 95% CI of 1.05-1.43) andwith Graves’ disease (GD) (p=0.00036 withodds ratio of 1.57, 95% CI of 1.23-2.02). Wefound the binding of NKX2.5 specific to the

126

Page 24: Human Genome Center Laboratory of Genome Database … · 2020. 6. 2. · the KEGG system programatically, we have been developing and maintaining the KEGG API as a SOAP/WSDL based

non-susceptible T allele in the region includingthis SNP. Furthermore, an SNP in NKX2.5 alsorevealed an association with SLE (p=0.0058with odds ratio of 1.74, 95% CI of 1.16-2.58). In-dividuals with risk genotype of both ITPR3 andNKX2.5 loci have higher risk for SLE (Odds ra-tio=5.77). Our data demonstrate that geneticand functional interactions of ITPR3 and NKX2.5 play a crucial role in the pathogenesis ofSLE.

We also identified that an SNP, rs3130342, ina 5’ flanking region of the TNXB gene revealeda significant association with SLE (p=0.000000930, odds ratio (OR) of 3.11 with 95%confidence interval (95%CI) of 1.89-5.28) inJapanese population. This association was repli-

cated by independent 203 cases and 294 controls(p=0.0440, OR of 1.52 with 95%CI of 1.01-2.78).Although a copy number variation (CNV) of theC4 gene adjacent to the TNXB gene was re-ported to be associated with SLE, our analysison this CNV revealed that the association ofCNV of the C4 gene was weaker than the SNPin the TNXB gene and likely to reflect the link-age disequilibrium between C4 CNV and thisparticular SNP. Stratified analysis also revealedthat the association of SNP rs3130342 with SLEwas independent of HLA-DRB1*1501 allele thathas been shown to be associated with SLE. Ourfindings strongly imply that the TNXB gene is acandidate gene susceptible to SLE in Japanesepopulation.

Publications

1. Kato, T., Hayama, S., Yamabuki, T.,Ishikawa, N., Miyamoto, M., Ito, T.,Tsuchiya, E., Kondo, S., Nakamura, Y. andDaigo, Y. Increased expression of insulin-like growth factor-II messenger RNA-binding protein 1 is associated with tumorprogression in patients with lung cancer.Clin Cancer Res. 13: 434-442, 2007.

2. Kanehira, M., Katagiri, T., Shimo, A., Takata,R., Shuin, T., Miki, T., Fujioka, T. and Naka-mura, Y. Oncogenic role of MPHOSPH1, acancer-testis antigen specific to human blad-der cancer. Cancer Res. 67: 3276-3285, 2007.

3. Kubo, M., Hata, J., Ninomiya, T., Matsuda,K., Yonemoto, K., Nakano, T., Matsushita,T., Yamazaki, K., Ohnishi, Y., Saito, S., Kita-zono, T., Ibayashi, S., Sueishi, K., Iida, M.,Nakamura, Y. and Kiyohara, Y.A nonsyn-onymous SNP in PRKCH increases the riskof cerebral infarction. Nat Genet. 39: 212-217, 2007.

4. Suzuki, C., Takahashi, K., Hayama, S.,Ishikawa, N., Kato, T., Ito, T., Tsuchiya, E.,Nakamura, Y. and Daigo, Y. Identification ofMyc-assocaited protein with JmjC domain asa novel therapeutic target oncogene for lungcancer. Mol Cancer Ther. 6: 542-551, 2007.

5. Yamabuki, T., Takano, A., Hayama, S.,Ishikawa, N., Kato, T., Miyamoto, M., Ito, T.,Ito, H., Miyagi, Y., Nakayama, H., Fujita, M.,Hosokawa, M., Tsuchiya, E., Kohno, N.,Kondo, S., Nakamura, Y. and Y. Daigo.Dikkopf-1 as a novel serologic and prognos-tic biomarker for lung and esophageal carci-nomas. Cancer Res. 67: 2517-2525, 2007.

6. Saigusa, K., Imoto, I., Tanikawa, C., Aoyagi,M., Ohno, K., Nakamura, Y. and Inazawa, J.RGC32, a novel p53-inducible gene, is lo-cated on centrosomes during mitosis and re-

sults in G2/M arrest. Oncogene. 26: 1110-1121, 2007.

7. Ebana, Y., Ozaki, K., Inoue, K., Sato, H.,Iida, A., Lwin, H., Saito, S., Mizuno, H.,Takahashi, A., Nakamura, T., Miyamoto, Y.,Ikegawa, S., Odashiro, K., Nobuyoshi, M.,Kamatani, N., Hori, M., Isobe, M., Naka-mura, Y. and Tanaka, T. A functional SNPin ITIH3 is associated with susceptibility tomyocardial infarction. J Hum Genet. 52: 220-229, 2007.

8. Lin, M.-L., Park, J.-H., Nishidate, T., Naka-mura, Y. and Katagiri, T. MELK, maternalembryonic leucine zipper kinase, involved inmammary carcinogenesis through interac-tion with Bcl-G, a pro-apoptotic member ofBcl-2 family. Breast Cancer Res. 9: R17, 2007.

9. Takata, R., Katagiri, T., Kanehira, M., Shuin,T., Miki, T., Namiki, M., Kohri, K., Tsunoda,T., Fujioka, T. and Nakamura, Y. Validationstudy of the prediction system for clinicalresponse of M-VAC neoadjuvant chemother-apy. Cancer Sci. 98: 113-117, 2007.

10. Hosokawa, M., Takehara, A., Matsuda, K.,Eguchi, H., Ohigashi, H., Ishikawa, O., Shi-nomura, Y., Imai, K., Nakamura, Y. andNakagawa, H. Oncogenic role of KIAA0101interacting with proliferating cell nuclear an-tigen in pancreatic cancer. Cancer Res. 67:2568-2576, 2007.

11. Hata, J., Matsuda, K., Ninomiya, T., Yo-nemoto, K., Matsushita, T., Ohnishi, Y.,Saito, S., Kitazono, T., Ibayashi, S., Iida, M.,Kiyohara, Y., Nakamura, Y. and Kubo, M.Functional SNP in a Sp1-binding site ofAGTRL1 gene is associated with susceptibil-ity to brain infarction. Hum Mol Genet. 16:630-639, 2007.

12. Osawa, N., Koga, D., Araki, S., Uzu, T.,

127

Page 25: Human Genome Center Laboratory of Genome Database … · 2020. 6. 2. · the KEGG system programatically, we have been developing and maintaining the KEGG API as a SOAP/WSDL based

Tsunoda, T., Kashiwagi, A., Nakamura, Y.and Maeda, S. Combinational effect of genesfor the renin-angiotensin system in confer-ring susceptibility to diabetic nephropathy. JHum Genet. 52: 143-151, 2007.

13. Onouchi, Y., Tamari, M., Takahashi, A.,Tsunoda, T., Yashiro, M., Nakamura, Yo.,Yanagawa, H., Wakui, K., Fukushima, Y.,Kawasaki, T., Nakamura, Yu. and Hata, A.A gennomewide linkage analysis ofKawasaki disease:evidence for linkage tochromosome 12. J Hum Genet. 52: 179-190,2007.

14. Miyamoto, Y., Mabuchi, A., Shi, D., Kubo,T., Takatori, Y., Saito, S., Fujioka, M., Sudo,A., Uchida, A., Yamamoto, S., Ozaki, K.,Takigawa, M., Tanaka, T., Nakamura, Y., Ji-ang, Q. and Ikegawa, S. A functional poly-morphism in the 5’ UTR of GDF5 is associ-ated with susceptibility to osteoarthritis. NatGenet. 39: 529-533, 2007.

15. Kanehira, M., Harada, Y., Takata, R., Shuin,T., Miki, T., Fujioka, T., Nakamura, Y. andKatagiri, T. Involvement of upregulation ofDEPDC1 (DEP domain containing 1) inbladder carcinogenesis. Oncogene. 26: 6448-6455, 2007.

16. Hayama, S., Daigo, Y., Yamabuki, T., Hirata,D., Kato, T., Miyamoto, M., Ito, T., Tsuchiya,E., Kondo, S. and Nakamura, Y. Phospho-rylation and activation of cell division cycleassociated 8 by aurora kinase B plays a sig-nificant role in human lung carcinogenesis.Cancer Res. 67: 4113-4122, 2007.

17. Tamura, K., Furihata, M., Tsunoda, T.,Ashida, S., Takata, R., Obara, W., Yoshioka,H., Daigo, Y., Nasu, Y., Kumon, H., Konaka,H., Namiki, M., Tozawa, K., Kohri, K., Tanji,N., Yokoyama, M., Shimazui, T., Akaza, H.,Mizutani, Y., Miki, T., Fujioka, T., Shuin, T.,Nakamura, Y. and Nakagawa, H. Molecularfeatures of hormone-refractory prostate can-cer cells by genome-wide gene expressionprofiles. Cancer Res. 67: 5117-5125, 2007.

18. Bourdon, A., Minai, L., Serre, V., Jais, J.-P.,Sarzi, E., Aubert, S., Chretien, D., Lonlay, deP., Paquis-Flucklinger, V., Arakawa, H.,Nakamura, Y., Munnich, A. and Rotig, A.Mutation of RRM 2 B , encoding p 53-controlled ribonucleotide reductase (p53R2),causes severe mitochondrial DNA depletion.Nat Genet. 39: 776-780, 2007.

19. Giacomini, K. M., Krauss, R. M., Roden, D.M., Eichelbaum, M., Hayden, M. R. andNakamura, Y. When good drugs go bad(commentary). Nature. 446: 975-977, 2007.

20. Zembutsu, H., Yanada, M., Hishida, A.,Katagiri, T., Tsuruo, T., Sugiura, I.,Takeuchi, J., Usui, N., Naoe, T., Nakamura,

Y. and Ohno, R. Prediction of risk of diseaserecurrence by genome-wide cDNA microar-ray analysis in patients with Philadelphiachromosome-positive acute lymphoblasticleukemia treated with imatinib-combinedchemotherapy. Int J Oncol. 31: 313-322, 2007.

21. Fujikawa, M., Katagiri, T., Tugores, A.,Nakamura, Y. and Ishikawa, F. ESE-3, anEts family transcription factor, Is up-regulated in cellular senescence. Cancer Sci.98: 1468-1475, 2007.

22. Hosono, N., Kubo, M., Tsuchiya, Y., Sato,H., Kitamoto, T., Saito, S., Ohnishi, Y. andNakamura, Y. Multiplex PCR-based real-time Invader assay (mPCR-RTIARETINA): anovel SNP-based detection method for du-plicated regions copy number polymor-phisms. Hum Mutat. 29: 182-189, 2007.

23. Kato, T., Sato, N., Hayama, S., Yamabuki, T.,Ito, T., Miyamoto, M., Kondo, S., Nakamura,Y. and Daigo, Y. Activation of HollidayJunction-Recognizing Protein involved in thechromosomal stability and immortality ofcancer cells. Cancer Res. 67: 8544-8553, 2007.

24. Ueda, K., Katagiri, T., Shimada, T., Irie, S.,Sato, T., Nakamura, Y. and Daigo, Y. Com-parative profiling of serum glycoproteomeby sequential purification of glycoproteinsand 2-nitrobenzensulfenyl (NBS) stable iso-tope labeling: a new approach for the novelbiomarker discovery for cancer. J ProteomeRes. 6: 3475-3483, 2007.

25. Taniwaki, M., Takano, A., Ishikawa, N.,Yasui, W., Inai, K., Nishimura, H., Tsuchiya,E., Kohno, N., Nakamura, Y. and Daigo, Y.Activation of KIF4A as a prognosticbiomarker and therapeutic target for lungcancer. Clin Cancer Res. 13: 6624-6631, 2007.

26. Suda, T., Tsunoda, T., Daigo, Y., Nakamura,Y. and Tahara, H. Identification of HLA-A24-restricted epitope-peptides derived fromgene products up-regulated in lung andesophageal cancers as novel targets for im-munotherapy. Cancer Sci. 98: 1803-1808,2007.

27. Mano, Y., Takahashi, K., Ishikawa, N.,Takano, A., Yasui, W., Inai, K., Nishimura,H., Tsuchiya, E., Nakamura, Y. and Daigo,Y. Growth factor receptor 1 oncogene part-ner as a noveprognostic biomarker andtherapeutic target for lung cancer. CancerSci. 98: 1902-1913, 2007.

28. Senju, S., Suemori, H., Zembutsu, H., Ue-mura, Y., Hirata, S., Fukuma, D., Mat-suyoshi, H., Shimomura, M., Haruta, M.,Fukushima, S., Matsunaga, Y., Katagiri, T.,Nakamura, Y., Furuya, M., Nakatsuji, N.and Nishimura, Y. Genetically manipulatedhuman embryonic stem cell-derived den-

128

Page 26: Human Genome Center Laboratory of Genome Database … · 2020. 6. 2. · the KEGG system programatically, we have been developing and maintaining the KEGG API as a SOAP/WSDL based

dritic cells with immune regulatory function.Stem Cells. 25: 2720-2729, 2007.

29. Cha, P.-C., Mushiroda, T., Takahashi, A.,Saito, S., Shimomura, H., Wanibuchi, Y.,Suzuki, T., Kamatani, N. and Nakamura, Y.High-resolution SNP and haplotype maps ofthe human gamma-glutamyl carboxylasegene (GGCX) and association study betweenpolymorphisms in GGCX with warfarinmaintenance-dose requirement of the Japa-nese population. J Hum Genet. 52: 856-864,2007.

30. Yamazaki, K., Onouchi, Y., Takazoe, M.,Kubo, M., Nakamura, Y. and Hata, A. Asso-ciation analysis of genetic variants in IL23R,ATG16L1 and 5p13.1 loci with Crohn’s dis-ease in Japanese patients. J Hum Genet. 52:575-583, 2007.

31. Tsukumo, Y., Tomida, A., Kitahara, O.,Nakamura, Y., Asada, S., Mori, K. and Tsu-ruo, T. Nucleobindin 1 controls the unfoldedprotein response by inhibiting ATF6 activa-tion. J Biol Chem. 282: 29264-29272, 2007.

32. Takehara, A., Hosokawa, M., Eguchi, H.,Ohigashi, H., Ishikawa, O., Nakamura, Y.and Nakagawa, H. GABA stimulates pancre-atic cancer growth through over-expressingGABAA receptor pai subunit (GABRP). Can-cer Res. 67: 9704-9712, 2007.

33. Kunizaki, M., Hamamoto, R., Silva, F.P.,Yamaguchi, K., Nagayasu, T., Shibuya, M.,Nakamura, Y. and Furukawa, Y. The lysine831 of VEGFR1 a novel target of methyla-tion by SMYD3. Cancer Res. 67: 10759-10765,2007.

34. Onouchi, Y., Gunji, T., Burns, J.C., Shimizu,C., Newburger, J.W., Yashiro, M., Naka-mura, Yo., Yanagawa, H., Wakui, K.,Fukushima, Y., Kishi, F., Hamamoto, K.,Terai, M., Sato, Y., Ouchi, K., Saji, T., Nariai,A., Kaburagi, Y., Yoshikawa, T., Suzuki, K.,Tanaka, T., Nagai, T., Cho, H., Fujino, A.,Sekine, A., Nakamichi, R., Tsunoda, T.,Kawasaki, T., Nakamura, Yu. and Hata, A.A functional polymorphism in ITPKC is as-sociated with Kawasaki disease susceptibil-ity and formation of coronary artery aneu-rysms. Nat Genet. 40: 35-42, 2008.

35. Kudo, S., Konda, R., Obara, W., Kudo, D.,Tani, K., Nakamura, Y. and Fujioka, T. Inhi-bition of tumor growth through suppressionof angiogenesis by brain-specific angiogene-sis inhibitor 1 gene transfer in murine renalcell carcinoma. Oncol Rep. 18: 785-791, 2007.

36. Silva, F.P., Hamamoto, R., Kunizaki, M.,Tsuge, M., Nakamura, Y. and Furukawa, Y.Enhanced methyltransferase activity ofSMYD3 by the cleavage of its N-terminal re-gion in human cancer cells. Oncogene, En-

hanced methyltransferase activity of SMYD3by the cleavage of its N-terminal region inhuman cancer cells. Oncogene. Nov 12;[Epub ahead of print], 2007.

37. The International HapMap Consortium. Asecond generation human haplotype map ofover 3.1 million SNPs. Nature. 449: 851-861,2007.

38. Sabeti, P.C., Varilly, P., Fry, B., Lohmueller,J., Hostetter, E., Cotsapas, C., Xie, X., Byrne,E.H., McCarroll, S.A., Gaudet, R., Schaffner,S.F., Lander, E.S. and The International Hap-Map Consortium: Genome-wide detectionand characterization of positive selection inhuman populations. Nature. 449: 913-918,2007.

39. Ishikawa, N., Takano, A., Yasui, W., Inai, K.,Nishimura, H., Ito, H., Miyagi, Y.,Nakayama, H., Fujita, M., Hosokawa, M.,Tsuchiya, E., Kohno, N., Nakamura, Y. andDaigo, Y. Cancer-testis antigen, LY6K is aserologic biomarker and a therapeutic targetfor lung and esophageal carcinomas. CancerRes. 67: 11601-11611, 2007.

40. Kato, M., Miya, F., Kanemura, Y., Tanaka,T., Nakamura, Y. and Tsunoda, T. Recombi-nation rates of genes expressed in humantissues. Hum Mol Genet. Nov 13; [Epubahead of print], 2007.

41. Kidokoro, T., Tanikawa, C., Furukawa, Y.,Katagiri, T., Nakamura, Y. and Matsuda, K.CDC20, a potential cancer therapeutic target,is negatively regulated by p53. Oncogene.Sep 17; [Epub ahead of print], 2007.

42. Brunet, J., Pfaff, A.W., Abidi, A., Unoki, M.,Nakamura, Y., Guinard, M., Klein, J.-P.,Candolfi, E. and Mousli, M. Toxoplasmagondii exploits UHRF1 and induces host cellcycle arrest at G2 to enable its proliferation.Cell Microbiol. Dec 6; [Epub ahead of print],2007.

43. Shimo, A., Tanikawa, C., Nishidate, T., Mat-suda, K., Lin, M.-L., Park, J.-H., Ohta, T.,Hirata, K., Fukuda, M., Nakamura, Y. andKatagiri, T. Involvement of KIF2C/MCAKoverexpression in mammary carcinogenesis.Cancer Sci. 99: 62-70, 2008.

44. Leung, A.A.C.C., Wong, V.C.L., Yang, L.C.,Chan, P.L., Daigo, Y., Nakamura, Y., Qi, R.Z., Miller, L., Liu, E.T.-K., Wang, L.D.J.-L., S.Law, Tsao, W. and Lung, M.L. Frequent de-creased expression of candidate tumor sup-pressor gene, DEC1, and its anchorage-independent growth properties and impacton global gene expression in esophageal car-cinoma. Int J Cancer. 122: 587-594, 2008.

45. Uemura, M., Tamura, K., Chung, S., Honma,S., Okuyama, A., Nakamura, Y. and Naka-gawa, H. A novel 5-steroid reductase (SRD5

129

Page 27: Human Genome Center Laboratory of Genome Database … · 2020. 6. 2. · the KEGG system programatically, we have been developing and maintaining the KEGG API as a SOAP/WSDL based

A3, type-3) is overexpressed in hormone-refractory prostate cancer. Cancer Sci. 99: 81-86, 2008.

46. Kamatani, Y., Matsuda, K., Ohishi, T., Oht-subo, S., Yamazaki, K., Iida, A., Hosono, N.,Kubo, M., Yumura, W., Nitta, K., Katagiri,T., Kawaguchi, Y., Kamatani, N. and Naka-mura, Y. Identification of a significant asso-ciation of an SNP in TNXB with SLE inJapanese population. J Hum Genet. 53: 64-73, 2008.

47. Obama, K., Satoh, S., Hamamoto, R., Sakai,Y., Nakamura, Y. and Furukawa, Y. En-hanced expression of RAD51AP1 is involvedin the growth of intrahepatic cholangiocarci-noma cells. Clin Cancer Res. in press, 2008.

48. Fukukawa, C., Hanaoka, H., Nagayama, S.,Tsunoda, T., Toguchida, J., Endo, K., Naka-mura, Y. and Katagiri, T. Radioimmunother-apy of human synovial sarcoma using amonoclonal antibody against FZD10. CancerSci. in press, 2008.

49. Kato, N., Miyata, T., Tabara, Y., Katsuya, T.,Yanai, K., Hanada, H., Kamide, K., Nakura,J., Kohara, K., Takeuchi, F., Mano, H., Yasu-nami, M., Kimura, A., Kita, Y., Ueshima, H.,

Nakayama, T., Soma, M., Hata, A., Fujioka,A., Kawano, Y., Nakao, K., Sekine, A.,Yoshida, T., Nakamura, Y., Saruta, T., Ogi-hara, T., Sugano, S., Miki, T. and Tomoike,H. High-Density Association Study andNomination of Susceptibility Genes for Hy-pertension in the Japanese National Project.Hum Mol Genet. in press, 2008.

50. Oishi, T., Iida, A., Otsubo, S., Kamatani, Y.,Usami, M., Takei, T., Uchida, K., Tsuchiya,K., Saito, S., Ohnishi, Y., Tokunaga, K.,Nitta, K., Kawaguchi, Y., Kamatani, N., Ko-chi, Y., Shimane, K., Yamamoto, K., Naka-mura, Y., Yumura, W. and Matsuda, K. Afunctional SNP in the NKX2.5-binding siteof ITPR3 promoter is associated with sus-ceptibility to Systemic Lupus Erythematosusin Japanese population. J Hum Genet. 53:151-162, 2008.

51. Daigo, Y. and Nakamura, Y. From cancergenomics to the thoracic oncology: Newbiomarker and therapeutic target discoveryfor lung and esophageal carcinomas. (Re-view Article) Gen Thorac and CardiovascSurg. 56: 43-53, 2008.

130

Page 28: Human Genome Center Laboratory of Genome Database … · 2020. 6. 2. · the KEGG system programatically, we have been developing and maintaining the KEGG API as a SOAP/WSDL based

Genetic heterogeneity of human beings is one of the most important targets ofpost-genomic research. Genome-wide association studies are being actively car-ried out using the genetic polymorphism markers to identify disease-related loci.We focus on the development of new methods to interpret the heterogeneity andto map the disease-associated loci and collaborate with research groups for data-mining of their genetic epidemiology studies.

1. The development of new methods to mapdisease-associated loci with genetic poly-morphisms.

Ryo Yamada

Genome-wide association (GWA) studies areresulting in many useful findings. The scale ofsuch studies is increasing along with rapid pro-gress in genotyping technology. This increase inscale necessarily increases the degree of depend-ence among individual tests in GWA studies.The inter-test dependence is problematic be-cause almost all the conventional statisticalmethods assume independence among multipletests. Besides the multiple sources of inter-testdependency, the variable inflation of test statis-tics due to biased sampling from structuredpopulation is one of the unavoidable conse-quences of enlarged sample size. These prob-lems that complicate the interpretation of dataof GWA studies are mutually related and thereis no straight-forward solution of them all to-gether. We decompose the difficulty into parts,i.e., the problem of linkage disequilibrium (LD),population structure, multiple genetic models,study design and characterize their problem andpropose solution of the individual problems at

the beginning and also attempt to improve theinterpretation of data of GWA studies as awhole.

a. Multiple testing correction

One of the major sources of inter-test depend-ence is allelic association due to LD and popula-tion structure. This problem has been wellaware of since SNP-based LD mapping becamepopular. There are some other origins of inter-test dependence that are less noticed but simi-larly troublesome as allelic association; the inter-test dependence due to application of multiplegenetic models to individual markers and dueto study designs, such as the usage of commoncontrols and subjects with comorbidity for multi-phenotype studies, which is currently commonparticularly in the gigantic study design. Thisyear we developed a method to estimate the ef-fective number of independent tests from GWAstudies in the structured population.

b. Test statistics correction for data of struc-ture population

Because the genetic epidemiology studies oncomplex genetic traits target relatively weak fac-

Human Genome Center

Laboratory of Functional Genomicsゲノム機能解析分野

Visiting Professor Gregory Mark Lathrop, Ph.D.Associate Professor Ryo Yamada, M.D., Ph.D.

客員教授 理学博士 グレゴリー・マーク・ラスロップ准教授 医学博士 山 田 亮

131

Page 29: Human Genome Center Laboratory of Genome Database … · 2020. 6. 2. · the KEGG system programatically, we have been developing and maintaining the KEGG API as a SOAP/WSDL based

tors, which means sample size of them shouldbe more than thousands and subsequentlymakes idealistic random sampling from homo-geneous population impossible. The test statis-tics of the studies in the heterogeneous popula-tion, in other words, structured population,tends to give false positive results. One of themethods to correct the increase in the false posi-tives is genomic control method for chi-squaredistribution. We modified the genomic controlmethod so that it could correct the Fisher’s exacttest statistics in 2007.

c. Characterization of exact trend test forSNP case-control association test data

The trend test is the test of the choice for 2×3contingency table test of SNP data. The defini-tion of the two-sided exact trend test is contro-versial. We investigated the suitability of multi-ple ways of the two-sided exact trend test for 2×3 SNP data tables this year.

2. The development of new methods to inter-pret the genetic heterogeneity.

Ryo Yamada

As a compound in nature, the DNA sequenceis under pressure to maximize the heterogeneityof the sequence. Under the most random condi-tion, all bases of the sequence would be poly-morphic, and all bases and all sets of bases aremutually independent. At the other extreme, un-der the least random condition, all DNA mole-cules would be clones. In living organisms, thenumber of polymorphic sites in the DNA se-quence is limited due to the requirements for re-production and as a result of selection and ge-netic drift, against which opposite forces act toincrease heterogeneity (e.g., mutation and re-combination). A major research target followingthe completion of the genome sequence is theinvestigation of intra-species variations, amongwhich diallelic single nucleotide polymorphismsare the most common.

a. Quantitation of linkage disequilibrium ofmultiple markers

Genetic variations within a population giverise to LD, and the use of the genetic history ofthe population and LD mapping is a very prom-ising method for identifying genetic back-grounds of various phenotypes. LD is a measureof inter-marker dependence. Although the inter-marker dependence exist among any set ofmarkers, only the pair-wise inter-marker de-pendence is utilized for quantitation of the ge-

netic heterogeneity and for genetic epidemiol-ogy studies usually. We develop a new methodto quantify the heterogeneity and complexity ofpopulation of DNA sequence with SNPs so thatvarious researches based on genetic heterogene-ity will benefit in 2007.

b. Geometric expression of haplotype popu-lations

Haplotypes are consisted of alleles of multiplemarkers. We attempted to deal the haplotypedata from combination theory standpoint andinvestigated the utility of polyhedral handlingof the combinatorial aspects of haplotypes.

3. Collaboration with genetic epidemiologyresearch groups.

Gregory Mark Lathrop and Ryo Yamada

Besides the development of new methods toanalyze genetic polymorphism data in the con-text of population genetics and genetic statistics,we collaborate with multiple research groups inand out of the IMS-UT, including Kyoto Univer-sity, Kyoto, The University of Tokyo Hospital,Tokyo, Laboratory for Rheumatic Diseases, SRC,RIKEN, Yokohama, National Hospital Organiza-tion Sagamihara National Hospital, Sagamihara,and The Centre National de Génotypage, Evry,France, for the interpretation of genetic epidemi-ology data with the conventional statisticalmethods.

4. Public distribution of population geneticsand genetic association study tools.

Ryo Yamada

Because the designs of genetic epidemiologystudies have been changing, the analysis toolshave to be updated all the time. The number ofgenetic epidemiology study groups is muchmore than the groups on genetic statistics in theworld and also in Japan. We opened the website that distributes basic tool of linkage dise-quilibrium mapping for public use. This distri-bution is supported by the grant from Japan So-ciety for the Promotion of Science on the permu-tation test.Web-site URL: http://func-gen.hgc.jp/

132

Page 30: Human Genome Center Laboratory of Genome Database … · 2020. 6. 2. · the KEGG system programatically, we have been developing and maintaining the KEGG API as a SOAP/WSDL based

Publications

1. Yamamoto, K. and Yamada, R. Lessons froma Genomewide Association Study of Rheuma-toid Arthritis. N. Engl. J. Med. 357: 1250-1251,2007

2. Yamada, R. and Yamamoto, K. Mechanismsof disease: genetics of rheumatoid arthritis--ethnic differences in disease-associated genes.Nat. Clin. Pract. Rheumatol. 3: 644-50, 2007

5. Suzuki, A., Yamada, R. and Yamamoto, K.Citrullination by peptidylarginine deiminase

in rheumatoid arthritis. Ann. N.Y. Acad. Sci.1108: 323-39, 2007

6. Ryo Yamada, and F. Matsuda. A novelmethod to express SNP-based genetic hetero-geneity, Ψ, and its use to measurelinkage disequilibrium for multiple SNPs, Dg,and to estimate absolute maximum of haplo-type frequency. Genetic Epidemiol. 31: 709-726, 2007

133

Page 31: Human Genome Center Laboratory of Genome Database … · 2020. 6. 2. · the KEGG system programatically, we have been developing and maintaining the KEGG API as a SOAP/WSDL based

The mission of our laboratory is to conduct computational ( “in silico”) studies onthe functional aspects of genome information. Roughly speaking, genome informa-tion represents what kind of proteins/RNAs are synthesized on what conditions.Thus, our study includes the structural analysis of molecular function of each geneproduct as well as the analysis of its regulatory information, which will lead us tothe understanding of its cellular role represented by the networks of inter-gene in-teraction.

1. Transcription factor binding site co-occurrence in firmicutes

Nicolas Sierro and Kenta Nakai

Preliminary investigation of binding site co-occurrence in firmicutes highlighted several spe-cific combinations of transcription factors ap-pearing with a higher-than-expected frequencyin some genomes. Such combinations could in-dicate the presence of different regulationmechanisms for specific pathways and in spe-cific strains. For example, two LexA sites sepa-rated by 16 or 17 nucleotides are found about150 nucleotides upstream of lexA and a genesimilar to uvrX in Bacillus anthracis and clausii,Listeria monocytogenes and Staphylococcus aureus,epidermidis , haemolyticus and saprophyticusstrains. Both of these genes are repressed by theSOS DNA-repair genes repressor LexA. How-ever, with the notable exception of the Staphylo-coccus strains for which a similar pattern with adistance of 31 nucleotides is found in front ofrecA , no other gene of the SOS DNA-repairregulon is preceded by such combinations. Thisspecific co-occurrence pattern could be relatedto the two-step SOS DNA-repair responsemechanism, where the presence of multiple

LexA sites upstream of some of the genes in-volved in the first response may be necessary toensure a tight repression of potentially harmfulgenes despite the weaker binding site.

2. Promoter structure modeling for expres-sion pattern prediction

Alexis Vandenbon and Kenta Nakai

Initiation of transcription in eukaryotes isregulated by transcription factors binding cis-regulatory elements in the regulatory regions ofgenes. We can therefore assume that regulatoryregions containing similar sets of cis-regulatorymotifs will drive similar expression patterns atthe transcriptional level. We have developed aMarkov chain-based promoter structure modelthat includes positional preference, order, andorientation of motifs in upstream regulatory re-gions. The model is trained using promoter se-quences of a set of co-regulated genes, and issubsequently used to predict genes having simi-lar expression profiles as the input gene set. Weapplied our model on a dataset of genes ex-pressed in pharyngeal muscle cells of Caenorhab-ditis elegans, and on a dataset of muscle ex-pressed genes of Ciona intestinalis. Both compu-

Human Genome Center

Laboratory of Functional Analysis In Silico機能解析イン・シリコ分野

Professor Kenta Nakai, Ph.D.Associate Professor Kengo Kinoshita, Ph.D.

教 授 理学博士 中 井 謙 太准教授 理学博士 木 下 賢 吾

134

Page 32: Human Genome Center Laboratory of Genome Database … · 2020. 6. 2. · the KEGG system programatically, we have been developing and maintaining the KEGG API as a SOAP/WSDL based

tational and experimental verifications indicatedthat this model is capable of predicting candi-date promoters driving similar expression pat-terns as the input-regulatory sequences. We arefurther developing our promoter structuremodel, and we are also working on a number ofprojects to find common features in sets of co-regulated genes. We believe that our approachescan be useful for finding promising candidategenes for wet-lab experiments and for increasingour understanding of transcriptional regulation.

3. DBTSS: database of transcription startsites, progress report 2008

Hiroyuki Wakaguri1, Riu Yamashita, YutakaSuzuki1, Sumio Sugano1, and Kenta Nakai:1Graduate School of Frontier Sciences, U. To-kyo

One of the major topics in the post-sequenceera is the analysis of transcription regulationnetwork. In order to study this problem, wehave constructed a database, DataBase of Tran-scription Starts Site (DBTSS), which included anumber of 5’-end sequence (ESTs) produced bythe oligo-capping method. We have recently re-leased DBTSS version 6, extended with threemajor additional features. The first new featureis the new data resource by SOLEXA. NowDBTSS has not only 1,540,411 mapped se-quences from oligo-capped clones, but also21,981,890 SOLEXA sequences from MCF7 andHEK193 cells. The second feature is a viewer ofevolutional conservation of the promoter re-gions. A user can compare the sequence aroundthe promoter regions from three species amonghuman, mouse, rat, chimp, and macaque. Thethird feature is a tool for the transcriptomeanalysis of promoter region with expression andpredicted transcription factor binding sites. Thistool searches for putative transcription factorbinding sites commonly occurring in the pro-moters, which show similar behaviors on vari-ous drug perturbations. DBTSS can be accessedat http://dbtss.hgc.jp.

4. Predicting transcriptional activities of hu-man promoters from putative transcriptionfactor binding sites by using support vec-tor machines

Hirokazu Chiba, Riu Yamashita, Kenta Nakai

Bindings of specific transcription factors (TFs)to promoter sequences play a central role intranscription regulation. However, the relation-ship between the combination of TFs and tran-scription activities remains to be deciphered. To-

ward the understanding the relationship, weanalyzed human promoter sequences and theirtranscriptional activities in specific tissues,which were defined by large collection of full-length cDNAs. We used support vector machineto predict the transcriptional activities from pu-tative TF binding sites (TFBSs) in the promotersequences. Among 38 tissues tested, best predic-tion performances were obtained in the cases ofbrain and testis. Furthermore, we performed thesame scheme of prediction by dividing the pro-moter sequences into two parts at -200 of tran-scriptional start sites (TSSs). The prediction per-formances were improved, suggesting that theTFBSs function differently according to the posi-tion, and the boundary is -200. We also per-formed cross-species comparison of promoter se-quences between human and mouse, and exam-ined the average identities at various positionsfrom the TSSs. The result supports the func-tional boundary at -200 of TSSs. These resultswill promote the understanding of the promoterarchitecture of mammals.

5. Retrotransposition as a source of new pro-moters

Kohji Okamura and Kenta Nakai

The fact that promoters are essential for thefunction of all genes presents the basis of thegeneral idea that retrotranspositions give rise toprocessed pseudogenes. However, recent studieshave demonstrated that some retrotransposedgenes are transcriptionally active. Because pro-moters are not thought to be retrotransposedalong with exonic sequences, these transcription-ally active genes must have acquired a func-tional promoter by mechanisms that are yet tobe determined. Hence, comparison between aretrotransposed gene and its source gene ap-pears to provide a unique opportunity to inves-tigate the promoter creation for a new gene. Weidentified 29 gene pairs in the human genome,consisting of a functional retrotransposed geneand its parental gene, and compared their re-spective promoters. In more than half of thesecases, we unexpectedly found that a large partof the core promoter had been transcribed,reverse-transcribed, and then integrated to beoperative at the transposed locus. This observa-tion can be ascribed to the recent discovery thattranscription start sites tend to be interspersedrather than situated at one specific site. Thispropensity could confer retrotransposability topromoters per se. Accordingly, the retrotrans-posability can explain the genesis of some alter-native promoters.

135

Page 33: Human Genome Center Laboratory of Genome Database … · 2020. 6. 2. · the KEGG system programatically, we have been developing and maintaining the KEGG API as a SOAP/WSDL based

6. Investigation of motif finding perform-ances

Keishin Nishida and Kenta Nakai

Computational prediction of nucleotide bind-ing specificity for transcription factors remains afundamental and largely unsolved problem. Oneof the prediction methods is pattern discoveryin unaligned DNA sequences, called “motiffinding”. Despite many studies about motiffinding, this problem is far from being resolved.To provide an appropriate motif finding over-view for experiment design, we tried largerscale analysis of wide input conditions. We fo-cused on the Gibbs Motif Sampler, one of themost famous motif finding programs. Back-ground sequences are generated randomly withany GC content. Motif information is down-loaded from the JASPAR database to plant ex-perimentally valid motif into the background se-quences. Our results show an expected ten-dency; longer sequences are a more difficultproblem, and more mismatches further increasethe difficulty. We find that the motif findingperformance is worse for larger input sequencesthan for smaller ones. Indicating that the defaultprogram parameters are not optimal for larges.Based on this result, we will investigate suitableGibbs Motif Sampler parameters for handlingsuch large size sequence datasets. In additionwe will investigate other motif models andother motif finding programs, and produce simi-lar guidelines.

7. Analysis of Nucleosomal DNA sequences

Yoshiaki Tanaka, Riu Yamashita and KentaNakai

Nucleosomes limit accessibility of some regu-latory factor binding sites, and thus play an im-portant role in transcription and replication. Ithas been reported that the nucleosome occu-pancy rate in the regions upstream of transcrip-tion start sites (TSSs) is lower than that in otherregions. It is also known that the binding of nu-cleosomes on DNA is sequence dependent. Mo-tifs recognized by nucleosomes are categorizedinto two groups, periodic sequence motifs (ex.10bp AA/TT repeat) and consecutive sequencemotifs (ex. poly(A/T)). Recently some research-ers succeeded in the computational prediction ofnucleosome positions in the S. cerevisiae genomeusing some of these motifs. However, theirmethods are not as efficient in higher eukaryoticgenomes. We therefore compared nucleosomeand linker DNA sequences from the genomessuch as H. sapiens, C. elegans and S. cerevisiae. In

this analysis, different tendencies are observedfor each organism. For example, 10bp AA/TTperiodic motifs are observed in C. elegans and S.cerevisiae, but not in H. sapiens. These tendencieswill be helpful for developing a new predictionapproach of nucleosome positioning. Now weare also investigating other important tendenciesfor nucleosome positioning in higher eukaryoticgenomes.

8. Computational analysis of trans-splicing inC. intestinalis gene expression

Shuang Li, Riu Yamashita, Nicolas Sierro andKenta Nakai

Trans-splicing, in which the 5’-ends of somepre-mRNAs are cut and replaced by the 5’-terminal sequence of a specific donor mRNAcalled spliced-leader (SL), has been observed insix phyla. In chordates, Ciona intestinalis is theonly one reported as performing trans-splicingsystematically. We have studied the trans-splicing of C. intestinalis with 5’-EST data. Thisyear, we compared the two populations (trans-splicing positive or negative) of genes, aiming atfinding the biological meaning of trans-splicing.We discovered that the expression ratio betweenthe two gene groups varies with tissues and de-velopmental stages, ranging from 1:3.1 at the‘juvenile’ stage to 1:1.1 in ‘blood’. We also ob-served for instance that ribosome related genesare more likely to fall into the non-trans-splicedgroup while mitochondria related genes show apreference for the trans-spliced group. Ouranalysis indicates that in C. intestinalis, althoughthere may not exist strong fundamental require-ments for genes to generate trans-splicing pre-mRNAs, the different populations of genes arelikely to be spatially and temporally regulateddifferently.

9. Computational Description of Gene Net-works by Direct Regulatory Interactions inAscidian Early Development

Xuyang Yuan, Nicolas Sierro, Kohji Okamura,and Kenta Nakai

The ascidian Ciona is a good model system toelucidate gene regulatory networks in chordatedevelopment. Accordingly, comprehensive insitu hybridization assays have identified a num-ber of regulatory genes with localized expres-sion pattern. Subsequent knockdown assayshave illuminated thousands of combinations ofgene expression profiles in the early embryo, de-picting a blueprint for its early development.However, the blueprint cannot demonstrate di-

136

Page 34: Human Genome Center Laboratory of Genome Database … · 2020. 6. 2. · the KEGG system programatically, we have been developing and maintaining the KEGG API as a SOAP/WSDL based

rect molecular mechanisms occurring in eachcell because an interaction between two geneswas detected by whether each transcriptiontakes place or not. Thus, we are computationallyconstructing gene networks consisting of onlydirect interactions. For trans-acting factorswhose binding sites are unknown, we are pre-dicting them by motif finding programs and ex-amination of orthologs to complete the wholenetwork. Our result will help to understand theprecise molecular mechanisms during the devel-opment and will provide suggestion for furtherexperimental analyses.

10. Docking of protein molecular surfaceswith evolutionary trace analysis

Eiji Kanamori2,3, Yoichi Murakami4, YukoTsuchiya, Daron M Standley4, Haruki Naka-mura4, and Kengo Kinoshita: 2Japan BiologicalInformation Research Center, 3Hitachi Soft-ware Engineering Co., Ltd., 4Protein ResearchInstitute, Osaka University

Protein-protein interaction is the first step toconstruct interaction network of proteins, whichis a key to understand the complex biologicalfunctions. To identity the reliable interactions,structural information of proteins are useful, sothe method to predict the complex structure ofproteins is require. Many methods have beendeveloped by several groups, but the predictionaccuracy is not so high at the moment. There-fore, we tried to develop a new method to pre-dict the complex structure of proteins using thedifferent view of proteins, that is, molecular sur-face with considering the physicochemical andevolutional information. With this method, weparticipated in blind prediction contest, criticalassessment of prediction of interactions (CAPRI)and have achieved the good results.

11. COXPRESdb: a database of coexpressedgene networks in mammals

Takeshi Obayashi, Shinpei Hayashi5, MasayukiShibaoka1, Motoshi Saeki5, Hiroyuki Ohta5,6,Kengo Kinoshita: 5Graduate School of Infor-mation Science and Engineering, Tokyo Insti-tute of Technology, 6Center of Biological Re-sources and Informatics, Tokyo Institute ofTechnology

The amount of publicly available gene expres-sion data is the most abundant in mouse andhuman, which is tenth or twentieth larger thanthat for Arabidopsis. However there is no coex-pressed gene database as like ATTED-II forArabidopsis, although the information is valu-

able to predict gene function. We are thus con-structing new database named COXPRESdb (co-expression database) (http://coxpresdb.hgc.jp)for coexpressed genes in mouse and humanfrom such publicly available gene expressiondata. The information of gene coexpression iscalculated from thousands of oligonucleotidemicroarray (GeneChip) data and then repre-sented as gene lists and gene networks. Whenthe networks became too big for the static pic-ture on the web in GO networks or in tissuenetworks, we used Google Maps API to visual-ize them interactively. This information of genecoexpression will widely promote experimentalresearches for mouse and human.

13. Identification of transient hub proteinsand the possible structural basis for theirmultiple interactions

Miho Higurashi, Takashi Ishida and Kengo Ki-noshita

Proteins that can interact with multiple part-ners play central roles in the important biologi-cal processes, such as signal transduction. Al-though it is well known that stable complexesand transient complexes have different struc-tural features, the hub proteins defined by pre-vious study were identified as proteins withmultiple partners, regardless of whether the in-teractions were transient or stable. As a result,so-called hub proteins can be the components ofstable supramolecules as opposed to the normalconcepts of hub proteins. In this study, we havedeveloped a method to identify transient hubproteins using PDB, and then perform statisticalanalyses of the structural features of these pro-teins. As a result, we found that the main differ-ence between sowciable and non-sociable pro-teins is not the abundance of disordered regions,in contrast to the previous studies, but ratherthe structural flexibility of the entire protein, re-sulting from less number of hydrogen bonds be-tween core residues.

14. PrDOS: prediction of disordered proteinregions from amino acid sequence

Takashi Ishida and Kengo Kinoshita

Identification of disordered regions in pro-teins is important for the functional annotationof proteins and for high-throughput structuraldetermination. However, the cost of experimen-tal determination of disordered regions is ex-pensive. Thus, we developed a computationalmethod to predict disordered protein regionsfrom their amino acid sequences and imple-

137

Page 35: Human Genome Center Laboratory of Genome Database … · 2020. 6. 2. · the KEGG system programatically, we have been developing and maintaining the KEGG API as a SOAP/WSDL based

mented web-interface of this prediction method.Our system is composed of two predictors, thatis, a predictor based on the local amino acid se-quence, and one based on template proteins.The performance of the method was evaluatedin the blind benchmark by the structural biologycommunity. The method achieved high accuracy(>90% with the sensitivity of 0.56), especiallyfor short disordered regions.

15. eF-seek: prediction of the functional sitesof proteins by searching for similar elec-trostatic potential and molecular surfaceshape

Kengo Kinoshita, Yoichi Murakami4 andHaruki Nakamura4

Molecular function of proteins are determinedby their three dimensional structures, thus thesimilarity of protein structure can give someclues to infer their functions. In many cases, themolecular functions are begun with the molecu-lar interaction with small molecules (ligands).Therefore, to find the putative ligands is thefirst step to identify the molecular function ofproteins. For the purpose, we have developed aweb server to identify a putative ligand uponthe structure of proteins. The web server, eF-seek, accepts a coordinate file with PDB formatfile and returns complex structures predicted. tosearch for the similar ligand binding sites forthe uploaded coordinate file with PDB format.The representative binding sites in eF-site data-base are search by our own algorithm based onthe clique search algorithm.

16. Structural diversity of ligand bindingsites in proteins

Megumi Okamoto, Matsuyuki Shirota, andKengo Kinoshita

Molecule recognition is the first step in realiz-ing the function of proteins. Therefore, the un-derstanding of the molecular recognition is animportant step to reveal the structure-functionrelation in proteins. To observe the structure-function relationship of known complexes, wecarried out the classification of AMP, GMP,CMP, UMP binding sites in PDB. We analyzedvariety of binding sites in three levels, which aresecondary structure elements, spatial arrange-ment of atoms and shape of three-residue frag-ments. As a result, we found that binding sites,which are various in the view of secondarystructure and atomic configuration, have com-mon fragment combinations. And same frag-ment combination constitutes structurally differ-

ent binding sites by different spatial arrange-ments.

17. Molecular dynamics simulation ofcholesterol-induced change in a mem-brane environment

Naoya Fujita, Takashi Ishida and Kengo Ki-noshita

Simulation in molecular level of lipid raft,where many biological functions are realized, isimportant to understand membrane heterogene-ity, function of proteins and effects of raft struc-ture to proteins’ function. To reveal effects ofcholesterol, which is one of main components inthe raft domain, we compared a cholesterol-mixed raft-like membrane and a pure lipid, di-palmitoylphosphatidylcholine (DPPC), mem-brane. By analyzing the simulation trajectories, asignificant variation of membrane width wasfound on the pure DPPC membrane but not onthe cholesterol-mixed membrane. Such a differ-ence on the membrane environment alsochanged stability and mobility on an embeddedprotein, alamethicin.

18. Domain size effect on amino acid compo-sition

Matsuyuki Shirota, Takashi Ishida, and KengoKinoshita

The size effect of proteins on amino acid com-position, that is whether the frequencies of resi-dues change systematically with increasing pro-tein size, has been tested several times underthe assumption that the frequencies of hydro-phobic residues would increase and those of hy-drophilic residues would decrease if protein sizeincreased. However, such systematic change inthe frequency of hydrophobic or hydrophilicresidues has not yet been detected. Here, takingadvantage of recent large database of proteinstructures, we examined the change in occur-rence of each amino acid with increasing proteinsize, and identified definite correlation betweenprotein size and occurrence of several amino ac-ids. Contrary to the generally accepted expecta-tion that the incidence of hydrophilic amino ac-ids would decrease, only two charged aminoacids-lysine and glutamic acid-among all hydro-philic residues had decreased occurrence withincreasing protein size. Since these two aminoacids are the most rarely found in the proteincore, this decrease of lysine and glutamic acidcan be considered to be induced by the increaseof protein size due to the decrease of relativesurface area.

138

Page 36: Human Genome Center Laboratory of Genome Database … · 2020. 6. 2. · the KEGG system programatically, we have been developing and maintaining the KEGG API as a SOAP/WSDL based

Publications

Vandenbon, A., Miyamoto, Y., Takimoto, N.,Kusakabe, T., and Nakai, K. Markov chain-based promoter structure modeling for tissue-specific expression pattern prediction. DNARes ., in press.

Genome Information Integration Project and H-invitational 2 Consortium, The H-InvitationalDatabase (H-InvDB), a comprehensive annota-tion resource for human genes and tran-scripts. Nucl. Acids Res., in press.

Sierro, N., Makita, Y., de Hoon, M., and Nakai,K. DBTBS: a database of transcriptional regu-lation in Bacillus subtilis containing upstreamintergenic conservation information. Nucl. Ac-ids Res ., in press.

Wakaguri, H., Yamashita, R., Suzuki, Y.,Sugano, S., and Nakai, K. DBTSS: DataBase ofTranscription Start Sites, progress report 2008.Nucl. Acids Res., in press.

Ikura, T., Kinoshita, K., and Ito, N.A cavity withan appropriate size is the basis of the PPIaseactivity. PEDS , in press.

Obayashi, T., Hayashi, S., Shibaoka, M., Saek,M., Ohta, H., and Kinoshita, K. COXPRESdb:a database of coexpressed gene networks inmammals. Nucl. Acids Res, in press.

Higurashi, M., Ishida, T., and Kinoshita, K.Identification of transient hub proteins andthe possible structural basis for their multipleinteractions. Protein Sci , 17: 72-78, 2008.

Kanamori, E., Murakami, Y., Tsuchiya, Y., Stan-dley, D.M., Nakamura, H., and Kinoshita, K.Docking of protein molecular surfaces withevolutionary trace analysis. Proteins 69: 832-838, 2007.

Okumura, T., Makiguchi, H., Makita, Y.,Yamashita, R., and Nakai, K. Melina II: a webtool for comparisons among several predictivealgorithms to find potential motifs from pro-moter regions. Nucl. Acids Res. 35: W227-W231, 2007.

Horton, P., Park, K.-J., Obayashi, T., Fujita, N.,Harada, H., Adams-Collier, C.J., and Nakai,K. WoLF PSORT: protein localization predic-tor. Nucl. Acids Res. 35: W585-W587, 2007.

Tsuritani, K., Irie, T., Yamashita, R., Wakaguri,H., Kanai, A., Mizushima-Sugano, J., Sugano,S., Nakai, K., and Suzuki, Y. Distinct class ofputative “non-conserved” promoters in hu-mans: comparative studies of alternative pro-moters of human and mouse genes. GenomeRes . 17(7): 1005-1014, 2007.

Sakakibara, Y., Irie, T., Suzuki, Y., Yamashita,R., Wakaguri, H., Kanai, A., Chiba, J., Takagi,T., Mizushima-Sugano, J., Hashimoto, S.,

Nakai, K., and Sugano, S. Intrinsic promoteractivities of primary DNA sequences in thehuman genome. DNA Res. 14(2): 71-77, 2007.

Nakai, K. and Horton, P. Chapter 29: Computa-tional prediction of subcellular localization (inM. van der Giezen ed., Protein Targeting Proto-cols (2nd ed.), Methods in Molecular Biology ,vol. 390). pp. 429-466, Humana Press, Totowa,2007, (ISBN 13 978-1-58829-702-0).

Obayashi, T., Kinoshita, K., Nakai, K., Shibaoka,M., Hayashi, S., Saeki, M., Shibata, D., Saito,K., and Ohta, H. ATTED-II: a database of co-expressed genes and cis elements for identify-ing co-regulated gene groups in Arabidopsis.Nucl. Acids Res. 35: D863-D869, 2007.

Koike, R., Kinoshita, K., and Kidera, A. Proba-bilistic alignment detects remote homology ina pair of protein sequences without homolo-gous sequence information. Proteins 66: 655-63, 2007.

Kinoshita, K., Murakami, Y., and Nakamura, H.eF-seek: prediction of the functional sites ofproteins by searching for similar electrostaticpotential and molecular surface shape. NucleicAcids Res . 35: W398-402, 2007.

Ishida, T. and Kinoshita, K. PrDOS: predictionof disordered protein regions from amino acidsequence, Nucleic Acids Res. 35: W460-464,2007.

Hosaka, Y., Iwata, M., Kamiya, N., Yamada, M.,Kinoshita, K., Fukunishi, Y., Tsujimae, K.,Hibino, H., Aizawa, Y., Inanobe, A., Naka-mura, H., and Kurachi, Y. Mutational analysisof block and facilitation of HERG current by aclass III anti-crrhythmic agent, nifekalant.Channels 1: 198-208, 2007.蒔田由布子,中井謙太.第2章10.転写因子とシスエレメントの相互作用解析に役立つデータベースとウェブツール.礒辺・中山・伊藤編,分子間相互作用解析ハンドブック(実験ハンドブックシリーズ).羊土社.pp.210―217,2007.鈴木穣,山下理宇,中井謙太,菅野純夫.転写制御領域(プロモーター領域)の同定と解析~DBTSS/TRANSFAC~.中村・礒合・石川・平川・坊農編,バイオデータベースとウェブツールの手とり足とり活用法 改訂第2版.羊土社.pp.65―75,2007.ポール・ホートン,中井謙太.PSORT(ピーソート).中村・礒合・石川・平川・坊農編,バイオデータベースとウェブツールの手とり足とり活用法 改訂第2版.羊土社.pp. 108―114,2007.木下賢吾.タンパク質間相互作用に関するデータベースの構築.生体の科学.58:334―337,2007.

139

Page 37: Human Genome Center Laboratory of Genome Database … · 2020. 6. 2. · the KEGG system programatically, we have been developing and maintaining the KEGG API as a SOAP/WSDL based

Department of Public Policy is a new and the first social science section at theIMSUT. We work for three major missions; public policy studies on translationalresearch, its application to healthcare and its impact on social security; practicaladvices and survey for research projects to build public trust; and “minority-centered” scientific communication. We have conducted a comparative politicalstudy on genetic testing business in East Asia and supported for “BioBank Japan”project from ethical, legal and social standpoints.

1. A comparative political study on genetictesting business in East Asia

To examine broader social and cultural agen-das on industrialization of genetic tests and sug-gest policy implications in East Asia, we’ve beenconducting this research. In the first phase, wehave surveyed political and legal positions re-garding genetic testing business, especially ge-netic tests related to lifestyle and multi-factorialcommon diseases directly to the public or viaclinics in Japan, China, Taiwan and South Ko-rea. We’ve interviewed main players in thisarea; the relevant authorities, bioindustries, phy-sicians, academics and patients support groups.We also conducted literature reviews regardingregulations.

One of the key preliminary findings is thecontrary regulative differences between SouthKorea and Japan. After the fabrication of HwangWoo-suk’s stem cell cloning and unethical hu-man egg collection, bioethics law has been dis-cussed for revision and the government seeksmore strict regulation towards life science andhealthcare. Then, the government takes strict po-sitions towards bioindustries which provide ge-netic tests for children directly to the public orvia clinics. The Korean National Bioethics Advi-sory Commission suggested the first genetic

testing guidelines in 2006, which suggests ban-ning certain types of genetic tests for multi-factorial common diseases with poor scientificevidences. Finally, the Ministry banned 20 typesof genetic tests to be supplied for the clinicaland business purposes, which had been previ-ously supplied by Korean bioindustries. Thebanned genetics tests include obesity, all can-cers, Alzheimer’s, high blood pressures, diabe-tes, osteoporosis, and asthma. Genetic testing forresearch purposes are out of scopes of thesepolicies. South Korean bioethics law will be re-vised within a year or two and add statutes forregulating genetic testing. With regards to labo-ratory quality assurance, the Ministry of Healthand Welfare requires all genetic testing labs inSouth Korea to submit applications about thelist of genetic tests they provide and other infor-mation to qualify their laboratory quality to ob-tain licenses from the Ministry. The licensed labshave to accept laboratory inspection by the Ko-rean Institute of Genetic Testing Evaluation(KIGTE).

On the contrary, in Japan, we don’t have anylegal regulation regarding genetic informationand genetic tests. Ministry of Health and Wel-fare (MHLW) hasn’t taken any action towardsgenetic testing markets and leave self-regulationby bioindustries. The Council for Protection of

Human Genome Center

Department of Public Policy公共政策研究分野

Associate Professor Kaori Muto, Ph.D. 准教授 保健学博士 武 藤 香 織

140

Page 38: Human Genome Center Laboratory of Genome Database … · 2020. 6. 2. · the KEGG system programatically, we have been developing and maintaining the KEGG API as a SOAP/WSDL based

Individual Genetic Information (CPIGI), whichhas been consisted of 25 bioindustries, publishedtheir first guidelines for standardization andbusiness ethics on genetic tests marketing in2007, supported by the Ministry of InternationalTrade and Industry (MITI). The MHLW hasmade metabolic syndrome countermeasures andregular health check-ups for employees over 40will be changed to stratify individual risks formetabolic syndromes since April 2008. It is obvi-ous that bioindustries are ready to enter into thefast-growing market with genetic testing forobesity, which is banned in South Korea forclinical use. We could further compare the socialand cultural difference between these neighbors,such as notions of individual responsibility andindividuals’ right to obtain genetic informationfor their healthcare management. We’re nowstudying Taiwan and China’s political approachfor genetic testing business to obtain wider im-plications in East Asia. Some part of this projecthas been financially supported by the Japan Sci-ence and Technology Agency (JST) and the Min-istry of Education, Culture, Sports, Science andTechnology (MEXT).

2. Ethical, legal and social support for “Bio-Bank Japan” project

For supporting “BioBank Japan” project, ledby Professor Yusuke Nakamura of Laboratory ofMolecular Medicine of IMSUT, we’ve conductedthree types of surveys and issued newslettersfor participants. By the end of 2007, the projecthas obtained 200,000 written consent forms byresearch coordinators called Medical Coordina-tors (MC). The project trained nurses or phar-macists as MCs for obtaining fully informedconsent from participants. This consent processhad been well-worked out in advance and iscomplied with the government ethical guide-lines for genetic/genomic research. However, re-cent publications show that the long and tediousconsent process may not contribute to partici-pants’ understanding the overview of the re-search, may be unethical rather than ethical. Ifwe long for “personalized medicine”, we shouldthink further about the construction of “person-

alized consent process” and we have to changethe relationship between participants and re-searchers, from one-time informed consent tolong lasting public trust.

Obtaining feedbacks from participants is alsoeffective to keep incentives to participation andprevent dropout of participants from researchprocess. We conducted three kinds of surveys toevaluate and improve the consent process andexplore what the project should do for public in-volvement; questionnaire surveys towards re-search participants, a web-based questionnairesurvey towards all MCs and focus group inter-views with chief MCs to triangulate the consentprocess. The preliminary results show that par-ticipants are basically satisfied with the consentprocess and highly evaluate MCs’ attitudes to-wards them. Most MCs also responded thatthey have made their original efforts to maketheir explanation easier and understandable spe-cifically towards the elderly. However, certainamounts of participants have already forgottenabout what for they have donated their DNAand serums and the experience of watching theDVD or the leaflet about the project overview.MCs explains that this project doesn’t have anyplans to disclose personal genotyped data toeach participant, but a certain amount of partici-pants responded that they now want to see theirown genotyped data or tentative research feed-backs, while others are just satisfied with theircontribution to genomic research without anyrewards. Even though participants should forgetthe fact that they gave consent for research,MCs explain, encourage and appreciate partici-pants at each time and participants recall theirwill for contribution.

To appreciate participants’ and MCs’ contri-bution to the project, we had issued “BioBanknewsletters No.1” in August 2006 and “No.2” inNovember 2006. We will explore more methodsand opportunities to communicate with partici-pants. Because the current forms of BioBanknewsletters are available only for the sightedwith good eyesight, we make efforts for person-alized information security to meet with dis-abilities of participants.

Publications

1. 武藤香織.遺伝病ピアサポートの可能性と課題.遺伝診療をとりまく社会―その科学的・倫理的アプローチ.水口修紀・吉田雅幸監修,吉田雅幸・小笹由香編集.ブレーン出版.149―158,2007.

2. 武藤香織.神経難病当事者団体のネットワーキング.神経難病のすべて~症状・診断から

最先端治療,福祉の実態まで~.阿部康二編著.新興医学出版社.187―192,2007.

3. 武藤香織.「オーダーメイド医療」からみえる科学性と不確実性.遺伝子技術の社会学.柘植あづみ・加藤秀一編著.文化書房博文社.177―181,2007.

4. 柊中智恵子,武藤香織.遺伝に対する相談へ

141

Page 39: Human Genome Center Laboratory of Genome Database … · 2020. 6. 2. · the KEGG system programatically, we have been developing and maintaining the KEGG API as a SOAP/WSDL based

の対応.難病医療専門員による難病患者のための難病相談ガイドブック.吉良潤一編.九州大学出版会.64―87,2008.

5. Izumi Ishiyama, Akiko Nagai, Kaori Muto,Akiko Tamakoshi, Minori Kokado, KyokoMimura, Tetsuro Tanzawa, Zentaro Yama-gata. Relationship between Public Attitudestoward Genomic Studies Related to Medicineand Their Level of Genomic Literacy in Ja-

pan. American Journal of Medical Genetics,2008 (in press).

6. Kaori Muto; Truth telling and predictive ge-netic testing―Huntington’s Disease in Japan,In Predictive & Genetic Testing in Asia:social-science perspectives on the ramificationof choice. (University of Amsterdam Press,The Netherlands), 2008 (in press).

142