structure generation on the basis of bct ......structure analysis are in£erence of constituent...
TRANSCRIPT
lS E-T R-81-21
ダ燕.唇 婆
A-fillX: /二こAx
STRUCTURE GENERATION ON THE BASIS OF BCT
REPRESENTATION OF CHEMICAL STRUCTURES
by
Takashi Nakayama
and
Yuzuru Fujiwara“
July 22,1981
Structure Generation on the Basis of BCT
Representation of Chemical Structuxes
abstract
A method of structure generation based on BCT
(bユock ・一 cu’ヒpoint tree ) represen’ヒation of chemical
structure has been developed. The generation pro-
gram is a part of the automatic structure analysis
system of mass spectra( ASASMAS ), and is used
when a set of the inierred substructures are given
as input data. The input substructures are repre-
sented by means of BCT.
1NTRODUCTZON
The major parts of the process of chemical
structure analysis are in£erence of constituent
substructureS and structure generation by cornbin-
ing those infe:rred subst:ructures。 A me’ヒhod of
structure generation £rom substructures already
inferred is described in this paper. Various
schemes for stxuctuye generation have been devis一一
1-10 each of these methods.is based onNa rnethodedr
of representation of chemical s’ヒruc’ヒures, and it
may be said that ’ヒhe meヒhod of representa’ヒion of
chemical structures determines the method oE
struc’ヒure g’enera’ヒio:n. AIl of these me・ヒhods i:n-
cluding the present one pursue the govexning prin-
ciple of reducing the number of combinationsi. :, n
this paperr the chemical structure is represented
in terms・f BCT(b1。ck-cu七P。int tree),U which
clarifies the hierarchy in chemical structures 3wholely so that the idea of the connectivity stack
and the supedr-atom5 are included naturally in our
method presented. Combinatorial problems that
occur in the process of s●ヒruc’ture generation are
thus partitioned into 一qubproblems and’are classifi-
ed into strages.
一 1 一
Struc・ヒure generation is regarded as a problem
in combi:natorial analysis. The:re must be neither
duplications nor.omissions in the genera’ヒed struc一
コturesr and the method of st:ructur母 gene:rat::Lo・n
should be e:Eficien’t∫ processing ti:me shou:Ld be
short in prac’ヒice. The prqcessing ’ヒime depends
::Largely on the method of representation of chemi-
cal structures. The constituent unit of ’ヒhe BCT
representation of ohemical struc’ヒures is a block
( abiconnected component of a g:raph )r and this
makes it possible to reduce ’ヒhe nu皿ber of combina一
サtions greatly by omitting atom-by-atoln processユ:ng.
Another feature of the rne’ヒhod of s・ヒructure genera・一
tion described here is the availabi].ity of a graph
database. The various graphs which appear in ’ヒhe
process of s七ruc・ヒure generation are no’ヒ generated
for each case, but ins’ヒead are.:re●ヒrieved from the
g:raph da剛ヒabase.
一 2 一一
REPRESENTAT1ON OF CHEb41CAL STRUCTURES
Chemical structures are represented by means
of graphs, and atoms and bonds of’a chemical
structure correspond to vertices and edges of a
graph respectively. However, it is inefficient to
process chemical struc’ヒures at the atomic leveユ,
and sometimes a superatom/ring assembZy is used
as the processing unit. Xn ASASMAS r the concept
of ’ヒhe superat:om is ex’ヒend,ed ’ヒ。 rep:resent the
chemicaZ structures in a way that is consistent
with BCT.・ The following is a brief description of
12BCT rep:resentation of chemical s・ヒructures.
Let u∈V be a ver七ex of a connected graph
G=( V,E ). A vertex u is a ”cutpoint” if the
rernoval of u yields the disconnectivi’ヒy of the
graph G. A ”block” of a graph G is a maxirnai sub一一
graph of G which contains no cutpoints. Nowr
bc(G) ( biock-cutpoint graph of G ) is defined:
T=( W,F ) is a bc(G) if (1) W=AuB is a set of
vertices where A=’ o ai, ... ,an } is a set of
all cutpoints of G, and B =’ o bl, e” ,bm } iS a
sev of all biocks of G, and (2) F i{ flr ’“’ ,fz }
i{( ai,bj .)1 ai E b」r ai E Ar bj E B }, where
ai Ebゴmeans that a cutp・int ai is a member。f a
一 3 一
set of vertices of block bj e A bc(G) has the
following properties: 〈i) it is a bipartite graph
of subsets A and B, (2) it is a tree regardless of
the original structurer 〈3) terminal vertices
correspond to biocks of G, and (4) the distance
between any pair of 『ヒe:rminal vert二ces is an even
number. A bc(G) is called a block一一cutpoint tree
( BCT ) because of property (2). A tree is a BCT
if and onlY if it possesses p:roper’ヒy (4), and this
property is used for generating BCT. Examples of
the BCT representa’ヒion of chernical s’ヒ:rucut:re・s are
shown in Figu:re l. The in’ヒernal structure of each
block is fUed in the block dictionary.
一 4 一
GRAPH DATABASE
The graph generation p:rocedu:re which wouユd
usually be performed in the process of struc’ヒure
genera’ヒion is :replaced by the ret:rieval of object
graphs from the graph database. As it is impossi-
ble to prepare aU of the graphs, a graph which
canno’ヒ be found in ’ヒhe graph database is generated
when it is first required and this generated graph
is also added ’ヒ。 the database. There are five ma-
jor fUes in ASAS!YLAS: (i) a compound file, (2) a
table’ 盾?@correspondence between substructures and
mass spectral components ( CSSC ), (3) a bloek
file, (4) a BCT fiie, and (5) a spectrai data file.
Acompound file and a CSSC con’ヒain’ヒhe chemical
st:ructures or substruc’ヒures in the form of BCT
representation r and are Used for inferring sub-
s七ruc’ヒures in ’ヒhe process of s’ヒruc’ヒure analysis.
The i:n七ernal structures of block in tha’ヒ BCT rep-
resentation are stored in a block fiie. A BCT
fUe is a set of BCTs which are organized accord-
ing to parameters ( mrn )r where rn is the number
of blocks and n is the number of cutpoints of a
BCT r and is used when needed du:ring ’ヒhe process of
structure generation. 1Vhe records of a BCT fUe
一 5 一
are trees which possess the BCT’s property (4)
described previous sec.tion and furthermore satisfy
・ヒhe condition;deg u≦4, where u is a cutpoint of
a BCT t because the degree of u cannot exceed the
valence of u which is considered to be an atorp
such as carbQn, oxygen, or nit:rogen・ The ou’ヒユine
of ’ヒhe system ASASMAS (especially the rela’ヒion
between programs and these five data files ) is
shown in Figure 2.
一 6 一
Dv!ETHOD OF STRUCTIL]RE
9ty!12i1ne一..gS.s1i11Lg1iig!z2一一gguergS!9一.utline of structure t
GENERAT 1 ON
Prograrn
GENERAIr工ON is initia’ヒed when a se’ヒ of subs’ヒruc一・
tures bl, 一e e. tbm is given as input data, where
blr ’” , bm are simple blocks that have been
i:nferred by program ANALYS工S. GENERAT工ON genera’ヒes
all the possibZe eombinations of blocks bl, e’e r
bm with neither onission nor duplication. The
outline of the procedure.is: .(i) all the BC[Vs of
mblocks and n cutpoin’ヒs a:re gene:radヒed, whe:re
lsn一くm-17 (2) blocks blr ... rbm are assigned
to block vertices of a generated BCT ( labeling );
(3) cutpoints are assigned to vertices in each
block of ’ヒhe labeled BCT: and (4) obゴect s’ヒ:ruc一一
’ヒure『 are obtained by connec’ヒing blocks according
to the cutpoint assignment.
Generation of ( rn,n )BCT.. A BC7’ of m blocks
and n cutpoints i.s denoted by ( mrn’ )BCT. The
first. step of・ structure generation which generates
( m,n )BCT is for the most part accoMplished by
retrieving them from the BCT f“e. This reduces
the pxocessing time of structure generation.. When.
m blocks are given, the number of cutpoints n var-
ies from l ’ヒ。 m-1, i.e., 1≦n≦m’一1。 The case of
一一 7 一一
n=l corresponds to a ’1 star te whose center is a
cutp。int’and n=m闘l is the case that the der
gree of every cutpoint is equal to two・.
The procedu:re of (m,n )BCT generation is to
collect・ヒhe trees that ha▽e m block vertices and
n cutpoint vertices. Let :N-tree denote the tree
コ ロ ロ
wユth N vert■ces, where N=・m+n. An N ・一’ヒree ユs
genera’ヒed from an (N-1)一tree by adding one ▽ertex.
N。w’let T={tl’t2’…’tZ}beaset。fall
(N-1)一trees’where ti represents an(N-1)一tree’
and le七 T. be a set. of trees which are generated =L
fr・m ti・T by adding。ne vertex’T’ = Tl u [D2 u
・∴uTZ is a set。f N-trees・T#eref。re the
generati。P pr。cedure。f N-trees is divided int。
tw。 parts・the generati。n。f Ti(i=1’…rZ)
and the genera’ヒion of the union T 1..
(1)Generati。n。f Ti・Let ti =(V’E)’ti・T・
:Fi:rst a set of▽ertices V is divided in七〇〇rbits
of per:muta’ヒion g:roup on V. The element of this
permutati・n gr。up is aut。m。rphism。f ti’that is
to sayt the perrnutation on a vertex se’ヒ▽which
preserves the adjacency relationship among ver一
コt:Lces:
一一 8 一
V=VI U’V2 U’@’” U Vr, V垂獅u早=早@for any p¢q.
The vertices in a subset▽j(ゴ=・’∴・rr)are
topologically equivalent. Secondv an’ m-tree is.
generated from ti by adding one vertex to the
arbitrary ve「tex in Vゴ(」 :1’… C’「)・The「e-
fore the number of generated N一一trees is the number
of oxbits, iee., iTi l=re
(2) Generation Qf T’. T’=Tl u T2 U “’“ U 1Vz
gives七he t。tal N騨t「ees’but Ti∩Tゴφd。se n。t
always hold。 The duplications ( i.e., ’ヒhe e1-
ements of T. n T. ) should be eliminated. This ユ コ
problem is so1▽ed by introducing linear order in’ヒo
the set Of N-t:rees; an :N-tree is coded in.ヒ。 numer一一
ical sequence by Edmonds’ rnethod,13 and is stored
in the appropriate vertex of binary tree which
should represent the set of N一一trees.14 Duplica-
tion is detected when a generated N-tree is found
in七he ve:rt:ex in which i七 should be s’ヒ。:red. The
set of N-trees is generated in ’ヒhis :manne:r, but it
consists of BCTs and non一一BCTs. An N-tree is a BCT
if and only if it satisfies the BCT’s property (4)
described in sec’ヒion 2. An :N-BCT is a bipar’ヒite
graph w:hose vertex se’ヒ consists of block and
一 9 一
cutpoint二ver’ヒices, and are classified according to
the parameter ( mfn )r where m and n are the num-
bers of block ver’ヒices and cu’ヒpoint ve:rtices re-
spectively. The BCTs which are stored in the BCCr
file are restricted to those in which the degrees
of all cutpoint vertices do not exceed four r
because the ceiling of the free vaience may be set
to four for most organic compounds. These are
calユed ▽aユid BCTs. Table 工 shows the number of
BCTs in each class corresponding to ( mrn ). The
valid ( 6,n )BCTs ( n=2, e.. ,5 ) are shown in
:Figure 3. The to.ヒal number of valid (.6,n )BCTs
is 20.
Labeling vertices of ( mrn )BCT. A BCT is
a bipa:tite graph whose vertex set consists of
b10Ck and CUqヒpOin’ヒ Ve:r’ヒiCeS, and the セe:rminal
ver’ヒices are all block ve:rtices. Therefore, it is
easy to distinguish block vertices from cutpoint
vertices. The second step of the structure genp.
eration is to assign the elemehts bl, ’” ,bm
of the given block Set to the block vertices of the
( mrn )BCT. This procedure is called labeling.
Now let Cl, ’” rCk denote the colors of
blocks, and let n, be the number of blocks whose ユ
一 io 一一
coior is Ci ( ’堰≠P,... , k ), where it is assumed
thaV nl S n2 S’”・Snk without ioss of generality.
Labeユ.ing is in block number orde:r; the first is C lr
the second C and so on. 2r
(1) if nl=n2=’”=nk-i=i, Ci (i=lr...r-
k-1 ) is assigned to ’ヒhe :represen’ヒative block
verteX of a class partitioned on the basis of.
’ヒopochromatic equivale:ncy. .If block ▽ertices of
an ( mrn )BCT are partitioned into Z classes
gl,’” r gz, there are Z ways of assignments of ci.
When’ bi is assigned to the representative of a
class, b梶@r the foUowing condition should be sat-
isfied between ’ヒhe degree of ・ヒhe representa・ヒive in
the ( m,n )BCT ( deg bj(Ci) ) and the number of
the constituent vertices of the block Ci ( I Ci l・ ):
deg bj(ci) s ICi l (A)
If there are no classes which satisfy the condi一一
tion (A), it means that the assigrment of
Clr’”,Ci一.1 Which resulted in the assignment of
the Ci should be forbidden. After the assignment
Of Ci,’”rCk’D.P is finishedr Ck is assigned to
’ヒhe remaining block vertices、under condition (A).
一 il 一
This labeling Procedu:re can be :represented in the
f。rm。f a tree’and an assignrnent・f C1’●●●’Ck
corresponds ’ヒ。 a path in ヒhe tree which connects
’ヒhe roo’ヒ with a 七er:minal ve:rtex. Therefo:re the
number of labeling methods is equivalent to the
number of terminal ▽er’ヒices of dep’ヒh (m一ユ ) i:n
・ヒhe ・ヒree, and there a:re no duplica’ヒes. This セree
is calユed a genera’ヒion ’ヒ:ree・
(2) When N.≧2 for some ゴ<k, the ve:r‘ヒices ⊃
j-1 j ・一l
whose dep七h’s fe.m p三、np t。(nゴ’+pl、np)
f。rゴ≧2’。rfr。mOt・(nゴ1)f。「ゴ= lra「e
the same coユor in the genera’ヒion tree, so if ’ヒhe
procedure described abo▽e is applied, there will
be duplica’ヒion. To avoid the duplication r color
Cゴis assigned nj times t。 bl。ck ve「tices:Fi「st’
the block ver’ヒices to which colors are no’ヒ ye’ヒ
assigned are pa:rtitioned according to topochromat-
ic equi▽a:Lency. This partition is not limited to
・ヒhe simple elassifica・ヒion based on orbits, but is
further subdivided。 Those classes are organi2:ed
hierarchically. The subdi▽ision is pe:rfor:med
ロ ロaccording to ’ヒhe deg』ree of consangu■n■ty a:mong
vertices. If the dis’ヒance between vertices u and
一 ユ2 一
v of the same orbit in a rooted ’ヒree is k, ・ヒhe
degree of consanguinity between u and v is k.
This is denoted by san( u,v )=k. The relation
introduced into vertices of the sa.me orbit as the
degree of consanguinity within k ( san( u,v )sk )
is an equivalence relation ( denoted by ”:” ).
The following apply if u, v a:nd w are ver’ヒices
which be工ong to ’ヒhe same orbit: :L. a reflexive
law: u 1 u ( san( u,u )=O ). 2. a syrnmetric law:
if u E vr then v i u. 3. a transitive law: if
u≡▽ and v…wr’ヒhen u…w. (The transit=i▽e law
is shown as follows: assuming that u:wr and ×
is the closest co㎜on ancestor of u and w,七here
should exis七 two paths which connec’ヒvwi●ヒh y or z
(xx) which are on the path between u and w. Then
’ヒhere.exis’ヒs a cycle which contain verヒices x, y
and z, and’ヒhis contradicts that x, y and z a:re
’ヒhe ver.ヒices、of a tree. ) If partitions of
san(u,v)skl and of san(urv)一くk2 are ob一一
tained f。r kl and k2 when klくk2’the f。rmer must
be a subdi▽ision of the la’ヒter. The:refo:re 七he
hierarchical partitioning structure is obtained
based on the relation of t!ne degree of consanguin-
ity ( see Figure 4 ).
一 13 一
This par・ヒi七ioning s’ヒructure can be :represent-
ed in ’ヒhe fo:rm of a tree (called a part:itioni:ng
tree, as in Figure 4 ). Then ’ヒhe :Labeling problem
at this step is equivalent to enumerating all the
ways。f assignment。f the same things(c。1。「Cゴ)
・ヒ。 ’ヒhe rooms (vertices in a leve1 ) stepwise
in order of dep’ヒh. The capacity of each.roo:m
( i.e., the rnaximum number of colors to be adlni’ピー
’ヒed ) is de’ヒermined・for every depth. That is to
say r the capacity of a room is equivalent to the
suTin of the nurnbers o£ members of classes corres-
ponding ‘ヒ。 ’ヒhe descendant lea▽es of the ver’ヒex
( a roorn ).
Afte「c。1。「Cゴis assigned t。 the leaves。f
a partitioning tree, specific block vertices to
which color C」 is to be assigned are seiected
arbitrarily from leaf classes by the .number of
assigned Inembers. In the genera’ヒion t:ree t label-
inq paths a「e g「。wn by c。nnec七ing Nゴmembe「s se.
lected seriaily with the paxent. This probZem is
equivalent to the problem of partition of an in-
teger under some restriction.
The example of iabeling one of the ( 6 r 4 )BCT
is as follows: A ( 6,4 )BCT and block colors Cl,
一 14 一
C2 and C3 which sh。uld be assigned. t。 it are sh・wn
in F4gure 5(a)・ Block vertices of the ( 6,4 )BCT
a「enumbe「edfr。rP l t。6・As nゴ1’n2’=2
and n3=3(i・e・’n1<n2<n3)’Cl’C2 and C3 are
assigned in this order. And
#n。de(C・)== 3’#node(C2)=6’#n。de(93)=2
and
degl=deg2=deg4 t deg5=deg6 = 1;deg3=4
so c。lo「s Cl and C3 cann。一t be assigned t。 vertex 3
(condi’ヒion (A) does no’ヒhold ).
(1)Assignment。f Cl・Bユ。ck vertices are parti-
ti。ned int。 three・。rbitS’gl i{3}’92 =’{1’2}
and 93={4・5’6}’but gl is rejected by・c・ndi-
ti。n(A)・τheref。re there、 are tw。・kinds・f assign-
ment。f Cl ・’92 and g3・It is・arbitrary t。 select
コ ロ
representative vertユces from each class, and the
selecting vertex l fr・m 92 and vertex 4 fr。m g3
c。「「esDP。nds to the r。。ts。f generati・n trees Tl
and T2 sh。wn in Figure 5(b)・
(2)Assignment。f C2・In case。f generati。n tree
Tlr the remaining vertices are partiti。ned int。
three・rbits 91 =’ o2}’92 ={3}and
-15 一
g3={ 4,5,6 }・ Since further subdivision based on
the degree of consanguinity dose not exist for
these classesr n2 (= two) of the color C2 are
assigned to gl, g2 and g3. There are four sets of
assignments, { glrg2 },{ gl,g3 }t{ g2,g3 } and
{ g3 }. The number of members assigned to each
class ( each element of a se’ヒ calculated above )
are { 1,1 } r { 1,1 } ,{ 1,1 } and { 2 }.
And finallyr selection of specific members
from each class is arbitrary, so we get { 2 r 3 } ,
{ 2,4 },{ 3,4 } and { 4,5 } as examples.
Procedures for generation tree T2 are simi1ar
and straightforward.
(3) Assignrnent of C3. This is the final color to
be assigned, and the only w.ay is to assign C3 to
ail the rernaining vertices. However r it should be
noted that some paths are blocked by condition (A)
upon assigning C3; color Cl and C3 cannot be as-
sig:ned ’ヒ。 ver’ヒex 3, and ’ヒhis is shown by X in
Figure 5(d). Final generation ’ヒ:rees are ob七ained
as in Figure 5(d) when C3’s are assigned.
Specific labelings corresponding to paths
connecting a root with leaves of depth 5 are shown
in Figure 5(e)
一 16 一
一 工tis not clearhow those labeled biock vertices are connected at
the atomic level at this stage. The detailed
structures whSch specify the connectivity among
blocks at ’ヒhe a’ヒomic level are generated by alloc-
ating cu・ヒpoints in each block of ’ヒhe labeled BCT.
:For examp二Le, in the ( 6,4 )BCT in Figure 5(a),
cutpoints al from biock 1 and 2, cutpoints al, a2,
a3 and a4 frorn block 3, cutpoint a2 frorn b1ock 4,
cutp。int a3 fr。m bユ。ck 5’and cutp。int a4 fr。m
block 6 are selected and the connectivity among
blocks at the atomic level is determined.
The procedure of selecting cutpoints
al, ’” ,ap from a block bi is ’≠刀@follows: let
j= 1.
(1) The constituent vertices of bi are partitioned
into orbits: gi, ’” , gq’
(2) An arbitrary vertex of gk (k=1, e e. , q ) is
selected as a candida七e fo「aゴ・The「e a「e q kinds
of selections of aj for a given selection of
alr e1’@la梶|le
(3) 1f j=pr then end, Otherwise, go to (4).
(4) The selected vertex is regarded as a verヒex
which is given a new coior a」; go to (i) with
一 17 一
ゴ←」+1・
This procedure is als・o implemented i:n the
fo:rm of genera’ヒion tree described in the p:revious
Xab61ing procedu:re fo:【’BCT。
The example of cutpoint assignment for a ben-
zene ring is shown in Figure 6。
コ ロ
Connecting blocks accord:Lng to cutpo:Ln’ヒ as.
コ ロsignment. Using the me’ヒhod of cutpoint ass■gn一
:ment for each block ve:r’ヒex of a labeied BCT,
structures are genera・ヒed by connecting blocks with
othe:r blocks in turn, fusing the cutpoints which
should be ’ヒhe same atom in ’ヒhe gene:rated s’ヒruc・・一・
tures.
工f there are no symme’ヒries in a labeled BCT
ヨ
( i.e., if ’ヒhere are no block vertices which are
equivalent topochromatically ), al]_the cOmbina-
tions of cutpoin・ヒ assignmen・ヒs for blOck vertices
gi▽e differen’ヒ assemb二Led structures, i.e., there
are no duplicates among them. The number of
ロ
。。mbinati。ns・f cutp。int assignment is ll Zi’ i=1
where there are Z. kinds of cutpoint assignment ユ.
in bユ。ck bi・H・weVer’if there is a sy㎜etry in
a labeled BCT, ’ヒhere are duplicates in
一 18 一
ロ
IT Zi structures・T・a▽・id these duplicates’the i=1
p:roblem of connecting the bユocks is conside:red in
the same way as the proble:m of coloring blook
vertices. This is equivalent・to selecting a kind
of cutpoin’ヒ assignment for each block vertex,
where the kinds.of colors are equivalent to the
kinds of cutpoin’ヒ assignme:nt inherent ●ヒ。 the
blocks. The:refore, ’t二his problern is solved sirn-
ilarly ’ヒ。 the problem of BCT labeling. The struc-
tures gene:rated by connec’ヒing blocks of Figure 5(e)ノ
are shown in :Figure 7. Six structures shown in
the lowes’ヒ part of tt j gure 7 correspond 七〇 the
first and last three assignments for six-ring (C2)
shown above theln. There is only one labeled BCT
which could be cormect:ed at ’ヒhe a’ヒomic level.
The other th:ree BCTs are excluded by the condition
thatl 狽??@degree。f a cutp。int must n。t be greater
than four.
Partitioning Ve:rtices in’ヒ。 o:rbits。 When
block ▽er七:ices of ( m,n )BCT’are labeled, or cu・ヒー
points are selec’ヒed from cons’ヒituent vertices of
a block, it is necessary to partition these ver-
tices into orbits according to topochromatic
一 19 一
equivalency. Vertices u and v of a labeled
graph ,, G are equivalent toporochromatieally if
f(u)=v fdr some au・ヒomorphism f of G, where an
au・tomorphism of a graph G is a permu’ヒa’ヒion on a
set of vertices of G which prese:rves the adゴa-
cency among vertices. A set of aZl automorphisms
of G forms a group. The orbits of this group
are the classes partitioned according to topo-
chromatic equivalency.
Therefore, if all the autornorphisrns are
found, it is easy to par’ヒi’ヒion vertices. In prac-
tice, it is easy to find all the automorph±srns for
a g:raph ’ヒha’ヒ is not highly symmetrical by using
backtrack search. The procedure of finding auto-
morphisms is as follows:
:Le・ヒ V={1,2』, … ,n} be a se・ヒof ver・ヒices
of G. The procedure checks whether the adjacency
relationship among vertices is preserved for a
permutation7
f=〈1’i i’2 111 2’n>
Let f(k)=( il, ’” , ik ) denot.e the left por一
・ヒion of a permu・ヒatiOn f correspoDding ’ヒ。 k
一 20 一
ロ ロ
vert■ces, =L。e.,
k+1 … n
f= f(:k) コ
■k÷1 ’●● ㌔
Assulning that f(k) is given, f(k+1) is de『ヒer-
mined as f。ユ1。ws・(1)ik+1・V 一’{il’…’ik}・
(2)B。th(k+1)and ik+1 bel。ng t。 the same
orbit. (.3) The、 adjacency relationship formed by
▽er’ヒices ’{ 1, … ,k,k+1 } shOuld be formed by
{i、’∵・’ik’ik+、}’whete it is guaranteed that
the adゴacency rela・ヒidn白hip formed by ’{ 1, … ,
k}is f。rmed bジ{il’…’ik}’s・it。nly has
’ヒ。 be guaran’ヒeed that the adゴacency be’ヒween、 i :k+l
and{il’…’ik}is the same as the。neわe-
tween ( k+1 ) and ’{ 1, 。・・ ,k }. Au・ヒomorphisms
are obtained when the procedure reaches f(n).
If the:re are no ▽er ti ce$ that satisfy the condi一
・ヒions (1), (2) and (3), then ’ヒhere are no auto-
morphisms which contain f(k) as a part of a
permutation(see Figure 8).
It is not required necessariユy to find alユ
’ヒhe au’ヒomorphisms fo:r partitioning vertices into
orbits, because verdヒices can be classified into
some classes ( these a:re integ:rated ・ヒ。 some orbits )
一 21 一
when an autpmorphism is found, and the autornpr-
phism that should be found next is conditioned
’ヒhat it has to con’ヒain a’ヒ least one cycle that
maps a vertex ( member ) in some ciass to another
cZass’s member. Tf there is no automorphism that
satisfies ’ヒhe condition, ’ヒhe :resulted classes give
the orbits. The program ’ヒha’ヒ is i:mplemented ac-
co:rding to this algori’ヒhm is also prepared, and is
useful for getting orbits for highly symmetrical
graphs.
partitioning vertices o£ a tree into. p.rb-i.一t.一s.!
The pr.ograms that give partitioning. of vertices
are prepared for both highly and not-so-highly.
sy皿metrical graphs by finding au’ヒomorphisms de-
fined on a set of vertices. On the other handr
thd particular partitioning progedure that dose
not use automorphisms explici’ヒely is prepared for
trees.
This parti’ヒioninq is accomplished by using
Edmonds’ canonical form of a tree. Assuming that
there is one ”center” in a given tree. Then the
onユy vertex ’ヒhat is topologically equiva].ent to
the center is itself. Regarding this tree as a
rooted tree whose root is ’ヒhe center vertex r
一 22 一
vertices u and v are equivalent if and only if
they satisfy the following three condi’ヒions:
(i) Vertices u and v are the same distance frorn
・ヒhe center.
(2) The parents of u and v are the sarRe vertex r or
an equivalent one.
(3) The subtrees whose roots are u or v are iso一一
morphic, where a subtree is defined as a subgraph
七hat consis’しs of a ver’ヒex ( u or・▽ ) and its all
the descendants in the given rooted tree. Since
these conditions are checked inmediately in terms
of Ed:monds, canonical form of a t:ree, par’ヒitioning
vertices is given by investigating vertices for
e▽ery dis’ヒance f:rom the :root to leaves of a maxi-
rltal distance. The classes partitioned in this
fashion agree wi’ヒh ’ヒhe orbits of .ヒhe automorphism
group described previously‘
When there are two centers in a given tree,
it can be coped with by inserting a dummy vertex
between ’the two vertices, converting the case of
one’ モ?獅狽?秩@( see Figure 9 ).
一 23 ・一
CONCLUSION
Ame’ヒhod to genera’ヒe chemical s’ヒ:ructu:res is
p:resen’ヒedr and it is based on BCT represen’ヒa’ヒニLon
of chemical s・ヒructu:resr on labeling nodes of bi・一#
partite graphs ( BCT ) corresponding to blocks and
on allocating cutpoints in the blocks.
The programs were im,plemented in FORTRAN on a
ACOS一一900 cornputer, and the spectraZ data were
edi’ヒed from EPA/N工H Mass Spec’ヒral Database ( 1975
edition, abou’ヒ l1300 records )。 BCT file is or-
ganized according to parame’ヒe:r (m,n ) as shown in
TabZe 1. The execution speed of structure genera-
tion depends on the times of orbi’ヒ compu七a’ヒion in
the generation s’ヒages such as BCT labelingr cu’ヒ・一
point assignmeht and then connecting blocks.
Apparen’ヒly o:rbits should be computed fo:ビ each
node of a generation ’ヒree except lea▽es, so ’ヒhe
numbe:r of orbi’ヒ computa’ヒion equals 七〇 ’ヒhe 孕umber
of nodes ( not leaves ) of generation trees.
The execution time of orbi’ヒ computa七ion depends on
the nuruber of vertices and symmetric degree o£ the
graph, so we prepared three types of programs for
orbit computation. This rnethod of structure gen-
eration is useful for autornatic structure elucida一一
一 24 一
’ヒユon, and consti’ヒutes a part of ASASbtEAS。
一 25 一
(1)
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
(1 O)
(u)
REFERENCES
NelsonrD.B.; MunkrM.E.; GashrK.B.;
Hexald,D.L.Jr. 」.Org.Chem. 1969, 34, 3800.
AntoszrF.J.; NelsontD.B.; Herald.D.L.Jr.;
Munk,M.E. J.A:n.Chem.Soc. i970r 92, 4933.
ShellyrC.A.7 MunkrT“1.E. Anal.Chim.Acta.
1978, 103r 245・
Shelly,C.A.7 Munk,M.E. 」。Chem.工nf。Comput.
Sci. 1979, 19, 247.
Kudo,Y.; Sasaki,S. J.Chem.Doc. 1974, 14r
200.
KudorY.; Sasaki,S. 」.Chern.Znf.Comput.Sci.
1976r 16r 43・
Masinter,L.M.; Sridharan,N.S.; Lederberg,」.;
SMithrDeH・ J・AM・CheM・SOC・ 1974r 96, 7702・
rylasinter,L.M.; Sridharan,N.S.7 CarhartrR.E.;
Smith,D.H. J.Arn.Chern.Soc. 1974, 96r 7714.
Carhart,R.E.; SmithrD.H.; Brown rH.;
DjerassirC. 」.Am.Chern.Soc. 1975, 97, 5755.
Carhart,R.E.; SmithrD.H.; Brown,H.;
Sridharan,N.S. J.Chem.rnf.Cornput.Sci.
:L975, ユ.5r l24.
Nakayama r T.7 Fuj iwara,Y. 」.Chem.1nf.Comput.
SCi’. ’1980r 20r 23・
一一 26 一
t
(1 2)
(Z 3)
(1 4)
Harary r F・ ”Graph Theory”; Addison-Wesley:
Reading, Massachusetts, i969.
Busacker,R.G.; Saaty,T.L. tv Fini’ヒe G:raphs and
:Netwo:rks: An 工nt:roduction wi・ヒh ApPlications ev,
McGraw-Hill, 1965; Chapter 6.
Aho,A.V.7 Hopcroft r J.E.; Ullman r J.D. ”The
Design and Analysis of Computer趾gorit㎞”,
Addison-Wesiey: Reading, b4assachusetts,
1974e
一 27 一
Figure 1.
Figure 2.
Figure 3.
Figure 4.
Figure 5.
Figure 6.
F工GURE LEGENDS
BCT representation of chemical s’ヒruc-
tures。
Block diagra:m of the au’ヒomatic struc-
ture analysis system of mass spectra
(ASASMAS )7 GENERAMON is the program
set for s’ヒ:ructu:re generatユon・
‘▽alid (m,:n)BCTs for m=6.
Par’ヒitioning of ▽ertices of a t:ree
コ ロincluding『一ヒhe degree of consangu■n:L’ヒy.
(a)As:kele●ヒon of a(6,4)BCT and
blocks.
(b)Assignment。f Cl t・(6’4)BCT and
roots of corresponding generation
trees Tl and T2・
(c) Grow七h of generation ’ヒrees ’ヒhrough
assignment of C2・
(d)C。mpユeti。n。f generati。n tree Tl
a:ndT2・
(e).F・ur kinds。f labeling c。rresp。nd鱒
ing 七〇 four pa’ヒhs of genera’ヒion
’ヒrees.
(a) Parti・ヒioning of vertices・
(b)Assignment。f Cl and f。U。wing ’
一 28 一
ロ:Fユqure
ロ:F:Lqure
7.
8.
Figure 9.
Tabユe 工.
par七i・ti on,in9.
(c)Assigrment・f C2 and f・ll。wing
の ロ ロ partユtユon■ng・
(d)Assignment。f C3・
Connecting blocks at the atomic le▽el e’
Examp:Le of ▽er’ヒex par’ヒitioning. There
are 32 au’ヒomo:rphisms on the set of
verdヒices of ’ヒhis graph.
Example of tree ve:rtex par’ヒitioning.
The:re are 9! au’tomorphisms on the se’ヒ
of ▽er七ices of this tree. In prac’ヒice,
it is irnpossible and unnecessary to get
all of them for ’ヒhe pu:rpose of par’ヒi一
コ ロ
のtユon:Lng vert■ces.
N umber of BCTs classified according to
parameter(ln,:n).
一29 一
一・・一j〉
一一”堰
.
oe一.p一一〇一一e一一〇一“H)
Figure i
GENERATION
ANALYSIS
structurat data
compound data
B C T
btock dictionary
しEARNING
c s s c
spectral data
Figure 2
(m,n)=(6,2)
(m,n)=(6,3)
(m,n)=(6,4)
(m,n)=(6,5) o{ro-eK>・e>一〇一e-o一)>o
Figure 3
1 2 4 5 8 10111 3141 161718
sah《。,.)≦4 sah(…)≦2 san(u,v) S 6
一一一一一一一一一一一一 san(v,v)〈一6
{1,2.’r’ .18 }
一一一一一 san(u.v)〈一4
〈1 .’”・6}・ {7・ ’” s12}’ e {13ih・18}
一一一一 san(u.v) (一2
{i.2.3}.{4.5.6},…,{16.i7,18}
tree representation ot partitioning structure
Figure 4
1
2
4
3fila2 E
al Y a3 a4
6
cl CuP
c2く!)P
C3 O一一一〇
ni = 1
n2= 2
n3= 3
Figure 5(a)
1
2
C13
T1
4
6
51
2
3
e) 4
T2
6
5
(i) ”””’Cl一・“一一一@
Figure 5(b)
4
5 3
T1
1
2
””M“””””@Cl e一一””””..”“”
3) 一一… C2 一一一一一一一 Cl
‘c,) (4) 一一一一 C2 一一一 (2) (3
T2
4
5
3
5
5
6
Figure 5(c)
Tl T2
1 一一e一一一一一一一一一一一一一 Cl 一“一e一一ee一一e-ee一一一一 4
4 2 3 “ee-eC2 一一一一一一一一 1 3 5
5 3 4 4 魯●●●りC2 ’・一. 2 3 5 5 6
x4
x2 e一一“e
c3 一一・””K.
2x
1x
5 5 一一ee一C3 ・一一一一一・一 5 2
’
6 6 一一e一一C3 ”’‘一・一”.. 6 6
Figure 5(d)
C3
Cl Cl
C2 C3 c C3
C3
C3
Cl Cl
C2
C2 c
C3 C3
C3
Figure 5(e)
1
1〔つ;
1
Cl
l(e)1
3
Figure 6(a) Figure 6(b)
Cl
l〔ゴ
3
Cl
l(慧 Cl
l(g)1
C2
Figure 6(c)
Cl
C2C3
Cl
C3
C2
Cl
ce3 2
Cl
Cl
C3
Cl
“,2
C3
Cl Cl
Cl
C2
Cl
Q,3
C2
Figure 6(d)
病
1
2
3
4
一一 v
cC’o・・一一一一・
…αく
一一bh一
口〇一
C3
c
Cl
C3
c
c
2. .41<=>
3
163 3. .21く=S・
4i 一2
1《隠
3. .4
iO2
2i 21b“ 3’ 3 2A 2i〈 :>3
4 4
3. L2 3
く}1、
く:S・ i
3 4
,b2 ie 4 4
Q, ,
,b,
督 3
〈=〉・
2. 一4 2
3
之8“ 4. .2
103 1
4 2
1タ>1
・4 一3・
1<=>21
1( >34
2. i3
04く季・
2
“3 4
く⇒・
2
1擬
づ・
2
1費1つ・
Figure 7
4
4
3
3
3
3
2
2
1 5
6
6
7
7
7
7
8
8
Figure 8
a112
6
133 2
11
5G
3
2
2
7Q
1
1
4
3
2
Y8
3
103
al 9
nodecode
ロ ヨ
12567389104111213
.13 411141114111 」一■一一圏一」L-r■一一」一
SI S2 S3
S1竃S2翼S3 罵〉‘2翼3耀4
sim鷲arly
5嵐6■7口8■9臓10ロ11●12●13
Figure 9
n霜 ユ 2 3 4 5 6 7
2 ユ
3 1 ユ
4 ユ 1 2
5 1 2 3 3
6 ユ 2 6 7 6
7 ユ 3 9 17 ユ8 ユユ
8 1 3 ユ3 30 51 44 23
9 ユ 4 ユ7 53 109 ユ48
ユ0 1、 4 23 79 2ユ3
11 1 5 28 ユユ9
ユ2 1 5 35
ユ3 1 6
ユ4 1
Tabie 1. Niumber of BCTs classified according
to paraneter ( m t n ).
INSTITUTE OF INFORMATION SCIENCES AND ELECTRONICS UNIVERSITY OF TSUKUBA
SAKURA-nURA, 麗II}一{ARI・一GUN, 18ARA}く1 305 JAPAN
REP(⊃R丁D(X:二Ufl ENTAT I ON PAG EREPOR[if NU蟹BER
工SE-TR-81-21
TZTLE Structure Generation on Chemical Structures
the Basis of BCT Representation of
AUTHOR (s)
Yuzuru Fuコ■wara
Takashi Nakayama
[工nstitute of 工nformation SciencestUniversity of Tsukuba]
[Central Research Laboratory,Kuraray Co. Ltd.]
REPQRユl DA〔でE
July 22, 1981
MA工N CATEGORY
3.7 lnformation Retrieval
NTUtYIBER eF
37
PAGES
CR CATEGORIES
3.79
KEY WORD S
Graph database/graph generation/BCT graph/ASASMAS/learning system
ABS[rRACT
A method of structure generation based on BCT (block cutpoint tree)
representation of chemical strucVure .has been developed. The generation
program is a part of the automatic structure analysis system of mass
spectra (ASASMAS), and is used when a set of the inferred substructures
are given as imput data. The input substructures are represented by
means of BCT.
SUPPLEMENTARY NOtll]ES