study of graph patterns classification using …42 study of graph patterns classification using...

7
42 Study of graph patterns classification using maximum contrast methods Study of graph patterns classification using maximum contrast methods Naohiko Kinoshita,Toru Takiguchi ,Kousuke Takano,Asuka Namiduka Major in (Field of) Health Informatics and Business Administration, Niigata University of Health and Welfare (NUHW) Introduction Maximum contrast methods (MCMs) are one of the comparison methods used for trend analysis of dose response relationships (DRRs) in clinical stud- ies, toxicological tests, and so on. In terms of multi- ple comparisons, Dunnett's test 1) and William test 2) have been used in the past. Nishiyama et al 3) . proposed a composite MCM as an analysis of dose response data in in-vitro toxicity tests. Fur- thermore, Hamada et al 4) . showed that MCMs had the best detection performance in the evaluation of statistical analyses of outliers and dose-response 著者連絡先950-3198 新潟県新潟市北区島見町 1398 番地 新潟医療福祉⼤学 木下直彦 TEL:025-257-4490 FAX:025-257-4490 E-mail:[email protected] 受付⽇:2018 11 15 受理⽇:2018 12 14 Abstract: Maximum contrast methods (MCMs) are developed by assuming various types of contrast vec- tors in the dose response relationship (DRR). Then, the maximum goodness of fitof the DRR pattern is identified using contrast statistics. MCM is generally used for data analyses in toxicolog- ical tests and is well-known that the method have the for its high detection power (1-β) for differ- ent types of DRRs. In the study on DRRs, researchers set various types of contrast vectors in advance and identified both dose and changing time for the effect (toxicity). However, the pattern consists of only an increase or a stagnation (non-reaction), there is no analysis method that consid- ers a decreasing pattern in openly-available programs. In this study, one-way ANOVA data with a concave or convex type pattern (namely the trend) shows an increase or stagnation tendency, and a decreasing trend is also assumed. The focus herein is to classify a typical graph pattern with an integer ratio by setting both the number of groups and the increase-decrease displacement as a parameter. The classification of graph patterns as concave or convex by generating contrast vec- tors is considered. The aim of the study is to create a mathematical approach and program mod- ules using contrast vectors corresponding to graph patterns categorized by the above-mentioned method. As a result, an improved binary-direction MCM (bd-MCM) prototype was developed. Bd- MCM presented herein can consider an increasing or decreasing tendency as well as a decreasing trend. Key words : maximum contrast methods, comparison methods, classification

Upload: others

Post on 04-May-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Study of graph patterns classification using …42 Study of graph patterns classification using maximum contrast methods Study of graph patterns classification using maximum contrast

42

Study of graph patterns classification using maximum contrast methods

Study of graph patterns classification using maximumcontrast methods

Naohiko Kinoshita,Toru Takiguchi,Kousuke Takano,Asuka NamidukaMajor in (Field of) Health Informatics and Business Administration, Niigata University of Health and Welfare (NUHW)

IntroductionMaximum contrast methods (MCMs) are one of

the comparison methods used for trend analysis ofdose response relationships (DRRs) in clinical stud-ies, toxicological tests, and so on. In terms of multi-ple comparisons, Dunnett's test1) and Williamtest2) have been used in the past. Nishiyama etal3). proposed a composite MCM as an analysis ofdose response data in in-vitro toxicity tests. Fur-thermore, Hamada et al4). showed that MCMs hadthe best detection performance in the evaluationof statistical analyses of outliers and dose-response

【著者連絡先】〒950-3198 新潟県新潟市北区島見町 1398番地新潟医療福祉⼤学木下直彦TEL:025-257-4490 FAX:025-257-4490E-mail:[email protected]受付⽇:2018年 11⽉ 15⽇ 受理⽇:2018年 12⽉ 14⽇

Abstract:Maximum contrast methods (MCMs) are developed by assuming various types of contrast vec-tors in the dose response relationship (DRR). Then, the maximum “goodness of fit” of the DRRpattern is identified using contrast statistics. MCM is generally used for data analyses in toxicolog-ical tests and is well-known that the method have the for its high detection power (1-β) for differ-ent types of DRRs. In the study on DRRs, researchers set various types of contrast vectors inadvance and identified both dose and changing time for the effect (toxicity). However, the patternconsists of only an increase or a stagnation (non-reaction), there is no analysis method that consid-ers a decreasing pattern in openly-available programs. In this study, one-way ANOVA data with aconcave or convex type pattern (namely the trend) shows an increase or stagnation tendency, anda decreasing trend is also assumed. The focus herein is to classify a typical graph pattern with aninteger ratio by setting both the number of groups and the increase-decrease displacement as aparameter. The classification of graph patterns as concave or convex by generating contrast vec-tors is considered. The aim of the study is to create a mathematical approach and program mod-ules using contrast vectors corresponding to graph patterns categorized by the above-mentionedmethod. As a result, an improved binary-direction MCM (bd-MCM) prototype was developed. Bd-MCM presented herein can consider an increasing or decreasing tendency as well as a decreasingtrend.

Key words : maximum contrast methods, comparison methods, classification

Page 2: Study of graph patterns classification using …42 Study of graph patterns classification using maximum contrast methods Study of graph patterns classification using maximum contrast

patterns. MCMs are used when the observationdata are one-way ANOVA data with ordinal scale,such as DRR or the data exhibits a linear increasetrend after a certain time. The approximate pat-tern type can be selected by setting the maximumvalue of the contrast statistic as the test statisticfor various linear.

This method can be regarded as a kind of classi-fication problem in which data trend graphs areclassified into a number of patterns and the best“goodness of fitness”is selected from themwhen a change in observation data is representedby the graph.“Supervised learning”is a class of methodsapplied to classification tasks in the field ofmachine learning. These methods involve classify-ing observational data into an appropriate classusing various classifiers. Clustering of so-calledunsupervised learning is employed as the methodto classify observational data without making anassumed class. Researchers must establish thecontrast vector that shows the shape of more thanone assumed DRR beforehand when MCM isapplied to DRRs.

There are no clear solutions on what kind ofmatrix must be made or how it must be made,when this contrast vector group is expressed as amatrix (hereinafter referred to as the definitionmatrix). Nishiyama5)said that selecting and apply-ing the definition matrix when the dose-responseshape is unknown is unclear and indicated that thepower of the test (1-β) should be compared. Eachpower of the test of five definition matrixes wascompared, and an effective result was producedfor the definition matrix of DRR.

The definition matrix verified by Nishiyama etal5). considers only DRR ; therefore, only the con-trast vector with an increasing pattern is assumedin the aforementioned study. The contrast vectorwith the decrease pattern is not prepared in DRR.

Hence, the development of the bi-directional MCM(bd-MCM) corresponding to both increasing pat-terns, including DRR, and decreasing patterns isconsidered here. Specifically, the possibility of high-ly precise class separation was challenged using amethod for obtaining the definition matrix with acontrast vector that corresponds to everythingwith the typified chart pattern having an increas-ing and decreasing pattern and the observationaldata was classified.

The purpose of this study is to develop a mathe-matical model to obtain a typified model patternfrom the number of groups and the extent of fluc-tuation of displacement in this study and to devel-op a program module that generates a definitionline automatically for the chart pattern of every-thing. In this paper, we first explain the theory ofbd-MCM and then the typing of graph patterns tobe incorporated into the definition matrix is dis-cussed.

MethodHerein, the classification problem based on the

features of bd-MCM, the formulation of the mathe-matical models based on integer programmingapproach, and the development of a prototype pro-gram module was achieved based on a methodolo-gy for solving points. To develop the prototype,visual basic for applications (VBA) in MicrosoftExcel was used due to its general versatility andscalability.

The theory of bd-MCMRegarding the original MCM, the definition of

Nishiyama et al3). was applied and the minimalquotation of the points are explained below. Theobservational data assumed in this paper are one-way data with an ordinal scale. At each level ofthe ordinal scale multiple observations exist, andeach level is called a“group”. Let k be the num-

43

ヘルスサイエンス・ヘルスケア Volume 18,No.2(2018)

Page 3: Study of graph patterns classification using …42 Study of graph patterns classification using maximum contrast methods Study of graph patterns classification using maximum contrast

ber of groups. There are ni observation valuesyij;j = 1, 2, ..., ni in each of the i- group (i = 1, 2, ...,K) and { yij;i = 1, 2, ..., K, j = 1,2, ... ni} that conformto N (μi, σ2) independently of each other. Giventhat the order of (μ1, μ2,... ΜK) does not change,it is assumed that a graph pattern is formed bythe increase / decrease ratio with the adjacentgroup. For the remainder of this paper, it isassumed that there is no trend as a null hypothe-sis, that is, assume H0 =μ1 =μ2 = ... =μK. The veri-fication of these tendencies are then considered. Inthis paper, this null hypothesis H0 is called a“nullhypothesis”following the definition of Yoshida6).

For the constants c1, c2, ..., cK that satisfy the statistical quantity defined by the for-

mula (1) is called the“contrast statistics for thecontrast vector: c = (c1, c2, c3, ..., cK)”. The super-script ( ) ' indicates transposition.

Where M is a diagonal matrix with i diagonalelements of 1 / ni. It is well-known that this is thebest test statistic for testing the null hypothesisH0: c'μ = 0 against the alternative hypothesisH1:c'μ> 0. However, μ = (μ1, μ2, ... μK) '.

Consider the contrast vectors c1, c2, ..., cm corre-sponding to the types for graph patterns with var-ious types, prepare comparative statistics t1, t2, ...,tm for them(4),to the test statistic is called MCM.Furthermore, a contrast coefficient matrix C

obtained by arranging multiple comparison vec-tors to be used in the form of an m × k matrixas shown in formula (5) is the definition matrix ofthis MCM.

Classification of graph patternWhen the accuracy is increased, infinite contrast

vectors are generated in the classification of graphpatterns. In this research, classification is per-formed by focusing on graph patterns. It is neces-sary to classify typical graph patterns in advancefor this reason. Therefore, the ratio between a cer-tain group and the adjacent group (hereinafterreferred to as "ratio of adjacent group (R-AG)") islimited to an integer ratio, and the increase /decrease ratio (D) of the ratio is set as an integerconstant. The ratio of adjacent group can beobtained using the following formula:

R-AG: Ratio of adjacent group (r) = {(μ2 - μ1) :(μ3 - μ2): ...: (μK - 1 - μK)}

As an example, consider a graph pattern inwhich the group number K = 3 and the increase /decrease ratio D = 3.

In this case, the number of graph patterns is3×3×3=27 as shown in Figure. 1.

Among these graphs, the graphs with the sameR-AG have the same contrast vector. For this rea-son, the group ratio is into a graph pattern withrelatively prime R-AG with the same condition asthat shown in Figure. 1. The graph pattern is thencategorized into 13 patterns. (Figure 2)

Furthermore, the definition matrix: C×( -1 ) issame as that of the definition matrix C owing tothe following property of the contrast vectors : c1

+ c2 + ... cm = 0. The graph corresponding to thecontrast vector obtained by multiplying C ×( -1 )is a linear moving average model (x - axis) alongthe x axis. This indicates that these two graphshave the same contrast statistics. Figure 3 showsthe graph patterns of the same contrast statistics.

44

Study of graph patterns classification using maximum contrast methods

∑ki=1

=0,ci

However,

Page 4: Study of graph patterns classification using …42 Study of graph patterns classification using maximum contrast methods Study of graph patterns classification using maximum contrast

45

ヘルスサイエンス・ヘルスケア Volume 18,No.2(2018)

Fig. 1 All graph patterns(K=3,D=3)

Fig. 2 classifying in graph patterns(K=3,D=3)

Page 5: Study of graph patterns classification using …42 Study of graph patterns classification using maximum contrast methods Study of graph patterns classification using maximum contrast

When contrast statistics are considered, thegraph patterns are categorized into seven cate-gories; however, the linear shift of the graphsalong the x axis is meaningless and become thesame class in the classification focusing onincrease and decrease . Therefore, it can be saidthat it is reasonable to adopt the linear catego-rized into 13 types shown in Figure 2. For thisreason, it is necessary to determine which graphpattern is based on the positive / negative infor-mation of the adjacent group of observation datafor the linearly shifted graph pattern with thesame contrast statistics.

Mathematical modelA mathematical model based on integer pro-

gramming method was developed for classificationof graph patterns using the result of typification ofgraph patterns

Number of groups: K (t1 ... tK)Increase / decrease width: D (maximum = D, min-imum = - D). * D is an integerR-AG: Ratio of adjacent group: r = (r1: r2: ...: rK - 1), *r is an integer satisfying - D <r <D* r is relatively prime.Combination of integer numbers satisfying follow-ing constraint condition- (D-1) ≦r1 + r2 +・・・+r (K - 1) ≦ D – 1

As an example, Figure 4 shows the rejectionarea of the R-AG when K = 3 and D = 3. All areasexcept for the rhomboid area in Figure 4 arerejected. For group number K = 3, given that R-AG is two pairs, it is possible to show the combi-nation region in a two-dimensional graph.

From the set of integers in the region shown inthe example (Figure 4), the combination ofincrease / decrease ratios is obtained by eliminat-ing non-disjoint combinations. Subsequently, allcontrast vectors are created by setting constantsthat satisfy for these combinations

46

Study of graph patterns classification using maximum contrast methods

Fig. 3 Group of same contrast statistics using MCMs(maximum contrast methods)

∑ki=1 =0,ci

Page 6: Study of graph patterns classification using …42 Study of graph patterns classification using maximum contrast methods Study of graph patterns classification using maximum contrast

Prototype of program moduleA prototype program was developed to auto-

matically generate all categorized graph patternsbased on mathematical models using an integerprogramming approach.

By specifying the number of groups (K) and theamount of Displacement (D), it is possible to gener-ate contrast vectors corresponding to automatical-ly categorized graph patterns. For categorizedgraph patterns, graphs are also automatically gen-erated such that the shape of the graph can bevisually recognized.

ConclusionIn this study, a high we attempted to develop a

highly versatile classifier for unexpected graphpatterns by setting exhaustive contrast vectors. Itbecame clear that it was impossible to identify the

47

ヘルスサイエンス・ヘルスケア Volume 18,No.2(2018)

Fig. 5 Prototype of program module

Fig. 4 Rejection area of ratio of adjacent group

Note:The part of the hatchet in Figure 4 shows the existencerange of the adjacency increase / decrease ratio value in thecase where the X axis and the Y axis are three levels respective-ly. Numerical values (the convex type is positive and the concavi-ty is negative) with the pattern irregularity information added tothe ratio of the distance between group 1 and group 2 and thedistance between group 2 and group 3.

Page 7: Study of graph patterns classification using …42 Study of graph patterns classification using maximum contrast methods Study of graph patterns classification using maximum contrast

pattern of linear shifted graphs using only bd-MCM. Additional processes were required todetermine which graph pattern would be based onthe positive / negative information.

By adding this process, it is possible to performclassification into a typed graph pattern; hence, itcan be used for classification.

It is desirable that the program module createdin this research be applied as that of Nishiyama etal7). However, this study focuses on the generationof categorized graph patterns; hence, it is a proto-type specialized for generating corresponding con-trast vectors. Regarding the release of the pro-gram module, it is necessary to check the qualityof detection and the sample size design using themaximum comparison method. However, this isout of the scope of this study. We are intend toverify the detection capabilities and sample sizedesign in the future and publish it as a package forprogramming languages such as R after improv-ing the quality of the current prototype.

References1)Dunnett, C.W.: A multiple comparison procedure forcomparing several treatments with a control, Journalof the American Statistical Association, 50: 1096-1121,1955.

2)Williams, D.A.: A test for differences between treat-ment means when several dose levels are comparedwith a zero dose control, Biometrics, 27:103-117, 1971.

3)Hiroshi, N., Isao, Y.: Proposal of the Composite Maxi-mum Contrast Method and its Application to Toxico-logical Data Analysis,計量⽣物学 Vol.25,No.1:1-18,2004

4) Hamada, C.: A performance comparison of maxi-mum contrast methods to detect dose dependency,Drug Information Journal,31:423-432,1997.

5)Nishiyama, H., Omori, T., Yoshimura, I.: A compositestatistical procedure for evaluating genotoxicity usingcell transformation assay data, Environmetrics, 14:183-192,2003.

6) Yasushi N. and Michihiro Y.: Basis of a statisticalmultiple comparison way. Scientist company, 1997. (inJapanese)

7) Satoru N., ya Hirokazu Hara and Isao Y.: TheSAS/IML program to utilize the biggest method ofanalog. Measure biology and 24: 57-70, 2004. (inJapanese)

48

Study of graph patterns classification using maximum contrast methods