full thesis in pdf - inflibnetshodhganga.inflibnet.ac.in/bitstream/10603/13622/10/10_chapter...

33
Discussion Discussion Painting by: Ms. Hemalata Pradhan Paphiopedilum hirsutissimum

Upload: dinhthuy

Post on 07-Sep-2018

225 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Full Thesis in PDF - INFLIBNETshodhganga.inflibnet.ac.in/bitstream/10603/13622/10/10_chapter 5.pdf · of identified specimens, the collections can also be made from the wild. In the

DiscussionDiscussion

Painting by: Ms. Hemalata Pradhan Paphiopedilum hirsutissimum

Page 2: Full Thesis in PDF - INFLIBNETshodhganga.inflibnet.ac.in/bitstream/10603/13622/10/10_chapter 5.pdf · of identified specimens, the collections can also be made from the wild. In the

���������� ����������������

�������������� � ������������������������� �����

5 DISCUSSION

DNA barcoding, on the basis of its initial success in lepidopteran insects, fishes and birds (Hebert et al. 2003a,b, 2004a,b) was projected as a powerful technique for species level identification of all eukaryotes. The short DNA sequence proposed as the universal barcode was a 658 bp long region of CO1 gene, commonly known as ‘Folmer’ region. Subsequently, various DNA barcode programmes were initiated to generate species specific molecular signatures for identifying animals and plants. Shortly, it was realized that this region of the CO1 gene, suggested as universal barcode for all organism, might not work in plants, except in some macroalgae, as CO1 sequences in land plants are highly invariant (Chase et al. 2005, Kress et al. 2005). Moreover, in plants the events of hybridization, introgression and allopolyploidy are more pronounced than in animals and the species identification based on one locus was considered to be insufficient (Chase et al. 2005). Initially taxonomists had viewed the technique and applicability of DNA barcoding to plants with a great deal of suspicion and skepticism (Chase et al. 2005). Thus, investigations in various laboratories using both in silico and experimental approaches, focused on the identification of a corresponding locus or multi-locus combination, which could become a barcode for plants (Chase et al. 2005, Kress et al. 2005, Newmaster et al. 2006, Kress and Erickson 2007, Sass et al. 2007, Fazekas et al. 2008, Lahaye et al. 2008, CBOL Plant Working Group 2009, Seberg and Petersen 2009, Yao et al. 2010, Chen et al. 2010, Hollingsworth et al. 2011, China Plant BOL Group 2011). In 2007, when the present work was initiated, search for a uniform barcode for plants was compared to the search for “Holy Grail” (Rubinoff et al. 2006). Thus, the present work was initiated to check the applicability of the concept of DNA barcoding to plants and to evaluate the regions of chloroplast or nuclear genome, already being used, as possible barcodes for orchids, either individually or in various combinations. Based on the earlier recommendations of Plant Working Group of Consortium for the Barcode of Life (CBOL), five loci chosen to be tested as DNA barcodes were rpoB, rpoC1, rbcL, matK and ITS. These five candidate loci were evaluated in the present

Page 3: Full Thesis in PDF - INFLIBNETshodhganga.inflibnet.ac.in/bitstream/10603/13622/10/10_chapter 5.pdf · of identified specimens, the collections can also be made from the wild. In the

���������� ����������������

�������������� � ������������������������� �����

work, according to the standard guidelines in the CBOL’s data (http://www.barcoding. si.edu/protocols.html) and BOLD [Barcode of Life Datasystems] (Ratnasingham and Hebert 2007). These selected loci were evaluated for three major criteria laid down by CBOL, Plant Working Group (2009). These were (i) Universality and robust amplification: ability to retrieve sequence of the targeted locus(i) using single primer pair across the species belonging to different genera of family Orchidaceae and other families of vascular plants (ii) Sequence quality and coverage: which loci are most amenable to the production of bidirectional sequences with few or no ambiguous base calls and require less manual editing of trace sequences and (iii) Discrimination: which loci enable most species to be distinguished using genetic distances, phylogenetic trees and BLAST analysis. In the present investigation, the species specific barcodes were developed for Orchidaceae members based on the above criteria and the primer universality was tested by checking the amplification and sequencing of the five selected loci in the species which are from diverse families of land plants. Few possible applications of DNA barcoding were also exemplified by analyzing the ability of five candidate DNA barcode loci in distinguishing the endangered orchids (listed in Appendix I of CITES) and by developing DNA barcodes for medicinal orchid species and comparing these with the samples available in markets. The details of the methodology followed and the results obtained are discussed further.

5.1 PLANT MATERIAL, SAMPLE COLLECTION, IDENTIFICATION AND DOCUMENTATION

The reasons for selecting the family Orchidaceae for the present investigations have already been described in ‘Introduction’. One of the most critical steps in DNA barcoding is collection of specimen, correct identification and documentation. The specimens for constructing DNA barcode library can be obtained from various sources like natural history museums, herbaria, botanical gardens/nurseries, frozen tissue collections, seed banks, type culture collections and other repositories of biological materials (http://www.barcoding.si.edu/protocols.html). Apart from these repositories

Page 4: Full Thesis in PDF - INFLIBNETshodhganga.inflibnet.ac.in/bitstream/10603/13622/10/10_chapter 5.pdf · of identified specimens, the collections can also be made from the wild. In the

���������� ����������������

�������������� � ������������������������� �����

of identified specimens, the collections can also be made from the wild. In the present study, the majority of plant specimens (mainly orchid species) were collected from the wild, few were obtained from type culture collections (e.g. Paphiopedilum species) and plant nurseries (e.g., Renanthera, Cymbidium, Holcoglossum and Hygrochilus). Conscience effort was made to collect species having wide phylogenetic coverage within the family Orchidaceae. Thus, the species assemblage investigated represented four sub-families out of the five circumscribed sub-families of the Orchidaceae along with taxon-based sampling within individual groups. The species other than orchids were mainly collected from the botanical garden of the Department except Platanus orientalis which was procured from Srinagar (Jammu and Kashmir). Orchids are highly specialized and able to grow on a variety of substrata. According to their habitat they are classified into three groups- (i) Epiphytes that grow on other plants, or few on rocks and boulders (lithophytes/rupicolous species), the latter two though not strictly epiphytes are also included in this category, (ii) Terrestrial ones grow in soil (commonly called ground orchids) and (iii) Saprophytes, which grow on dead organic matter (Chowdhery 1998). The epiphytic orchids grow mostly on branches of tall trees, so that they receive restricted light through the forest canopy. Sometimes they are perched on branches that are up to 10-30 feet from the ground level. Therefore, the major constraint in collection of these orchid species from wild is the difficult accessibility. The epiphytic orchids were collected using a customized long stick that could reach approximately 20 feet height and was provided with a hook at one end. On the other hand, the terrestrial species have lower light tolerance as they grow in the forest shade or in open grass or bush lands where they are protected from the direct sun. The tubers or rhizomes of terrestrial orchids store food and remain embedded in the soil during unfavorable conditions and drought during summers or winters (Chowdhery 1998). The plant grows above ground for very short period during the favorable season and produce flowers and fruits (Chowdhery 1998). Thus, the ground orchids were available for very short span of time in which collections were to be made. The second major hindrance in terrestrial orchid collection was to identify the orchid species from other plant species in vegetative state. However, at flowering stage these could be identified easily. Besides these, the overall major obstacle was

Page 5: Full Thesis in PDF - INFLIBNETshodhganga.inflibnet.ac.in/bitstream/10603/13622/10/10_chapter 5.pdf · of identified specimens, the collections can also be made from the wild. In the

���������� ����������������

�������������� � ������������������������� �����

that all orchid species being threatened are listed in Appendix II of CITES (http://www.cites.org, Chowdhery 1998). This necessitated utmost care in their collection. Therefore, collection of multiple specimens was avoided and only leaf tissue along with one plant as herbarium specimen was collected/procured for each species. For each species, 3-5 accessions were collected, of these one was used for the preparation of herbarium specimen and rest all were collected either as leaf sample or stem. The species listed in Appendix I (species of Paphiopedilum and Renanthera) were obtained from type collections or nurseries. Following collection of specimen/tissue, voucher numbers were allotted to each and all the details about the site of collection, habit and habitat were recorded. The photographs were taken in the field. The correct documentation of each specimen is important in order to link the voucher number to the DNA barcode sequence obtained later and for correct identification, which is important for developing species-specific barcodes. To ascertain the correct botanical identity up to specific level, the plants collected in flowering stages were identified either by comparing with the already available herbarium specimens or most of the times by consulting experts. Some of the plants collected in vegetative state were made to flower and were then identified. Few specimen collected at vegetative state could be identified only up to generic level only.

5.2 TISSUE PRESERVATION AND DNA EXTRACTION

The individuals, branches or tissues of the collected species were brought to the laboratory in their native state. These were preserved under the conditions where DNA damage was minimal. The plants or their parts were wrapped in aluminium foil and then sealed in plastic bags containing silica gel to keep the samples dry. Once in the laboratory, these were stored at -20°C in a deep freezer to minimize the degradation of DNA in the samples and to preserve them till DNA was extracted. The methodology followed in the present investigations was similar to the one that had been used by other groups for the safe transfer and preservation of the plants or their collected parts (Sass et al. 2007, Newmaster and Ragupathy 2009).

Page 6: Full Thesis in PDF - INFLIBNETshodhganga.inflibnet.ac.in/bitstream/10603/13622/10/10_chapter 5.pdf · of identified specimens, the collections can also be made from the wild. In the

���������� ����������������

�������������� � ������������������������� �����

As reported in literature, the extraction of DNA for barcoding purposes were chiefly carried out using genomic DNA purification kits, such as DNeasy Plant MiniTM kit (Kress and Erickson 2007, Sass et al. 2007, Newmaster et al. 2008, Ferri et al. 2009, Hollingsworth et al. 2009) and DNeasy 96 Plant Kit (Taberlet et al. 2007, Fazekas et al. 2008). The predominant CTAB method of DNA isolation (Doyle and Doyle 1987) has also proved to be successful with many plants (Bleeker et al. 2008, Lahaye et al. 2008). During the present study too, DNA was isolated either by CTAB protocol or by the procedure detailed in the Genomic DNA purification kit (Fermentas, # K0512). The technique of DNA barcoding requires quick and high throughput DNA isolation in order to identify species at a faster rate. Therefore, Genomic DNA Purification Kit was used for rapid isolation and purification of high quality DNA. The kit is based on selective detergent-mediated DNA precipitation from crude lysate. The entire procedure is rapid, utilizing only 20-25 min with a typical yield of 2-10 µg genomic DNA from a small amount (100 mg) of tissue (www.fermentas.com). Furthermore, as the kit does not employ silica-gel based column, and therefore, there is no loss in yield of DNA as is prevalent in other column based extractions. For the accessions with fresh green leaves, genomic DNA was extracted either using CTAB method (Doyle and Doyle 1987) or genomic DNA kit with slight modifications. However, some of the orchid species, especially epiphytes accumulate mucilage in order to conserve water and as food reserve (Chowdhery 1998). The high mucilage (polysaccharide) content in such species was a major obstacle in DNA isolation. The polysaccharides co-precipitate with the DNA and hinder complete dissolution of DNA (Barnwell et al. 1998), thus reducing the quantity of nucleic acid extracted. Moreover, the presence of polysaccharides along with the dissolved DNA causes inaccessibility of DNA to enzymes thus inhibiting processes such as PCR amplification, cutting with restriction enzymes or in vitro labeling (Barnwell et al. 1998). The CTAB/ kit methods were not successful in isolation of mucilage free DNA. Therefore, a modified CTAB method by Barnwell et al. (1998) was used, in which the concentration of CTAB was increased in a step-wise manner; finally resulting in the precipitation of nucleic acids free from contamination by polysaccharides and polyphenols. This facilitated the isolation of DNA from species of Oberonia, Vanda and others with high mucilage content, and has been used successfully with as little as 1 g fresh weight of tissue.

Page 7: Full Thesis in PDF - INFLIBNETshodhganga.inflibnet.ac.in/bitstream/10603/13622/10/10_chapter 5.pdf · of identified specimens, the collections can also be made from the wild. In the

���������� ����������������

�������������� � ������������������������� �����

5.3 SELECTION OF LOCI

The loci to be tested as DNA barcode for plants could be from the nuclear or chloroplast genome. These, could be from either coding or non-coding regions. However, to be suitable as a DNA barcode this should possess certain pre defined characteristics (www.kew.org, Kress et al. 2005). An ideal barcode sequence needs to have conserved flanking regions and should be short (600-800 bp), so that it can be amplified by universal primers and sequenced routinely in single pass sequencing (Kress et al. 2005). Moreover, the primers used should not be prone to non-specific annealing that results in production of double bands or amplification of loci other than the targeted one (Kress et al. 2005, Ford et al. 2009). Most importantly, they should be capable of generating comparable data which enable species to be distinguished from one another. It should have an adequate sequence variations among the species with no or low divergence values within the species to provide a distinct barcode gap (Hebert et al. 2003a, Lahaye et al. 2008). It should be easy to align and could be recovered from dry herbarium samples and parts/fragments (http://www.kew.org/ barcoding/rationale.html). The initial in silico analysis and laboratory assessment of various barcode loci suggested regions mainly from the chloroplast genome and one from the nuclear genome as suitable for plant barcoding (Kress et al. 2005, Chase et al. 2005, Newmaster et al. 2006, Chase et al. 2007, Kress and Erickson 2007). Kress et al. (2005) recommended the use of nuclear ribosomal ITS and trnH-psbA spacer from the plastid genome for discriminating plant species. Chase et al. (2005) using in silico approach tested the utility of rbcL and ITS sequences available in Genbank for species identification using BLAST method and suggested that multi-locus combination with regions from plastid genome and one from nuclear genome would suffice as universal barcode for plants. Newmaster et al. (2006) advocated the use of rbcL as a DNA barcode for land plants. Chase et al. (2007) again proposed multi-locus barcode and recommended that either of the two 3-locus combinations, matK+rpoB+rpoC1 or matK+rpoC1+trnH-psbA, may be used as universal barcodes for plants. Kress and Erickson (2007), on the other hand, proposed the use of coding rbcL in combination with a non-coding intergenic spacer, trnH-psbA as a two-locus global DNA barcode for land plants. At the time of commencement of this work in

Page 8: Full Thesis in PDF - INFLIBNETshodhganga.inflibnet.ac.in/bitstream/10603/13622/10/10_chapter 5.pdf · of identified specimens, the collections can also be made from the wild. In the

���������� ����������������

�������������� � ������������������������� ����

2007, the recommendations from above discussed groups had amply highlighted the importance of four coding loci (matK, rbcL, rpoB and rpoC1) and one non-coding intergenic spacer (trnH-psbA) from the chloroplast genome, and one internal transcribed spacer (nrITS) from the nuclear genome in DNA barcoding of plants. Therefore, in the present study, five loci viz., matK, rbcL, rpoB, rpoC1 and ITS were selected for investigating their suitability as DNA barcodes for orchid species in particular and for land plants in general. The trnH-psbA spacer was not tested due to its numerous limitations reported previously by various workers (Chase et al. 2005, Chang et al. 2006, Chase et al. 2007, Lahaye et al. 2008) and repeated failure to obtain bidirectional sequences from Dendrobium species, which were being experimented upon in the laboratory for a separate Ph.D. programme (Singh 2012). The foremost limitation in use of this locus as a barcode is its length that varies from 300-1000 bp across plants. This poses problems in alignment of the generated sequences (Chase et al. 2007). Moreover, in many plants, especially in orchids and amaryllids, insertions of rps19 and rpl22 genes within this spacer have been reported (Chase et al. 2005, Chang et al. 2006, Lahaye et al. 2008). The difficulty in amplification and sequencing of trnH-psbA spacer has been attributed to the poly(A) repeats present within its sequence (Zhu et al. 2010, Hollingsworth et al. 2011). The reason for testing the loci from the nuclear genome along with the chloroplast genome was the realization that DNA barcodes based on uniparentally inherited markers could never adequately address to the intricacy that exists in nature (Chase et al. 2005). Moreover, the chloroplast genome provides less variable sites for analysis than the nuclear genome as nucleotide substitution rates in the former have been reported to be lower than that in the latter (Wolfe et al. 1987). Thus, it has been predicted that the loci from nuclear genome are more likely to provide a better species resolution than those from the chloroplast genome, especially in cryptic and sister species complexes and recently diverged taxa (Chase et al. 2005). Furthermore, in hybrid species, even after speciation, the chloroplast genome constitution of the newly evolved species remains similar to the donor parent (Sun and Lo 2011, Greiner et al. 2011) and therefore, it is difficult to identify the species with the markers only from the chloroplast genome as it is uniparentally inherited (Raspé 2001). Likewise, the use of only the chloroplast genome for barcoding was not expected to resolve the ambiguities

Page 9: Full Thesis in PDF - INFLIBNETshodhganga.inflibnet.ac.in/bitstream/10603/13622/10/10_chapter 5.pdf · of identified specimens, the collections can also be made from the wild. In the

���������� ����������������

�������������� � ������������������������� �����

created because of aneuploidy and introgression, which play an important role in speciation in plants (Soltis and Soltis 2009). Hence, for the present study, a locus from the nuclear genome i.e., nrITS along with various loci from the chloroplast genome were selected for generating DNA barcode of orchids keeping in mind that in orchids event like hybridization is very common and there is also prevalence of cryptic and sister species complexes (Soliva et al. 2001, Pellegrino et al. 2005, Zitari et al. 2011).

5.4 AMPLIFICATION AND PURIFICATION

OF THE TARGETED LOCI

The relative usefulness of each of the five tested loci amplified from 191 species of plants (104 Orchidaceae species+87 species from other families) was analyzed by comparing their amplification and sequencing success rates among the tested species. PCR amplifications were generally successful in rbcL, rpoB and rpoC1 loci with 95, 97 and 97% success rates, respectively. The rpoB (RNA polymerase β) and rpoC1 (RNA polymerase β′) had the highest universality in terms of amplification and sequencing in the present investigation. The high amplification rate of rpoC1 locus has also been reported in other plants like cycads (Sass et al. 2007), Araucaria, Inga and Liverworts (Hollingsworth et al. 2009), species of African Podostemaceae (Kelly et al. 2010), species from five genera of the Lemnaceae (Wang et al. 2010), species belonging to 34 genera of bryophytes from North-Eastern China (Liu et al. 2010) and congeneric species of Dendrobiums (Singh et al. 2012). CBOL Plant Working Group (2009) reported 90-98% amplification success rate of rpoC1 among 550 species representing all major lineages of land plants. The amplification success of this locus among 96 species belonging to 48 genera from 43 families of land plants was 83.3% (Kress and Erickson 2007), which was also considered good. The high amplification rate of rpoC1 could be attributed to conserved flanking regions within the gene from which the primers were made.

In the present study, the amplification rate for rpoB locus was 97%, which was

much higher than the amplification success obtained in other plant species (Kress and Erickson 2007, Sass et al. 2007, Liu et al. 2010, Kelly et al. 2010). The amplification

Page 10: Full Thesis in PDF - INFLIBNETshodhganga.inflibnet.ac.in/bitstream/10603/13622/10/10_chapter 5.pdf · of identified specimens, the collections can also be made from the wild. In the

���������� ����������������

�������������� � ������������������������� ����

success rates of 77.1% and 78.3% were obtained in 48 species pairs of land plants (Kress and Erickson 2007) and in African Podostemaceae members (Kelly et al. 2010), respectively. Likewise, comparatively low amplification rate was reported in other groups of plants e.g., cycads exhibited only 33% amplification (Sass et al. 2007) with only 7 species yielding single band of amplicon out of 21 samples tested. The rest of the samples gave non-specific amplification with multiple bands. While investigating various species belonging to three taxonomically diverged genera, Hollingsworth et al. (2009) could not amplify rpoB from any species of liverworts, whereas this locus could be amplified in Araucaria and Inga with 100% and 95% success, respectively (Hollingsworth et al. 2009). Similarly, this locus could be amplified from only 6% of the 38 species of bryophytes from North-Eastern China, (Liu et al. 2010). In contrast, this provided high amplification rates of 90-98% in 95 species of land plants (CBOL Plant Working group 2009), 98% in samples of 31 species of Lemnaceae (Wang et al. 2010) and 99.2% from 36 species of Dendrobium (Singh et al. 2012). The varied and generally low amplification rates of this locus in some groups are suggestive of non-conserved nature of the region of the gene, across the plant kingdom, from where the primers have been designed.

The high amplification rate of rbcL (RUBISCO large sub-unit) locus, obtained

in the present study, is in agreement with most of the investigations reporting the similar amplification rates of 95-100% (Kress et al. 2005, Kress and Erickson 2007, Fazekas et al. 2008, Hollingsworth et al. 2009, Kress et al. 2009, CBOL Plant Working Group 2009, Ebihara et al. 2010). Fazekas et al. (2008) reported 100% recovery rate for rbcL. It was easily retrievable and well suited for recovery of high-quality bidirectional sequences across 93 land plant species from North America (Fazekas et al. 2008). Likewise, this locus could be successfully amplified in all the tested samples of 11 species of Ficus and 4 species of Gossypium (Roy et al. 2010). However, slightly lower success (97%) was obtained from the samples of 16 species of Berberis (Roy et al. 2010). This locus was easily amplified from Araucaria, Inga and liverworts with 93%, 100% and 98% success, respectively (Hollingsworth et al. 2009). Starr et al. (2009) could amplify rbcL sequences in 98% of the samples belonging to Carex, though there was weak amplification in one-third of the samples,

Page 11: Full Thesis in PDF - INFLIBNETshodhganga.inflibnet.ac.in/bitstream/10603/13622/10/10_chapter 5.pdf · of identified specimens, the collections can also be made from the wild. In the

���������� ����������������

�������������� � ������������������������� �����

and some of them even yielded double bands (Starr et al. 2009). In bryophytes, unlike rpoB, rbcL could be amplified with 100% success in 38 species from North-Eastern China (Liu et al. 2010). Wang et al. (2010) also observed 100% amplification success for rbcL in the tested 31 species belonging to five genera of the family Lemnaceae. Recently, China Plant BOL Group (2011) reported 94.5% amplification success of rbcL among 6,286 individuals belonging to 1,757 species from 141 genera across 75 families of seed plants. The PCR success for rbcL was observed to be 96.91% in 36 species of Dendrobium (Singh et al. 2012). Beside high universality, rbcL is also the most suitable gene for studying phylogeny of various plant groups; therefore, a huge number of rbcL sequences are already available in the public databases. Newmaster et al. (2006) retrieved 10,000 rbcL sequences belonging to diverse group of land plants and proposed it as core barcode region for identification of lands plants. Due to its high recoverability and ease of sequencing, rbcL frequently appears as one of the core loci in different suggested barcode locus combinations (Kress and Erickson 2007, Fazekas et al. 2008, CBOL Plant Working Group 2009, China Plant BOL Group 2011).

The matK (maturase K) gene from the chloroplast genome has been reported to

have high variations due to which it is frequently used in studying phylogenetic relationships in the family Orchidaceae (Kores et al. 2001, Freudenstein et al. 2004, van den Berg et al. 2005). The amplification success rate for orchid species, in the present study, was 85.97%, which could be considered good. Xiang et al. (2011) reported slightly higher PCR amplication rate of 92.3% for matK locus in Holcoglossum species of Orchidaceae. Likewise, in other reports on orchid species i.e., Costa Rican orchid data set and Dendrobium species, much higher amplification rates of 99% and 99.32% were reported by Lahaye et al. (2008) and Singh et al. (2012), respectively. In the data set I that included non-orchid species as well, the success rate decreased to 82.12%, which was the lowest amplification rate obtained among the loci from the chloroplast genome. The observed low rate of PCR amplification of matK region is in conjunction with the observations of Sass et al. (2007), Kress and Erickson (2007), Fazekas et al. (2008), Hollingsworth et al. (2009), Kress et al. (2009) and Liu et al. (2010) who also reported amplification rates in the

Page 12: Full Thesis in PDF - INFLIBNETshodhganga.inflibnet.ac.in/bitstream/10603/13622/10/10_chapter 5.pdf · of identified specimens, the collections can also be made from the wild. In the

���������� ����������������

�������������� � ������������������������� �����

range of 24-69% from different group of plants. Among 46 species pairs of land plants tested by Kress and Erickson (2007), the amplification success rate for matK was 39.3%, which was quite lower than what has been observed in the present study. Moreover, in cycads, it could yield single bands only in 24% of the tested samples while 52% samples yielded multiple bands and in 24% no amplification was obtained (Sass et al. 2007). However, matK could be easily amplified from Araucaria and Inga with 100% and 95% success, respectively, though there was no amplification in liverworts (Hollingsworth et al. 2009). Likewise, in the bryophyte species tested by Liu et al. (2010), only 20% of the samples yielded PCR products. Wang et al. (2010) could amplify matK in 71% of the tested samples from the species of the family Lemnaceae. In 34 species of Carex, the primers for matK could amplify 98% of the analyzed species, but with weak bands in 21% of the samples (Starr et al. 2009). The Plant Working Group, CBOL (2009) had attained 90-98% successful amplifications in case of angiosperms. However, in non-angiosperms, matK amplification was more problematic with only 50% success in gymnosperms and cryptogams. The locus could be amplified in 95.7% of the samples from the members of Podostemaceae (Kelly et al. 2010). Similarly, this locus could be amplified in all the analyzed samples of 4 species of Gossypium while, in species of Berberis and Ficus, 76% and 85% samples could be amplified, respectively (Roy et al. 2010). Recently, China Plant BOL Group (2011) reported 91% amplification success of matK from 6,286 individuals belonging to 1,757 species of seed plants. The low amplification success and the difficulties encountered in sequencing could be attributed to higher variability of the matK region especially in monocots (Chase et al. 2007, Fazekas et al. 2008). In the present investigation, out of ten Habenaria species, matK sequences could be retrieved only from six. None of the accessions of four species yielded amplicons even after two-three PCR trials. The variability in matK sequences might be the reason for failure in obtaining amplicons from four of the ten species of Habenaria using the same primer set. Likewise none of the individuals of Goodyera repens and Pinalia spicata yielded matK sequences, whereas matK could be recovered from G. procera and P. mysorensis. Therefore, it can be inferred that successful matK amplification and sequencing might require not only family or genera specific primers but also species specific primers in some cases. Rather, it was suggested by Hollingsworth et al.

Page 13: Full Thesis in PDF - INFLIBNETshodhganga.inflibnet.ac.in/bitstream/10603/13622/10/10_chapter 5.pdf · of identified specimens, the collections can also be made from the wild. In the

���������� ����������������

�������������� � ������������������������� �����

(2011) that to increase the matK recovery rates either taxonomic group specific primers or modified universal or a cocktail of primers should be used for cost-effective and efficient plant barcoding.�

ITS locus too exhibited relatively low amplification rate of 84.73% for land

plants. A slightly higher amplification success rate of 88% for ITS was achieved in orchids in the present study. Kress et al. (2005) had also obtained the same value for ITS amplification in the plants investigated by them. In their study, ITS failed to amplify in 12% of the herbarium samples and yielded poor quality amplicons in many other samples (Kress et al. 2005). The amplification rate of ITS was only 60.4% in congeneric species pairs from 48 phylogenetically diverse plant genera investigated by Kress and Erickson (2007). The members of family Asteraceae exhibited 75% amplification efficiency (Gao et al. 2010b). The amplification rate reduced further to 42.3% when it was tested across 8,557 medicinal plants and closely related samples belonging to 5,905 species from 1,010 diverse genera of 219 families in 7 phyla [Angiosperms, Gymnosperms, Ferns, Mosses, Liverworts, Algae and Fungi] (Chen et al. 2010). However, the complete ITS could be easily amplified in 98.97% of the tested species of cycads (Sass et al. 2007). Likewise, this locus could be successfully amplified in all the tested samples of 11 species of Ficus and 4 species of Gossypium, though, the success was slightly lower (97%) among the samples of 16 species of Berberis (Roy et al. 2010). A high amplification rate of 98.97% for ITS was reported in congeneric species of Dendrobium (Singh et al. 2012). Similarly, ITS2 region could be successfully amplified in all the samples of 38 species of bryophytes investigated by Liu et al. (2010) and it had a relatively higher efficiency at 85% for Asteraceae species (Gao et al. 2010b).

Following amplification, the amplicons are generally sequenced directly using

cycle sequencing method or can be cloned in a vector before being sequenced. For barcoding, direct sequencing of the amplicons is advocated (http://www.barcoding. si.edu/protocols.html) as cloning requires additional time and cost. Nevertheless, if paralogous copies of a gene or spacer are present, cloning becomes an unavoidable step to obtain the sequence of all the copies. Direct PCR sequencing using cycle

Page 14: Full Thesis in PDF - INFLIBNETshodhganga.inflibnet.ac.in/bitstream/10603/13622/10/10_chapter 5.pdf · of identified specimens, the collections can also be made from the wild. In the

���������� ����������������

�������������� � ������������������������� �����

sequencing method can be carried out as long as there is only one product present (http://www.biology.ualberta.ca). Prior to direct PCR sequencing the products must be purified to obtain single DNA template and the primers and excess dNTP’s are to be removed. Other components such as competing enzymes or buffer components could also cause problems during sequencing and thus need to be removed (http://www.biology.ualberta.ca). There are several methods for purifying PCR products. These are (i) Column purification, (ii) Ethanol precipitation, (iii) Enzymatic purification (Exo-SAP), (iv) Gel purification and (v) PEG precipitation (http://www.biology.ualberta.ca). The first three methods are used only if single band of amplicon is obtained after amplification of the targeted locus. The latter two methods are used if more than one PCR product/band is obtained. For barcoding purpose, the single band of amplicon was purified using column based Qiaquick PCR purification kit (Qiagen, CA) by some of the investigators (Taberlet et al. 2007, Bleeker et al. 2008), while others have used Exo-SAP method (Kress and Erickson 2007, Sass et al. 2007). In the present study, Exo-SAP method was successfully used for the samples that yielded single band of amplicon. However, the samples in which multiple bands were obtained, the band having molecular weight corresponding to the targeted locus was excised from the gel and purified using gel extraction kit, as was done by Singh et al. (2012).

5.5 SEQUENCING AND QUALITY CHECK

Sanger’s di-deoxy chain termination (Sanger et al. 1977) and pryosequencing (Ronaghi et al. 1998) are two major techniques that can be used to sequence DNA. The pyrosequencing technique is based on detection of pyrophosphate (PPi) molecules released on incorporation of every nucleotide in a growing chain on a template DNA molecule (Ronaghi et al. 1998). It uses a series of enzymes which results in generation of visible light on incorporation of each nucleotide. It is rapid but has an inherent limitation that it can produce sequence reads of only 300-500 bp in one reaction (Ronaghi et al. 1998). This renders the pyrosequencing technique unsuitable for the sequencing of DNA barcodes that are expected to have a length of 600-800 bp. In contrast, the modern automated ABI 3730 xl sequencers based on Sanger’s di-deoxy

Page 15: Full Thesis in PDF - INFLIBNETshodhganga.inflibnet.ac.in/bitstream/10603/13622/10/10_chapter 5.pdf · of identified specimens, the collections can also be made from the wild. In the

���������� ����������������

�������������� � ������������������������� �����

chain termination method can be used to sequence ~1000 bp long DNA molecule in a single reaction (Chan 2005). Therefore, in the present investigation, the DNA sequencer based on Sanger’s di-deoxy chain termination method (Sanger et al. 1977) was used to sequence the amplicons obtained after PCR.

Among the five candidate loci, tested in the present study across different

families of land plants (data set I), sequencing success of rpoC1 was the highest (91.78%) and it increased further to 93% for the data set II having only orchid species. The high sequencing success rate is in concordance with the observation of Singh et al. (2012) who have also reported a sequencing rate of 95.58% for this locus from 36 Dendrobium species. However, in the African Podostemaceae members (Kelly et al. 2010) and 38 species of bryophytes from North-Eastern China (Liu et al. 2010), rpoC1 was successfully sequenced for all the samples. On the other hand, among 93 PCR samples obtained from 34 species of Carex, bidirectional sequences could be obtained for 70% of the samples, while remaining 30% samples could be sequenced only in one direction because the reverse primer was probably priming at two different sites in those amplicons (Starr et al. 2009).

The sequencing success for rpoB in different species of land plants was

91.68%, which was similar to the highest value obtained with rpoC1 for the analyzed data set (I). A similar sequencing rate was obtained among 92 species of different land plants but with five set of primers (Fazekas et al. 2008). In 36 Dendrobium species, 93.93% sequencing rate was reported for this locus (Singh et al. 2012). However, a relatively high rate of sequencing success rate for rpoB has been reported for other plant genera, such as Carex in which sequencing success was 100% though, around one-third of the samples had given weak amplification (Starr et al. 2009). On the other extreme, in members of African Podostemaceae none of the amplicons of rpoB could be sequenced (Kelly et al. 2010).

In the present investigation, involving 191 species (data set I), 89.9% of the

rbcL amplicons could be sequenced and in orchid data set (II), it increased to 92.49% with a single set of primers. The sequencing success of rbcL can be considered good

Page 16: Full Thesis in PDF - INFLIBNETshodhganga.inflibnet.ac.in/bitstream/10603/13622/10/10_chapter 5.pdf · of identified specimens, the collections can also be made from the wild. In the

���������� ����������������

�������������� � ������������������������� �����

enough though in another study, 100% sequencing was achieved for 93 species of plants using two set of primers (Fazekas et al. 2008). Recently, the China Plant BOL Group (2011) reported 97.8% sequencing success for this locus. Likewise, the amplicons of rbcL from Araucaria, Inga and liverworts could be sequenced with success rate of 93, 100 and 98%, respectively (Hollingsworth et al. 2009). Among the samples of 16 species of Berberis, 11 of Ficus and 4 of Gossypium, the sequencing success for rbcL was 95%, 98% and 100%, respectively (Roy et al. 2010). The locus could also be sequenced with 100% success in 38 species of bryophytes from North-Eastern China (Liu et al. 2010). However, in case of Carex, it could be sequenced with only 18% success and all efforts to sequence rbcL in both directions failed, with 82% of sequences being entirely unreadable (Starr et al. 2009).

Evaluation of sequence quality and coverage demonstrated that high quality

bidirectional sequences were obtained routinely for the above three loci (rbcL, rpoB and rpoC1) and very less or no manual editing was required for further analysis of these candidate barcode sequences. This could be attributed to the conserved nature of their flanking regions from which the primers were made.

The matK locus was sequenced successfully in 88.12% of the amplicons

obtained in the data set I and in the data set II, a higher value of 91.17% was recorded. The high sequencing rate of matK is similar to the rate (92.07%) obtained in 36 species of Dendrobium (Singh et al. 2012). In 34 species of Carex, the percentage of matK amplicons sequenced was still higher (98%; Starr et al. 2009). Recently, China Plant BOL Group (2011) reported 95.5% sequencing success for this locus amplified from 6,286 individuals belonging to 1,757 species in seed plants. However, this locus could be successfully sequenced only from 73.9% of the investigated samples from the Podostemaceae (Kelly et al. 2010). Fazekas et al. (2008) used 10 different primer sets to achieve 87.60% sequencing success rate across diverse land plants. Roy et al. (2010) reported 85% sequencing success for matK in 11 Ficus species and 16 Berberis species. However, they could sequence all the 51 samples of four species of Gossypium. A lower value in bidirectional sequencing rate (~75%) of matK was observed from Holcoglossum (Xiang et al. 2011).

Page 17: Full Thesis in PDF - INFLIBNETshodhganga.inflibnet.ac.in/bitstream/10603/13622/10/10_chapter 5.pdf · of identified specimens, the collections can also be made from the wild. In the

���������� ����������������

�������������� � ������������������������� �����

In the present study, whole ITS, including the coding 5.8S region present between ITS1 and ITS2 spacers, was sequenced with a success rate of 85.52% in data set I and for orchid data set (II) it increased to 88.97%. Kress and Erickson (2007) tested only ITS2 as a barcode and a sequencing success of only 60.4% was obtained. However, in cycad species, sequencing success rate of complete ITS was 100% (Sass et al. 2007). Likewise, Roy et al. (2010) reported 100% sequencing success for ITS in 11 Ficus and 4 Gossypium species, however, the success was lower (94%) among the tested 16 Berberis species. ITS2 region alone could be successfully sequenced in 78.95% of the samples belonging to 38 species of bryophytes investigated by Liu et al. (2010). China Plant BOL Group (2011) could successfully sequence ITS in 88.9% of the tested seed plants. Essentially the distinction among species based on DNA barcoding relies on variation in a small number of positions in the nucleotide sequences of the selected loci. Therefore, the DNA sequences used for this purpose are required to be of high fidelity with unambiguous identification of each base (Kress and Erickson 2007). To ensure correct identification of each base in the trace sequence, commonly referred to as base calling (Ewing et al. 1998), electropherograms are subjected to base-calling using software, Phred. Phred assigns a quality value (QV) or a score to each nucleotide after analyzing the corresponding peaks (Ewing et al. 1998). This score (Q) commonly referred to as Phred score is logarithmically related to the error probabilities in base-calling by the software (Ewing and Green 1998) where,

Q = – 10 log10 P or P = 10 - Q/10

This implies quality score of 20 assigned to a base means that the chance of this base called incorrectly is only 1%. As it is a logarithmic scale, a Phred quality score of 30 implies that the possibility of wrong base calling is 0.1% (Ewing and Green 1998). A quality score of 20 and above is considered acceptable for DNA barcoding (CBOL Plant Working Group 2009). Thus, in the present investigation, any sequence falling below Phred quality score of 20 was not considered for further analysis. To ensure retrieval of high number of sequences suitable for further analysis, each sequence was trimmed from both the ends to remove the bases with Phred score

Page 18: Full Thesis in PDF - INFLIBNETshodhganga.inflibnet.ac.in/bitstream/10603/13622/10/10_chapter 5.pdf · of identified specimens, the collections can also be made from the wild. In the

���������� ����������������

�������������� � ������������������������� ����

below 20, before assembling the forward and reverse sequences by Sequencher, an assembly software used frequently along with other similar programmes, such as, CodonCode Aligner, DNA Baser, etc. (CBOL Plant Working Group 2009, Chen et al. 2010).

5.6 UNIVERSALITY OF BARCODE LOCI

To ascertain the universality of the selected loci and the primers, the

amplification and sequencing success of the candidate loci were also evaluated in species belonging to families other than Orchidaceae. The majority of such species analyzed were from angiosperm families, however, three were gymnosperms and one pteridophyte. The criterion for assessing universality of the five DNA candidate markers involved the assessment the region(s) which could be routinely amplified and sequenced in the maximum number of analyzed plants. Among the tested loci, rbcL, rpoB and rpoC1 had highest recoverability with amplification success rates above 95%. The remaining two loci, matK and ITS though exhibited low amplification success as compared to other three, their recoverability was still higher than the reported amplification rates in some of the plant species (Kress and Erickson 2007, Chen et al. 2010, Roy et al. 2010). Moreover, the amplification success for the latter two loci increased further when congeneric species of family Orchidaceae were also included in the comparison. To achieve universality, not only the primers but also the PCR reaction conditions should preferably be same. The Plant Working Group at Kew suggested one thermal cycle to be used for amplification of all tested loci (www.kew.org/protocols). Thus, a single thermal cycle which was slightly different from that of PWG, Kew protocols (www.kew.org/protocols) was standardized for PCR amplification of all the four loci from the chloroplast genome and the same cycle was used for amplifying orchid as well the non-orchid species. Sass et al. 2007 followed the thermocycle available on Kew website and Fazekas et al. (2008) used the protocols of University of Guelph Genomic facility for amplification of the tested loci from the chloroplast genome. On the other hand, Kress and Erickson (2007), Gonzales et al. (2009), Hollingsworth et al. (2009) and Roy et al. (2010) modified the themocycles according to the primers and locus(i) used for the amplification. The

Page 19: Full Thesis in PDF - INFLIBNETshodhganga.inflibnet.ac.in/bitstream/10603/13622/10/10_chapter 5.pdf · of identified specimens, the collections can also be made from the wild. In the

���������� ����������������

�������������� � ������������������������� �����

modifications mainly were the change in annealing temperatures which varied from 46-53°C for different loci, number of cycles and addition of DMSO in case of matK. The advantage of using single thermal cycle for all the four chloroplast loci would help in multiplex PCR amplification of more than one locus in multi-locus combinations. The multiplexing would add to reducing the time of amplifying various loci and money invested, thus helping in hastening the process of barcoding and making it cost effective.

5.7 INTRA- AND INTER-SPECIFIC VARIATIONS

The assessment of intra- and inter-specific variations is important for the correct identification of species and generation of DNA barcodes (Hebert et al. 2003a, b; Lahaye et al. 2008). The minimum inter-specific variation has to be higher than the maximum intra-specific variation (Meyer and Paulay 2005). The difference between the two is referred to as barcode gap (Meyer and Paulay 2005). An ideal barcode must exhibit a barcode gap, so that, the distribution of intra- and inter-specific divergences does not overlap (Lahaye et al. 2008). In the present investigations, the intra- and inter-specific divergences expressed in terms of K2P (Kimura-2-parameter) distances were evaluated as was also done by Chen et al. 2010. Another method to evaluate variations between two sequences in pair-wise comparison is uncorrected p-distances, which has been used by CBOL Plant Working Group (2009). A p-distance is the proportion (p) of nucleotide sites at which two sequences being compared are different. It is obtained by dividing the number of nucleotide differences by the total number of nucleotides compared (Nei and Kumar 2000). It does not make any correction for multiple substitutions at the same site, differences in the transitional and transversional rates or differences in evolutionary rates among sites (Nei and Kumar 2000). On the other hand K2P model takes into account transitional and transversional substitution rates, while assuming that the four nucleotide frequencies are the same and that rate of substitution do not vary among sites (Nei and Kumar 2000).

The intra-specific variations could be evaluated in 78 orchid species that were

represented by more than one individual. Intra-specific distances of variable range

Page 20: Full Thesis in PDF - INFLIBNETshodhganga.inflibnet.ac.in/bitstream/10603/13622/10/10_chapter 5.pdf · of identified specimens, the collections can also be made from the wild. In the

���������� ����������������

�������������� � ������������������������� ����

were obtained for all the five tested loci in different species, with highest maximum divergence value in rpoB (among individuals of Eulophia spectabilis) and the lowest in matK (among individuals of Porpax reticulata and Geodorum densiflorum). The matK sequences of maximum number of species (9) had intra-specific divergences, whereas, only three species showed such divergences in their rpoC1 sequences. A perfect barcode gap was observed in all the species with intra-specific variations for all the loci, except for rbcL locus in E. spectabilis, where intra-specific distance among its accessions and inter-specific distance with E. flava was the same. Hence due to overlapping intra- and inter-specific variations, these two species were not resolved correctly using distance method. Furthermore, when BLAST analysis of Paphiopedilum villosum for ITS sequences was carried out, one of its individual with intra-specific variations had 100% identity with other species of Paphiopedilum viz., P. insigne and P. gratixianum. However, the remaining three individuals with no variation in their sequence matched with their own species. Likewise for matK sequences, one Otochilus individual matched with Coelogyne fusescens, while the remaining two showed similarity with the same genus. These results need to be viewed with caution as the sequences available in GenBank cannot be totally relied upon because of erratic identifications and uneven sequence quality (Nilsson et al. 2006). Thus, intra-specific variations might lead to wrong species identifications. Therefore, it is important to evaluate variations within individuals of a species prior to generating barcodes.

In the data set I matrix, containing orchid as well as non-orchid species, the

maximum K2P distance among the different species was obtained with ITS sequences and this was slightly higher than the corresponding score for the matK. The high inter-specific divergence values among ITS sequences have also been reported by Kress et al. (2005) and Kress and Erickson (2007) in assemblages of plants with floristic approach, in cycads (Sass et al. 2007) and in Dendrobiums (Singh et al. 2012). Among the chloroplast loci, matK exhibited the highest and rbcL the lowest K2P distances. The orchid data set (II) also revealed a similar trend though the K2P distance for ITS was much higher as compared to four chloroplast loci. The probable reason could be that ITS sequences are difficult to align among species from diverse

Page 21: Full Thesis in PDF - INFLIBNETshodhganga.inflibnet.ac.in/bitstream/10603/13622/10/10_chapter 5.pdf · of identified specimens, the collections can also be made from the wild. In the

���������� ����������������

�������������� � ������������������������� �����

taxonomic groups. Due to difficulties in alignment, a number of insertions and deletions are introduced that lead to decreasing K2P distances. While among species of same taxonomic group like Orchidaceae in the present study, alignment of ITS sequences was easy and more variable sites were available for comparison. ITS has earlier been reported to be useful in reconstructing phylogenies down to tribe or sub-tribe levels but not suitable for levels above these (Cox et al. 1997). The inter-specific K2P distance of 0.066 for matK among orchid species was much higher than that reported (0.0125) by Lahaye et al. (2008) for 48 species of Orchidaceae. In the matrix, synonymous species viz., Pholidota imbricata and P. pallida (Govaerts et al. 2010) showed zero distance estimates in all the five tested loci. This provides an example of congruence of conventional taxonomy predominantly based on morphological characters and DNA taxonomy.

5.8 SPECIES DISCRIMINATION RATES AND EVALUATING DNA BARCODES

The three methods used for evaluating species resolution and selecting suitable DNA barcodes were genetic distance, phylogenetic tree method and BLAST analysis. The first method employs assessment of intra-specific as well as inter-specific K2P distances. The ideal barcodes must not exhibit any overlap in inter-specific and intra-specific divergence. The lowest value for the former should be higher than the maximum of the latter, thus providing a perfect barcode (Hebert et al. 2003a, Lahaye et al. 2008). The perfect barcoding-gap helps in assigning an unknown individual to its respective species correctly (Meyer and Paulay 2005) and in flagging potential new species. In the second method, phylogenetic tree is constructed using sequences of the candidate locus and per cent species resolution/monophyly was determined by cluster analysis (Lahaye et al. 2008, China Plant BOL Group 2011). The species for which all the individuals clustered together in a single clade are considered as unequivocally identified species/monophyletic and those which clustered with the individuals of the other species were treated as unresolved. In the present study, the species resolutions of the selected loci calculated using both these methods showed almost similar results except in few cases e.g., the intra-specific distance within Eulophia spectabilis and

Page 22: Full Thesis in PDF - INFLIBNETshodhganga.inflibnet.ac.in/bitstream/10603/13622/10/10_chapter 5.pdf · of identified specimens, the collections can also be made from the wild. In the

���������� ����������������

�������������� � ������������������������� �����

inter-specific distance between, this and other species of the same genus, E. flava, based on rbcL K2P distances overlapped and therefore, the two species were not resolved. However, in the NJ tree of the same locus, two accessions each of these two species segregated in two distinct clusters, while three accessions of E. spectabilis co-segregated at base of the tree beween the two branches formed by other accessions of two species. In case of Platanthera edgeworthii and P. latilabris, the matK sequences showed divergence based on K2P distance while, the accessions of two species did not cluster separately in the NJ tree. The remaining four loci also could not discriminate the two species. This might be due to the incorrect identification of the two P. latilabris accessions and the K2Pdistance obtained for matK sequences was due to intra-specific variation among the different accessions of P. edgeworthii. The tree based analysis for species discrimination also provides a convenient method of viewing the data as the unresolved species clusters could be identified easily. It also helps in identification of synonymous species e.g., Pholidota pallida and P. imbricata collected as two different species remained unresolved in NJ trees of all the four tested loci. The two names were later found to be synonyms in Kew checklist (Govaerts et al. 2010). The last method used for evaluating species discrimination is BLAST analysis (Ross et al. 2008). In this method the barcode sequence of an unknown individual is blast searched for a very similar or identical sequence from the database, containing reference barcodes of correctly identified species and the first hit with maximum score is taken as the species to which the unknown specimen belongs. In the present study, the species resolutions calculated using BLAST method were higher than those arrived at by the other two methods for three (ITS, matK and rpoB) of the selected loci in the data set I while, it had the highest discrimination ability for all the five tested loci in data set II. The China Plant BOL group (2011) and Singh et al. (2012) also observed an increased species resolution using BLAST method. This method is quite pertinent to the type of application that is needed to be developed through barcodes i.e., accurate species identification.

Another unique species discrimination method, though not tested in the present

investigation, needs a special mention because of the innovative approach followed. This method, suggested by Tyagi et al. (2010), involves the comparison of

Page 23: Full Thesis in PDF - INFLIBNETshodhganga.inflibnet.ac.in/bitstream/10603/13622/10/10_chapter 5.pdf · of identified specimens, the collections can also be made from the wild. In the

���������� ����������������

�������������� � ������������������������� �����

Oligonucleotide Frequency Ranges (OFR) of the proposed barcode loci. OFR for particular di- or trinucleotide is calculated by dividing the number of its occurrence in a sequence by the total number of di- or trinucleotides i.e., n-1 and n-2 respectively, where, n is the length of a particular sequence (Tyagi et al. 2010). The minimum and the maximum values for these di- or trinucleotide frequencies for a particular species are determined using the software OFR Generators. Two species are considered distinct if their OFRs do not overlap (Tyagi et al. 2010). The species discrimination obtained by this method, among an assemblage of 2,777 CO1 sequences from species ranging from fungi to mammals, 180 ITS sequences from plants and fungi, 251 matK sequences and 258 rbcL sequences of of land plants, was higher than that arrived at by other methods e.g., p-distance based method, Euclidean distance of oligonucleotide frequencies and nucleotide-character based method (Tyagi et al. 2010).

In the present investigations, among the tested loci, ITS exhibited the highest

overall discrimination in both the data sets, while rpoC1 provided the lowest species discrimination. The species resolution of 90% was obtained when ITS sequences of only the orchid species were compared. At the genus level, ITS showed 100% resolution in most of the genera analyzed, except in Paphiopedilum and Platanthera. The high species discrimination rate of ITS is consistent with the earlier reports suggesting it or ITS2 as potential DNA barcode for plants (Kress et al. 2005, Chen et al. 2010, Yao et al. 2010). The high discrimination ability of this region could be attributed to its high rate of evolution leading to genetic changes that allows differentiation of closely related congeneric species (Kress et al. 2005, Sass et al. 2007, Liu et al. 2011a, Singh et al. 2012). Liu et al. (2011a) and Singh et al. (2012) too obtained 100% species resolution with ITS among 8 species of Taxus and 129 Dendrobium species, respectively. However, among 131 individuals belonging to 26 species of Alnus, ITS was able to discriminate only 76.9% of the species (Ren et al. 2010). Among the Mexican Cactaceae, 86% of the 87 species tested could be correctly identified using this locus alone (Yesson et al. 2011). Starr et al. (2009) achieved only 25% resolution among 8 species of Carex with ITS sequences. Likewise, in the present study too, ITS in spite of having the highest sequence variation among the tested loci could discriminate only 50% of the Indian Paphiopedilums as opposed to

Page 24: Full Thesis in PDF - INFLIBNETshodhganga.inflibnet.ac.in/bitstream/10603/13622/10/10_chapter 5.pdf · of identified specimens, the collections can also be made from the wild. In the

���������� ����������������

�������������� � ������������������������� �����

matK which provided 100% species resolution (Parveen et al. 2012). Recently, the China Plant BOL Group (2011), based on their study involving comparison of ITS from 6,286 individuals belonging to 1,757 species, reported 79% discrimination rates for angiosperms. Furthermore, ITS in combination with any one of the tested plastid DNA markers (matK, rbcL or trnH-psbA) could achieve 69.9 - 79.1% species resolution (China Plant BOL Group 2011). Although, ITS with high genetic variations and species discrimination power seems to be the most suitable barcode for plants, there are some limitations which have so far restricted its use as universal barcode. The major concerns are (i) incomplete concerted evolution leading to its divergent paralogous copies within individuals, (ii) possibility of amplificantion of ITS from microbial contaminants, especially fungi which establish mycorrhizal association with many plants including orchids, (iii) its low amplification and sequencing rates in diverse taxa and (iv) difficulties in aligning the ITS sequences from species belonging to diverse taxonomic groups (Hollingsworth et al. 2011). However, in our analysis of a large data set with 530 individuals of 191 species, direct sequencing of single copy ITS sequences was successful in 85% of the sampled accessions though, the amplicons of this locus from three natural hybrids of Paphiopedilum could not be sequenced probably because of existence of multiple copies of ITS. Fungal ITS amplification along with genomic ITS is of common occurrence, especially in plants possessing endophytic fungi e.g., orchid species (Alvarez and Wendel 2003). However, the BLAST analysis of the investigated orchid species showed that none of the ITS sequences was of fungal origin. Alignment of ITS sequences was straightforward and easier for species belonging to the family Orchidaceae (data set II), whereas in the larger data set (I) with species from diverse families of plants, alignment posed problems as addition of several gaps was required. Likewise, in another analysis of larger data set by China Plant BOL group (2011) with 6,286 individuals of 1,757 species in 141 genera of plants, single-copy ITS sequencing was successful in 75.5% of the sampled individuals, 7.4% yielded multiple copies and fungal contaminations were detected in 1.8% of the sampled species. Recently, Singh et al. (2012) also reported high amplification and sequencing with 100% species discrimination by ITS locus among congeneric species of Dendrobium. The investigations in this thesis along with the latter two reports assert that the limitations

Page 25: Full Thesis in PDF - INFLIBNETshodhganga.inflibnet.ac.in/bitstream/10603/13622/10/10_chapter 5.pdf · of identified specimens, the collections can also be made from the wild. In the

���������� ����������������

�������������� � ������������������������� �����

with ITS are not as persistent as previously estimated and therefore, ITS if not as a sole barcode but at least along with one or two loci from the chloroplast genome could provide a universal multi-locus barcode for seed plants.

As in many cases complete ITS was difficult to amplify and sequence,

therefore ITS spacer-ITS2 (second internal transcribed spacer 2) had been suggested as an alternative (Chen et al. 2010, Yao et al. 2010). ITS2 region because of its small size and ease in sequencing and high species discrimination abilities has proved to be a useful barcode for species of Asteraceae and Euphorbiaceae (Gao et al. 2010a,b, Pang et al. 2010). Chen et al. (2010) tested its ability among 6,600 medicinal plants belonging to 4,800 species from 753 genera with a highly encouraging result of 92.7% discrimination at species level (Chen et al. 2010). Yao et al. (2010) downloaded 50,790 and 12,221 ITS2 sequences belonging to plants and animals, respectively, from the GenBank and observed that this locus could successfully discriminate 76.1% dicotyledons, 74.2% monocotyledons, 67.1% gymnosperms, 88.1% ferns, 77.4% mosses and 91.7% animals at the species level. Jeanson et al. (2011) tested four barcode loci (matK, rbcL, psbA-trnH and ITS2) on 39 individuals belonging to 26 Palm species from 3 genera of the tribe Caryoteae (subfamily Coryphoideae) and achieved 92% resolution by ITS2 in combination with matK and rbcL. The lack of universal primers for ITS and ITS1 and need for specific PCR conditions and additives for their amplification, might be the reasons for low amplification efficiency of ITS/ITS1 obtained in many groups (Kress et al. 2005, Kress and Erickson 2007, China Plant group 2011). Therefore, ITS2 has been suggested as more suitable barcode (Chase et al. 2007, Kress and Erickson 2007, Chen et al. 2010). Although, ITS2 possesses high universality, due to its short length (160-320 bp) number of informative characters available for identification is less than that available in the entire ITS region. Thus, ITS2 alone may not be suitable to discriminate congeneric species. This has been well highlighted by the observations on the members of the family Euphorbiaceae (Pang et al. 2010). Using ITS2, species discrimination rate was only 68% among congeneric species genus Glochidion, which was much lower than overall 96% species resolution obtained within the family (Pang et al. 2010).

Page 26: Full Thesis in PDF - INFLIBNETshodhganga.inflibnet.ac.in/bitstream/10603/13622/10/10_chapter 5.pdf · of identified specimens, the collections can also be made from the wild. In the

���������� ����������������

�������������� � ������������������������� �����

Among the chloroplast loci studied, matK provided maximum species resolution of 82.6% with distance and NJ tree methods and 92.02% of the sampled species were identified correctly with BLAST analysis. Earlier, Lahaye et al. (2008), on the basis of their study involving more than 1,036 species of Mesoamerican orchids, reported that matK alone or in combination with trnH-psbA could correctly identify >90% of the species and thus, proposed matK as a suitable DNA barcode for all land plants. However, the same investigators also highlighted the fact that the high resolving power of matK tended to decrease if the sampling was restricted to sister species rather than natural geographic assemblages of species. In the present study too, the species discrimination rates for orchid data set (II) decreased to 80%, which is similar to what was obtained with matK among 36 congeneric species of Dendrobium (Singh et al. 2012). In the present work, two species pairs viz., Otochilus sp.-Coelogyne fusescens, Nervilia gammieana-N. aragoana that showed inter-specific divergence and appeared as distinct in data set I, comprising orchid as well as non-orchid species, but remained unresolved and exhibited zero distances in matrix of only the orchid species (data set II). This example amply demonstrated that the species resolution tend to decrease when species from the same taxonomic group were compared as opposed to the floristic backdrop. At the generic level, among the congeneric species of Paphiopedilum, matK afforded 100% species resolution, as opposed to 50% resolution provided by ITS (Parveen et al. 2012). In contrast, ITS resolved all the analyzed species of Nervilia, while matK yielded 50% resolution. In the remaining genera, both ITS and matK showed 100% resolution. Therefore, from the present analysis of the congeneric orchidaceous species, it can be concluded that both matK and ITS would probably be required for attaining 100% species resolution. However, these results might change if the number of species analyzed in each genus is increased. The decrease in species resolution with increase in sample size has been well highlighted by Singh et al. (2012). In their study, when only 10 species of Dendrobium were included, 100% resolution was obtained with all the tested loci including matK. However, in a larger data set having 36 Dendrobium species, the resolving power of matK decreased to 80%, and other chloroplast regions had still lower species resolution values (Singh et al. 2012).

Page 27: Full Thesis in PDF - INFLIBNETshodhganga.inflibnet.ac.in/bitstream/10603/13622/10/10_chapter 5.pdf · of identified specimens, the collections can also be made from the wild. In the

���������� ����������������

�������������� � ������������������������� �����

The remaining three loci (rbcL, rpoB and rpoC1) from the chloroplast genome exhibited very low and varied species discriminatory powers (approx. 60-75%) using all the three methods. The rbcL resolved 74% of the species using distance and tree methods when species from diverse families were compared. It decreased to 55% in the orchid data set (II). The previous investigations had also projected the suitability of rbcL locus for reconstructing phylogenies only at family and sub-family level as it had limited application at the species level (Cameron et al. 1999, Soltis and Soltis 1998). In the present studies too, rbcL at family level could distinguish all the species belonging to diverse families of land plants however, among congeneric species, the resolving power was very low. The species discrimination rate for rpoB locus was similar to that afforded by rbcL, however, for rpoC1 it was lower than both rbcL and rpoB, in data set II. In data set I, the resolving powers of both the latter loci were lower than rbcL. The low discriminatory rate of rpoB and rpoC1 has also been observed in a number of other investigations (Sass et al. 2007, Starr et al. 2009, Kelly et al. 2010, Parveen et al. 2012, Singh et al. 2012). Despite low discrimination rates these loci show high primer universality. Universal application includes standard PCR amplification and sequencing primers as well as the ubiquitous presence of the locus in major land plant lineages. Therefore, these loci have been earlier proposed of utility in multi-locus and multiple level barcode in combination with matK or trnH-psbA as the primary plant barcode (Chase et al. 2005, Kress and Erickson 2007, CBOL Plant Working Group 2009).

5.9 MULTI-LOCUS COMBINATION

An ideal barcode should be universal, robust, cost effective and show high species discrimination (CBOL Plant Working Group 2009). As none of the tested barcode loci in the present investigation met all these criteria, various combinations of loci were tested for orchid species to achieve maximum resolution along with high universality. The multi-locus combination of rbcL+matK was suggested as the core barcode for plants by CBOL Plant Working Group (2009), though the species discrimination in a floristic back drop was only 72%. In the present analysis, this proposed core barcode produced a higher discrimination rate of 90.79%. However, this was lower than the value of 92% obtained with ITS alone or the combinations of

Page 28: Full Thesis in PDF - INFLIBNETshodhganga.inflibnet.ac.in/bitstream/10603/13622/10/10_chapter 5.pdf · of identified specimens, the collections can also be made from the wild. In the

���������� ����������������

�������������� � ������������������������� ����

ITS with any other tested chloroplast locus that exhibited 92-94% species resolution. Among the two-locus barcodes compared, best species resolution of 94.74% was provided by two-locus combination of matK and ITS. This was slightly higher than that afforded by rbcL+ITS (93.42%). For the taxa where matK could not be amplified and sequenced, rbcL could be used as a back-up to replace matK in two tiered approach as rbcL showed 95% recovery rate for orchids. Moreover, addition of ITS to the above suggested core barcode leads to 96% species discrimination rate. Further inclusion of either rpoB or rpoC1 and/or both as four- or five-locus combinations did not result in increase in species resolution. Therefore, rbcL+matK+ITS combination seems to be the standard DNA barcode for orchid species. Recently, China Plant BOL Group (2011) has also advocated the inclusion of ITS in the core barcode of matK+rbcL, earlier suggested as a possible universal barcode for plants by Plant working group of CBOL (2009). This proposition satisfies the various criteria desired from an ideal land plant DNA barcode. The combination has added advantage of having a locus from the nuclear genome besides two from the chloroplast genome thus, allaying the apprehension of limited efficacy of DNA barcodes because of their uniparental inheritance (Chase et al. 2005). The efficacy of three-locus combination can be demonstrated in different ways e.g., while barcoding of unidentified material, if both ITS and matK sequences are obtained, the maximal identification power could be obtained for correct species identification and even recently diverged or cryptic species could be distinguish easily. However, if sequence of only one of these is obtained along with that of rbcL, material may still be identified to a rough taxonomic position (for example, species group or genus). During barcoding of recently formed hybrids, there is a likelihood of not obtaining ITS sequences, as was the case with Paphiopedilum hybrids in the present study. This could be because of the coexistence of orthologous copies of ITS in the hybrid genome (Parveen et al. 2012). In such a situation, cloning may be an unavoidable step in the DNA barcoding procedure, however, only if the other two markers individually or in combination are unable to provide correct identification. In the present investigation, barcoding of natural hybrids of Paphiopedilum also identified the maternal parent using matK (Parveen et al. 2012). For congeneric species identification of orchids in the present study, both matK and ITS afforded 100% species in most of the genera analyzed. However, in two

Page 29: Full Thesis in PDF - INFLIBNETshodhganga.inflibnet.ac.in/bitstream/10603/13622/10/10_chapter 5.pdf · of identified specimens, the collections can also be made from the wild. In the

���������� ����������������

�������������� � ������������������������� �����

genera, Paphiopedilum and Nervilia, only one of these loci afforded 100% species discrimination. In Paphiopedilum, matK discriminated all analyzed species, whereas, ITS resolved 50% species. Diagonally opposite to this, ITS discriminated all analyzed species of Nervilia as opposed to matK with 50% resolution. Therefore, to facilitate identifications to the level of species by each of the suggested locus from matK+rbcL+ITS core barcode, maximum number of reference sequences for all three markers from most of the species of the world would be required in the databases.

5.10 FLORISTIC V/S TAXONOMIC ASSEMBLAGES

Various approaches have been used to evaluate the performance of plant barcoding loci in different species assemblages (Hollingsworth et al. 2009). The first approach, ‘species pairs’ approach involves taking pairs of related species from multiple phylogenetically divergent genera (Kress and Erickson 2007). This provides a profound assessment of universality of the regions, but only limited insights into species-level resolution, as all species of individual genera are not sampled to provide accurate estimation of the percentage of species that can be discriminated (Hollingsworth et al. 2009). The second approach is ‘floristic’ approach that involves sampling of multiple species belonging to diverse taxonomic groups within a given geographical region. This again can provide a sound assessment of universality and also represents an example of how barcoding might be applied in practice. However, the limitation of the ‘floristic’ approach is that sampling includes species from diverse taxonomic classes, but does not necessarily include the sister species of each species (Hollingsworth et al. 2009). This along with multiple cases of single species sampled per genus might result in overestimation/increased levels of species discrimination in floristic assemblages (Hollingsworth et al. 2009). Finally, the taxon-based approach involves sampling multiple species within a given taxonomic group. This provides limited insights into universality, but offers more definitive information on levels of species discrimination (Hollingsworth et al. 2009). The usefulness of various barcoding loci and their combinations has been tested mostly in floristic assemblages that are restricted to a particular geographic area (Kress and Erickson 2007, Fazekas et al. 2008, CBOL Plant Working Group 2009, Gonzalez et al. 2009, Kress et al. 2009, Chen et al. 2010, Ebihara

Page 30: Full Thesis in PDF - INFLIBNETshodhganga.inflibnet.ac.in/bitstream/10603/13622/10/10_chapter 5.pdf · of identified specimens, the collections can also be made from the wild. In the

���������� ����������������

�������������� � ������������������������� ����

et al. 2010, Kress et al. 2010, Burgess et al. 2011, Piredda et al. 2011, China Plant BOL Group 2011). Beside these, few reports are also based on taxon based sampling – Crocus (Seberg and Petersen 2009), Compsoneura (Newmaster et al. 2008), Asterella, Araucaria and Inga (Hollingsworth et al. 2009), Dendrobium (Yao et al. 2009, Singh et al. 2012), Berberis (Roy et al. 2010), Taxus (Liu et al. 2011) and Paphiopedilum (Parveen et al. 2012). However, the approach followed in the present investigation, combines broad phylogenetic coverage with taxon-based sampling within a single family of angiosperms. Though, the sampling is not very exhaustive within groups, species representing four out of five circumscribed sub-families of Orchidaceae could be collected. For testing universality of candidate barcode loci, limited number of species from different families of angiosperms, a few gymnosperms and one pteridophyte were also analyzed. The data set comprising species from diverse families of plants showed 100% species discrimination with rbcL sequences alone, while matK and ITS individually afforded 95% species resolution. Thus, in floristic assemblages with single species representing each genus/family, even rbcL could be successful in achieving species level discriminations. Probably, this was the reason for repeated proposals of inclusion of rbcL in a multilocus barcode for land plants, as the recommendations were mostly based on floristic studies (Kress and Erickson 2007, Fazekas et al. 2008, CBOL Plant Working Group 2009). The same locus had very low resolving power when congeneric species of a genus were compared. Therefore, this example highlights the importance of testing the discrimination power of any barcode locus at taxon level rather than in floristic sampling.

5.11 POSSIBLE APPLICATIONS OF DNA BARCODING

The applications of DNA barcoding have already been described in ‘Introduction’. Among the numerous possible applications of DNA barcoding in various fields, one of the most projected applications is for checking the illegal trade in endangered species of both plants and animals or biopiracy (Eaton et al. 2010, Jeanson et al. 2011, Muellner et al. 2011, Yesson et al. 2011). All Orchidaceae species are threatened and are listed in Appendix II of CITES and few have been included even in Appendix I. In the latter category are all species of Paphiopedilum and Renathera

Page 31: Full Thesis in PDF - INFLIBNETshodhganga.inflibnet.ac.in/bitstream/10603/13622/10/10_chapter 5.pdf · of identified specimens, the collections can also be made from the wild. In the

���������� ����������������

�������������� � ������������������������� �����

imschootiana (http://www.cites.org). The species listed in Appendix I are highly endangered and their collection from wild and trade is strictly prohibited. Many orchid species possess exquisitely beautiful flowers which are sold commercially in the market. One of the endangered orchid genera with high ornamental value is Paphiopedilum. The indiscriminate collections of Paphiopedilum species from the wild for their exotic ornamental flowers have rendered these plants endangered. Although, the trade of these endangered species from the wild is strictly forbidden, it continues unabated in one or other forms that elude the current identification methods. DNA barcoding that offers identification of a species even if only a small fragment of the organism at any stage of development is available could be of great utility in scrutinizing the illegal trade of endangered plant species. Therefore, in this study DNA barcodes were developed for eight Indian species (out of nine found in India) of Paphiopedilum along with their three natural hybrids using loci from both the chloroplast and nuclear genomes. The matK locus emerged as the signature sequence for the identification of closely related endangered species of Indian Paphiopedilums. DNA barcodes of the three hybrids also reflected their parentage to some extent and thus helped in elucidating the parentage of their inter-specific hybrids (Parveen et al. 2012). Whereas, the second endangered species, Renanthera imschootiana could be distinguished from other orchids using both matK and ITS sequences. The barcodes of endangered species generated in the present work can be utilized by customs officers or other agencies responsible for checking their illicit trade and thus help in their conservation.

The second possible application that can be exemplified is to develop DNA

barcodes of medicinal plants to enhance the accurate and rapid authentication of these plants and to check their substitutions. Orchid species not only form an ornamental elite group of economically important plants but also have medicinal and therapeutic properties (as described in ‘Review of Literature’). The medicinal orchids are collected from wild as none is under cultivation in India. Many of these orchids face the extreme danger of extinction due to over-exploitation and habitat destruction and hence needs to be conserved. The important Astavarga component of ayurvedic tonic Chyavanprashcomprises four orchid species viz., Habenaria intermedia (riddhi), H. edgeworthii (vriddhi), Malaxis acuminata (rishbhak) and M. muscifera (jivak). All these four orchids

Page 32: Full Thesis in PDF - INFLIBNETshodhganga.inflibnet.ac.in/bitstream/10603/13622/10/10_chapter 5.pdf · of identified specimens, the collections can also be made from the wild. In the

���������� ����������������

�������������� � ������������������������� �����

are distributed in Western Himalayas and H. intermedia is under threat of extinction due to its indiscriminate collection (Chauhan et al. 2007). The population of this IUCN listed endangered species is dwindling since many years (Ved et al. 2003) and hence need to be conserved. DNA barcoding has been projected as a useful technique for correct identification of medicinal plants and their allied species (Song et al. 2009, Chen et al. 2010, He et al. 2010, Gao et al. 2011) and for conservation of endangered plant species (Ogden et al. 2008, Jeanson et al. 2011, Muellner et al. 2011, Xiang et al. 2011, Parveen et al. 2012). Therefore, in the present study, DNA barcodes were developed for 26 medicinal orchid species viz. Aerides odorata, A. multiflora, Acampe praemorsa. Arundina graminifolia, Coelogyne cristata, C. fuscescens, C. nitida, Cymbidium aloifolium, Eria spicata (accepted name: Pinalia spicata), Eulophia nuda (accepted name: E. spectabilis), Geodorum densiflorum, Goodyera repens, H. edgeworthii (accepted name: Platanthera edgeworthii), H. longicorniculata, H. roxburghii, Habenaria intermedia, Malaxis acuminata (accepted name: Crepidium acuminatum), Pholidota articulata, P. imbricata, Polystachya concreta, Rhynchostylis retusa, Satyrium nepalense, Vanda cristata, Vanda tessellata, Vanda testacea and Vanilla planifolia. These barcodes will not only help in easy and rapid identification of species in herbal formulations but also help in their conservation by checking their illicit trade.

DNA barcoding has also been successfully used to identify the adulterants and

substitutions in herbal formulations. For example, in Black Cohosh (herbal supplement containing a North American species Actea raceomosa) using matK sequences it was detected that Asian species of Actea are used as substitutes in commercially available herbal formulation (Harmon 2010). Srirama et al. (2010) demonstrated the applicability of DNA barcoding in authenticating the market samples of herbal medicines by procuring Phyllanthus samples, sold as Keezhanelli or Kirunelli (vernacular names), from 25 different shops located in Karnataka, Tamilnadu and Kerala. Species specific DNA barcode sequences of psbA–trnH revealed that six species of Phyllanthus are being sold as admixtures in southern India and among those Phyllanthus amarus is the predominant one (Srirama et al. 2010). In the present investigation, the authenticity of medicinal orchids, especially the four species in Astavarga, was checked in samples available in market and were compared with the corresponding barcodes available in

Page 33: Full Thesis in PDF - INFLIBNETshodhganga.inflibnet.ac.in/bitstream/10603/13622/10/10_chapter 5.pdf · of identified specimens, the collections can also be made from the wild. In the

���������� ����������������

�������������� � ������������������������� �����

Genbank. The orchids were procured using their vernacular names from 2-3 different sources in the market. The BLAST analysis of ITS and matK sequences revealed that the psedobulbs sold as Jivak and/or Rishbhak belonged to one species i.e., Malaxis acuminata and rbcL showed 99% similarity to Oreorchis species. The only sample of Jivak collected could not be analyzed as only the ITS sequence could be generated from this sample and this also turned out to be of a fungal contaminant. On the other hand, the psedobulbs of Riddhi and/or Vriddhi showed 99% identity to Thunia alba based on matK and rbcL sequences. The queried ITS sequences of the same were recovered as unique. This analysis showed that the plants sold as Jivak and Rishbhak belong to one species only, although, in literature these are two different species of the same genus and the Riddhi/Vriddhi samples are different species altogether with no resemblance to the two Habenaria species.

5.12 CONCLUSIONS EMERGING FROM THE STUDY

Based on the observations presented in the thesis, it can perhaps be concluded that the applicability of DNA barcoding to plants is amply validated. However, quest for a universal barcode for plants, whether based on single locus or multiple loci, that could provide 100% species resolution across the plant kingdom, is as unrealistic as it was in 2007. Moreover, DNA barcoding, like any other technology, is not expected to be 100% perfect, though in some cases failure in species discrimination might have been due to inadequate taxonomic delimitation. However, if required for 100% species resolution within a taxonomic grouping, DNA barcodes would most likely be taxa specific. This implies that the projection that DNA barcodes once available for all the described species would be able to provide a correct identity up to species level to any unknown sample, whether available in vegetative, fragmented or DNA form, or would indicate the total novelty leading to the discovery of new species after proper taxonomic studies, may not come true. However, more than 90% success in species identification with single locus or two-/three-locus combinations also emphatically demonstrates the efficacy of the technique. Rather, the instances of not so correct identification by DNA barcodes may encourage taxonomist for re-consideration or re-investigation of the taxa remaining unresolved by DNA barcoding.