˘ ˇ ˆ˘ - diva portal302236/fulltext01.pdf · exogenous spumaretroviruses or foamy viruses have...

36

Upload: others

Post on 03-Aug-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: ˘ ˇ ˆ˘ - DiVA portal302236/FULLTEXT01.pdf · Exogenous spumaretroviruses or foamy viruses have a similar host range as lentiviruses. Endogenous spumaretroviruses were recently

������������������� ���������� �

����

���������� �������������� ������������������� ������� ����������������������������

������� � ���������� ����������������������������������� ����

� �!��"#$ %&#$&'(

!��$��)��*)+,)!�"$�-./*-�*��0*..0,*,���1�2�1��1��1���*�+,,+/

Page 2: ˘ ˇ ˆ˘ - DiVA portal302236/FULLTEXT01.pdf · Exogenous spumaretroviruses or foamy viruses have a similar host range as lentiviruses. Endogenous spumaretroviruses were recently

����������������������� ������� ������������������������������������������������������������������������������� �! � �������� ��������"�������#������$��%&�&���&'(&&�)���*���������)������)�+*����*��,"�������)�-������./�0*�����������1����������������2����*/

��������

3���*�*���"/�%&�&/�4������������0�������4������5�6���������������������+*�����/#���� ������������ ���������/����������� �������������� ������������������� ������� ������������������������� 7�/�������%$���/� ������/�8639�'�!:'�: ;:��;&:&/

<������������������,<04�.�����:�������������)�������*�������:����������)�<04�����������/�0*��������������)�<04����������*������������������*��/�������-����������,�--�.���������������������1������������������������������������)�����������1��*��*��������������/�0*������)��*���1���1�����������������<04��)����������������<04������������������--�/+�����8������������*�����*�����)��--���������������������))�����������)�<04�

)��� ������� ������������ ,=4>�.� ��� ������� ������������ ,24>�./� 0*�� ��������������������)��--��1����������������1����)��� �����*��*�)��*�������������)<04�/� 0*�� ��������� ��������� ��� �*���--������������ �����������)�� ����)�1*��*���������������� ���1� )������)�=4>�/�0*������ )��������)� �*����))����������)�����������<04��1���� ���������������������� �*��� ������������������/�0*���1����*�� �*��� �������� �������� �������� 0?� ��� @#� ��� �*���� #0:���*� ������*��� 1*��*� ������������������1��*�0#0#���������##0###�������������������/8�+�����88���*���������������)��*����������)�����������<04��1���������������������

�--:��������������/�0*��<04� ������1������������1��*� ��������������*����������������������������:����������1���<04������*��������/8�+�����888���*�����*�����+�����8����88�1��������������<04��)����*�������������

�����������������*�)��*������������)������1�<04�/�"���*��)������������<04��*���������������*�����/�0*����1�������������������������1����*��<04����������������������*������))�����������)��*��+�����/�0*����������<04�����������)�������8�1���)����������������<04�/�+��������8�������������������)����������������%����������*���*������������������0?A@#/+�����8>�����������1��������������������*������)���B�������������������������������������

)���*��������)�24>�/�24>��1�������������������)��������*�����������������������*�����*�)��*�����������������������/�C*�����*���������)������������)����������������������24>�/�#���������������������������24>����=4>������������������������������1����)���������������������������������������������������*�������*������������1��*��1��������)����24>���B�����/+�����>� ��������������1�)� ���)�������� ���� )�� ��������)�24>�����)� ���������

������� ��)�������)���������D���/� 8�� ������ �������������)� ������ ���������� ����������������������������������*��������)�*����24>�����������*��������������������������/8����������--��1������������������)��������������������<04�/�+�������1�������

��������������*��������������������������*�����/�0*�����*���������������*����*������������������������))����������)�:������9#���B�����������/

���� ����4������������������������������*�����-�������������*��������������������������)������:��

�� ������������� ���������!� ��� �"#������#���$�#����� �������������� ���� ��%&'()*(������� �������

E�"�����3���*�*��%&�&

8669��$ �:$%&$8639�'�!:'�: ;:��;&:&��(�(��(��(����:�%&&%!�,*���(AA��/��/��A������F��G��(�(��(��(����:�%&&%!.

Page 3: ˘ ˇ ˆ˘ - DiVA portal302236/FULLTEXT01.pdf · Exogenous spumaretroviruses or foamy viruses have a similar host range as lentiviruses. Endogenous spumaretroviruses were recently

To my family

Page 4: ˘ ˇ ˆ˘ - DiVA portal302236/FULLTEXT01.pdf · Exogenous spumaretroviruses or foamy viruses have a similar host range as lentiviruses. Endogenous spumaretroviruses were recently
Page 5: ˘ ˇ ˆ˘ - DiVA portal302236/FULLTEXT01.pdf · Exogenous spumaretroviruses or foamy viruses have a similar host range as lentiviruses. Endogenous spumaretroviruses were recently

List of papers

I Benachenhou, F., Jern, P., Oja, M., Sperber, G., Blikstad, V.,

Somervuo, P., Kaski, S. and Blomberg, J. Evolutionary conser-vation of orthoretroviral long terminal repeats (LTRs) and ab initio detection of single LTRs in genomic data. PLoS ONE 4 (2009b), p. e5179

II Benachenhou, F., Blikstad, V. and Blomberg, J. The phylogeny

of orthoretroviral long terminal repeats (LTRs). Gene 448 (2009a), pp. 134-8.

III Benachenhou, F., Sperber, G. and Blomberg, J. Universal struc-

ture and phylogeny of Long Terminal Repeats (LTRs). Manu-script.

IV Blomberg, J., Benachenhou, F., Blikstad, V., Sperber, G. and

Mayer, J. Classification and nomenclature of endogenous retro-viral sequences (ERVs): problems and recommendations. Gene 448 (2009), pp. 115-23.

V Blikstad, V., Benachenhou, F., Sperber, G.O. and Blomberg, J.

Evolution of human endogenous retroviral sequences: a concep-tual account. Cell Mol Life Sci 65 (2008), pp. 3348-65.

Page 6: ˘ ˇ ˆ˘ - DiVA portal302236/FULLTEXT01.pdf · Exogenous spumaretroviruses or foamy viruses have a similar host range as lentiviruses. Endogenous spumaretroviruses were recently
Page 7: ˘ ˇ ˆ˘ - DiVA portal302236/FULLTEXT01.pdf · Exogenous spumaretroviruses or foamy viruses have a similar host range as lentiviruses. Endogenous spumaretroviruses were recently

Contents

Introduction.....................................................................................................1 Retroviruses and LTR retrotransposons .....................................................1

Background............................................................................................1 XRVs and ERVs....................................................................................3 Relation to other genomic elements ......................................................4 LTRs ......................................................................................................5

Hidden Markov models ..............................................................................7 Why HMMs? .........................................................................................7 Generalities ............................................................................................7 Building HMMs for LTRs .....................................................................8 Overfitting and regularisation................................................................8 Model surgery........................................................................................8 Scoring...................................................................................................9 Evaluation..............................................................................................9 SuperViterbi...........................................................................................9

Summary of papers .......................................................................................10 Paper I ......................................................................................................10 Paper II .....................................................................................................13 Paper III....................................................................................................17 Paper IV ...................................................................................................19 Paper V.....................................................................................................19

Conclusions and future prospects .................................................................21

Acknowledgements.......................................................................................23

References.....................................................................................................24

Page 8: ˘ ˇ ˆ˘ - DiVA portal302236/FULLTEXT01.pdf · Exogenous spumaretroviruses or foamy viruses have a similar host range as lentiviruses. Endogenous spumaretroviruses were recently

Abbreviations

env Envelope gene ERV Endogenous retrovirus gag Group specific antigen gene HERV Human endogenous retrovirus HML Human MMTV-like retrovirus HMM Hidden Markov model HTLV Human T-cell leukemia virus IN Integrase domain JSRV Jaagsiekte sheep retrovirus LTR Long terminal repeats MAP Maximum a posteriori ML Maximum likelihood MLV Mouse leukemia virus MMTV Mouse mammary tumor virus ORF Open reading frame PAS Polyadenylation site PBS Primer binding site PLE Penelope like elements pol Polymerase gene PPT Polypurine tract pro Protease gene pSIVgml Grey mouse lemur prosimian immu-

nodeficiency virus R Repeat region (in LTR) RELIK Rabbit endogenous lentivirus type K RH RNAse H domain RT Reverse transcriptase domain TSD Target site duplication TSS Transcription start site U3 Unique 3’ region U5 Unique 5’ region XRV Exogenous retrovirus YR Tyrosine recombinase

Page 9: ˘ ˇ ˆ˘ - DiVA portal302236/FULLTEXT01.pdf · Exogenous spumaretroviruses or foamy viruses have a similar host range as lentiviruses. Endogenous spumaretroviruses were recently

1

Introduction

The present work is devoted to the study of a specific part of retroviruses and related retrotransposons, the long terminal repeats (LTRs). The study of LTRs is important for two reasons. First, there are very few systematic investigations of LTRs published in the literature. This is partly because of the scarcity of conserved motifs and partly because LTRs do not code for proteins and as a result standard alignment techniques such as ClustalW cannot be used. Second, LTRs are ubiquitous components of many eukaryote genomes. In the human genome for example, seven to eight per-cent is of retroviral origin (Lander et al., 2001). The aim of the present work is threefold. i) To create reliable alignments of LTRs. ii) To study the evolutionary relationships between LTR retrotrans-posons using the alignments. iii) To detect LTRs in genomic DNA. This was done by utilising the mathematical tool hidden Markov models (HMMs).

Retroviruses and LTR retrotransposons Background Retroviruses are a family of RNA viruses which infect vertebrates (Coffin et al., 1997; Mager and Medstrand, 2003). Their genome consists of two iden-tical positive-strand RNAs. One of their characteristic features is their ability to make a DNA copy of their RNA genome, reverse transcription. Retroviruses integrate into the host genome as part of their replication. This property, although unique among viruses, is shared with other retroelements, the LTR retrotransposons (Boeke and Stote, 1997; Eickbush and Jamburut-hugoda, 2008). It is remarkable that the Copia/Ty1 retrotransposons (Flavell et al., 1997; Peterson-Burch and Voytas, 2002), found in fungi, plants, invertebrates and some vertebrates, have almost the same genetic structure as retroviruses. The only difference is that they generally lack the env-gene and therefore cannot leave the cell. They are however capable of reintegrating into the cellular genome and make multiple copies of their own genome. However, one line-

Page 10: ˘ ˇ ˆ˘ - DiVA portal302236/FULLTEXT01.pdf · Exogenous spumaretroviruses or foamy viruses have a similar host range as lentiviruses. Endogenous spumaretroviruses were recently

2

age of Copia/Ty1, referred to as Agroviruses or Sireviruses (Peterson-Burch and Voytas, 2002; Llorens et al., 2009), has an env-like gene. The Gypsy/Ty3 elements (Kordis, 2005; Llorens et al., 2008) found in fungi, plants, invertebrates and some vertebrates, are even more similar to retrovi-ruses. Like the Copia/Ty1 group they are generally non-infectious transpos-able elements except for the errantividae of insects which have infectious properties (Terzian et al., 2001) thanks to an env-like gene. The remaining group of LTR-containing retrotransposons is the Bel/Pao group in metazoans (Boeke and Stote, 1997). Retroviruses and other env-containing retroelements can occasionally infect the germ line and be transmitted in a Mendelian fashion thereby becoming endogenous retroviruses (ERVs, HERVs in the human genome). The genome of LTR retrotransposons comprises four protein-coding genes (Coffin et al., 1997): The gag-, the pro-, the pol- and the env-gene. Pol is the best conserved gene and encodes the reverse transcriptase (RT), the RNAse H (RH) and integrase (IN) proteins. The env-encoded proteins allow for the fusion between retrotransposon and cell, thus conveying infectivity. In the genomic, full length, RNA, the four coding genes are flanked by two non-coding sequences. Both of them end in an identical region, the repeat region R, but they also contain two unique regions adjacent to R: The U5 region at the 5’ end and the U3 region at the 3’ end. After reverse transcrip-tion the flanking non-coding sequences become identical and are called LTRs. They consist of U3 followed by R and U5. Downstream of the 5’ LTR is the primer binding site (PBS) that pairs to the host t-RNA priming the synthesis of the first DNA-strand in the reverse transcription reaction. The 3’ LTR is preceded by the polypurine tract (PPT) which primes the syn-thesis of the second DNA-strand. ERVs are often present as solo LTRs, lacking the internal genes. They were formed through homologous recombination between the two LTRs, and elimination of internal sequences. The RT-protein has been used to classify all retroelements (Xiong and Eick-bush, 1988; Xiong and Eickbush, 1990). Copia/Ty1 appears to be ancestral, with Gypsy/Ty3 and retroviruses being sister groups. Gypsy/Ty3 is by far the most diverse group with at least ten subgroups (Malik and Eickbush, 1999; Kordis, 2005).

Page 11: ˘ ˇ ˆ˘ - DiVA portal302236/FULLTEXT01.pdf · Exogenous spumaretroviruses or foamy viruses have a similar host range as lentiviruses. Endogenous spumaretroviruses were recently

3

XRVs and ERVs Retroviruses have traditionally been classified in two ways depending on perspective –virus or genomic repetitive element. As viruses or XRVs, they are organised in seven genera: alpha-, beta-, gamma-, delta-, epsilon-, lenti- and spumaretroviruses, following the Inter-national Committee on Taxonomy of Viruses (ICTV). ERVs, being much more numerous than XRVs, are divided into broader groupings: Class I, II and III. This classification depends on the length of the target site duplication (TSD)- Class I (or ERV1) has 4 bp, Class II (or ERV2) has 6 bp and Class III (or ERV3) has 5 bp. Neither classification scheme is able to account for all ERVs and XRVs. For example, ERVs have been discovered which are intermediate between alpha- and betaretroviruses (Blikstad et al., 2008), and the newly discovered lentivi-ral ERVs (Katzourakis et al., 2007; Blikstad et al., 2008; Gifford et al., 2008) fall outside the three Classes, as do their exogenous relatives. Furthermore, the ICTV and ERV classification are somewhat incongruent. Whereas ERV1 correspond to gammaretroviruses and ERV2 correspond to betaretroviruses, ERV3 are only distantly related to spumaretroviruses. In addition, alpha-, delta- and lentiviruses cannot be assigned to any Class, even if they are phylogenetically closer to ERV2 (Jern et al., 2005). Simi-larly epsilonretroviruses are related to but not part of ERV1 (Jern et al., 2005). The PBS is useful in naming clades of retroviruses. For example, the most common gammaretroviral HERV uses histidine t-RNA as its primer and is accordingly known as HERV-H.

Alpharetroviruses Alpharetroviruses are found exclusively in birds both as XRVs and ERVs.

Betaretroviruses ERV2 are present in most vertebrate genomes but are especially numerous in mice. Some exogenous betaretroviruses such as MMTV in mice or JSRV in sheep have closely related endogenous counterparts. The human genome contains betaretroviral ERVs from a particular clade, called HML or HERV-K, which can be divided into ten subgroups, HML1-10 (Andersson et al., 1999). Most of the ten subfamilies entered the primate genome relatively recently after the split between New World and Old World monkeys about 35 Million years ago. HML5 however seems to be older, around 55 Million years old (Lavie et al., 2004).

Page 12: ˘ ˇ ˆ˘ - DiVA portal302236/FULLTEXT01.pdf · Exogenous spumaretroviruses or foamy viruses have a similar host range as lentiviruses. Endogenous spumaretroviruses were recently

4

Gammaretroviruses Gammaretroviral ERVs are widely represented in vertebrates from fishes to humans. In humans, ERV1 consist of numerous clades, many of them are present only as solo LTRs or with no recognisable protein-coding genes (Mager and Medstrand, 2003). Similarly to ERV2, most analysable ERV1 seem to have originated after the split between New World and Old World monkeys. Some clades such as HERV-E and HERV-T are relatively closely related to the exogenous murine leukemia virus, MLV.

Deltaretroviruses and epsilonretroviruses Deltaretroviruses are only known as exogenous retroviruses. They infect mammals such as primates and bovines. Epsilonretroviruses are mainly found in fishes and amphibians.

Lentiviruses Lentiviruses exemplified by HIV infect primates, carnivores and ungulates. Endogenous lentiviruses have recently been found in rabbits (Katzourakis et al., 2007) and lemurs (Gifford et al., 2008).

Spumaretroviruses Exogenous spumaretroviruses or foamy viruses have a similar host range as lentiviruses. Endogenous spumaretroviruses were recently discovered in sloths (Katzourakis et al., 2009). Interestingly, foamy viruses have co-evolved with their hosts, showing that these viruses have circulated for more than 100 Million years. ERV3, distantly related to foamy viruses, are less intact than both ERV1 and ERV2 and appear to be older, 100-150 Million years old (Mager and Med-strand, 2003).

Relation to other genomic elements Retroviruses and LTR-retrotransposons are but one lineage of mobile ele-ments with reverse-transcribing ability. There are three other classes (Eickbush and Jamburuthugoda, 2008):

• The tyrosine recombinase (YR) retrotransposons represented by

DIRS and PAT (Poulter and Goodwin, 2005). In contrast to LTR retrotransposons they recombine into the host genome by means of their YR. They also differ by not having LTRs. DIRS has inverted terminal repeats while PAT has so-called split direct repeats where the unrelated terminal sequences are repeated in the internal region. The YR retrotransposons have been discovered in many organisms, including primitive eukaryotes such as volvox and trypanosoma.

Page 13: ˘ ˇ ˆ˘ - DiVA portal302236/FULLTEXT01.pdf · Exogenous spumaretroviruses or foamy viruses have a similar host range as lentiviruses. Endogenous spumaretroviruses were recently

5

• The non-LTR retrotransposon such as LINE elements. They code for an endonuclease to cleave the target DNA, the released 3’ ends being used as primers in the reverse transcription of the element’s RNA into the new site. Non-LTR retrotransposon are present in high copy numbers in eukaryotes. LINEs for example, account for 21% of the humane genome (Lander et al., 2001).

• The Penelope-like (PLE) elements in eukaryotes have the most di-

verse structure. Some elements have LTRs, some have ITRs. The PLE elements appear to be ancestral (Eickbush and Jamburuthugoda, 2008). The YR elements are nested within the LTR retrotransposons in most phylogenies (Poulter and Goodwin, 2005) but due to various uncertainties, it is still possible that they are the original LTR retrotransposons. Two RT-coding DNA viruses, the Hepadnaviruses including the Hepatitis B virus, and the Caulimoviruses of plants also cluster with LTR retrotransposons although they both lack LTRs (Xiong and Eickbush, 1988; Xiong and Eick-bush, 1990). Other sequences more distantly related to RT are group II mitochondrial introns, bacterial retrons which synthesise multicopy single-stranded DNA (ms-DNA) and eukaryotic telomerases (Xiong and Eickbush, 1988; Xiong and Eickbush, 1990). RNA-directed RNA polymerases are also distantly related to RT (Xiong and Eickbush, 1990). The DDE integrases of LTR retrotransposons are present in the eukaryotic “cut and paste” DNA transposons and bacterial IS3 and IS481 insertion se-quences. YR recombinases are found in other DNA transposons, the Cryp-tons, detected in Fungi. As can be seen, the relationships between all RT-, YR recombinase and DDE integrase containing elements seem particularly complex and difficult to resolve.

LTRs LTRs vary widely in structure among LTR retrotransposons (Leib-Mosch et al., 2005). Their lengths range from 200 bp to several thousand bp. Most LTRs start with TG and end with CA, short inverted repeats. The short in-verted repeats serve as recognition signals for IN. IN binding interactions may extend 15-20 bp from the viral DNA ends (Brown, 1997). Relative to the provirus, the unintegrated viral DNA has an additional 2 bp inverted repeats which are important in the binding to IN but are eventually removed (Brown, 1997). U3 contains enhancer and promoter elements. The TATA-box promoter is common in vertebrate LTRs. The U3-region of the longer LTRs may contain

Page 14: ˘ ˇ ˆ˘ - DiVA portal302236/FULLTEXT01.pdf · Exogenous spumaretroviruses or foamy viruses have a similar host range as lentiviruses. Endogenous spumaretroviruses were recently

6

open reading frames (ORFs): Examples are the sag of MMTV and the nef of primate lentiviruses such as HIV. The most conserved motif in LTRs is the polyadenylation signal, the AATAAA motif which is often located in the R region. R always starts at the transcription start site (TSS) and ends at the polyadenylation site (PAS). R is important for both transcription initiation and RNA processing, in addition to its role in reverse transcription. Apart from the terminus ending in CA, U5 has few conserved motifs but a T-rich area is present.

Page 15: ˘ ˇ ˆ˘ - DiVA portal302236/FULLTEXT01.pdf · Exogenous spumaretroviruses or foamy viruses have a similar host range as lentiviruses. Endogenous spumaretroviruses were recently

7

Hidden Markov models Why HMMs?

As mentioned above, LTRs are heterogeneous. HMMs were chosen be-cause:

• They are able to detect and align common core motifs in highly variable sequences.

• They are fully probabilistic models and can be used ab initio. • They can be visualised because they provide alignments and

consensuses.

There are some drawbacks:

• HMMs may not correctly align regions between the conserved core motifs of the family.

• Finding the best HMM for a given family is not straightforward. • For detection in a database, HMMs tend to be slow compared to

other methods.

Generalities HMMs are widely used in pattern recognition, e. g. in speech recognition (Rabiner, 1989). Their task is to generate a stochastic succession of signals. In biology the signal can be a nucleotide or amino acid residue in a se-quence. A profile HMM (Krogh et al., 1994; Durbin, 1998) is a sequence of inter-connected modules. Each module consists of a match state, corresponding to a conserved residue and insert and delete states to model indels. The states are not observable because it is not known a priori which match or insert state a particular residue was emitted from; the underlying Markov process, i. e. the succession of states, is hidden. There are three basic problems for HMMs:

• To calculate the probability of generating a given sequence from a given HMM. This problem is solved by the forward algorithm.

• To find the optimal path of a given sequence through a given HMM, i. e. to determine which insert or match state correspond to which residue. This problem is solved by the Viterbi algorithm. If several sequences are aligned to the HMM, the resulting alignment is called a Viterbi alignment.

Page 16: ˘ ˇ ˆ˘ - DiVA portal302236/FULLTEXT01.pdf · Exogenous spumaretroviruses or foamy viruses have a similar host range as lentiviruses. Endogenous spumaretroviruses were recently

8

• To construct the HMM which best represents the family of se-quences under study. The starting point is a set of sequences which is a good sample of the family, the training set. A maximum likeli-hood (ML) estimate of the parameters of the HMM would select the HMM which generates the training sequences with the highest like-lihood. This optimisation problem has no general solution. However local optima can be found by means of the Baum-Welch algorithm which is an iterative procedure.

Building HMMs for LTRs The parameters of the HMM have to be initialised. This can be done by us-ing a single sequence and assuming that each residue corresponds to a match state. The parameters are then optimised by applying the Baum-Welch algo-rithm to the training set. The training set must be designed with care. As an example, most available lentiviruses are from primates. An HMM trained on these LTRs would be biased to primates. To reduce this bias one can either assign weights to the sequences or replace closely related sequences by their consensus (Durbin, 1998).

Overfitting and regularisation Another difficulty is when the HMM is not general enough. It generates with appreciable probability the sequences found in the training set but not other related sequences (overfitting). If for example the training sequences are too few, the ML estimate leads to HMM parameters which are implausible. To increase the generality of an HMM a method to regularise it is needed. This is achieved by incorporating prior knowledge about the parameters of the HMM in a Bayesian statistical framework. The ML estimate of HMM parameters is replaced by a maximum a posteriori (MAP) estimate which maximises the a posteriori probability of the training sequences taking into account prior information about the parameters (Durbin, 1998). A simple and elegant prior is the entropic prior invented by (Brand, 1999). It favours high entropy (more random) HMMs or low entropy (less random) HMMs depending on the sign of a parameter z. z can be thought of as adding or re-moving random noise from the training data.

Model surgery The number of modules of the initial HMM is equal to the length of the ini-tialisation LTR, around 500 bp for a typical LTR. The number of conserved

Page 17: ˘ ˇ ˆ˘ - DiVA portal302236/FULLTEXT01.pdf · Exogenous spumaretroviruses or foamy viruses have a similar host range as lentiviruses. Endogenous spumaretroviruses were recently

9

residues is however probably lower than this. It is therefore desirable to de-lete the modules with less used match states. This procedure is called model surgery and was introduced by (Krogh et al., 1994).

Scoring The score of a sequence was calculated as the logarithm of its HMM prob-ability minus the logarithm of the probability of the reverse sequence (Karplus et al., 2005). This scoring method has the advantage of being insen-sitive to the composition bias of the sequence because the reverse sequence has the same composition as the original sequence. Another attractive feature is that random sequences score zero on average.

Evaluation The most challenging part in HMM modelling of an LTR family was to se-lect good models. A good HMM is capable of generating unseen LTRs be-longing to the family of interest. In paper I-III many HMMs were constructed with increasing number of modules (M) and with different values of z. In paper I and III, the different HMMs were evaluated by designing a test set which contained LTRs different enough from the training LTRs. The aver-age score of the test LTRs was then plotted against the number of modules in the HMM. For a fixed z the score increased with M until it reached a plateau. The z-value with the highest plateau was preferred but the choice of M was not unique. Any value on the plateau would do. Another evaluation method was tried in paper II. A Viterbi alignment of the training set was created by each HMM. Based on the alignment a phyloge-netic tree was obtained. The models could be selected by searching for well-supported trees. Manual inspection of the Viterbi alignments was also useful in searching for good models. Viterbi alignments displayed the conserved residues of an LTR family. However the positions in U3, R and U5 of longer insertions were important information too.

SuperViterbi Several models were combined by Viterbi aligning their HMM consensuses. This is a kind of alignment of alignments or “SuperViterbi”.

Page 18: ˘ ˇ ˆ˘ - DiVA portal302236/FULLTEXT01.pdf · Exogenous spumaretroviruses or foamy viruses have a similar host range as lentiviruses. Endogenous spumaretroviruses were recently

10

Summary of papers

Paper I Five HMM models were built for different groupings of retroviral LTRs: HMLs, general betaretroviral, lentiviral, gammaretroviral and general verte-brate. The lentiviral and HML groups were the most homogeneous followed by the gamma, the general beta and the general groups. The HMMs were tested for detection on human chromosome 19 by compar-ing them to the RepeatMasker output. The HML model demonstrated high sensitivity and specificity (see Table 1). The gamma model was somewhat worse as expected from the higher diversity of gammaretroviruses (see Table 1). The cutoff was selected to ensure a relatively high sensitivity.

Table 1: LTR HMM and RepeatMasker cross correlation. The table shows the number of LTRs detected in chromosome 19 as compared to the RepeatMasker output.

Threshold

HMM+ REP+

HMM+ REP-

HMM- REP+ Sensitivity Specificity

Hml 5 395 18 57 0.87 0.96

Gamma -1 631 1051 362 0.64 0.38

Table 2: “Jackknifing” the HML HMM: Removing one group from the training set and detecting the group removed in chromosome 19.

Model no_hml1 no_hml2 no_hml3 no_hml4 no_hml5 no_hml6 no_hml7 no_hml8 no_hml9 no_hml10

# match states 130 170 110 130 110 110 130 110 210 170

Threshold 5 5 6 5 5 5 6 5 6 5

Sensitivity (including removed group)

(%)

60 87 62 78 70 82 80 74 87 87

Specificity (%) 91 96 93 95 91 91 91 93 86 91

% detection

of removed

52 88 6 83 0 59 71 23 67 67

Page 19: ˘ ˇ ˆ˘ - DiVA portal302236/FULLTEXT01.pdf · Exogenous spumaretroviruses or foamy viruses have a similar host range as lentiviruses. Endogenous spumaretroviruses were recently

11

The generality of the HML models was also tested on chromosome 19 by removing one of the ten HML groups one at a time and trying to detect the removed group (jack-knifing). Each jack-knifed HML model could detect the missing group with fairly high sensitivity except for the non-HML5 and non-HML3 HMMs (see Table 2). The jack-knifed gamma models were less general than the HML models. HMMs are very slow compared to BLAST searches but this is somewhat compensated by their ability to detect more distant relatives. The Viterbi alignments of the five groups contained a few conserved motifs. The common features were the TG-start, the AATAAA motif, a T-rich area and the CA-end. The TATA-box was seen in all groups except HML which seems to have lost it. The lentiviral and gamma group had a few other con-served promoter elements in U3. The lentiviral alignment is visualised as a weblogo in Fig 1A. The R-region had some conserved G-rich and C-rich areas in the HML, len-tiviral and gamma groups. These areas base pair to form the stem in stem-loop structures in all three groups. The stem-loops have different functions in different retroviruses. The TAR-loop of HIV (Fig 1B) is important in tran-scription (Rabson and Graves, 1997) whereas the R-loop of HML facilitates polyadenylation (Baust et al., 2000). The HML, beta, lentiviral, gamma and general groups were combined into a single “SuperViterbi” alignment to investigate their common structure. Three A-rich modules stood out, the first two providing TATA-boxes and the third one containing the polyadenylation signal, the AATAAA motif.

Page 20: ˘ ˇ ˆ˘ - DiVA portal302236/FULLTEXT01.pdf · Exogenous spumaretroviruses or foamy viruses have a similar host range as lentiviruses. Endogenous spumaretroviruses were recently

12

Page 21: ˘ ˇ ˆ˘ - DiVA portal302236/FULLTEXT01.pdf · Exogenous spumaretroviruses or foamy viruses have a similar host range as lentiviruses. Endogenous spumaretroviruses were recently

13

Paper II Phylogenetic trees based on Viterbi LTR alignments were obtained for the HML, lentiviral and gammaretroviral groups. For each group many LTR trees were generated from HMM-models with varying M- and z-values. Fig 2 displays the average bootstrap support as a function of M and z for lentivi-ral LTRs. The average bootstrap support increases with M until it reaches a plateau.

0

100

200

300

400

500

600

700

800

0 100 200 300 400 500

M

Mea

n bo

otst

rap

-7-6-5-4-3-2-10

Fig 2: Mean bootstrap support of neighbour-joining trees as a function of M and z of the corresponding HMM in the case of lentiviral LTRs. One thousand bootstrap trials were performed. The highlighted points have high bootstrap support.

The phylogenetic tree corresponding to the lentiviral LTR HMM with M equal to 350 and z equal to -6 is shown in Fig 3. For the primate lentiviruses except the lemur endogenous lentivirus pSIVgml, it agrees very well with the tree in (Gifford et al., 2008). Their tree was based on the Gag and Pol gene at the amino-acid level. For the non-primate lentiviruses (including the newly discovered rabbit endogenous lentivirus RELIK (Katzourakis et al., 2007)) and pSIVgml (Gifford et al., 2008), there is less agreement; the rea-son is probably that there is less non-primate lentiviral LTR sequences avail-able for training the HMMs. For the HML group, high bootstrap LTR trees were selected and compared to the corresponding pol trees at the nucleic acid level. The agreement was generally good as can be seen in Fig 4. The main discrepancies are the branching orders of HML3 and HML6 and the placement of HML7. One

z

Page 22: ˘ ˇ ˆ˘ - DiVA portal302236/FULLTEXT01.pdf · Exogenous spumaretroviruses or foamy viruses have a similar host range as lentiviruses. Endogenous spumaretroviruses were recently

14

explanation for these differences could be recombination. The HML LTR tree is consistent with the tRNA-usage of the PBS: methionine or isoleucine tRNA for HML5, arginine or asparagine tRNA for HML3 but leucine for the other eight HMLs (Lavie et al., 2004). The homogeneity of the LTR groups was reflected in the resolution of the corresponding trees. The gammaretroviral LTR trees had lower bootstrap support than the HML and lentiviral LTR trees. Nevertheless the gammaret-roviral LTR trees were consistent with the known Pol phylogeny at the nodes with higher bootstrap supports. In summary the LTR and Pol phylogenies are in good agreement for the studied retroviruses, suggesting that the LTR and the pol genes have largely co-evolved.

Page 23: ˘ ˇ ˆ˘ - DiVA portal302236/FULLTEXT01.pdf · Exogenous spumaretroviruses or foamy viruses have a similar host range as lentiviruses. Endogenous spumaretroviruses were recently

15

Fig 3 Neighbour-joining tree of lentiviral LTRs.

• RELIK: Rabbit endogenous lentivirus type K. VISNA: Ovine maedi-visna vi-rus. BIV: Bovine immunodeficiency virus. EIAV: Equine infectious anaemia virus. FIV: Feline immunodeficiency virus. pSIVgml: Grey mouse lemur pro-simian immunodeficiency virus.

• COL: Guereza colobus. • SUN: Sun-tailed macaque. LST: Lhoest's monkey. MND: Mandrill. • GSN: Greater spot-nosed monkey. MON: Mona's monkey. MUS: Moustached

monkey. SYK: Sykes' monkey. DEN: Dent's Mona monkey. DEB: DeBrazza's monkey.

• O.BE, O.CM: HIV-1 type O. CPZ: Chimpanzee. SIV-VER: Vervet monkey. SAB: African green monkey, sabaeus subspecies. RCM: Red-capped man-gabey. DRL: Drill monkey.

Page 24: ˘ ˇ ˆ˘ - DiVA portal302236/FULLTEXT01.pdf · Exogenous spumaretroviruses or foamy viruses have a similar host range as lentiviruses. Endogenous spumaretroviruses were recently

16

Page 25: ˘ ˇ ˆ˘ - DiVA portal302236/FULLTEXT01.pdf · Exogenous spumaretroviruses or foamy viruses have a similar host range as lentiviruses. Endogenous spumaretroviruses were recently

17

Paper III The HMM-based methods of paper I and II were extended to LTRs from other retrotransposon groups. Fourteen groups belonging to Copia/Ty1, BEL, Gypsy/Ty3 and retroviruses were modelled by HMMs. The groups represented much of the diversity of LTRs. A general HMM was trained on LTRs from all groups. The Weblogo of the corresponding Viterbi alignment is displayed in Fig 5. It consists of five conserved stretches: 1) The TGTT… start. 2) A poorly conserved AT-rich region. 3) A better conserved AT-rich region probably providing TATA-boxes. 4) A well conserved polyadenylation signal, the AATAAA motif. 5) The …AACA end. Long inserts are present between the conserved stretches; they carry sequence elements which are host specific such as transcription binding sites and stem-loops in R and U5 (see paper I). The conserved LTR structure is essentially that found in paper I which only considered retroviral LTRs. In Copia/Ty1, the termini are palindromic over longer stretches. The Retrofit consensus for example, begins with TGTTAGAGATAT and ends with ATATTTTCTAACA. It is possible that they are IN-binding motifs. HMMs were built for each LTR group. They yielded group consensus se-quences which were then combined into a single HMM. A neighbour-joining tree (optimised as in Paper II) based on the HMM is shown in Fig 6 together with the neighbour-joining tree obtained from a concatenated alignment of the three pol domains RT, RNAse H and IN from (Malik and Eickbush, 1999). The two trees agree fairly well. In particular the Lenti, HML and ERV3 retroviruses cluster together with relatively high bootstrap support.

Fig 5 Weblogo of Viterbi alignment of 136 LTRs from fourteen groups of LTR retrotrans-posons. The contiguous match states are underlined and the positions of longer in-serts are shown.

Page 26: ˘ ˇ ˆ˘ - DiVA portal302236/FULLTEXT01.pdf · Exogenous spumaretroviruses or foamy viruses have a similar host range as lentiviruses. Endogenous spumaretroviruses were recently

18

Fig 6 (Left) Neighbour-joining tree based on a concatenated alignment of RT- RNAse H- and IN- sequences. (Right) Neighbour-joining tree generated by the LTR consensus HMM.

Page 27: ˘ ˇ ˆ˘ - DiVA portal302236/FULLTEXT01.pdf · Exogenous spumaretroviruses or foamy viruses have a similar host range as lentiviruses. Endogenous spumaretroviruses were recently

19

Paper IV ERVs have different names depending on context. An example is HML2 which is also known as HERV-K10, HTDV/HERV-K etc. Part of the confu-sion is due do the fact that ERVs are at the same time retroviruses, repetitive elements and genetic loci. Many HERV groups are named according to PBS-type. This is often appro-priate but it happens that closely related ERVs have different PBS-types or that distantly related HERVs use the same PBS-type. On the other hand it is convenient that the names are informative. Taxonomy should follow phylogeny and phylogeny depends largely on se-quence similarity. The pol-gene is the most useful retroviral gene for se-quence comparisons because it is the most conserved one. Pol protein align-ments are preferable because an amino acid can be similar to another, in contrast to nucleic acids. The first step in classification is clustering ERVs sharing more than a fixed percentage Pol similarity. 80% similarity is reasonable based on previous experience. A complication is that older integrations will be split into several clusters because of postintegrational mutations. To compensate for this, a consensus can be calculated for each cluster and the consensuses can be clustered. Classification can be further improved by exploiting structural features of retroviruses such as PBS-type, number of zinc fingers in Gag, translational strategy at the gag-pro or pro-pol junction (frameshifts), etc. A naming convention was suggested: A name would contain:

• Information about host species, e. g. HS for Homo sapiens. • A classification denominator with genus, e. g. alpha (A) or beta (B),

and group, e. g. HML3. A new group name should be avoided if an older one has become accepted by the scientific community.

• Other information such as chromosomal position.

Paper V This review describes the bioinformatic tools used to study ERVs and gives an overview of ERV evolution. A useful tool for detection and characterisation of ERVs is RetroTector (Sperber et al., 2007; Sperber et al., 2009). RetroTector searches for a chain

Page 28: ˘ ˇ ˆ˘ - DiVA portal302236/FULLTEXT01.pdf · Exogenous spumaretroviruses or foamy viruses have a similar host range as lentiviruses. Endogenous spumaretroviruses were recently

20

of motifs from gag, pro, pol and env between two candidate LTRs. It also tries to reconstruct the retroviral proteins. It is unique in providing a detailed analysis of the ERV structure. It is also fast: The analysis of the human ge-nome (3 billion bp) takes 1-2 days on a computer cluster. ERVs are fossils of retroviruses and are therefore of great help in clarifying the evolution of exogenous retroviruses (XRVs). The retroviral tree has three main branches, comprising ERV1 (or gammaretroviruses) and epsilonretro-viruses; spumaretroviruses (or foamy viruses) and ERV3; and alpha-, beta- (or ERV2), delta- and lentiviruses. Some retroviruses are transitional forms; as an example, some chicken ERVs are intermediate between alpha- and betaretroviruses. Different ERV genera have different host range. Gamma- and betaretroviral ERVs are common to all vertebrates whereas alpharetroviral ERVs are re-stricted to birds. The presence of structurally intact ERVs in many vertebrate genomes can be explained as recent integrations of infectious XRVs (horizontal transfers). Partially incongruent ERV and vertebrate host trees also support horizontal transfer. Examples of recent integrations of gammaretroviruses are HERVFc-related viruses in the dog genome and HERV-T-related viruses in the opossum genome. The review finally discusses various defence mechanisms against retroviral infections and HERV evolution after endogenization, including the role of HERVs in cancers.

Page 29: ˘ ˇ ˆ˘ - DiVA portal302236/FULLTEXT01.pdf · Exogenous spumaretroviruses or foamy viruses have a similar host range as lentiviruses. Endogenous spumaretroviruses were recently

21

Conclusions and future prospects

It was demonstrated in this work that HMMs could successfully model LTRs.

• They detected LTRs in genomic DNA with reasonable sensitivities and specificities although the detection process was lengthy (paper I).

• They provided reliable Viterbi alignments. This could be verified by

relating the conserved motifs of retroviral LTR to known functions of XRVs and ERVs (paper I).

• They yielded phylogenetic trees largely congruent with the corre-

sponding pol trees (paper II and III).

• They revealed the conserved structure of LTRs (paper I and III). LTRs might be useful for retroviral taxonomy, together with other taxo-nomic markers, such as length of TSD. The methodology developed in this work could also be applied to other non-coding genetic elements.

Page 30: ˘ ˇ ˆ˘ - DiVA portal302236/FULLTEXT01.pdf · Exogenous spumaretroviruses or foamy viruses have a similar host range as lentiviruses. Endogenous spumaretroviruses were recently

22

Page 31: ˘ ˇ ˆ˘ - DiVA portal302236/FULLTEXT01.pdf · Exogenous spumaretroviruses or foamy viruses have a similar host range as lentiviruses. Endogenous spumaretroviruses were recently

23

Acknowledgements

To my supervisor Jonas Blomberg for sharing his knowledge with me and for support and encouragement. To my co-supervisor Göran Sperber for his always quick help. To Patric Jern for introducing me to Bioinformatics. To my present and former colleagues: Christina Öhrmalm, Anders Malm-sten, Lijuan Hu, Nahla Mohamed, Amal Elfaitoury, Egle Aukstuoliene, Shaman Muradrasoli, Yajin Song, Magnus Jobs, Ronnie Eriksson, Ylva Molin, Markus Klint, Marie Edvinsson, Hong Yin and Sultan Golbob- Thank you for your company. To Eva Haxton for her help and support. Special thanks to Guma Abdeldaim for being a really good friend and for the good times we had together. To my family- I cannot thank you enough.

Page 32: ˘ ˇ ˆ˘ - DiVA portal302236/FULLTEXT01.pdf · Exogenous spumaretroviruses or foamy viruses have a similar host range as lentiviruses. Endogenous spumaretroviruses were recently

24

References

Andersson, M.L., Lindeskog, M., Medstrand, P., Westley, B., May, F. and Blomberg, J.: Diversity of human endogenous retrovirus class II-like sequences. J Gen Virol 80 (Pt 1) (1999) 255-60.

Baust, C., Seifarth, W., Germaier, H., Hehlmann, R. and Leib-Mosch, C.: HERV-K-T47D-Related long terminal repeats mediate polyadenyla-tion of cellular transcripts. Genomics 66 (2000) 98-103.

Blikstad, V., Benachenhou, F., Sperber, G.O. and Blomberg, J.: Evolution of human endogenous retroviral sequences: a conceptual account. Cell Mol Life Sci 65 (2008) 3348-65.

Boeke, J.D. and Stote, J.P.: Retrotransposons, Endogenous Retroviruses, and the Evolution of Retroelements, Retroviruses. Cold Spring Harbor Laboratory Press, New York, NY, USA, 1997, pp. 343-436.

Brand, M.: Structure Learning in Conditional Probability Models via an Entropic Prior and Parameter Extinction. Neural Comp. 11 (1999) 1155-1182.

Brown, P.O.: Integration, Retroviruses. Cold Spring Harbor Laboratory Press, 1997, pp. 161-203.

Coffin, J.M., Hughes, S.H. and Varmus, H.E.: Retroviruses. Cold Spring Harbor Laboratory Press, New York, 1997.

Durbin, R., Eddy, SR, Krogh, A, Mitchison, GJ: Biological sequence analy-sis: Probabilistic models of proteins and nucleic acids. Cambridge University Press, Cambridge UK, 1998.

Eickbush, T.H. and Jamburuthugoda, V.K.: The diversity of retrotransposons and the properties of their reverse transcriptases. Virus Res 134 (2008) 221-34.

Flavell, A.J., Pearce, S.R., Heslop-Harrison, P. and Kumar, A.: The evolu-tion of Ty1-copia group retrotransposons in eukaryote genomes. Ge-netica 100 (1997) 185-95.

Gifford, R.J., Katzourakis, A., Tristem, M., Pybus, O.G., Winters, M. and Shafer, R.W.: A transitional endogenous lentivirus from the genome of a basal primate and implications for lentivirus evolution. Proc Natl Acad Sci U S A 105 (2008) 20362-7.

Jern, P., Sperber, G.O. and Blomberg, J.: Use of endogenous retroviral se-quences (ERVs) and structural markers for retroviral phylogenetic inference and taxonomy. Retrovirology 2 (2005) 50.

Karplus, K., Karchin, R., Shackelford, G. and Hughey, R.: Calibrating E-values for hidden Markov models using reverse-sequence null mod-els. Bioinformatics 21 (2005) 4107-15.

Page 33: ˘ ˇ ˆ˘ - DiVA portal302236/FULLTEXT01.pdf · Exogenous spumaretroviruses or foamy viruses have a similar host range as lentiviruses. Endogenous spumaretroviruses were recently

25

Katzourakis, A., Gifford, R.J., Tristem, M., Gilbert, M.T. and Pybus, O.G.: Macroevolution of complex retroviruses. Science 325 (2009) 1512.

Katzourakis, A., Tristem, M., Pybus, O.G. and Gifford, R.J.: Discovery and analysis of the first endogenous lentivirus. Proc Natl Acad Sci U S A 104 (2007) 6261-5.

Kordis, D.: A genomic perspective on the chromodomain-containing retro-transposons: Chromoviruses. Gene 347 (2005) 161-73.

Krogh, A., Brown, M., Mian, I.S., Sjolander, K. and Haussler, D.: Hidden Markov models in computational biology. Applications to protein modeling. J Mol Biol 235 (1994) 1501-31.

Lander, E.S., Linton, L.M., Birren, B., Nusbaum, C., Zody, M.C., Baldwin, J., Devon, K., Dewar, K., Doyle, M., FitzHugh, W., Funke, R., Gage, D., Harris, K., Heaford, A., Howland, J., Kann, L., Lehoczky, J., LeVine, R., McEwan, P., McKernan, K., Meldrim, J., Mesirov, J.P., Miranda, C., Morris, W., Naylor, J., Raymond, C., Rosetti, M., Santos, R., Sheridan, A., Sougnez, C., Stange-Thomann, N., Stoja-novic, N., Subramanian, A., Wyman, D., Rogers, J., Sulston, J., Ain-scough, R., Beck, S., Bentley, D., Burton, J., Clee, C., Carter, N., Coulson, A., Deadman, R., Deloukas, P., Dunham, A., Dunham, I., Durbin, R., French, L., Grafham, D., Gregory, S., Hubbard, T., Humphray, S., Hunt, A., Jones, M., Lloyd, C., McMurray, A., Mat-thews, L., Mercer, S., Milne, S., Mullikin, J.C., Mungall, A., Plumb, R., Ross, M., Shownkeen, R., Sims, S., Waterston, R.H., Wilson, R.K., Hillier, L.W., McPherson, J.D., Marra, M.A., Mardis, E.R., Fulton, L.A., Chinwalla, A.T., Pepin, K.H., Gish, W.R., Chissoe, S.L., Wendl, M.C., Delehaunty, K.D., Miner, T.L., Delehaunty, A., Kramer, J.B., Cook, L.L., Fulton, R.S., Johnson, D.L., Minx, P.J., Clifton, S.W., Hawkins, T., Branscomb, E., Predki, P., Richardson, P., Wenning, S., Slezak, T., Doggett, N., Cheng, J.F., Olsen, A., Lu-cas, S., Elkin, C., Uberbacher, E., Frazier, M., et al.: Initial sequenc-ing and analysis of the human genome. Nature 409 (2001) 860-921.

Lavie, L., Medstrand, P., Schempp, W., Meese, E. and Mayer, J.: Human endogenous retrovirus family HERV-K(HML-5): status, evolution, and reconstruction of an ancient betaretrovirus in the human ge-nome. J Virol 78 (2004) 8788-98.

Leib-Mosch, C., Seifarth, W. and Schön, U.: Influence of Human Endoge-nous Retroviruses on Cellular Gene Expression, Retroviruses and Primate Genome Evolution. Landes Bioscience, 2005, pp. 123-143.

Llorens, C., Fares, M.A. and Moya, A.: Relationships of gag-pol diversity between Ty3/Gypsy and Retroviridae LTR retroelements and the three kings hypothesis. BMC Evol Biol 8 (2008) 276.

Llorens, C., Munoz-Pomer, A., Bernad, L., Botella, H. and Moya, A.: Net-work dynamics of eukaryotic LTR retroelements beyond phyloge-netic trees. Biol Direct 4 (2009) 41.

Mager, D.L. and Medstrand, P.: Retroviral repeat sequences, Nature ency-clopedia of the human genome. Nature publishing group, London, UK, 2003.

Page 34: ˘ ˇ ˆ˘ - DiVA portal302236/FULLTEXT01.pdf · Exogenous spumaretroviruses or foamy viruses have a similar host range as lentiviruses. Endogenous spumaretroviruses were recently

26

Malik, H.S. and Eickbush, T.H.: Modular evolution of the integrase domain in the Ty3/Gypsy class of LTR retrotransposons. J Virol 73 (1999) 5186-90.

Peterson-Burch, B.D. and Voytas, D.F.: Genes of the Pseudoviridae (Ty1/copia retrotransposons). Mol Biol Evol 19 (2002) 1832-45.

Poulter, R.T. and Goodwin, T.J.: DIRS-1 and the other tyrosine recombinase retrotransposons. Cytogenet Genome Res 110 (2005) 575-88.

Rabiner, L.: A tutorial on hidden Markov models and selected applications in speech recognition. Proc. IEEE 77 (1989) 257-286.

Rabson, A.B. and Graves, B.J.: Tat, the Trans-activator of HIV, Retrovi-ruses. Cold Spring Harbor Laboratory Press, 1997, pp. 225-230.

Sperber, G., Lövgren, A., Eriksson, N.E., Benachenhou, F. and Blomberg, J.: RetroTector online, a rational tool for analysis of retroviral elements in small and medium size vertebrate genomic sequences. BMC Bio-informatics 10 (2009).

Sperber, G.O., Airola, T., Jern, P. and Blomberg, J.: Automated recognition of retroviral sequences in genomic data--RetroTector. Nucleic Acids Res 35 (2007) 4964-76.

Terzian, C., Pelisson, A. and Bucheton, A.: Evolution and phylogeny of insect endogenous retroviruses. BMC Evol Biol 1 (2001) 3.

Xiong, Y. and Eickbush, T.H.: Similarity of reverse transcriptase-like se-quences of viruses, transposable elements, and mitochondrial in-trons. Mol Biol Evol 5 (1988) 675-90.

Xiong, Y. and Eickbush, T.H.: Origin and evolution of retroelements based upon their reverse transcriptase sequences. EMBO J 9 (1990) 3353-62.

Page 35: ˘ ˇ ˆ˘ - DiVA portal302236/FULLTEXT01.pdf · Exogenous spumaretroviruses or foamy viruses have a similar host range as lentiviruses. Endogenous spumaretroviruses were recently
Page 36: ˘ ˇ ˆ˘ - DiVA portal302236/FULLTEXT01.pdf · Exogenous spumaretroviruses or foamy viruses have a similar host range as lentiviruses. Endogenous spumaretroviruses were recently

����(������������(��� ���������������� �������������� ������������������� ������� ����������������������������

#����1����������3��������� ���3�4�������

������ �������������3����������� ���3�4���������(���� �(����������������� ������������3������2���3�������5� �3�6������3�������� ��������������������7���������8���6�������������� �2��������6�� ��������������� ������������2��������������� �������������������������� �%��������������������3�(���� ���������������3����������� ���34�������5�9�������:��������+,,��������������6�����2 ������������������ ��;%���������������������3�(���� ��������������3����������� ���3�4�������<5=

������2����1���2 �������5��5�����1�2�1��1��1���*�+,,+/

������������������� ���������� �

����