dog introgression patterns in a south european wolf...
TRANSCRIPT
Dog introgression patterns in a South
European wolf population
MSc in Bioinformatics
Master’s Thesis
Daniel Gómez-Sánchez
Barcelona, 2014
Dog introgression patterns
in a South European wolf
population
Daniel Gómez-Sánchez
Barcelona, 2014
Approval of the tutors
Signed,
Dr. Antonio Barbadilla Dr. Carles Lalueza-Fox
ACKNOWLEDGMENTS
First of all, it had been a pleasure working with the people of the Institut de Biologia
Evolutiva (CSIC-UPF). Concretely, I’m very grateful to the Paleogenomic’s group for
the opportunity to work with a very professional team: to Dr. Carles Lalueza-Fox for
accept me in his group, to Dr. Óscar Ramírez because this work could not have been
done without his help and ideas, to Iñigo Olalde and Federico Sánchez-Quinto for
shared their bioinformatic skills with me, and to Federica Pierini.
Second, I’m thankful to the faculty of the Universitat Autònoma de Barcelona’s MSc in
Bioinformatics, concretely to my academic tutor Antonio Barbadilla, for the
bioinformatic knowledge and skills taught.
Third, I would like to express my gratitude to Dr. Carles Vilà, Dr. Robert K. Wayne, Dr.
Tomas Marques-Bonet and Dr. Jeffrey M. Kidd for the unpublished data used in this
Master’s Thesis. I also wanted to point out the help of Raphael Carrasco and Conrad
Enseñat for the donation of the Sierra Morena and Wolf EEP samples; Dr. Natalia
Sastre for the microsatellite analysis; and Dr. Adam Boyko, Dr. Bridgett vonHoldt and
Dr. Malgorzata Pilot for the information about the 48K dataset.
I’m very thankful to Dr. Carles Lalueza-Fox, Dr. Óscar Ramírez, Iñigo Olalde and Dr.
Antonio Barbadilla for the review and comments on the manuscript; and Patricia
Rodríguez and Jordi Antonio Pinzón García for their help in the linguistic revision.
Last but not least, I’m very grateful to all the people who have believed, and continue to
believing in my early scientific career: to my parents Antonio and María Luisa, my
sisters Alba and Alicia, Patricia Rodríguez and the rest of my family for their patience
and affection; to my friends Alberto Segovia Sanz and Jordi Antonio Pinzón García for
their aid in the battle to conquer informatics; and to Dr. Juan Luis Santos and all the
people in the Cytogenetic’s lab at the Universidad Complutense de Madrid for my
initiation in the scientific world.
Thank you all, because without you this Master’s Thesis had never been written.
1
INDEX
Contents
1. INTRODUCTION ...................................................................................................... 3
1.1. Extinction risk factors ......................................................................................... 4
1.2. Molecular markers .............................................................................................. 7
2. OBJETIVES ................................................................................................................ 8
3. MATERIAL AND METHODS ................................................................................. 9
3.1. Sampling and sequencing .................................................................................... 9
3.2. Mapping ................................................................................................................ 9
3.3. SNP calling ............................................................................................................ 9
3.4. Diversity analysis and inbreeding .................................................................... 10
3.5. Ancestry analysis ................................................................................................ 11
3.6. Hybridization analysis ....................................................................................... 12
4. RESULTS .................................................................................................................. 13
4.1. Heterozygosity and inbreeding ......................................................................... 13
4.2. Hybridization patterns ...................................................................................... 16
5. DISCUSSION ............................................................................................................ 24
6. CONCLUSION ......................................................................................................... 29
7. REFERENCES ......................................................................................................... 30
Appendixes
1. Bioinformatics’ discussion ....................................................................................... 37
2. Results for no-Iberian samples ................................................................................ 41
3. Heterozygosity by chromosome ............................................................................... 43
4. Heterozygosity distribution for no Iberian samples .............................................. 57
5. Principal components’ boxplots and PCA with component 4 .............................. 60
6. Cross-validation error of the ADMIXTURE analysis .......................................... 64
7. Linear model details of heterozygosity-percentage block analysis ...................... 65
2
List of Tables
Table 1. Results for Iberian samples...............................................................................13
List of Figures
Figure 1. Iberian wolf distribution ................................................................................... 4
Figure 2. Diversity distribution ..................................................................................... 13
Figure 3. Diversity analysis ........................................................................................... 14
Figure 4. Runs of homozygosity .................................................................................... 15
Figure 5. Principal component analysis ......................................................................... 17
Figure 6. ADMIXTURE analysis of the present-work's dataset ................................... 18
Figure 7. ADMIXTURE analysis of the 48K-merged dataset ...................................... 19
Figure 8. STRUCTURE analysis of introgressed wolves ............................................. 20
Figure 9. Ancestry across the chromosome ................................................................... 21
Figure 10. Analysis of haplotype blocks ....................................................................... 22
Figure 11. Iberian shared alleles .................................................................................... 23
3
1. INTRODUCTION
Grey wolves (Canis lupus) historically have been distributed across Europe, Asia and
North America, but due to human hunt, deforestation and wild prey loss its population
was reduced during the past centuries (Boitani 2003). The species was fragmented and
confined in the southern European peninsulas (Iberia, Italy and the Balkans), Canada
and Northern USA (Mech 1970; Breitenmoser 1998). In the 1960s legal protection led
to a population expansion in USA and Western Europe (Mech 1995); however, in
Eastern Europe and Northern Asia there was no protection, neither extinction risk
(Bibikov 1994). Differences in protection and threat between these populations make
this species a good model for conservation genetic and genomics.
In Europe, wolves have a discontinuous range where it can be distinguished three main
populations that spatially correspond to different glacial refugia and demographic
histories (Pilot et al. 2014): large and interconnected populations with constant hunting
pressures in Eastern Europe and two relatively smaller, isolated and bottlenecked
populations in Western Europe. Eastern Europe wolves are also connected with the
Asian populations; nevertheless, hunting causes multiple local demographic fluctuations
(for example, Ozolins & Andersone 2001; Sidorovich et al. 2003; Gomerčić et al.
2010). Currently, wolves in Western Europe are expanding from the partially protected
populations in Italy (including the Apennine Peninsula and the Western Italian Alps)
and the Iberian Peninsula to other regions as Catalonia or France (Sastre 2011).
The Iberian Peninsula contains the largest wolf population in Western Europe (Boitani
2003; Silva et al. 2013), isolated at least since the extinction at the end of 19th
century of
the France and Central Europe wolves, suffering a reduction due to human eradication
campaigns (Valverde 1971).With new conservation policies, the population underwent a
posterior expansion in range and size (Figure 1; Sastre 2011). Although controversially,
the wolf population in Iberian Peninsula is estimated to hold 2,200-2,500 individuals
concentrated in the Northwestern region (Silva et al. 2013). However, in the South of
the Duero river, very fragmented and isolated populations with high extinction risk are
present (Silva et al. 2013). In 1994, the European Breeding of Endangered Species
Programme (EEP) started a breeding program for the Iberian wolf derived from 15
4
founders according to the studbook. The relatively high number of independent
founders and the subsequent genetic management leads to the conservation of the
original variability in the EEP population (Ramírez et al. 2006).
1.1. Extinction risk factors
Small and isolated populations have an increased risk of extinction due to genetic drift
and inbreeding because the probability of mating between relatives increases (Frankham
2005; Wright et al. 2007). Close-relative matting increases the amount of homozygous
alleles in the offspring which may reduce its fitness by inbreeding depression (Wright
1977; Falconer & Mackay 1996). Furthermore, small isolated populations are also
known to have a higher risk of hybridizing (Godinho et al. 2011; Randi et al. 2014).
Figure 1. Iberian wolf distribution. Range in the Iberian Peninsula between 19
th and 20
th
centuries. Taken from Sastre 2011.
Figure 1. Iberian wolf distribution
5
Thereby, wolf populations are sensitive to both processes due to its biogeographical
history previously described.
Population bottlenecks
Bottlenecks are demographic processes that consist in a severe reduction of
effective size. Consequences can be the loss of genetic diversity and larger
amount of consanguinity, deleterious mutations and genetic drift (Bouzat 2010),
that leads to a reduction of adaptability (Frankham et al. 1999) and increase of
extinction risk by genetic and demographic processes (Keller 2002). Founding
effects, over-exploitation by humans, diseases, starvation and other natural and
biological catastrophes cause population bottlenecks, and the genetic effects
depends on their strength, duration and isolation level (Carmichael et al. 2001;
Busch et al. 2007). In the Italian and Iberian Peninsula, isolated populations
suffered a severe bottleneck, and many studies described the previous explained
effects in the genetic landscape (Sastre et al. 2010; Sastre 2011; Pilot et al.
2014).
Population fragmentation
Fragmentation of populations due to habitat loss and modification is an
increasingly important threat in the conservation of endangered species because
the diversity in a population can only increase through mutation or exchange of
genes with neighbouring populations (Vilà et al. 2003a). Isolated and
fragmented populations have an increased extinction risk due to the lack of
migration and the lower population effective size (Frankham 2005). Under these
conditions, migration events may have important effects for the rescue of small
and inbred populations (Tallmon et al. 2004). Detecting population
fragmentation is therefore crucial for conservation management. European wolf
population have been more fragmented than Americans; thus, old world grey
wolves are much differentiated between them due to the lack of genetic flux
between populations and the genetic drift (Pilot et al. 2010).
6
Inbreeding depression
Founder effect and isolation might reduce allelic and genotypic diversity within
populations, increasing inbreeding and the probability of extinction. Inbreeding
leads to an increased frequency of deleterious alleles in the population which in
turn reduce the individual fitness. This phenomenon is known as inbreeding
depression and might decrease the short-term viability of a population due to the
loss of adaptive potential (Ouborg 2010; Ouborg et al. 2010). Wolves are prone
to inbreeding depression (Liberg et al. 2005; Räikkönen et al. 2006, 2009), but
large or fast-growing population seems to avoid it thanks to selection for
heterozygotes (Randi 2011). However, the decline of genetic variability is
correlated to the effective population size that is very small in wolves even in
the largest populations (Randi 2011), due probably to increased levels of
inbreeding together with decreased dispersal and immigration (Aspi et al. 2006).
Confirmed negative effects of inbreeding depression in wild wolves include the
decreasing over winter survival of pups (Liberg et al. 2005) and congenital bone
deformities in isolated wolf populations (Räikkönen et al. 2006, 2009).
Hybridization
Hybridization between wild species and their domestic counterparts represents a
threat to natural populations; although at the same time can introduce genetic
variation into isolated populations. The consequences could be the disruption of
local adaptation, increase of genetic homogenization between populations and
the extinction through introgressive hybridization (Rhymer & Simberloff 1996).
Grey wolves and domestic dogs possess identical karyotypes and can generate
fertile hybrids despite physiological and morphological differences (Wayne et
al. 1989; Vilà & Wayne 1999). Until now, population genetic studies did not
reveal large scale introgression of dog genes in wolves with the markers used.
Nevertheless, several works have started to detect hybrids in natural populations
(Randi et al. 2000, 2014; Randi & Lucchini 2002; Andersone et al. 2002;
Verardi et al. 2006; Sundqvist 2008; Godinho et al. 2011; Hindrikson et al.
2012). The use of only a few molecular markers cannot detect past generation
backcrosses (Randi 2008) and cryptic introgression is likely to go undetected
7
(Currat et al. 2008). Moreover, introgressed variants may be undistinguishable
from intraspecific variation (Caniglia et al. 2013). If introgression is sufficiently
frequent, small and fragmented wolf populations can lose specific adaptations
and subsequently become extinct. Also wolf re-expansion waves are in risk to be
polluted by hybridization due to the amount of free-ranging or feral dogs (Randi
2011).
1.2. Molecular markers
Studies on evolutionary processes in natural populations have been extensively analysed
with classical population genetics (Ouborg et al. 2010). In grey wolves, many studies
have been conducted in this way using microsatellite loci (Verardi et al. 2006; vonHoldt
et al. 2010; Randi et al. 2014), MHC genes (Galaverni et al. 2013; Niskanen et al.
2014), mtDNA (Thalmann et al. 2013) and combinations of them (Ramírez et al. 2006;
Sastre et al. 2010; Godinho et al. 2011). Advances in Next Generation Sequencing
(NGS) technologies allow examining thousands of genetic markers, including indel-
polymorphisms and single nucleotide polymorphisms (SNPs). Access to a large number
of loci permits researchers to overcome analytical limitations associated with the
analysis of a small number of genetic markers (Allendorf et al. 2010), even in the
hybridization analysis (Twyford & Ennos 2012).
Many works have analysed the population processes of wolf and other canids using
NGS technologies (Boyko et al. 2010; vonHoldt et al. 2010, 2011; Pilot et al. 2014),
based on genotyping microarrays obtained from the complete sequencing of the dog
genome (Lindblad-Toh et al. 2005). Although this kind of data enhances the
understanding of wolf populations (vonHoldt et al. 2011), microarrays from close
relative species could introduce a bias due to the inclusion of the species variation
alone. Until now, just few works (Lindblad-Toh et al. 2005; Wang et al. 2013; Axelsson
et al. 2013; Freedman et al. 2014) include a whole-genome sequencing of wolves;
however, all of them are focused in the study of domestication.
8
2. OBJETIVES
The objective of this Master’s Thesis is to analyse for first time a wolf population using
genome wide sequencing, including three Northwestern Iberian samples, one of them
from the EEP, and the first South of Duero individual in the literature. The current
diversity of the whole Iberian Peninsula has been covered using these samples. Only
one previous study (Godinho et al. 2011) have analysed the dynamics of wolf-dog
hybridization in this Northwestern population, where wolves use agricultural habitats
close to human settlements (Cuesta et al. 1991; Llaneza et al. 1996; Vos 2000; Blanco
& Cortés 2007) which is likely to favour the contact with feral and free-ranging dogs
and possibly resulting in extensive hybridization (Petrucci-Fonseca 1982; Blanco et al.
1992).
The specific objectives in the present work are two:
To analyse the wolf inbreeding degree in two small wolf populations, one of
them on the edge of extinction.
To analyse dog hybridization level and introgression patterns in wolves.
9
3. MATERIAL AND METHODS
Bioinformatical methodology is discussed in Appendix 1.
3.1. Sampling and sequencing
We have generated the whole-genome sequence of 4 Iberian wolves: one captive (Wolf
EEP), two from the Northwestern population (Wolf Spain and Wolf Portugal) and one
from Southern Spain (Sierra Morena). The Illumina libraries were constructed following
manufacturer's instructions and sequenced in the CNAG (Centre Nacional d’Anàlisi
Genòmica) and the BGI (Beijing Genomics Institute). In addition, we included the
genomes of 11 dogs (different breeds), 6 American and 6 Eurasian wolves (unpublished
data; Freedman et al. 2014) for comparison purposes. All wild samples derive from
animals killed or found dead for reasons other than this research and deposited in
scientific collections. Captive wolf sample, whose origin is the Iberian Northwestern
population, comes from the Parc Zoologic of Barcelona.
3.2. Mapping
All the sequences were mapped to the dog reference genome (canFam3.1) using BWA
version 0.6.1 (Li & Durbin 2009) with the quality trimming parameter set to a Sanger
quality score of 15 and default parameters. Next, I used Picard tools version 1.70
(http://picard.sourceforge.net/) to remove PCR duplicates and GATK version 2.5
(McKenna et al. 2010) to perform indel realignment. The resulted files were used for
the SNP calling.
Then, the DepthOfCoverage tool implemented in GATK was used for the autosomic
data in order to average depth of coverage of this final set.
3.3. SNP calling
I produced a preliminary set of autosomic variants for wolf and dogs (19,640,837 SNPs)
using the GATK UnifiedGenotyper and VariantFiltration with the recommended
10
filtering parameters for the case in which Variant Quality Score Recalibration (VQSR)
is not available (Auwera et al. 2013; more details in Appendix 1).
To avoid low complexity regions and gaps, the mappable region was obtained using the
GEM mappability program (Derrien et al. 2012) version 1.315, and custom Perl scripts.
Keeping the variants that fell into these regions I obtained a final dataset for all the
samples which contains 18,956,547 confident SNPs.
3.4. Diversity analysis and inbreeding
To explore the genome-wide distribution of genetic variability in the Iberian samples, I
looked at the distribution of heterozygosity across the genome in 1 Mb overlapping
window with 200kb sliding-step with an in-house-made Perl and R scripts. For each
window, the number of heterozygous positions in these regions was computed and
divided by the number of all callable positions. Only windows with a 100kb minimum
callable region were considered. The approach of the present work was been used in
other studies (for example, Prado-Martinez et al. 2013) instead the expected
heterozygosity diversity (π, Tajima 1983), because the number of samples for each
population is small.
To avoid coverage divergences between samples, I removed variants in non-callable
sample-specific regions obtained with GATK CallableLoci tool with a minimum base
quality of 20 and a maximum-minimum depth based in its coverage distribution, taken
the mean±5 autosomal read depth.
Runs of homocigosity (ROH) are regions with a lower heterozygosity rate. For each
sample, ROH were computed with a non-overlapping window-size of 1Mb. Depending
on the length, ROHs may be indicative of historical population demographics and
homozygosity by descent (Li et al. 2006; Hamzić 2011). Long ROHs (> 1Mb) are
indicative of autozygosity, inbreeding or admixture (Boyko et al. 2010; Pilot et al.
2014). Due to this association to the recent-past demography, I conservatively
considered ROHs when at least two consecutive windows (≥ 2Mb) fell under a
heterozygosity cutoff of 0.0005 (based in the heterozygosity distribution, Figure 2,
Appendix 2)
11
To calculate the inbreeding coefficient based in runs of homocygosity, it was applied
the following definition of FROH (Keller et al. 2011):
∑
Where ROHk and Lj are the kth ROH and the individual j’s genome length. The genome
length for each sample was computed using the callable bases in the considered
windows.
3.5. Ancestry analysis
For the ancestry analysis non-biallelic and missing markers were removed with a
custom Python script, filtering by MAF<0.01 and LD-pruned using PLINK version 1.07
(Purcell et al. 2007), with sliding-window size of 50 SNPs (10 overlap) and r2=0.5.
With this pruned dataset (4,558,774 SNPs), I performed an ADMIXTURE (Alexander
et al. 2009) analysis, which uses the same statistical model as STRUCTURE (Pritchard
et al. 2000). To assess the error, the program was run 5 times with K between 2 and 10,
and a 5-fold cross-validation (Alexander & Lange 2011). To visualize the relationships
between this genotype data a Principal Component Analysis (PCA) was performed
using the smartPCA program implemented in EIGENSOFT package version 5.0.1
(Price et al. 2006).
To check and improve the ancestry results for the Iberian wolves, I combined the
present-work’s data with a 48K dataset from previous works (Boyko et al. 2010;
vonHoldt et al. 2010, 2011) for increase the number of samples. This data comes from
the Affymetrix Canine version 2 genome-wide SNP mapping array, which uses
CanFam2 assembly coordinates. For this reason, each sample was mapped and SNPs
was called again to this assembly as previously described. After joining both datasets,
filtering by MAF and LD-pruned with PLIK using the same parameters as previously
described, I obtained a set of 43,497 SNPs. This dataset was used to repeat the
ADMIXTURE (only 3 runs) and PCA analysis as described above.
12
3.6. Hybridization analysis
To confirm the high level of admixture between Sierra Morena wolf and dogs, shared
alleles between each Iberian wolf to the other samples were estimated. The percentage
of shared alleles by sample pairs was computed dividing by the total number of alleles
present in both samples. For this analysis, I included all confident SNPs called (no-
pruned dataset, 15,807,997 SNPs, without non-biallelic and missing markers). For the
Iberian wolf population I also computed the shared alleles between all samples drawing
a four set Venn diagram with VennDiagram R package version 1.6.5 (Chen & Boutros
2011).
To determine dog introgressed regions in highly admixed Iberian wolf samples (Sierra
Morena and Wolf Spain), it was used PCAdmix version 1.0 (Brisbin et al. 2012) with a
50 SNP window size. Because this program needs phased genotypes, the complete
pruned dataset were phased using SHAPEIT version 2.644 (Delaneau et al. 2013). To
detect blocks of ancestry (haplotypes assigned to ancestral populations), PCAdmix was
run with the 11 dogs as one ancestral population and 6 Eurasian wolves plus Wolf
Portugal and Wolf EEP as the second. To assess the result for both samples, the same
analysis for the Wolf EEP and Wolf Portugal was subsequently performed excluding
only the one used in the admixed population. From 50 SNP block assignment of the
four samples, overall percentage of haplotypes Dog/Dog, Wolf/Dog and Wolf/Wolf was
computed. To analyse the relationship of each kind of block, I fitted to a linear model
the incidence of each class with the mean heterozygosity in each chromosome using lm
function implemented in R (version 3.1.0).
For comparison purposes, a microsatellite analysis with Sierra Morena and Wolf Spain
was made, including the genotyping of 10 autosomic markers following the protocol of
Sastre et al. (2010). Using a dataset that includes 31 Iberian wolves and 32 dog samples
(Sastre 2011), a Bayesian model-based clustering approach implemented in
STRUCTURE version 2.0 (Falush et al. 2007) was performed, running 100,000 Markov
chain Monte Carlo repetitions and a burn-in period of 10,000 iterations for K=2.
13
Figure 2. Diversity distribution. Density (a) and box plots (b) from heterozygosity in Iberian
samples using 1Mb 200kb-overlapping windows. Dotted lines point out the cutoff used as
inbreed windows.
Figure 2. Diversity distribution
4. RESULTS
4.1. Heterozygosity and inbreeding
Considering only Iberian wolves, Sierra Morena has the lowest mean heterozygosity
rate (0.00109 het/bp, Table 1), with 41.85% of sliding windows falling into inbreed
regions (Figures 2 and 3, Appendix 3); Wolf Portugal seems to share the same pattern
(0.00118 het/bp). The mean heterozygosity rates observed in the genome sequences of
the Eurasian wolves is 0.0016 het/bp (except Wolf Italy, where is lowest; Appendixes 2,
3 and 4), consistent to other genome-wide studies (Lindblad-Toh et al. 2005; Freedman
et al. 2014). Dog samples have a reduced heterozygosity (0.00088 het/bp; Appendixes
2, 3 and 4), but vary across different breeds as previously described (Lindblad-Toh et al.
2005; Freedman et al. 2014).
Table 1. Results for Iberian samples.
Sample Population Cov Het FROH %Dog blocks
Sierra Morena South Spain 43.94 0.00109271 0.42 31.88
Wolf Spain Northwestern 22.68 0.00154275 0.15 14.30
Wolf Portugal Northwestern 24.30 0.00118270 0.30 2.94
Wolf EEP Northwestern 22.74 0.00146647 0.15 3.20
Cov: atosomic coverage; Het: heterozygosity (het/bp); FROH: inbreeding coefficient;
%Dog: percentage of dog ancestry blocks
a) b)
14
Fig
ure
3.
Div
ersi
ty a
naly
sis.
Het
ero
zygo
sity
in I
ber
ian s
ample
s usi
ng 1
Mb 2
00kb
-over
lappin
g w
ind
ow
s (b
lue
lin
es).
Dott
ed l
ines
po
int
out
the
med
ian o
f
each
sam
ple
an
d r
ed b
lock
s ar
e ru
ns
of
ho
mozy
gosi
ty (
RO
Hs)
usi
ng 1
Mb n
on
-over
lappin
g w
indow
s. D
etai
ls f
or
each
sam
ple
in
Ap
pen
dix
3.
Figu
re 3
. Div
ers
ity
anal
ysis
15
ROHs appear in all Iberian wolves (Figure 4), but Sierra Morena had chromosomes
almost entirely homozygous (Figure 3, more details in Appendix 3). This sample shows
the largest ROHs at 40-60 Mbp, and the cumulative curve is the highest as compared to
the other Iberian samples. Although Wolf Portugal also has runs longer than 40 Mbp
(Figure 4b), the distribution is quite similar to other Iberian wolves. Wolf Spain and
Wolf EEP show a similar cumulative curve at the ROH length (Figure 4a), and almost
all runs of homozygosity are shorter than 30 Mbp.
Figure 4. Runs of homozygosity.
Cumulative (a) and total (b) counts for runs
of homozygosity (ROHs) in Iberian
samples computed using 1Mb non-
overlapping windows. Note that inbreed
regions less than 2Mb are in the plot but
not considered as ROH.
Figure 4. Runs of homozygosity
a)
b)
16
Inbreeding coefficient analysis, calculated with the FROH, leads to the same result: Sierra
Morena is the most inbreed Iberian wolf (FROH = 0.42), followed by Wolf Portugal
(FROH = 0.30). Wolf Spain and Wolf EEP have an inbreeding coefficient (FROH = 0.15)
which is half the most inbreed Northwester Iberian sample (Table 1). The FROH of
Eurasian and American wolves (Appendix 2), except Wolf Italy (FROH = 0.51), Wolf
China (FROH = 0.23) and both Wolf Mexico (A and B samples, FROH = 0.70), is much
lower than that of Wolf Spain and Wolf EEP, with inbreeding coefficients between 0.01
and 0.09. On the other hand, dogs have a FROH in the range between 0.20-0.44
(depending the breed), higher than Northwestern Iberian wolves and around the Sierra
Morena’s value.
4.2. Hybridization patterns
48K-merged and the present-work’s dataset bring a similar result in the PCA analysis.
Wolf Portugal and Wolf EEP clusters with Iberian wolves and near other Eurasian
populations (Figure 5). Nevertheless, Wolf Spain and Sierra Morena are shifted from
this cluster towards dogs in the PC1, which differentiates well American wolves,
Eurasian wolves and dogs (Appendix 5). In the present-work’s dataset, PC4
distinguishes better Eurasian populations, but shows the same pattern in the Iberian
wolves (Appendix 5).
The ADMIXTURE analysis results in the same hybridization pattern with dogs (Figure
6). Because the 48K dataset (Boyko et al. 2010; vonHoldt et al. 2010, 2011) comes
from a microarray that maximizes the dog variability and has more samples, I detect
dog ancestry in Wolf Spain better in the 48K-merged (Figure 7) than in the present-
work’s dataset (Figure 6), although in the K=2 appears this component. Cross-validation
error for both datasets (Appendix 6) shows these differences, obtaining as correct
clusters K=9 and K=2, respectively.
From the ADMIXTURE analysis at K=2, our samples have the following percentage for
the dog component (this study and 48K-merged datasets, respectively): Sierra Morena
31.51% and 36.94%, Wolf Spain 10.43% and 17.69%, Wolf Portugal 0.00% and 4.47%,
Wolf EEP 0.00% and 3.28%. Importantly, the 48K dataset always detects more
percentage of introgression in any sample. It is likely that this bias is caused by the
17
Figure 5. Principal
component analysis.
Principal component
analysis (PCA) of
Sierra Morena (red),
Wolf Portugal (blue),
Wolf Spain (green)
and Wolf EEP
(orange) with the
48K-merged dataset
samples (a) and
samples from this
work (b, c). In c),
SNP from dog blocks
in Sierra Morena and
Wolf Spain (Figure 9)
are removed (note that
in this case Iberian
samples cluster
closer).
Figure 5. Principal component analysis
a)
b)
c)
18
microarray design, which takes into account only the variability from the dog genome.
Alternatively, for the most inbreed samples (Sierra Morena and Wolf Spain), the
STRUCTURE analysis using microsatellites leads to values of 42.4% and 0.6% of dog
component (Figure 8), respectively. These results are very different from those obtained
with genomic data, suggesting wrong estimations due to the low number of markers.
Due to this displacement towards dogs, I analyse haplotype blocks of dog and Eurasian
ancestry in the admixed samples (Figure 9). The result shows that almost a third of
Sierra Morena’s genome (31.88 % of 50 SNP) comes from dogs, doubling the Wolf
Spain’s dog ancestry (14.30%; Table 1, Figure 9). Moreover, the ancestry pattern
between both samples is different: Sierra Morena has long dog haplotypes present at the
same region in both chromosomes, whereas in Wolf Spain they are shorter and both
chromosomes shows a different distribution (Figure 10a). Wolf Portugal and Wolf EEP
have only 3% dog ancestry (Table 1), result that validate the method. These values are
close to the percentage of ADMIXTURE dog component and indicate an accurate
Figure 6. ADMIXTURE analysis of the present-work’s dataset. Cross-validation error
(Appendix 6) shows that the better cluster is K=2, which differentiates dogs and wolves.
Nevertheless, this analysis also differentiates North American (K=3), South American (K=4),
Asian and European (K=5, K=7) and Iberian (K=8) wolves. Moreover, relationships between
dogs breeds are reflected in K=6-9.
Figure 6. ADMIXTURE analysis of the present-work's dataset
19
Fig
ure
7
. A
DM
IXT
UR
E
an
aly
sis
of
the
48K
-mer
ged
d
ata
set.
C
ross
-val
idat
ion
erro
r (A
pp
endix
6)
sho
ws
that
th
e bet
ter
clu
ster
is
K
=9
, w
hic
h
dif
fere
nti
ates
d
ogs,
an
d dif
fere
nt
wolf
popula
tions:
Asi
an (A
SW
), C
entr
al E
uro
pea
n (C
EW
), It
alia
n (
ITW
), I
ber
ian
(IB
W)
and 3 d
iffe
rent
Am
eric
an
(AM
W)
popula
tions
(lef
t pan
el).
On t
he
right,
zoo
m o
f Ib
eria
n s
ample
s an
alyse
d i
n t
his
work
.
Figu
re 7
. AD
MIX
TUR
E an
alys
is o
f th
e 4
8K
-me
rge
d d
atas
et
20
estimation of the hybridization patterns with our genomic data in contrast with
microsatellites.
The linear models between heterozygosity and haplotypes (Dog/Dog, Wolf/Dog and
Wolf/Wolf) indicate that in the hybrid samples, chromosomes with more percentage of
both ancestry blocks tend to be more genetically variable (Figure 10b, Appendix 7).
This result suggests that the hybrid regions in Wolf Portugal increase the heterozygosity
by introgression.
Removing the SNP’s windows that present a dog haplotype in at least one of the
admixed samples, the PCA analysis shows the same clusters as in the previous one
(Figure 5c). In this case, Wolf Spain gathers the Iberian cluster (Wolf Portugal and Wolf
EEP), and Sierra Morena is close to them. However, Sierra Morena remains displaced
towards dogs, including in the PC4 (Appendix 5) which explains better the Eurasian
variation.
Furthermore, using all confident markers, Sierra Morena shares around 70% alleles with
dogs, which represents almost 1% more than Wolf Spain and 3% more than no-admixed
(Wolf Portugal and Wolf EEP) samples (Figure 11a). On the other hand, from all the
alleles present in the Iberian population (22,959,835 out of 31,616,032 in the dataset),
Sierra Morena have 5% of singletons, comparing with the 3% from the rest (Figure
11b). Moreover, exclusive alleles shared between Sierra Morena and Wolf Spain are a
Figure 8. STRUCTURE analysis of introgressed wolves. Probabilistic assignment to the
genetic clusters inferred by Bayesian analysis with K=2 of dog, Iberian wolves (IBW) and the
hybrid samples Wolf Spain (WS) and Sierra Morena (SM).
Figure 8. STRUCTURE analysis of introgressed wolves
21
little higher (around 0.5% more) than between the other Iberian wolves and Sierra
Morena. Both results are consistent with the high hybridization level of Sierra Morena
and a few introgression of dog’s genome in Wolf Spain.
Figure 9. Ancestry across the chromosome. Ancestry blocks from dogs (red) and Eurasian
wolves (blue) in Sierra Morena (a), Wolf Spain (b), Wolf Portugal (c) and Wolf EEP (d). In
the legend, N represents the number of individuals used as ancestral population.
Figure 9. Ancestry across the chromosome
a) b)
c) d)
22
Figure 10. Analysis of
haplotype blocks. Haplotype
class block frequency (a) and
heterozygosity-percentage
linear model (b, details in
Appendix 7) by sample and
class. Each point represents a
chromosome.
Figure 10. Analysis of haplotype blocks
a)
b)
23
Figure 11. Iberian shared
alleles. Shared alleles between
samples in the present-work’s
dataset by pairs (a) and between
the four Iberian samples (b).
Percentage is calculated as the
number of shared alleles divided
by the total number of alleles in
the considered samples.
Figure 11. Iberian shared alleles
a)
b)
24
5. DISCUSSION
Wolf population of Northwestern Iberia has been extensively studied in many aspects
(Ramírez et al. 2006; Sastre et al. 2010; Godinho et al. 2011; Sastre 2011; vonHoldt et
al. 2011; Pilot et al. 2014), but in this work it is included for the first time the variability
present in the South of the Iberian Peninsula. Although only one sample from this
population was analysed, it could be the last individual due to the high extinction risk of
an isolated one-pack group (Padial et al. 2000; Silva et al. 2013) composed by a single
breeding pair, their offspring of the year and occasional older offspring (Randi 2011).
Because this small size and the controversy about the existence of the South Iberian
wolf, I used “population” to refer to Sierra Morena individual data. By comparing the
genetic patterns between both populations it can be understood the dynamics of Iberian
wolf and its current conservation status. This study analyses the first NGS data from
wolves in the context of conservation and population genetics; thus we can investigate
in depth heterozygosity, inbreeding and hybridization patterns individually.
A major concern in wolf conservation genetics is the extensive hybridization between
wolf and wild or domestic canids (Rhymer & Simberloff 1996; Randi 2011).
Hybridization is a documented threat of canids, including the Ethiopian wolf with dogs
(Gottelli et al. 1994), and the red wolf (Adams et al. 2003), the Great Lakes wolf
(Leonard & Wayne 2008) and other North American wolves with coyotes (Roy et al.
1994). In Europe, hybridization between declining or expanding wolf populations and
their domestic counterparts is an important threat (Randi 2011) and many hybrids were
reported with a few number of genetic markers (Randi et al. 2000, 2014; Randi &
Lucchini 2002; Andersone et al. 2002; Verardi et al. 2006; Sundqvist 2008; Godinho et
al. 2011; Hindrikson et al. 2012). Only one previous study (Godinho et al. 2011)
provides information about the hybridization between wolves and dogs in the Iberian
Peninsula, obtaining a 4% of hybridization occurrence (8 individuals) in the
Northwestern population using 42 autosomal markers. In Godinho et al. (2011) some of
the introgressed samples were selected because presented dog phenotypic traits. Most of
the hybrid individuals show a 50% dog component and only two samples have a
component lower than 20% (concretely, 15.2% and 18.1%). Both individuals contain a
dog-like Y-chromosome which indicates the direction of the cross, suggesting that the
25
introgression is recent and thus more detectable. In the present work, one out of two
wild samples from the Northwestern population shows a minor level of hybridization
(around 15%); otherwise, Sierra Morena has the third part of its genome introgressed by
dog (Table 1, Figure 9). Moreover, the mtDNA for both individuals (data not shown)
present a w1 wolf haplotype (from Vilà et al. 1999), that supports the major wolf-dog
hybridization direction detected in previous works (Vilà et al. 2003b; Godinho et al.
2011). Nevertheless, here I cannot detect the 15% hybridization level obtained with
whole-genome data of Wolf Spain using 10 microsatellite markers (Figure 8). Inversely,
Sierra Morena’s dog component is almost 50%, far for the proportion around 30%
obtained with ADMIXTURE and PCAdmix (Figures 6, 7 and 9). In addition, Wolf
Spain and Sierra Morena had no hybrid phenotypic characteristic, contrary to the
observations of Godinho et al. (2011). The results of this work suggest that the use of
microsatellite data might underestimate the hybridization incidence in populations with
at least 15% introgression; on the other hand, it might overestimate this parameter in the
most introgressed samples. Further whole-genome information from the Iberian
Peninsula will help to understand the proportion of hybridization in the Northwestern
population and accurately estimate the hybridization occurrence.
Because of the coexistence with feral and domestic dogs, hybridization is an important
effect in small wolf populations like those from the Italian and the Iberian Peninsulas
(Verardi et al. 2006; Randi 2008; Godinho et al. 2011; Randi et al. 2014) and also in
expanding populations in other European regions (Andersone et al. 2002; Sundqvist
2008; Hindrikson et al. 2012). Feral organisms might have an impact in the structure of
local communities, leading to loss of genetic diversity (Allendorf et al. 2001).
Moreover, introgression of dog genes can decrease the adaptive potential of the hybrid
and leads to extinction (Rhymer & Simberloff 1996). Introgressive hybridization would
enhance genetic homogenization, leading to disintegrate the local genetic adaptation.
Habitat modification is being increased by anthropogenic action, and this leads to
fragmentation and isolation of many populations. Individuals in these small and isolated
populations in contact with the domestic counterparts are more likely to hybridize
because of the difficulty of finding mates of the same species. This is very important in
the South Spain population, where the effective size is very small (Silva et al. 2013) and
the habitat is close to human settlements (Cuesta et al. 1991; Llaneza et al. 1996; Vos
2000; Blanco & Cortés 2007). When introgression occurs, a relatively greater fraction
26
of small population would hybridize each generation and increase even more the
introgression rate (Rhymer & Simberloff 1996).
On the other hand, the importance of the genetic diversity in the scope of conservation
genetics is due to the effects in inbreeding depression and disruption of local adaptation
(Allendorf et al. 2010). Heterozygosity varies a lot across Iberian samples (Table 1,
Figure 2 and 3): in Wolf Spain it is similar to other European wolf populations
(Appendixes 2, 3 and 4), whereas Wolf Portugal has a lower rate; Wolf EEP, the captive
sample, is also as variable as Wolf Spain and the diversity of Sierra Morena is the
lowest, but very close to Wolf Portugal. Nevertheless, the four Iberian samples have a
higher inbreeding coefficient than the Eurasian populations (except Wolf China and
Wolf Italy; Appendix 2). Eurasian wolves have inbreeding coefficients between 0.01
and 0.09, except Wolf China (0.23) and Wolf Italy (0.51). Wolf China sample was
previously described (Freedman et al. 2014) showing the same diversity and ROHs;
Italian wolf is known to pass a severe bottleneck with genetic effects (Lucchini et al.
2004; Fabbri et al. 2007; Randi 2008) that leads to its inbreeding pattern. The genetic
evidence that a bottleneck occurred in the Iberian population was demonstrated (Sastre
et al. 2010; Sastre 2011), and explains the results obtained for the inbreeding coefficient
of Iberian samples in the present work.
Although the EEP studbook indicates that Wolf EEP must have an inbreeding
coefficient near 0 because only few generation of crosses have undergone, Wolf EEP
have the same inbreeding coefficient as Wolf Spain (FROH = 0.15). This value can be
explained by the past bottleneck which reduced the diversity of Iberian individuals
(Sastre et al. 2010; Sastre 2011). An inbreeding coefficient of 0.125, close to the value
of Wolf Spain, is produced by mattings between grandparent/grandchild, half-siblings,
or uncle/niece (assuming no previous inbreed parents). Wolf Portugal, due to its lower
heterozygosity rate, has an increased inbreeding coefficient (0.30) that is likely to
involve mattings between close-relative wolves with the same inbreeding coefficient as
Wolf Spain and Wolf EEP. In a population with a small effective size, the inbreeding
can’t be avoided (Randi 2011), as demonstrates the FROH very high (0.42) of Sierra
Morena. This value is near a very close-relative (parent/offspring, full siblings and
double first cousins in first degree) continuous matting structure, so the number of
individual must be very small. The autozygosity of Sierra Morena leads to a loss of
27
genetic variation and inbreeding depression, which can make the population disappear,
threat that is likely to occur in the very inbred Iberian population.
In wolves, inbreeding has an effect in the health of the population (Liberg et al. 2005;
Räikkönen et al. 2013). Loss of genetic diversity (inbreeding depression) reduces
reproduction and survival, increasing extinction risk (Frankham 2005). Inbreeding
depression affects the population ability for adapting to the environmental change.
Although inbreeding depression can be avoided by removing the deleterious alleles by
selection (purging), this effect in small population is low and deleterious alleles of small
effect can drift to fixation (Frankham 2005). Consequently, this alleles increase in
frequency and reduce reproductive fitness (Wright et al. 2007). South Spain population
has a very small effective size (only one pack, Silva et al. 2013) which fixed deleterious
alleles are likely to have a high frequency due to inbreeding. Northwestern Iberia,
although the estimation of population size does not seem to endanger the population
diversity (Silva et al. 2013) because its recent growth, shows an inbreeding coefficient
between 0.15-0.30 (Table 1), slightly higher than other well-conserved Eurasian
populations (Appendix 2). This increase of inbreeding suggests that the effective size
values for the Iberian Peninsula might be overestimated (for example, by including
juveniles; Vilà 2010) as the matting structure related to the inbreeding coefficients
suggests. More samples will be necessary to verify this hypothesis.
Despite the differences in genetic variation, the three Northwestern wolves and Sierra
Morena cluster together when the dog component is removed (Figure 5). Moreover, dog
component only appears in the less inbreed Northwestern wild individual, leading to
conclude that the increase in the heterozygosity and decrease of inbreeding coefficient is
due to hybridization in this sample (Figure 10). Genetic integrity of Iberian population
is at risk due to its hybridization with dogs, as shown in Sierra Morena and Wolf Spain
samples. Nevertheless, Sierra Morena and Wolf Spain introgression patterns are
different (Figures 9 and 10): Wolf Spain introgressed regions are always Dog/Wolf,
whereas in Sierra Morena there are also haplotypes Dog/Dog. This suggests that the
hybridization in the South population is frequent, and almost all remaining individuals
have introgression signals in their genomes. On the other hand, Wolf Spain shows a
pattern more likely related to a sporadic hybridization event.
28
Here it is confirmed the regional and continental genetic patterns of wolves detected in
vonHoldt et al. (2011) using genome-wide sequences without bias. Shared alleles bring
a geographical pattern (Figure 11), even considering only Iberian samples: the shared
percentage decreases with geographical distance, being highest with the Central Europe
sample. However, Sierra Morena reduces its affinity with other Iberian samples due to
the introgression of dog alleles. Wolf Spain shows an increase of shared alleles with
dogs, although it conserves the affinities with other Iberian wolves. Actual Iberian
population can be considered different from other European populations. Iberian wolf is
known to represent a different sub-specie (Cabrera 1907). Morphometric (Vilà 1993)
and genetic (Vilà et al. 1999; Lucchini et al. 2004) studies describe a notable
differentiation between Iberian and Eurasian wolves, which suggests that they have
been separated from all other European wolves for a long time. Recognizing its
evolutionary potential (Crandall et al. 2000), Northwestern Iberian population demands
a separate management. Although previous studies based in few molecular markers
(Lucchini et al. 2004; Ramírez et al. 2006; Sastre et al. 2010; Sastre 2011) conclude
that there is no severe reduction on the genetic variability, here it is demonstrated that
Wolf Portugal sample has an important reduction on the diversity (Table 1, Figures 2, 3
and 4), and that in the Wolf Spain the hybridization reduces its inbreeding coefficient
(Figure 10). Those two evidences indicate that the conservation status of Northwestern
Iberian population is at risk, either because inbreeding or introgression. Moreover,
compared with other European samples, the inbreeding coefficient is incremented
(Table 1, Appendix 2). If high levels of inbreeding or hybridization are an extended
pattern in the Iberian Peninsula, these results suggest that the real effective population
size is lower than previous estimations.
29
6. CONCLUSION
Summarizing, similarities between Sierra Morena individual and other Iberian samples
included in this Master’s Thesis shows that the Northwestern population is at risk for
the same reason as Sierra Morena. Huge inbreeding coefficient and introgression are
well-known conservation threats (Rhymer & Simberloff 1996; Frankham 2005; Ouborg
2010; Allendorf et al. 2010; Randi 2011). Both factors are detected in the Northwestern
samples, indicating that the population is not as well-conserved as previously described.
Nevertheless, it has been detected two different patterns in both individuals: Wolf Spain
has a heterozygosity rate approximately equal to other Eurasian populations, but dog
introgression is present; on the other hand, Wolf Portugal has an increased inbreeding
coefficient and no hybridization. Analysis of more samples could explain the major
pattern in the Northwestern Iberian population.
Following, I point out the conclusions derived from the present work:
South Iberian wolf shows loss of genomic diversity and huge dog hybridization
which indicates an important extinction risk.
Northwestern Iberian wolf has higher diversity and less introgression than the
South population, but the level represents a threat to the population.
Patterns of hybridization are different in both populations: in the South,
introgression is frequent and extended; in the Northwestern, an occasional event.
Northwestern population has an inbreeding coefficient slightly higher than other
healthy grey wolves as a consequence of the bottleneck suffered in the Iberian
Peninsula.
Although more samples are needed, wolf population size seems to be
overestimated in the Northwestern Iberian Peninsula.
30
7. REFERENCES
Adams JR, Kelly BT, Waits LP (2003) Using faecal DNA sampling and GIS to monitor
hybridization between red wolves (Canis rufus) and coyotes (Canis latrans).
Molecular ecology, 12, 2175–2186.
Alexander DH, Lange K (2011) Enhancements to the ADMIXTURE algorithm for
individual ancestry estimation. BMC bioinformatics, 12, 246.
Alexander DH, Novembre J, Lange K (2009) Fast model-based estimation of ancestry
in unrelated individuals. Genome Research, 19, 1655–1664.
Allendorf FW, Hohenlohe P a, Luikart G (2010) Genomics and the future of
conservation genetics. Nature reviews. Genetics, 11, 697–709.
Allendorf FW, Leary RF, Spruell P, Wenburg JK (2001) The problems with hybrids:
setting conservation guidelines. Trends in Ecology & Evolution, 16, 613–622.
Andersone Ž, Lucchini V, Ozoliņš J (2002) Hybridisation between wolves and dogs in
Latvia as documented using mitochondrial and microsatellite DNA markers.
Mammalian Biology - Zeitschrift für Säugetierkunde, 67, 79–90.
Aspi J, Roininen E, Ruokonen M, Kojola I, Vilà C (2006) Genetic diversity, population
structure, effective population size and demographic history of the Finnish wolf
population. Molecular ecology, 15, 1561–76.
Auwera GA Van Der, Carneiro MO, Hartl C et al. (2013) From FastQ Data to High-
Confidence Variant Calls : The Genome Analysis Toolkit Best Practices Pipeline.
In: Current Protocols in Bioinformatics (eds Bateman A, Pearson WR, Stein LD,
Stormo GD, Yates JR), pp. 11.10.1–11.10.33. Hoboken, NJ, USA.
Axelsson E, Ratnakumar A, Arendt M-L et al. (2013) The genomic signature of dog
domestication reveals adaptation to a starch-rich diet. Nature, 495, 360–364.
Bibikov DI (1994) Wolf problem in Russia. Lutreola, 3, 10–14.
Blanco JC, Cortés Y (2007) Dispersal patterns, social structure and mortality of wolves
living in agricultural habitats in Spain. Journal of Zoology, 273, 114–124.
Blanco JC, Reig S, de la Cuesta L (1992) Distribution, status and conservation problems
of the wolf Canis lupus in Spain. Biological Conservation, 60, 73–80.
Boitani L (2003) Wolf conservation and recovery. In: Wolves. Behavior, Ecology, and
Conservation (eds Mech LD, Boitani L), pp. 317–344. The University of Chicago
Press, Chicago.
Bouzat JL (2010) Conservation genetics of population bottlenecks: the role of chance,
selection, and history. Conservation Genetics, 11, 463–478.
31
Boyko AR, Quignon P, Li L et al. (2010) A simple genetic architecture underlies
morphological variation in dogs. PLoS biology, 8, e1000451.
Breitenmoser U (1998) Large predators in the Alps: The fall and rise of man’s
competitors. Biological Conservation, 83, 279–289.
Brisbin A, Bryc K, Byrnes J et al. (2012) PCAdmix: principal components-based
assignment of ancestry along each chromosome in individuals with admixed
ancestry from two or more populations. Human biology, 84, 343–364.
Busch JD, Waser PM, Dewoody JA (2007) Recent demographic bottlenecks are not
accompanied by a genetic signature in banner-tailed kangaroo rats (Dipodomys
spectabilis). Molecular ecology, 16, 2450–62.
Cabrera A (1907) Los lobos de España. Boletín de la Real Sociedad Española de
Historia Natural, 7, 193–198.
Caniglia R, Fabbri E, Greco C et al. (2013) Black coats in an admixed wolf × dog pack
is melanism an indicator of hybridization in wolves? European Journal of Wildlife
Research, 59, 543–555.
Carmichael LE, Nagy JA, Larter NC, Strobeck C (2001) Prey specialization may
influence patterns of gene flow in wolves of the Canadian Northwest. Molecular
Ecology, 10, 2787–2798.
Chen H, Boutros PC (2011) VennDiagram: a package for the generation of highly-
customizable Venn and Euler diagrams in R. BMC bioinformatics, 12, 35.
Crandall KA, Bininda-Emonds ORP, Mace GM, Wayne RK (2000) Considering
evolutionary processes in conservation biology. Trends in Ecology & Evolution,
15, 290–295.
Cuesta L, Barcena F, Palacios F, Reig S (1991) The trophic ecology of the Iberian Wolf
(Canis lupus signatus Cabrera, 1907). A new analysis of stomach’s data.
Mammalia, 55, 239–254.
Currat M, Ruedi M, Petit RJ, Excoffier L (2008) The hidden side of invasions: massive
introgression by local genes. Evolution, 62, 1908–1920.
Delaneau O, Zagury J-F, Marchini J (2013) Improved whole-chromosome phasing for
disease and population genetic studies. Nature methods, 10, 5–6.
Derrien T, Estellé J, Marco Sola S et al. (2012) Fast computation and applications of
genome mappability. PloS one, 7, e30377.
Fabbri E, Miquel C, Lucchini V et al. (2007) From the Apennines to the Alps:
colonization genetics of the naturally expanding Italian wolf (Canis lupus)
population. Molecular ecology, 16, 1661–1671.
Falconer DS, Mackay TFC (1996) Quantitative genetics. Pearson Education Limited.
32
Falush D, Stephens M, Pritchard JK (2007) Inference of population structure using
multilocus genotype data: dominant markers and null alleles. Molecular ecology
notes, 7, 574–578.
Frankham R (2005) Genetics and extinction. Biological Conservation, 126, 131–140.
Frankham R, Lees K, Montgomery ME et al. (1999) Do population size bottlenecks
reduce evolutionary potential? Animal Conservation, 2, 255–260.
Freedman AH, Gronau I, Schweizer RM et al. (2014) Genome sequencing highlights
the dynamic early history of dogs. PLoS genetics, 10, e1004016.
Galaverni M, Caniglia R, Fabbri E, Lapalombella S, Randi E (2013) MHC variability in
an isolated wolf population in Italy. The Journal of heredity, 104, 601–612.
Godinho R, Llaneza L, Blanco JC et al. (2011) Genetic evidence for multiple events of
hybridization between wolves and domestic dogs in the Iberian Peninsula.
Molecular ecology, 20, 5154–5166.
Gomerčić T, Sindičić M, Galov A et al. (2010) High genetic variability of the grey wolf
(Canis lupus L.) population from Croatia as revealed by mitochondrial DNA
control region sequences. Zoological Studies, 49, 816–823.
Gottelli D, Sillero-Zubiri C, Applebaum GD et al. (1994) Molecular genetics of the
most endangered canid: the Ethiopian wolf Canis simensis. Molecular ecology, 3,
301–312.
Hamzić E (2011) Division of Livestock Sciences Levels of Inbreeding Derived from
Runs of Homozygosity : A Comparison of Austrian and Norwegian Cattle Breeds.
PhD thesis, University of Natural Resources and Life Sciences: Vienna.
Hindrikson M, Männil P, Ozolins J, Krzywinski A, Saarma U (2012) Bucking the trend
in wolf-dog hybridization: first evidence from europe of hybridization between
female dogs and male wolves. PloS one, 7, e46465.
Keller L (2002) Inbreeding effects in wild populations. Trends in Ecology & Evolution,
17, 230–241.
Keller MC, Visscher PM, Goddard ME (2011) Quantification of inbreeding due to
distant ancestors and its detection using dense single nucleotide polymorphism
data. Genetics, 189, 237–249.
Leonard JA, Wayne RK (2008) Native Great Lakes wolves were not restored. Biology
letters, 4, 95–98.
Li H, Durbin R (2009) Fast and accurate short read alignment with Burrows-Wheeler
transform. Bioinformatics (Oxford, England), 25, 1754–1760.
Li L, Ho S, Chen C et al. (2006) Long Contiguous Stretches of Homozygosity in the
Human Genome. Human mutation, 27, 1115–1121.
33
Liberg O, Andrén H, Pedersen H-C et al. (2005) Severe inbreeding depression in a wild
wolf (Canis lupus) population. Biology letters, 1, 17–20.
Lindblad-Toh K, Wade CM, Mikkelsen TS et al. (2005) Genome sequence,
comparative analysis and haplotype structure of the domestic dog. Nature, 438,
803–819.
Llaneza L, Fernández A, Nores C (1996) Dieta del lobo en dos zonas de Asturias
(España) que difieren en carga ganadera. Doñana Acta Vertebrata, 23, 201–214.
Lucchini V, Galov A, Randi E (2004) Evidence of genetic distinction and long-term
population decline in wolves (Canis lupus) in the Italian Apennines. Molecular
Ecology, 13, 523–536.
McKenna A, Hanna M, Banks E et al. (2010) The Genome Analysis Toolkit: a
MapReduce framework for analyzing next-generation DNA sequencing data.
Genome research, 20, 1297–1303.
Mech LD (1970) The wolf: the ecology and behavior of an endangerted species
(Natural History Press, Ed,). Doubleday Publishing Co., N.Y.
Mech LD (1995) The challenge and opportunity of recovering wolf populations.
Conservation Biology, 9, 270–278.
Niskanen AK, Kennedy LJ, Ruokonen M et al. (2014) Balancing selection and
heterozygote advantage in major histocompatibility complex loci of the
bottlenecked Finnish wolf population. Molecular ecology, 23, 875–889.
Ouborg NJ (2010) Integrating population genetics and conservation biology in the era
of genomics. Biology letters, 6, 3–6.
Ouborg NJ, Pertoldi C, Loeschcke V, Bijlsma RK, Hedrick PW (2010) Conservation
genetics in transition to conservation genomics. Trends in genetics, 26, 177–187.
Ozolins J, Andersone Z (2001) Status of large carnivore conservation in the Baltic
States. Action plan for the conservation of wolf (Canis lupus) in Latvia. European
Commission: Strasbourg, T-PVS, 73, 1–32.
Padial JM, Contreras FJ, Pérez J, Ávila E, Barea JM (2000) Análisis de la situación y
problemática del lobo (Canis lupus signatus) en Sierra Morena Oriental (Sur de
España). Galemys, 12, 37–44.
Petrucci-Fonseca F (1982) Wolves and stray-feral dogs in Portugal. In: III International
Theriological Congress . Helsinky.
Pilot M, Branicki W, Jedrzejewski W et al. (2010) Phylogeographic history of grey
wolves in Europe. BMC evolutionary biology, 10, 104.
34
Pilot M, Greco C, vonHoldt BM et al. (2014) Genome-wide signatures of population
bottlenecks and diversifying selection in European wolves. Heredity, 112, 428–
442.
Prado-Martinez J, Hernando-Herraez I, Lorente-Galdos B et al. (2013) The genome
sequencing of an albino Western lowland gorilla reveals inbreeding in the wild.
BMC genomics, 14, 363.
Price AL, Patterson NJ, Plenge RM et al. (2006) Principal components analysis corrects
for stratification in genome-wide association studies. Nature genetics, 38, 904–
909.
Pritchard JK, Stephens M, Donnelly P (2000) Inference of Population Structure Using
Multilocus Genotype Data. Genetics, 155, 945–959.
Purcell S, Neale B, Todd-brown K et al. (2007) PLINK : A Tool Set for Whole-Genome
Association and Population-Based Linkage Analyses. American Journal of Human
Genetics, 81, 559–575.
Räikkönen J, Bignert A, Mortensen P, Fernholm B (2006) Congenital defects in a
highly inbred wild wolf population (Canis lupus). Mammalian Biology - Zeitschrift
für Säugetierkunde, 71, 65–73.
Räikkönen J, Vucetich JA, Peterson RO, Nelson MP (2009) Congenital bone
deformities and the inbred wolves (Canis lupus) of Isle Royale. Biological
Conservation, 142, 1025–1031.
Räikkönen J, Vucetich J a, Vucetich LM, Peterson RO, Nelson MP (2013) What the
Inbred Scandinavian Wolf Population Tells Us about the Nature of Conservation.
PloS one, 8, e67218.
Ramírez O, Altet L, Enseñat C et al. (2006) Genetic assessment of the Iberian wolf
Canis lupus signatus captive breeding program. Conservation Genetics, 7, 861–
878.
Randi E (2008) Detecting hybridization between wild species and their domesticated
relatives. Molecular ecology, 17, 285–293.
Randi E (2011) Genetics and conservation of wolves Canis lupus in Europe. Mammal
Review, 41, 99–111.
Randi E, Hulva P, Fabbri E et al. (2014) Multilocus detection of wolf x dog
hybridization in italy, and guidelines for marker selection. PloS one, 9, e86409.
Randi E, Lucchini V (2002) Detecting rare introgression of domestic dog genes into
wild wolf (Canis lupus) populations by Bayesian admixture analyses of
microsatellite variation. Conservation Biology, 3, 31–45.
35
Randi E, Lucchini V, Christensen MF et al. (2000) Mitochondrial DNA Variability in
Italian and East European Wolves: Detecting the Consequences of Small
Population Size and Hybridization. Conservation Biology, 14, 464–473.
Rhymer JM, Simberloff D (1996) Extinction by hybridization and introgression. Annual
Review of Ecology and Systematics, 27, 83–109.
Roy M, Geffen E, Smith D, Ostrander E, Wayne R (1994) Patterns of differentiation
and hybridization in North American wolflike canids, revealed by analysis of
microsatellite loci. Molecular biology and evolution, 11, 553–570.
Sastre N (2011) Genética de la conservación: el lobo gris (Canis lupus). PhD thesis,
Universidad Autónoma de Barcelona: Spain.
Sastre N, Vilà C, Salinas M et al. (2010) Signatures of demographic bottlenecks in
European wolf populations. Conservation Genetics, 12, 701–712.
Sidorovich VE, Tikhomirova LL, Jedrzejewska B (2003) Wolf Canis lupus numbers,
diet and damage to livestock in relation to hunting and ungulate abundance in
northeastern Belarus during 1990–2000. Wildlife Biol, 9, 103–111.
Silva JP, Toland J, Hudson T et al. (2013) LIFE and human coexistence with large
carnivores (The EU LIFE Programme - European Commision, Ed,). DG
Environment.
Sundqvist A (2008) Conservation Genetics of Wolves and their Relationship with Dogs.
PhD thesis, Uppsala University: Sweeden.
Tajima F (1983) Evolutionary relationship of DNA sequences in finite populations.
Genetics, 105, 437–460.
Tallmon DA, Luikart G, Waples RS (2004) The alluring simplicity and complex reality
of genetic rescue. Trends in ecology & evolution, 19, 489–96.
Thalmann O, Shapiro B, Cui P et al. (2013) Complete mitochondrial genomes of
ancient canids suggest a European origin of domestic dogs. Science, 342, 871–874.
Twyford AD, Ennos RA (2012) Next-generation hybridization and introgression.
Heredity, 108, 179–189.
Valverde JA (1971) El lobo español. Montes, 159, 228–241.
Verardi a, Lucchini V, Randi E (2006) Detecting introgressive hybridization between
free-ranging domestic dogs and wild wolves (Canis lupus) by admixture linkage
disequilibrium analysis. Molecular ecology, 15, 2845–2855.
Vilà C (1993) Aspectos morfológicos y ecológicos del lobo ibérico Canis lupus. PhD
thesis, Universidad de Barcelona: Spain.
36
Vilà C (2010) Viabilidad de las poblaciones ibéricas de lobos. Enseñanzas de la
genética para la conservación. In: Los lobos de la Península Ibérica. Propuestas
para el diagnóstico de sus poblaciones. (eds Fernández-Gil A, Álvares F, Vilà C,
Ordiz A), pp. 157–171. ASCEL, Palencia, Spain.
Vilà C, Amorim IR, Leonard JA et al. (1999) Mitochondrial DNA phylogeography and
population history of the grey wolf Canis lupus. Molecular Ecology, 8, 2089–2103.
Vilà C, Sundqvist A-K, Flagstad Ø et al. (2003a) Rescue of a severely bottlenecked
wolf (Canis lupus) population by a single immigrant. Proceedings. Biological
sciences / The Royal Society, 270, 91–97.
Vilà C, Walker C, Sundqvist A-K et al. (2003b) Combined use of maternal, paternal
and bi-parental genetic markers for the identification of wolf-dog hybrids.
Heredity, 90, 17–24.
Vilà C, Wayne RK (1999) Hybridization between Wolves and Dogs. Conservation
Biology, 13, 195–198.
vonHoldt BM, Pollinger JP, Earl D a et al. (2011) A genome-wide perspective on the
evolutionary history of enigmatic wolf-like canids. Genome Research, 21, 1294–
1305.
vonHoldt BM, Pollinger JP, Lohmueller KE et al. (2010) Genome-wide SNP and
haplotype analyses reveal a rich history underlying dog domestication. Nature,
464, 898–902.
Vos J (2000) Food habits and livestock depredation of two Iberian wolf packs (Canis
lupus signatus) in the north of Portugal. Journal of Zoology, 251, 457–462.
Wang G, Zhai W, Yang H et al. (2013) The genomics of selection in dogs and the
parallel evolution between dogs and humans. Nature communications, 4, 1860.
Wayne RK, Van Valkenburgh B, Kat PW et al. (1989) Genetic and Morphological
Divergence among Sympatric Canids. J. Hered., 80, 447–454.
Wright S (1977) Evolution and the genetics of populations. Vol. 3. Evolution and the
genetics of populations. Univ. of Chicago Press, Chicago, IL.
Wright LI, Tregenza T, Hosken DJ (2007) Inbreeding, inbreeding depression and
extinction. Conservation Genetics, 9, 833–843.
37
APPENDIX 1
Bioinformatics’ discussion
In this appendix, I present the pipeline diagrams for the analyses done and specific
bioinformatics issues were discussed.
Pipelines
Diagrams for the three main analyses done: mapping and variant calling,
diversity analysis and hybridization analysis. The symbol meanings in the
pipeline are the following:
Mapping and variant calling
38
Diversity analysis
Hybridization analysis
39
Bioinformatics’ discussion
SNP validation
Due to the small dataset used, I perform the validation of the calls with the
GATK Good Practices recommendations1, preforming a hard filtering. The
variants that pass the filters have:
Quality by depth higher than 2.
Root Mean Square (RMS) of the mapping quality higher than 40.
Phred-scaled p-value using Fisher’s Exact Test to detect strand bias
lower than 60.
Consistency of the site with two segregating haplotypes (haplotype
score) lower than 13.
u-based z-approximation from the Mann-Whitney Rank Sum Test for
mapping qualities higher than 12.5.
u-based z-approximation from the Mann-Whitney Rank Sum Test for the
distance from the end of the read for reads with the alternate allele higher
than 8.
Format conversion
In the analysis of the dataset, I used various published programs whose formats
are different. Although some open programs and scripts deal with this problem
(for example, VCFtools2), I wrote some scripts to better known of the final
dataset characteristic (https://github.com/magicDGS/bioConvert). For instance, a
VCF to TPED/TMAP converter written in Python (vcf2tplink.py) remove non-
bialelic SNPS and/or whitout GT information.
Analytical scripts
Perl and Python custom scripts used in the analyses of this work do not appear in
any repository because they are in optimization process. In addition, R analyses
were done using plyr3 and reshape2
4 packages to manage the data and ggplot2
5
to visualize the results. Because I explored the data by command-line interface,
no scripts were written.
40
References
1. Auwera, G. A. Van Der et al. in Curr. Protoc. Bioinforma. (Bateman, A.,
Pearson, W. R., Stein, L. D., Stormo, G. D. & Yates, J. R.) 11.10.1–11.10.33
(2013).
2. Danecek, P. et al. The variant call format and VCFtools. Bioinformatics 27,
2156–8 (2011).
3. Wickham, H. The Split-Apply-Combine Strategy for Data Analysis. J. Stat.
Softw. 40, 1–29 (2011).
4. Wickham, H. Reshaping Data with the reshape Package. J. Stat. Softw. 21, 1–20
(2007).
5. Wickham, H. ggplot2: elegant graphics for data analysis. (Springer New York,
2009).
41
APPENDIX 2
Results for no-Iberian samples
Table A2.1 shows the individual results and Table A2.2 the means for each population.
Means are computed considering dog population, Eurasian (excluding Wolf Italy),
North American and South American wolf populations.
Table A2.1 Individual results
Sample Specie/Region Population Cov Het FROH
Wolf Croatia Eurasian Wolf Central/Eastern Europe 6.98 0.00147319 0.09
Wolf China Eurasian Wolf Middle Eastern Europe/Asia 26.36 0.00148438 0.23
Wolf India Eurasian Wolf Middle Eastern Europe/Asia 24.90 0.00181422 0.01
Wolf Iran Eurasian Wolf Middle Eastern Europe/Asia 26.27 0.00178093 0.03
Wolf Israel Eurasian Wolf Middle Eastern Europe/Asia 6.01 0.00150744 0.05
Wolf Italy Eurasian Wolf Italy 5.81 0.00032140 0.51
Airedale Terrier Dog Modern breed 7.33 0.00064358 0.44
Basenji Dog Modern breed 1.35 0.00068557 0.34
Boxer Dog Modern breed 29.33 0.00066418 0.41
Chinese Crested Dog Modern breed 19.17 0.00076284 0.41
Chinook Dog Modern breed 7.84 0.00080670 0.39
English Cocker Spaniel Dog Modern breed 9.66 0.00104400 0.25
Kerry Blue Terrier Dog Modern breed 15.83 0.00068793 0.44
Labrador Retriever Dog Modern breed 10.80 0.00110546 0.20
Miniature Schnauzer Dog Modern breed 5.47 0.00076737 0.32
Soft Coated Wheaten Terrier Dog Modern breed 17.18 0.00070319 0.41
Standard Poodle Dog Modern breed 12.63 0.00101575 0.28
Wolf Great Lakes Amerian Wolf North America 24.34 0.00183124 0.08
Wolf Yellowstone A Amerian Wolf North America 25.73 0.00154630 0.18
Wolf Yellowstone B Amerian Wolf North America 24.07 0.00158641 0.13
Wolf Yellowstone C Amerian Wolf North America 5.41 0.00148466 0.09
Wolf Mexico A Amerian Wolf South America 23.59 0.00003753 0.70
Wolf Mexico B Amerian Wolf South America 5.23 0.00012047 0.70
Cov: atosomic coverage; Het: heterozygosity (het/bp); FROH: inbreeding coefficient
42
Table A2.2. Population results
Population N Mean Het Mean FROH
Central/Eastern Europe Wolf 4 0.00161203 0.08
Dogs 11 0.00080787 0.35
North American Wolf 4 0.00161215 0.12
South American Wolf 2 0.00007900 0.70
Het: mean heterozygosity (het/bp); FROH: inbreeding coefficient
43
APPENDIX 3
Heterozygosity by chromosome
Heterozygosity in each sample using 1Mb 200kb-overlapping windows. Red dots are
indicative for a window under 0.0005 heterozygotes per base pair (inbreed window).
44
45
46
47
48
49
50
51
52
53
54
55
56
57
APPENDIX 4
Heterozygosity distribution for no Iberian samples
Density and box plots from heterozygosity in dogs, Eurasian and American wolves,
using 1Mb windows with 200kb-overlapping. Dotted lines point out the cutoff used as
inbreed window.
Dogs
58
Eurasian wolves
59
American wolves
60
APPENDIX 5
Principal components’ boxplots and PCA with component 4
In the 48K-merged dataset, PC1 shows the differentiation between wolves and dogs,
whereas PC2 represents the geographical variation of wolves. In the present-work’s
dataset, PC1 shows the differentiation between wolves and dogs, whereas PC2 clusters
American wolves together. The geographically differentiation of Eurasian wolves are
explained with PC4. Plotted below, PCA (using PC1 and PC4) with the samples from
this work with and without dog blocks from Sierra Morena and Wolf Spain.
61
48K-merged dataset
62
Dataset form this work
63
PCA with dog blocks
PCA without dog blocks
64
APPENDIX 6
Cross-validation error of the ADMIXTURE analysis
Cross-validation mean and standard deviation for the 5-run ADMIXTURE analysis of
the present-work’s dataset and the 3-run 48K-merged dataset. Note that the variation in
the cross-validation error is larger in the present-work’s dataset.
65
APPENDIX 7
Linear model details of heterozygosity-percentage block analysis
Statistic summary (Table A7.1) for each linear model showed in Figure 10. Below,
residuals, Q-Q and leverage plots.
Table A7.1. Summary statistics table.
Sample Haplotype Slope p-value Adj. R2
Sierra Morena Dog/Dog -112.40 0.0657 0.0657
Wolf/Dog 332.32 0.0000 0.7331
Wolf/Wolf -219.92 0.0049 0.1774
Wolf Spain Dog/Dog 16.21 0.0252 0.1074
Wolf/Dog 443.56 0.0002 0.3029
Wolf/Wolf -459.77 0.0002 0.2968
Wolf Portugal Dog/Dog -17.37 0.0000 0.4639
Wolf/Dog 29.95 0.0000 0.3534
Wolf/Wolf -12.58 0.0942 0.0502
Wolf EEP Dog/Dog -7.19 0.2841 0.0049
Wolf/Dog 18.03 0.1870 0.0213
Wolf/Wolf -10.84 0.5100 -0.0153
Haplotype: haplotype class in both chromosomes;
Slope: estimation for the slope (heterozygosity in het/bp);
Adj R2:adjusted R
2 for the complete model
Dog/Dog haplotypes Sierra Morena
66
Wolf/Dog haplotypes Sierra Morena
Wolf/Wolf haplotypes Sierra Morena
67
Dog/Dog haplotypes Wolf Spain
Wolf/Dog haplotypes Wolf Spain
68
Wolf/Wolf haplotypes Wolf Spain
Dog/Dog haplotypes Wolf Portugal
69
Wolf/Dog haplotypes Wolf Portugal
Wolf/Wolf haplotypes Wolf Portugal
70
Dog/Dog haplotypes Wolf EEP
Wolf/Dog haplotypes Wolf EEP
71
Wolf/Wolf haplotypes Wolf EEP