© van belle werner25/11/2009 - pg. 1 v1 15. repetition / overview dr. werner van belle...
TRANSCRIPT
© Van Belle Werner
25/11/2009 - Pg. 2 v1
1. Correlation Mathematical definition of Average, Variance,
Standard deviation Mathematical definition of L.P. Correlation Graphical interpretation of correlation Implementing a correlation routine given the
math. Definition What is numerical stability ? Give an example
using the variance How to deal with missing numbers in a
correlation calculation ? What does significance of a correlation value
mean ?
© Van Belle Werner
25/11/2009 - Pg. 3 v1
Correlation Graphical Interpretation
© Van Belle Werner
25/11/2009 - Pg. 4 v1
Given a vector of n numbers
What is the average is of this vector ?
© Van Belle Werner
25/11/2009 - Pg. 5 v1
© Van Belle Werner
25/11/2009 - Pg. 6 v1
© Van Belle Werner
25/11/2009 - Pg. 7 v1
© Van Belle Werner
25/11/2009 - Pg. 8 v1
Step 1. Translation
© Van Belle Werner
25/11/2009 - Pg. 9 v1
Step 2. Variance normalization
© Van Belle Werner
25/11/2009 - Pg. 10 v1
Step 3. Covariance calulcation
r=0.936
© Van Belle Werner
25/11/2009 - Pg. 11 v1
1. Write a function that takes two inputs X and Y and returns the linear pearson correlation
between these two vectors.
2. Write a routine which reads a binary file in which we have a consecutive sequence of double
precision floats.
3. Modify your program to take 2 filenames as argument and let it report the correlation value
© Van Belle Werner
25/11/2009 - Pg. 12 v1
4. Write a program that will generate two random vectors of size N and let it report the correlation
between those two random vectors.
Let this program repeat this action 100 times and report the average absolute correlation.
Investigate the effect of the groupsize N on the average reported correlation.
5. We have 10 vectors stored in 10 different files. Write a program to read these files and report a
correlation matrix.
© Van Belle Werner
25/11/2009 - Pg. 13 v1
What should we do with missing numbers ?What is the significance of a correlation ?
© Van Belle Werner
25/11/2009 - Pg. 14 v1
Numerical StabilityVariance calculations are notorious for numerical
instability
A Simple Algorithm
for(int i=0; i < n; i++) var+=(x_i-avg)*(x_i-avg)var/=n;
100'000100'000100'000100'000100'000
1111111111111
10**1020**1030*10
40**1050*10
50**10+1 ?...
© Van Belle Werner
25/11/2009 - Pg. 15 v1
Note on a Method for Calculating Corrected Sums of Squares and Products B. P. Welford, technometrics, Vol. 4, No. 3 (Aug., 1962), pp. 419-420
Incremental online algorithm
© Van Belle Werner
25/11/2009 - Pg. 16 v1
2. Multi Dimensional Correlations What is 2D Gel electrophoresis ? What is a function/method declaration ? Exercise on converting a single dimensional
routine to multiple dimensions Correlation is not causation Correlation is not linear regression Correlations can be accidental Both no correlations -as well as- high/lo
correlations can be indicative
© Van Belle Werner
25/11/2009 - Pg. 17 v1
2D Gel Electrophoresis First dimension
acid sideacid sidepH 5pH 5
acid sideacid sidepH 5pH 5 neutralneutral
pH 7pH 7
neutralneutralpH 7pH 7
base sidebase sidepH 9pH 9
base sidebase sidepH 9pH 9
Protein Mixture
© Van Belle Werner
25/11/2009 - Pg. 18 v1
2D Gel Electrophoresis Iso Electric Focusing
acid sideacid sidepH 5pH 5
acid sideacid sidepH 5pH 5 neutralneutral
pH 7pH 7
neutralneutralpH 7pH 7
base sidebase sidepH 9pH 9
base sidebase sidepH 9pH 9
Protein Mixture
40' at 200 V40' at 200 V30' at 450 V30' at 450 V30' at 750 V30' at 750 V60' at 2000 V60' at 2000 V
40' at 200 V40' at 200 V30' at 450 V30' at 450 V30' at 750 V30' at 750 V60' at 2000 V60' at 2000 V
© Van Belle Werner
25/11/2009 - Pg. 19 v1
2D Gel Electrophoresis Transfer onto 2nd gel
pH seperated protein mixture
TransferTransferTransferTransfer
© Van Belle Werner
25/11/2009 - Pg. 20 v1
2D Gel Electrophoresis Transfer onto 2nd gel
pH seperated protein mixtureTime BasedTime BasedMass SeparationMass Separation
Time BasedTime BasedMass SeparationMass Separation
© Van Belle Werner
25/11/2009 - Pg. 21 v1
2D Gel Electrophoresis Washing/Drying/Staining
pH/mass seperated protein mixture
'staining' fluid
© Van Belle Werner
25/11/2009 - Pg. 22 v1
2D Gel Electrophoresis Capturing
© Van Belle Werner
25/11/2009 - Pg. 23 v1
2D Gels of multiple patients
Courtesy Gry Sjøholt, Nina Ånensen & Bjørn Tore Gjertsen
Patient #1Liver Size: 57
Patient #2Liver Size: 46
© Van Belle Werner
25/11/2009 - Pg. 24 v1
Given a stack of images: which areas correlate against our patient's tumrogrowth, life expectancy
etcetera ?
© Van Belle Werner
25/11/2009 - Pg. 25 v1
Reading The Image int img_sx int img_sy byte[,] read_image(String filename)
Will read the image from the provided filename
When img_sx is not set, will set both of them to the size of the image being loaded
When img_sx is set then the image must have the same size. Otherwise null is returned.
If the image does not exist null is returned as well.
The byte array is ordered as image[x][y]
© Van Belle Werner
25/11/2009 - Pg. 26 v1
Exercise Import your correlation routine from last time. It
should have the following declaration float correlate(float[] X, float[] Y, int n)
© Van Belle Werner
25/11/2009 - Pg. 27 v1
P53 Biosignature vs Liver size
© Van Belle Werner
25/11/2009 - Pg. 28 v1
Masking
© Van Belle Werner
25/11/2009 - Pg. 29 v1
Significance
© Van Belle Werner
25/11/2009 - Pg. 30 v1
Significance Mask
© Van Belle Werner
25/11/2009 - Pg. 31 v1
Variance
© Van Belle Werner
25/11/2009 - Pg. 32 v1
Variance Mask
© Van Belle Werner
25/11/2009 - Pg. 33 v1
Overall Mask
© Van Belle Werner
25/11/2009 - Pg. 34 v1
Overall Mask
© Van Belle Werner
25/11/2009 - Pg. 35 v1
P53 Biosignature vs Liver size
© Van Belle Werner
25/11/2009 - Pg. 36 v1
3. Nucleotides to Amino Acides Various biological terms briefly explained Prokaryotes/Eukaryotes/Chromosomes/Chromatide/
Chromatine/Karyotyping/Diploid/Haploid/Gametes Where is the genetic material stored ? Nucleotides / Amino Acids Complement, Reverse Sequence Proteins Translation Reading Frames, Open reading Frames
© Van Belle Werner
25/11/2009 - Pg. 37 v1
Cells Prokaryotic - no nucleus (bacteria) Eukaryotic – with nucleus (plants/animals)
The nucleus Contains the genetic material Genetic material can be in two states
Heterochromatine / Euchromatine (a diffuse state which makes the DNA accessible)
Chromosomes
© Van Belle Werner
25/11/2009 - Pg. 38 v1
Chromosomes
Karyotyping
© Van Belle Werner
25/11/2009 - Pg. 39 v1
Humans 23 chromosomes pairs: Diploid
Other organisms can have different layouts Tetraploid Hexaploid Octoploid
22 chromsome types (autosomes) 1 sex chromosome
© Van Belle Werner
25/11/2009 - Pg. 40 v1
Chromosome Various chromosome layouts
Somatic cells – diploid One set from mother One set from father
Gametes – haploid Mother -or- Father Gametes do not have the same genetic code
Autosomes in diploid cells are not strictly identical. Although 99% is the same
© Van Belle Werner
25/11/2009 - Pg. 41 v1
Chromatides 1 – chromatide 2 – centromere 3 – p-arm (short) 4 – q-arm (long) 5 – telomeres
Double chromatide state only during interphase
5
5
© Van Belle Werner
25/11/2009 - Pg. 42 v1
Chromosome ↔ DNA
© Van Belle Werner
25/11/2009 - Pg. 43 v1
Chromsome ↔ DNA
© Van Belle Werner
25/11/2009 - Pg. 44 v1
DNA
© Van Belle Werner
25/11/2009 - Pg. 45 v1
Nucleotides: A; C; T & G Paired nucleotides:
basepairs A-T; C-G (complementary
bases) Standard read from 5'
end to the 3' end Forward / reverse strands
© Van Belle Werner
25/11/2009 - Pg. 46 v1
Genes Gene identification is problematic
Position identifiers are not unique Sequences are not completely unique A biologists' agreement on terminology 'somewhere around this area' having 'largely' this
sequence. Similar sequences across species
© Van Belle Werner
25/11/2009 - Pg. 47 v1
Genes Specific areas in the genome (loci) have
meaning and translate to proteins afterward Number of bases in the genome ? Number of genes in the genome ?
© Van Belle Werner
25/11/2009 - Pg. 48 v1
Proteins 3D structures / molecular machines with
specific possibilities
© Van Belle Werner
25/11/2009 - Pg. 49 v1
Amino Acids Consist of a sequence of 20++ amino acids Alanine (Ala, A) Cysteine (Cys, C), Aspartic Acid
(Asp, D), Glutamic Acid (Glu, E), Phenylalanine (Phe, F), Glycine (Gly, G), Histidine (His, H), Isoleucine (Ile, I), Lysine (Lys, K), Leucine (Leu, L), Methionine (Met, M), Asparagine (Asn, N), Proline (Pro, P), Glutamine (Gln, Q), Arginine (Arg, R), Serine (Ser, S), Threonine (Thr, T), Valine (Val, V), Tryptophan (Trp, W), Tyrosine (Tyr, Y)
Selenocysteine, pyrrolysine (rare)
© Van Belle Werner
25/11/2009 - Pg. 50 v1
Essential Amino Acids Consist of a sequence of 20 amino acids Alanine (Ala, A) Cysteine (Cys, C), Aspartic
Acid (Asp, D), Glutamic Acid (Glu, E), Phenylalanine (Phe, F), Glycine (Gly, G), Histidine (His, H), Isoleucine (Ile, I), Lysine (Lys, K), Leucine (Leu, L), Methionine (Met, M), Asparagine (Asn, N), Proline (Pro, P), Glutamine (Gln, Q), Arginine (Arg, R), Serine (Ser, S), Threonine (Thr, T), Valine (Val, V), Tryptophan (Trp, W), Tyrosine (Tyr, Y)
© Van Belle Werner
25/11/2009 - Pg. 51 v1
DNA → Protein Codons (3 nucleotide sequence) translate to
Amino acid DNA copies to RNA, which is moved out of the
nucleus (T → U) Polymerases convert the sequence to proteins
Multiple translations possible. Most common is RNA Polymerase-II
© Van Belle Werner
25/11/2009 - Pg. 52 v1
Translation Table
© Van Belle Werner
25/11/2009 - Pg. 53 v1
Reading frames UCU AAA AUG GGU GAC ...CUA AAA UGG GUG AC ......UAA AAU GGG UGA C
An open reading frame (ORF) is a reading frame that contains a start codon a subsequent region which usually has a length
which is a multiple of 3 nucleotides a stop codon at its end.
© Van Belle Werner
25/11/2009 - Pg. 54 v1
Exercise Create a routine that calculates the complement of a
DNA sequence Create a routine that calculates the reverse of a DNA
sequence Create a routine that translates a DNA sequence into
an amino acid sequence Let the program try each reading frame and report
the sequence with the longest distance to the first stop codon
Include now complement, reverse and reverse complement sequences.
© Van Belle Werner
25/11/2009 - Pg. 55 v1
4. Exons, Introns,Splice Variants Translation mechanisms in eukaryotes Splice variants; Exons / Introns Ensembl Browsing of this type of information Designing probes to detect specific splice
variants
© Van Belle Werner
25/11/2009 - Pg. 56 v1
Transcription / Translation Prokaryotes
Transcription: Polymerase copies the gene into an RNA strand: mRNA
Translation: The mRNA is then used to generate proteins
These peptide chains then fold into Proteins
Problem for Eukaryotes DNA stays in the nucleus Proteins are mainly in the cytoplasm
© Van Belle Werner
25/11/2009 - Pg. 57 v1
Eukaryotic Translation Translate DNA to pre-mRNA Process pre-mRNA to mRNA
Adding caps (5' cap, polyadenylation) Splicing (select certain parts of the pre-mRNA) Editing (nucleotide modifications)
Transport mRNA to cytoplasm Translate mRNA to proteins.
© Van Belle Werner
25/11/2009 - Pg. 58 v1
The Process
© Van Belle Werner
25/11/2009 - Pg. 59 v1
Splicing Converts the freshly copied DNA (pre-m-RNA)
to a new strand (mRNA) Removes certain areas (introns) and joins
others together (exons)
© Van Belle Werner
25/11/2009 - Pg. 60 v1
m-RNA
© Van Belle Werner
25/11/2009 - Pg. 61 v1
One Gene, One Protein ? Non-coding genes Alternative splicing: one gene can have multiple
splice variants leading to different proteins Monocistronic mRNA: when the mRNA codes
for only one protein Polycistronic mRNA: codes for multiple genes
(operon)
© Van Belle Werner
25/11/2009 - Pg. 62 v1
TP53 In human: on which chromosome is it located ? Does this area in the genome also overlap with other
genes ? How many splice variants does the TP53 gene
have ? How many bases is the gene long ? How many exons does the gene have ? Is there a transcript which includes all exons ? What is the sequence of the shortest transcript ?
[TP53-205]
© Van Belle Werner
25/11/2009 - Pg. 63 v1
Some other Genes Genes: 5HTT, BRCA2, Wingless Alternative names ? Chromosome location ? Overlapping genes at this position ? Splice variants ? Do all exons transcribe in the same direction ?
© Van Belle Werner
25/11/2009 - Pg. 64 v1
Exercise We want to design a probe that will uniquely detect a
specific splice variant (target) We have a nice large table of all existing splice variants in
human together with their gene name and variant number (ENSTxxx)
Given a length L, we now want to find the first subsequence of the target that does not match any of the other existing splice variant. The subsequence is of length L The splice variant table will nopt contain the target itself
We also want the shortest sufficient probe to detect the target
© Van Belle Werner
25/11/2009 - Pg. 65 v1
5. Ensembl SQL Part 1 What is a database, schema, table, row,
column, attribute, value Ensembl Stable ID's Ensembl Genes, mapipng to stable ids Transcripts, Translations, Exons One-Many relationships across tables
© Van Belle Werner
25/11/2009 - Pg. 66 v1
Relational Databases Database Server Database aka [Database] Schema Tables Columns with specific types Rows Values can be NULL or real values
© Van Belle Werner
25/11/2009 - Pg. 67 v1
Joining tables
SELECT *FROM TABLE1 JOIN TABLE2 USING(Y)
SELECT *FROM TABLE1 t1JOIN TABLE2 t2WHERE t1.Y=t2.Y
© Van Belle Werner
25/11/2009 - Pg. 68 v1
Ensembl Provides very structured biological information Integrates many different data sources Cares about metadata
Keeps track of different versions, releases Keeps track where the data came from Keeps track how a specific analysis was performed
Access using Mysql-query-browser Host: ensembldb.ensembl.org Port: 3306 (up to #47) or 5306 (#48 onwards) Login: anonymous Password: <none>
© Van Belle Werner
25/11/2009 - Pg. 69 v1
Schemata Each organism has its own collection of databases
homo_sapiens_core_47_36i homo_sapiens_cdna_<x>_<y> homo_sapiens_core_expression_est_<x>_<y> homo_sapiens_core_expression_gnf_<x>_<y> homo_sapiens_disease_<x>_<y> homo_sapiens_est_<x>_<y> homo_sapiens_estgene_<x>_<y> homo_sapiens_funcgen_<x>_<y> homo_sapiens_haplotype_<x>_<y> homo_sapiens_lite_<x>_<y> homo_sapiens_otherfeatures_<x>_<y> homo_sapiens_variation_<x>_<y> homo_sapiens_vega_<x>_<y>
© Van Belle Werner
25/11/2009 - Pg. 70 v1
Stable Gene Identifiers Table gene_stable_id
gene_id – the current gene identification, acts as (part of the) primary key in many tables – a number
stable_id – the publicly visible ENSG<xxx> identifier
creation_date – when was this gene introduced ?
version – what is the current version of the gene
modified_date – when was the last change ?
© Van Belle Werner
25/11/2009 - Pg. 71 v1
Genes Table gene
gene_id biotype: proteincoding or not seq_region_id seq_region_start seq_region_end seq_region_strand display_xref_id source: where did it come from status: KNOWN/NOVEL description: human readable
© Van Belle Werner
25/11/2009 - Pg. 72 v1
Gene to Stable id mapping Write a query that will map a gene to its stable
id. The output should contain gene_id, Biotype, seq_region_id, seq_region_start,
seq_region_end, seq_region_strand, display_xref_id, source, status, description and of course the stable_id
© Van Belle Werner
25/11/2009 - Pg. 73 v1
Transcript Table transcript
transcript_id – these ids can coincide with gene_ids. Do not mix them ! gene_id seq_region_id / seq_region_start / seq_region_end /
seq_region_strand display_xref_id biotype status Description
Table transcript_stable_id stable_id – something like ENST000... transcript_id
© Van Belle Werner
25/11/2009 - Pg. 74 v1
Translation Table translation
translation_id transcript_id seq_start start_exon_id seq_end end_exon_id
Table translation_stable_id stable_id – something like ENSP0000.... translation_id
© Van Belle Werner
25/11/2009 - Pg. 75 v1
Exon Table exon
exon_id seq_region_id / seq_region_start /
seq_region_end / seq_region_strand phase end_phase
Table exon_transcript maps the transcript to something ? exon_id transcript_id Rank – exon number (1 to 10, 13, 170)
© Van Belle Werner
25/11/2009 - Pg. 76 v1
One to 0,1,+,* ?
© Van Belle Werner
25/11/2009 - Pg. 77 v1
Genes ↔ Transcript ↔ Exon Obtain a table with 3 columns
gene_id transcript_id exon_id
Start out with a table that lists all gene_id transcript_id
Then extend the table with exons belonging to that transcript
© Van Belle Werner
25/11/2009 - Pg. 78 v1
Genes ↔ Transcript ↔ Exon
© Van Belle Werner
25/11/2009 - Pg. 79 v1
6. Ensembl SQL Part 2 Various mappings: Genes to Proteins Various grouping operations: average, maxima,
minima, countings etcetera Ensembl Regions and Chromosome
information
© Van Belle Werner
25/11/2009 - Pg. 80 v1
Genes ↔ Protein Mapping Write a query to map genes to potential
proteins. The output table should contain A stable gene identifier A stable protein identifier
© Van Belle Werner
25/11/2009 - Pg. 81 v1
Genes ↔ Protein Mapping Write a query to map genes to potential
proteins. The output table should contain A stable gene identifier A stable protein identifier
SELECT G.stable_id as gen, T.stable_id as proteinFROM gene JOIN transcript USING (gene_id) JOIN translation USING(transcript_id)JOIN translation_stable_id T USING (translation_id)JOIN gene_stable_id G USING (gene_id)LIMIT 10
© Van Belle Werner
25/11/2009 - Pg. 82 v1
Genes ↔ Protein Mapping
© Van Belle Werner
25/11/2009 - Pg. 83 v1
Averages What is the average number of transcripts per
gene ? What is the average number of exons per
transcript ?
Based on #47 of the database33761 unique genes57365 unique transcripts503655 unique exon/transcript combinations288309 unique exons
Which gives 1.7 transcript/geneAnd 8.79 exons per transcriptBut only 8.5 unique exons per gene
© Van Belle Werner
25/11/2009 - Pg. 84 v1
Largest Gene Which gene has the most transcripts ?
SELECT COUNT(DISTINCT t.transcript_id) as tcount, si.stable_id, g.gene_idFROM gene gJOIN transcript t USING(gene_id)JOIN gene_stable_id si USING (gene_id)GROUP BY gene_idORDER BY tcount DESCLIMIT 10
ENSG00000154556 with 44 transcripts
© Van Belle Werner
25/11/2009 - Pg. 85 v1
Transcript with the most exons Which transcript has the most exons ?
SELECT MAX(rank) ecount, si.stable_id, t.transcript_idFROM transcript t JOIN exon_transcript et USING (transcript_id)JOIN transcript_stable_id si USING (transcript_id)GROUP BY transcript_idORDER BY ecount DESCLIMIT 10
ENST00000356127 with 313 exons
© Van Belle Werner
25/11/2009 - Pg. 86 v1
Regions The region codes in Ensembl can be a variety
of things. Table seq_region
seq_region_id name – can be a chromosome name coord_system_id length
© Van Belle Werner
25/11/2009 - Pg. 87 v1
Largest area covered Which gen covers the largest area in the
genome ? On which chromosome ?
SELECT seq_region_end-seq_region_start as L, stable_id, gene_id, nameFROM gene JOIN gene_stable_id USING (gene_id)JOIN seq_region USING (seq_region_id)ORDER BY L descLIMIT 10
Answer: ENSG00000174469 with 2'304'637 bases
© Van Belle Werner
25/11/2009 - Pg. 88 v1
Largest area
© Van Belle Werner
25/11/2009 - Pg. 89 v1
Create a table of TSS Retrieve a list of potential transcription start
sites and to which gene they belong
SELECT t.seq_region_start, t.seq_region_strand, stable_idFROM transcript tJOIN gene USING (gene_id)JOIN gene_stable_id USING (gene_id)
© Van Belle Werner
25/11/2009 - Pg. 90 v1
7. Ensembl Identifiers How are external identifiers represented in Ensembl ? Object_xref – links Ensembl objects to external
names Xref – remembers the external object name External_db – keeps track of a variety of different
databases How could we add a new nomenclature Map one nomenclature to ensembl identifiers Mapping exercices from one nomenclature to another Some nomenclatures have names for genes as well
as translations Mapping Uniprot to HGNC
© Van Belle Werner
25/11/2009 - Pg. 91 v1
External Databases Table external_db
external_db_id – primary key db_name – database name db_release – version status – predicted, known, cross referenced,
orthologue mapped etcetera type - misc, array etcetera db_display_name – how to print this database
name
© Van Belle Werner
25/11/2009 - Pg. 92 v1
External Databases Table external_db
external_db_id – primary key db_name – database name type - misc, array etcetera db_display_name – how to print this database
name
SELECT * FROM external_dbWHERE db_name=”HGNC”
external_db_id: 1100db_name: HGNCdb_display_name: HGNC symbolType: primary_db_synonym
© Van Belle Werner
25/11/2009 - Pg. 93 v1
External Names Table xref
xref_id – the cross reference primary key external_db_id – the external database key db_primary_acc – the 'primary key' in the external
database display_label – how to print this gene identifier description – a description according to the
external database
© Van Belle Werner
25/11/2009 - Pg. 94 v1
External Names to Stable Ids General purpose table object_xref
ensembl_id: the ensemble internal id (gene_id for instance)
ensembl_object_type: translation, transcript, gene xref_id: id from the xref table linkage_annotation
© Van Belle Werner
25/11/2009 - Pg. 95 v1
Adding new nomenclature Create external_db entry For each gene <A> in the nomenclature, allocate the
name in the database xref Gene A → xref_id 764 Gene B → xref_id 987 ...
For each external gene A,B,... in the nomenclature, map it to the database A → ENSG78646 → gene_id=76689 B → ENSG98768 → gene_id=7577
Link A,B to ENSG... through an object_xref of type Gene 764,76689, Gene 987,7577, Gene
© Van Belle Werner
25/11/2009 - Pg. 96 v1
External Names to Stable Ids
SELECT * FROM xref xJOIN object_xref oWHERE external_db_id=1100and x.xref_id=o.xref_idLIMIT 0,1000
xref_id: 1793295external_db_id: 1100dbprimary_acc 21076display_labe: TMEM14ADescription: transmembrane protein 14Ainfo_type: Dependentinfo_text: Generated via NP_054770ensembl_id: 18971ensembl_object_type: Genexref_id: 1793295linkage_annotation: NULL
© Van Belle Werner
25/11/2009 - Pg. 97 v1
HGNC to Ensembl
SELECT *FROM xref xJOIN object_xref oJOIN gene_stable_id gWHERE external_db_id=1100and x.xref_id=o.xref_idand ensembl_id=g.gene_id[and display_label="CXYorf1"]LIMIT 0,1000
display_label: CXYorf1ensembl_id: 7888ensembl_object_type: Transcript
display_label: CXYorf1ensembl_id: 4373emsembl_object_type: Gene
We obtain the wrong results ! Be aware that ensembl_id cannot alwaysBe mapped to a gene_id
© Van Belle Werner
25/11/2009 - Pg. 98 v1
HGNC to Ensembl
SELECT *FROM xref xJOIN object_xref oJOIN gene_stable_id gWHERE external_db_id=1100and x.xref_id=o.xref_idand ensembl_id=g.gene_idand ensembl_object_type='Gene'
Results in 18524 identifiersWith only 18107 unique ensembl identifiers
© Van Belle Werner
25/11/2009 - Pg. 99 v1
Ensembl to HGNC Find all ensembl genes that have no existing
HGNC name First: map all the stable_ids through the
object_xref table to the xref identities
SELECT * FROM gene_stable_id gJOIN object_xref xrJOIN xref x USING(xref_id)WHERE xr.ensembl_object_type='Gene' AND xr.ensembl_id=g.gene_idLIMIT 100
© Van Belle Werner
25/11/2009 - Pg. 100 v1
Ensembl to HGNC Find all ensembl genes that have no existing
HGNC name Second: take only those that belong to the
HGNC nomenclature (1100)
SELECT * FROM gene_stable_id gJOIN object_xref xrJOIN xref x USING(xref_id)WHERE xr.ensembl_object_type='Gene' AND xr.ensembl_id=g.gene_idAND external_db_id=1100LIMIT 100
© Van Belle Werner
25/11/2009 - Pg. 101 v1
Ensembl to HGNC Find all ensembl genes that have no existing
HGNC name Third: modify the query to only list the existing
identifiers
SELECT DISTINCT g.stable_idFROM gene_stable_id gJOIN object_xref xrJOIN xref x USING(xref_id)WHERE xr.ensembl_object_type='Gene' AND xr.ensembl_id=g.gene_idAND external_db_id=1100
© Van Belle Werner
25/11/2009 - Pg. 102 v1
Ensembl to HGNC Find all ensembl genes that have no existing
HGNC name Fourth: get rid of all the existing identifiers from
the full stable_id list.
SELECT stable_id FROM gene_stable_id WHERE stable_id NOT IN (SELECT DISTINCT g.stable_idFROM gene_stable_id gJOIN object_xref xrJOIN xref x USING(xref_id)WHERE xr.ensembl_object_type='Gene' AND xr.ensembl_id=g.gene_idAND external_db_id=1100)
15654 genes have no HGNC identifier
© Van Belle Werner
25/11/2009 - Pg. 103 v1
HGNC to Uniprot Mapping Write a query that will map each known HGNC
identifier to a Uniprot identifier Problem
HGNC deals with genes Uniprot deals with proteins ('Translation')
© Van Belle Werner
25/11/2009 - Pg. 104 v1
HGNC → Uniprot Map each HGNC identifier that is a gene to a
Ensembl Translation identifier Map each HGNC identifier that is a transcript to
a Ensembl Translation identifier Map each of those identifiers to an Uniprot
identifier
© Van Belle Werner
25/11/2009 - Pg. 105 v1
8. Simulating Realtime PCR What is the PCR ? Wrote a small simulation with a limit on the
material copied How to calculate the volume after x cycles
when the initial amount was I ? CT/CP Values – how to go back from a CT
value to the initial amount Simulated the effect of less than 100%
efficiency
© Van Belle Werner
25/11/2009 - Pg. 106 v1
PCR
Enzyme + Reagents + DNA → 2 DNA + somewhat less reagents + enzyme
Polymerase Chain reaction Denaturate DNA → single DNA strands Anneal primer – attaches only to complementary
DNA Synthesize the rest of the strand – Polymerase
Consumes dNTPs (deoxynucleoside triphosphates)
© Van Belle Werner
25/11/2009 - Pg. 107 v1
RT-PCR / qPCR Beware
Reverse Transcription PCR Realtime PCR (= qPCR)
q-PCR Repetitive cycles (20-40 cycles) Includes oligonucletodies that emit light when
bound
© Van Belle Werner
25/11/2009 - Pg. 108 v1
Simulating a RT-PCR reaction We start off with a specific volume of DNA
material: amount With each cycle we increment amount by the
DNA we copied (copied_dna)
© Van Belle Werner
25/11/2009 - Pg. 109 v1
Simulating a RT-PCR reaction
© Van Belle Werner
25/11/2009 - Pg. 110 v1
Simulating a RT-PCR reaction The cell volume is not infinite. We must observe
how much reagentia is left for the reaction
© Van Belle Werner
25/11/2009 - Pg. 111 v1
Simulating a RT-PCR reaction
© Van Belle Werner
25/11/2009 - Pg. 112 v1
Simulating a RT-PCR reaction The usable reagents are only part of the
remaining volume
© Van Belle Werner
25/11/2009 - Pg. 113 v1
Simulating a RT-PCR reaction
© Van Belle Werner
25/11/2009 - Pg. 114 v1
Effect of initial amount
Each multiplication with 10leads to a shift of 3.32 cycles
© Van Belle Werner
25/11/2009 - Pg. 115 v1
Why ? Exponential growth
© Van Belle Werner
25/11/2009 - Pg. 116 v1
Why ? Given a target amount T and an initial amount
a0, how many cycles will it take to reach T ?
© Van Belle Werner
25/11/2009 - Pg. 117 v1
Why ? Suppose now that the initial amount a0 is
multiplied with a factor 10, what effect does this have on the cyclecount ?
© Van Belle Werner
25/11/2009 - Pg. 118 v1
CT / CP values Based on the 'cycles (c) to a certain threshold
(T)' one can estimate the initial amount. Problem 1:we measurement after each cycle.
There exists no such thing as a 3.3 cycles. Solution: fit an exponential curve to the points we
did measure. Problem 2:
At a certain point the exponential growth tapers off. Solution: find the best point
Still within 'exponential growth' Easy recognizable Useful
© Van Belle Werner
25/11/2009 - Pg. 119 v1
Problem 1 - Points between cycles Log value of amount is a linear curve
© Van Belle Werner
25/11/2009 - Pg. 120 v1
Problem 2 - CT / CP Values Possibility 1: A required intensity
© Van Belle Werner
25/11/2009 - Pg. 121 v1
CT / CP Values Possibility 2: A required slope = required growth
© Van Belle Werner
25/11/2009 - Pg. 122 v1
CT / CP Values Possibility 3: Maximum slope
© Van Belle Werner
25/11/2009 - Pg. 123 v1
Accuracy of the CT value
© Van Belle Werner
25/11/2009 - Pg. 124 v1
Cycle Variances Assume that with each cycle not everything is
copied, but only something between 99% and 100% of the available amount, what effect will this have on our CT values ?
To understand this Our algorithm needs to report its own CT value. We must modify the stepsize, instead of calculates
cycle by cycle we will do it for every 1/1000th of a cycle; this brings the error due to CT positioning down to 0.07 % (= 0.00069)
© Van Belle Werner
25/11/2009 - Pg. 125 v1
Multi-step
Beware off linear interpolation
© Van Belle Werner
25/11/2009 - Pg. 126 v1
Exercise 1. Modify the simulate routine to return the
cycle value before it reaches a volume of 500 What is your reported CT value ?
2. Modify the routine such that it will decrease the efficiency of the copy process at random Before adding the dna_to_copy multiply it with a
random number between 0.99 and 1 3. Modify your routine to generate 1000
simulations and calculate the average reported CT value What is your result ?
© Van Belle Werner
25/11/2009 - Pg. 127 v1
Results: 99% - 100% efficiency Initial amount: 0.001
Without decreasing the efficiency: 18.907 With decreasing the efficiency: 18.9988 Difference: 0.0918; effect on initial amount
estimation: 6% underestimated Initial amount: 0.00001
Without decreased efficiency: 25.49 With decreased efficiency: 25.588 Difference: 0.098; effect on initial amount
estimation: 7% underestimated
© Van Belle Werner
25/11/2009 - Pg. 128 v1
Results 95%-100% efficiency Initial amount: 0.001
Without decreasing the efficiency: 18.907 With decreasing the efficiency: 19.2524 Difference: 0.3454; effect on initial amount
estimation: 21% underestimated Initial amount: 0.00001
Without decrease: 25.49 With decrease: 26.0581 Difference: 0.5681; effect on initial amount
estimation: 33% underestimated.
© Van Belle Werner
25/11/2009 - Pg. 129 v1
9. Data Grouping Understanding questions Grouping data chunks together Across or foreach gene/plate etcera ? Layout of a PCR experiment and examples
© Van Belle Werner
25/11/2009 - Pg. 130 v1
Unstructured Questions Calculate the up or down regulation between cell types
For all or for each gene ? Including the different replicas ?
Calculate the average expression in each cell line Averaged per gene after resolving replicates (each
gene will have the same weight afterward) -or- directly across replicas ?
Is there an effect between the cell line and the cell type
Such unstructured questionscan be understood and implemented differently and produce highly different results
© Van Belle Werner
25/11/2009 - Pg. 131 v1
qPCR A plate: 96 wells Different probes/gene: ALFA, BETA, IOTA A cell type: WT, TG Different dilutions: 1:2, 1:5, 1:20, 1:50 Technical Replicas: R1, R2, R2 A cell line: HeLa, SK-N-DZ Biological Replicas: B1, B2
© Van Belle Werner
25/11/2009 - Pg. 132 v1
qPCR: A Common Layout
© Van Belle Werner
25/11/2009 - Pg. 133 v1
Why think about groups ? Group information is often implicit. If it is implicit: assume
foreach. Groups can help to resolve missing data-points Groups determine the control flow in an analysis
Calculate everything on technical replicates, then average things out over the biological replicates -or-
Pool all technical and biological replicates together before continuing with the analysis
Not all potential groups make sense Calculating the average of all dilutions is only possible if we have
the same number of elements in each replica → dangerous to do Groups can be artificial but structure experiments
E.g: we have three replicas of each probe on each plate and another technical replica on a second plate → plate distinction can be irrelevant and just introduces an extra technical replica
© Van Belle Werner
25/11/2009 - Pg. 134 v1
Why think about groups ? A group of data tends to be smaller than the full
dataset (we do not need to load other groups) Can make streaming possible Requires less RAM
E.g: calculate an exon overlap map for each chromosome
Can allow parallel execution Dependencies between data groups
Recalculate only necessary groups
© Van Belle Werner
25/11/2009 - Pg. 135 v1
Language Issues Foreach, Per, (Forall)
denotes a separation between groups. Foreach gene means that each group will only deal with one
gene at a time. Forall, Across, Ignoring, (Pooled, Grouped), Aggregate ...
Denotes an aggregation of data independent of this particular variable
Forall genes means that ALFA, BETA, THETA etc can all be included in each individual group.
Pairs, Couples, Combinations, Multiples, Between Denotes subgroups within larger groups e.g: for each combination of dilutions → means in whatever
group we are working with we want to create subgroups that are unique wrt their dilution and compare these.
© Van Belle Werner
25/11/2009 - Pg. 136 v1
© Van Belle Werner
25/11/2009 - Pg. 137 v1
Starting with the marked element,Mark all other elements that belong to this group
© Van Belle Werner
25/11/2009 - Pg. 138 v1
We must stay within this particular biological replica
© Van Belle Werner
25/11/2009 - Pg. 139 v1
We must stay within this geneBETA
© Van Belle Werner
25/11/2009 - Pg. 140 v1
We must stay within the geneBETA
© Van Belle Werner
25/11/2009 - Pg. 141 v1
We can take all replicas: R1, R2, R3However: the dilution is not specified→Assume we will stay within thesame dilution
© Van Belle Werner
25/11/2009 - Pg. 142 v1
We can take all replicas: R1, R2, R3And compare one subgroup WT against the other subgroup TG1
© Van Belle Werner
25/11/2009 - Pg. 143 v1
This leads to a parentgroup,across celltypes
© Van Belle Werner
25/11/2009 - Pg. 144 v1
In this group we are lookingfor two subgroups with different celltypes.
© Van Belle Werner
25/11/2009 - Pg. 145 v1
Dealing with combinations If variable X is listed as a 'combination'
A celltype combination, or a dilution combination First create the parentgroup that assumes X is a group variable.
Celltype is treated as a group Dilution is treated as a group
From this parentgroup one can select a subgroup identified by a value of X. A subgroup where Celltype = WT A subgroup where Dilution = 1:2 This subgroup is then the first group of the combination
One can also select any other group that has a different value for X. A subgroup where Celltype = TG1 A subgroup where Dilution = 1:50 This subgroup is then another element of the combination.
© Van Belle Werner
25/11/2009 - Pg. 146 v1
Efficiency Estimation
Each multiplication with 10leads to a shift of 3.32 cyclesThis shift depends on the efficiency
© Van Belle Werner
25/11/2009 - Pg. 147 v1
Efficiency Estimation How do we want to estimate the PCR efficiency ?
For each plate For each probe/gene For each celltype (wildtype, modified) For each dilution combination For each replica
Exercise Extend the group provided to you to include all elements of that
group Color a second group belonging to the 'dilution combination' [Write down an object hierarchy to access the data quickly] [Write pseudocode to access the data]
© Van Belle Werner
25/11/2009 - Pg. 148 v1
For each plate, gene, dilution combination, celltype, biological replica, technical replica
This approach is somewhat flawed. Assume that R2/1:2 failed for IOTA but not for R1/1:2
© Van Belle Werner
25/11/2009 - Pg. 149 v1
For all plates and technical replicaseach gene, dilution combination, celltype, biological replica
Solves a mussing data problem
© Van Belle Werner
25/11/2009 - Pg. 150 v1
10. Accessing Data Groups An educational API to explore data groups Standard object hierarchies are difficult Data grouping enables concurrent access Helps with optimalisation (don't calculate what
didn't change) Cleaned up the output of a qPCR experiment
© Van Belle Werner
25/11/2009 - Pg. 151 v1
Object Hierarchies Calculate the median across replicas and across
plates, remove bad measurements first Gene → Dilution → Cell Type → Plate* → average replicas Gene → Dilution → Cell Type → average replicas Gene → Replica → Dilution → Cell Type
Calculate the copyratio per gene based on all dilution pairs
Gene → Dilution → [Cell Type] → CP Gene → Cell Type → Dilution → CP
Calculate the up down regulation between two genes
Gene → Dilution → Cell Type → Concentration → Gene → same Dilution → same Cell Type → Concentration
© Van Belle Werner
25/11/2009 - Pg. 152 v1
Data Grouping Hard coding a data representation/object
hierarchy often interferes with different data views / rotations of the data. XML/SQL/OODB
SQL can group data for you Unsuitable to perform an analysis loading group by group from a database can be highly
time consuming (latency) Group identification can be problematic. Requires a high level API that can be used to deal with
data: data slices.
© Van Belle Werner
25/11/2009 - Pg. 153 v1
A Table Interface A Table represents a table with attributes, records and
values Retrieve all unique keys given an attribute list Retrieve all records associated with a specific key
Differently stated: retrieve the group associated with (or identified by) key
Retrieve a value from a record Retrieve all values for an attribute in a table (a column) Iterate over all records in a table Iterate over all groups in a table
© Van Belle Werner
25/11/2009 - Pg. 154 v1
Table – loading in a tsvTable table=new Table("qpcr.tsv");
Will load the tab seperated value file qpcr.tsv in memory
Console.Out.Write(table) Will print out the table content, record by record
© Van Belle Werner
25/11/2009 - Pg. 155 v1
Records Each record maps an attribute (String) to a IComparable object (String, Double, …)
Record r=new Record();r[“Averaged CP”]=average;
Records can be shared between tables ! Records should be treated read-only after
creation and filling them with data. Records can can be copied; the content of the
record (the values that is) are not copied: r.copy();
© Van Belle Werner
25/11/2009 - Pg. 156 v1
Adding a record to a tableTable table=new Table();Record record=new Record();record[“test”]=60000;table.add(record);Console.Out.Write(table);
Records should be added only after their creation and initialization.
© Van Belle Werner
25/11/2009 - Pg. 157 v1
Retrieving a set of keysTable table=new Table(“qpcr.txt”);table.keys(“CellType”);
Returns a list of unique Records, with only the attribute 'CellType'
table.keys(“CellType”,”Gene”);
Returns a list of unique records with two attributes: 'CellType and Gene'
The return value is a List<Record>
© Van Belle Werner
25/11/2009 - Pg. 158 v1
Retrieving values belonging to a keytable.group(key)
Key is a record (CellType: WT; Gene: ALFA) The returned value is again a new Table. Remember the records are shared between the
returned subtable and the parent table. Do not modify records after they have been created and initialized.
If a record is added to the parent table it will not automatically appear in potential subtables.
© Van Belle Werner
25/11/2009 - Pg. 159 v1
Retrieving values from an attributerecord[“CellType”]
returns the IComparable content in this record
table[“CellType”]
returns an ArrayList of values linked to that attribute.
© Van Belle Werner
25/11/2009 - Pg. 160 v1
Iterating over the records in a table
foreach(Record tr in table){String str=(String)tr["CP"];…}
© Van Belle Werner
25/11/2009 - Pg. 161 v1
Iterating over all groups in a tableList<Table> groups=table.groups(“A”,”B”,...);
foreach(Table group in groups){
group.key → the current group identification
(contains A, B,....)…}
© Van Belle Werner
25/11/2009 - Pg. 162 v1
Advantages No need for an objects structure representing
the data Flexible wrt new data fields Flexible with regard to different regrouping (data
rotations) Foreach group
The inner loop can theoretically be executed in parallel
If necessary an SQL backend can be placed in the Table interface
© Van Belle Werner
25/11/2009 - Pg. 163 v1
Exercise: Data Cleanup Document which attributes exist in qpcr.tsv Write a routine that will run through the dataset
qpcr.tsv and clean up the data. All technical replicas should be averaged (for
each of the other attributes) Replicas with useless data should be omitted
cycle numbers >40 marked as bad [34.56] without a value [outlier replicas (median)]
© Van Belle Werner
25/11/2009 - Pg. 164 v1
11. Normalizing Efficiencies How to estimate the efficiency of a qPCR
experiment using dilution series How to normalize the CT values based on an
estimated efficiency Created a normalized table
© Van Belle Werner
25/11/2009 - Pg. 165 v1
Creating combinationsWrite a routine that will for for each plate, cell line,
cell type and probe report all dilution combinations of the averaged cp-values of the technical replicas
1.Reuse the table you created in the previous exercise
2.Foreach (plate,cellline,celltype,probe) obtain the associated group. Assume we want to have all dilutions included
3.Print out each combination of different dilutions
© Van Belle Werner
25/11/2009 - Pg. 166 v1
Copyrate Estimation
Each multiplication with 10leads to a shift of 3.32 cyclesThis shift depends on the copyrate
© Van Belle Werner
25/11/2009 - Pg. 167 v1
Normalization through dilution series If amplification were 100% efficient then halving
the initial concentration would shift the measurement 3.32 cycles to the right.
In reality it isn't and we will see shifts larger/smaller than 3.32 cycles.
By creating a dilution series, one can estimate the copyrate / efficiency
© Van Belle Werner
25/11/2009 - Pg. 168 v1
Copyrate Calculation If we have a dilution of factor 10 (becomes
stronger) And we have a shift (to the left) of x cycles, then
the copy ratio (r) is
© Van Belle Werner
25/11/2009 - Pg. 169 v1
ExamplesAfter diluting a sample a factor 10 we
observe a shift of
3.33 → ratio = 2 3.36 → ratio = 1.98 3.5 → ratio = 1.93 3.6 → ratio = 1.89 3.7 → ratio = 1.86
© Van Belle Werner
25/11/2009 - Pg. 170 v1
Routine to Calc. Eff.Based on the formula given before; write a routine
Double estimate_r(Double x, Double y, Double ratio_x2y);
To estimate the average multiplication factor for each cycle. To test your routine use the following inputs
X=10; Y=13.32; ratio_x2y=0.1 → copyratio 2
X=10; Y=6.68; ratio_x2y=10 → copyratio 2
© Van Belle Werner
25/11/2009 - Pg. 171 v1
Estimating the copyratio Plug in your routine into Ex. 2 such that the average
copy-ratio is calculated for each probe, celline, celltype and plate. All combinations should be taken into account.
Use the function Double estimate_r(Double x, Double y, Double ratio_x2y)to return the cycleratio.
© Van Belle Werner
25/11/2009 - Pg. 172 v1
Remarks on Efficiency One could also look at the material copied with
each cycle. It should double. Based on that we have a direct measurement of the efficiency. In low areas our sensititve is not sufficient to
estimate In High areas before the last cycle we have such a
position Only one measurement Depends somewhat on the software provided with
the machines
© Van Belle Werner
25/11/2009 - Pg. 173 v1
Remarks on Efficiency Often the efficiency is expressed as the relative
amount of input material that was used to create new DNA: efficiency = r-1.0. (between 80% and 140%)
Double estimate_efficiency (Double x, Double y, Double ratio_x2y)
{
return
100.0*(estimate_r(x,y,ratio_x2y)-1.0);
}
© Van Belle Werner
25/11/2009 - Pg. 174 v1
Normalizing CT Values Assume the reached volume was V, after x
cycles at a copyrate of r
What would be the number of cycles if the copyrate were 2 ?
© Van Belle Werner
25/11/2009 - Pg. 175 v1
Normalizing CT Values It could also be possible to go back to the initial
concentration instead of relying on normalizing the CT values
© Van Belle Werner
25/11/2009 - Pg. 176 v1
Normalize the CT Value Use your estimated copyratio to normalize the CT
value Implement the normalization equation Place the normalized CT values in a new table
© Van Belle Werner
25/11/2009 - Pg. 177 v1
12. Dct Values Normalizing the results towards a houshold
gene
© Van Belle Werner
25/11/2009 - Pg. 178 v1
Normalization ? CellType might affect the baseline gene expression
in the cell. A WT cell might be less active than a TG cell Or vice versa
To account for this problem one can compare a gene expression against a 'household' gene The household gene is supposed to be non related to
the measured gene As we know from microarrays, using one gene to
normalize various expressions is highly errorprone
© Van Belle Werner
25/11/2009 - Pg. 179 v1
Normalizing against a known gene If CP_A is the CP value for gene of interest A And CP_H is the CT value for the
householdgene H The concentrations [A] and [H] are given by
© Van Belle Werner
25/11/2009 - Pg. 180 v1
Normalizing against a known gene The ratio of the concentrations is then given by
© Van Belle Werner
25/11/2009 - Pg. 181 v1
Example A has a CT value of 15.786 H has a CT value of 18.875 DCT = 18.875-15.786 The concentration ratio is 8.51 This is an upregulation of a factor 8.51
(against the household gene concentration)
© Van Belle Werner
25/11/2009 - Pg. 182 v1
Example A has a CT value of 19.6 H has a CT value of 12.4 DCT = 12.4-19.6 = -7.2 The concentration ratio is 0.068011 Which is a down regulation of a factor 147.03
© Van Belle Werner
25/11/2009 - Pg. 183 v1
Calculating DCT Using GADPH as a houehold gene, Calculate for any other gene the DCT value and
report it in a new table
© Van Belle Werner
25/11/2009 - Pg. 184 v1
13. Delta-Delta CT Calculating DDCT values from one celltype to
another.
© Van Belle Werner
25/11/2009 - Pg. 185 v1
Calculating up/down regulations Up/down regulations are typically calculated
between celltypes. E.g: the relative expression of gene A in WT
condition against the relative expression of gene A in TG condition.
© Van Belle Werner
25/11/2009 - Pg. 186 v1
Example WT:
A has a CT value of 15.786 H has a CT value of 18.875 DCT = 18.875-15.786=3.089
TG: A has a CT value of 19.6 H has a CT value of 12.4 DCT = 12.4-19.6 = -7.2
DDCT = -7.2-3.089=-10.289 WT/TG=0.000799286
or a down regulation of a factor 1251.12
© Van Belle Werner
25/11/2009 - Pg. 187 v1
14. Reporting Regulations Reporting regulations as
Log values Ratios Ratios larger than 1
© Van Belle Werner
25/11/2009 - Pg. 188 v1
Reporting up-down regulations Can be reported as a ratio
x10 x0.01 x0
Can be reported as a log value x10 → log_10 value of 1 x0.1 → og_10 value of -1 x1 → log_10 value of 0 x0 → has no log_10 value
© Van Belle Werner
25/11/2009 - Pg. 189 v1
Reporting up/down regulations Can be reported as a ratio and a direction
x10 → 10 times upregulated x0.1 → 10 times downregulated
Exercise: Based on your DDCT table, create a report for the up / down regulations of each measured gene from the WT to the TG
© Van Belle Werner
25/11/2009 - Pg. 190 v1
Exercise Report Table
Gene CellLine Dilution Ratio Direction
Ddct Table Gene CellLine Dilution Ddct
Neither table contains CellType (siRNA versus Normal)(nor Plate)
Write a routine that will generate a new Report that Contains the average up/down ratios (averaged across dilutions (and plates))