introduction to machine learning potential of sequence data · introduction to machine learning ‐...
TRANSCRIPT
COllaborative Management Platform for detection and Analyses of (Re-) emerging and foodborne outbreaks in Europe
This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 643476.
Introduction to machine learning ‐potential of sequence data
Nanna Munck, PhD student Patrick Murigu Kamau Njage, postdoc
September 28, 2017.
This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 643476.
Content
• Machine learning method• Potential of sequence data • Example from my PhD project
This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 643476.
Matrix ‐ example of machine learning input
COMPARE is funded by the European Union’s Horizon 2020 research and innovation programme under Grant agreement No 643476
This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 643476.
Machine learning
• Analyse large and complex datasets by recognizing patterns
• “Algorithm improves with experience” (data) (Libbrecht M.W, and Noble W. S, 2015)
• Identifies relevant “features” in a complex data set enable the ability to make a strong prediction
• Example applications: spam filtering, face recognition, smart cars, finance, advertisement in internet browsers etc.
COMPARE is funded by the European Union’s Horizon 2020 research and innovation programme under Grant agreement No 643476
This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 643476.
Machine learning
COMPARE is funded by the European Union’s Horizon 2020 research and innovation programme under Grant agreement No 643476
This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 643476.
Machine learning Algorithms
COMPARE is funded by the European Union’s Horizon 2020 research and innovation programme under Grant agreement No 643476
This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 643476.
Machine learning
BroilerPigPigLayersBroilerLayersPigBroiler
1. Broiler2. Pig3. Layers4. Broiler5. Pig6. Layers7. Pig8. Pig
Figure 1. Conceptual model, modified from Libbrecht M.W, and Noble W. S, 2015
Labels
This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 643476.
Machine learningConceptual model og workflow
Figure 2. Conceptual model of the machinelearning workflow
This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 643476.
Potential of sequence data
Resistancegenes
Resistancegenes Plasmid
repliconsPlasmidreplicons
Virulencegenes
VirulencegenesMLSTMLST
SNPSNP
CGE Tools
OtherOther
Label Gene1 Gene2 Gene3 Gene4 Gene5 …Sequence 1 1 1 0 0 0Sequence 2 0 1 1 1 1Sequence 3 1 0 1 0 0Sequence 4 1 0 0 0 1
This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 643476.
Example from my PhD projectIntroduction
• Source attribution models link the number of cases of a specific foodborne illness to specific food and animal reservoirs
• Existing models are based mainly on phenotypic information of a given bacteria and comparison of its distribution in potential sources and humans
• With next generation sequencing technology, new methods for source attribution are available
• We investigate the potential of machine learning to predict the source (animal reservoir) from which a given Salmonella bacteria origins based on genotypic features
This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 643476.
Example from my PhD project Case: Salmonella Typhimurium
• Danish Salmonella Typhimurium isolates, 2012‐14 extracted from the Vivaldi database– 22 broilers– 20 pigs– 4 layers
• Features, in total 43: resistance genes (n = 20), plasmid replicons (n = 21), MLST type (n = 2)
This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 643476.
Example from my PhD project Data exploration, SNP analysis
LegendPigsBroilersLayers
This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 643476.
Example from my PhD projectResults and discussion
Broilers Layers PigsIncFIB(S) 6.56 2.01 5.58IncQ1 5.24 1.00 5.93IncFII(S) 5.62 2.01 4.94strA 4.06 1.00 5.15sul2 3.33 1.74 3.94ST‐19 3.57 1.00 ‐1.17ColpVC 3.23 ‐1.74 2.53blaTEM‐1B 2.87 1.42 ‐1.32Col(VCM04) 2.86 ‐1.42 ‐2.56tet(B) 2.59 1.00 1.07ST‐34 2.52 0.00 1.44Col156 1.98 0.00 2.10IncI1 ‐1.50 0.00 2.08strB 2.01 1.00 1.59ColRNAI ‐2.05 1.00 ‐0.66strA.like 0.09 0.00 0.10tet(A)like 0.09 0.00 ‐0.30
Table 1. Accuracy
Table 2. Confusion matrix
Table 3. Importance measures
Predicted/Data Broilers Layers PigsBroilers 4 2 2Layers 0 0 0Pigs 3 0 3
Model Accuracy Agreement Accuracyrf svmr svml svmr vs rf svml vs rf
0.67 0.67 0.5 0.86 0.79
Bold: resistance genes, italic: plasmidreplicons, normal: MLST type
This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 643476.
Example from my PhD projectConclusion and future work
• Host specific patterns– Neither ofthefeaturesincluded sofarshowed clearpredictability.
• Features– Asthis iswork inprogress,morefeaturessuch asvirulence stillneed tobe explored
• Data – Moredataistobe used andmoreknowledge istobe gained about interpretationof
themodeloutputsinorder toimprove themodel.
This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 643476.
Thank you
Questions?