np - positive set negative set full length orfs genome annotated candidate nps top ranked nps input...
TRANSCRIPT
NP - Positiveset
Negative Set
Full lengthORFs
Genome
Annotated
Candidate NPs
Top ranked NPs
Input Training NP catalogue
Negative Set
Negative Set
Negative set
NP processing tools
Translated proteome ML quality:
Cross validation
NeuroPID
prediction
Q
Y C
N
H
L D
R
W
M
T
S
G
A
V
P
F
I E
K
0
20
40
60
80
100
1-lo
g(p-
valu
e, t-
test
)
1-lo
g(p-
valu
e, t-
test
)
A B
GRAVY
Instabilit
y
Molecular
Weig
ht PI
Aromati
city
0
10
20
30
MRSRTSVLTSSLAFLYFFGIVGRSALAMEETPASSMNLQHYNN
MLNPMVFDDTMPEKRAYTYVSEYKRLPVYNFGIGKRWIDTNDN
KRGRDYSFGLGKRRQYSFGLGKRNDNADYPLRLNLDYLPVDNP
AFHSQENTDDFLEEKRGRQPYSFGLGKRAVHYSGGQPLGSKRP
NDMLSQRYHFGLGKRMSEDEEESSQR
MRSRTSVLTSSLAFLYFFGIVGRSALAMEETPASSMNLQHYNN
MLNPMVFDDTMPEKRAYTYVSEYKRLPVYNFGIGKRWIDTNDN
KRGRDYSFGLGKRRQYSFGLGKRNDNADYPLRLNLDYLPVDNP
AFHSQENTDDFLEEKRGRQPYSFGLGKRAVHYSGGQPLGSKRP
NDMLSQRYHFGLGKRMSEDEEESSQR
Random Forest Classifier RBF Linear SVC Gradient Boosting SVC Sigmoid Polynomal SVM0
0.2
0.4
0.6
0.8
1
‘accuracy’ ‘precision’ ‘recall’
Area under ROC curve
Cros
s va
lidati
on p
erfo
rman
ce
Cros
s va
lidati
on p
erfo
rman
ce
S. frugiperda (Fall armyworm) 5
H. armigera (Cotton bollworm) 6
S. gregorian (Desert locust ) 4
A. florea (Little honeybee) 0
M. rotundata (Alfalfa leafcutter bee)1
C. floridanus (Florida carpenter ant) 2
A. echinatior (Leafcutter ant) 3
A
C
B
D
D
SW Arthropods
UniProt Arthropods
Random Forest
Gradient Boosting
Linear SVC
Random Forest
Gradient Boosting
Linear SVC
Mean Accuracy 0.94 0.95 0.94 0.92 0.92 0.86
Mean Precision 0.94 0.95 0.93 0.93 0.94 0.95
Mean Recall 0.92 0.92 0.92 0.95 0.95 0.85
Mean AUC 0.94 0.95 0.94 0.89 0.90 0.87SW
Chordata UniProt
Chrodata
Random Forest
Gradient Boosting
Linear SVC
Random Forest
Gradient Boosting
Linear SVC
Mean Accuracy 0.96 0.97 0.95 0.90 0.91 0.85
Mean Precision 0.94 0.94 0.88 0.91 0.92 0.89
Mean Recall 0.91 0.92 0.93 0.91 0.91 0.83
Mean AUC 0.95 0.95 0.94 0.90 0.91 0.85
Organism# sequences UniProtKB
# of full length
UniProtKB# of SP
# of NP & SP
# NeuroPID All methods
Functional annotation enrichment
B. mori 17908 17069 138 6 69Innate immunity;Insulin-like; Chorion, Hormne (NP)
S. invicta 14356 84 12 2 4 Innate immunity
D. melanogaster
39961 31091 475 21 120Innate immunity; Developmental; Channel ligand; Receptor, Hormone (NP)
C.elegans 26005 25534 464 21 89Hormone (NP), Channel ligand; Receptor, Protease
SW Arthropods
UniProt Arthropods
Random Forest
Gradient Boosting
Linear SVC
Random Forest
Gradient Boosting
Linear SVC
Mean Accuracy 0.94 0.95 0.94 0.92 0.92 0.86Mean Precision 0.94 0.95 0.93 0.93 0.94 0.95Mean Recall 0.92 0.92 0.92 0.95 0.95 0.85Mean AUC 0.94 0.95 0.94 0.89 0.90 0.87
SW Chordates
UniProt Chordates
Random Forest
Gradient Boosting
Linear SVC
Random Forest
Gradient Boosting
Linear SVC
Mean Accuracy 0.96 0.97 0.95 0.90 0.91 0.85Mean Precision 0.94 0.94 0.88 0.91 0.92 0.89Mean Recall 0.91 0.92 0.93 0.91 0.91 0.83Mean AUC 0.95 0.95 0.94 0.90 0.91 0.85
organism / taxa# of UniProt (UniRef90)
# of NPP in SW
(UniRef90)
# of NPP in UniProt
(UniRef90)
PredictionNeuroPID
RBFa
Apis Melliferra 10394 6 19 7
SP in Apis Melliferab 2139 5 7
Gallus gallus 20760 5 5 5
SP in Gallus gallus 701 1 1
Bombyx mori 15250 5 17 9
SP in Bombyx mori 112 5 5 9
Octopoda 224 4 4 4
SP in Octopoda 76 3 3
Updates 5 7 2013
RF 7
ExtraTree 8
SVM-SVC 16GBR 79
4
11
2 2
1
RFExt-Tree
79
SVM-SVC
GBR
16
8786%
42%
100%100%
60%
75% RFExt-Tree
18
SVM-SVC
GBR
10
95100%
60%
100%100%
33%
80%
Apis mellifera Thaumeledone gunteri
Updates 5 7 2013
S. frugiperda (Fall armyworm) 5
H. armigera (Cotton bollworm) 6
S. gregorian (Desert locust ) 4
A. florea (Little honeybee) 0
M. rotundata (Alfalfa leafcutter bee)1
C. floridanus (Florida carpenter ant) 2
A. echinatior (Leafcutter ant) 3
A
C
B
0
0.3
0.6
0.9
1.2
1.5
% o
f ann
otat
ed N
Ps in
taxo
nom
y
113
77
2510
Mammalia
Insecta
Caenorhabditis
others
A B
66- 97 KRL....YDFG.........LG..............KRA..YsyvSEYKRL.............................pvYN..FGLGKR 98- 120 SKM....YGFG.........LG..............KR.......DG..RM...............................YS..FGLGKR 121- 164 DYD....Y.YGeededdqqaIGdedieesdvgdlmdKR..........DRL...............................YS..FGLGKR 165- 191 ARP....YSFG.........LG..............KRA..P...SGAQRL...............................YG..FGLGKR 192- 216 GGS...lYSFG.........LG..............KR........GDGRL...............................YA..FGLGKRPVN 222- 253 GRSsgsrFNFG.........LG..............KRS..D...DIDFRE...............................LEekFAEDKR 254- 316 .YPqehrFSFG.........LG..............KREveP...SELEAVrneekdnssvhdkknntndmhsgerikrslhYP..FGIRKL 347- 367 RRP....FNFG.........LG..............KRI..P........M...............................YD..FGIGKR
66- 97 KRL....YDFG.........LG..............KRA..YsyvSEYKRL........pvYN..FGLGKR98-120 SKM....YGFG.........LG..............KR.......DG..RM..........YS..FGLGKR121-164 DYD....Y.YGeededdqqaIGdedieesdvgdlmdKR..........DRL..........YS..FGLGKR165-191 ARP....YSFG.........LG..............KRA..P...SGAQRL..........YG..FGLGKR192-220 GGS...lYSFG.........LG..............KR........GDGRL..........YA..FGLGKRPVNS221-253 GRSsgsrFNFG.........LG..............KRS..D...DIDFRE..........LEekFAEDKR254-316 YPqehrFSFG.........LG..............KREveP...SELEAVrne(25)slhYP..FGIRKL346-367 RRP....FNFG.........LG..............KRI..P........M..........YD..FGIGKR