data mining on the farm accelerating the search for a better pesticide john b. kinney, senior...
TRANSCRIPT
Data Mining on the Farm
Accelerating the search for a better pesticide
John B. Kinney, Senior Research Associate
DuPont Biosolutions Enterprise
Spotfire User’s Conference, May 3-4 2001© 2001, DuPont, Inc. - All Rights Reserved.
DuPont Biosolutions Enterprise
Crop Protection Products* Control weeds, insects, and plant diseases
Pioneer Hi-Bred High performance seeds
Protein Technologies Soy protein isolates used in the food industry
Qualicon Food safety
* Focus of today’s talk
CPP Goals
Control pests Efficaciously Safely Environmentally Cost effectively
CPP Research & Data Styles
in vitroin vivoField
in vivo CPP Research
“Treating the bed to cure the patient”
Plants in potsLength of test is a factor “Extra” data
Herbicide Test Unit
CRL
BYG
FTI
MOG
PWX
VEL
BGC
Test SubstanceControl
Field Tests
Same as in vivo, but with less control!
“Extra” dataDegradation and movement in the
environment are major issues
Data Issues
Biological variability(Highly) Multivariate dataEC50 results are uncommonHistorical data is valuable
Successful Applications of Data Visualization
Sourcing: Preformatted data sets for sample acquisition analysis
Hit Followup: R-group visualization and analysis
Lead Optimization: Color-coded reports for rapid, high-dimensional comparisons
Browsing Acquisition Analysis Data
Challenge: Characterize and evaluate offerings from compound brokers and collaborators
Solution: External system to characterize offerings and build tables for browsing in Spotfire
Minimal interface...
User selection from existing “evaluation tables”
Spotfire for browsing
Parallel Synthesis Hit Followup
Visualization and analysis of combinatorial library
Row and Column layout useful, but not chemically relevant!
Merging synthetic schemes combined with biology
Hansch-style characterization often helpful for identifying trends and features
Fragment properties and whole molecule data can provide insights
NR1
R2
R1 == methyl, ethyl, propyl, etc
R2 == -F, -Cl, -Br, -I
Plate layout vs. Fragment Data
Lead Optimization
Numerous test and characterization values for each compound
History of complex, printed data reportsPRIMARY PLANT RESPONSE (WEEDS)
INCODE = CPD1 DEPT = 8 DATE = 891127 ############################################### SUBMITTER = # # N.B = 056898 N.B.PAGE = 021 #INCODE= CPD1 # AMT =.21G % = 100 FORM = # # LEAD AREA = # # #/MOLNM # #/Info= CHEMICAL NAME AVAILABLE UPON REQUEST # # # # # ###############################################
YY/MM/DD TYPE RATE UNITS MORN COCKL VELV PIG CRAB GIANT FOXTL B Y CHEAT DOWNY WILD SOR COMMENT TEST GLORY BUR LEAF WEED GRASS FOXTL MILLT GRASS GRASS BROME OATS GUM-------- ---- ------- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- -----------90/01/02 POST 1.0 KG/HA 90H 70H 70H 10C 10C 0 40G 0 30C HERBICIDE90/01/02 PRE 2.0 KG/HA 0 10H 30H 0 20H 0 0 0 30C
HeLo
Project Overview w/Heat Maps
Future Challenges
Better data extraction/formatting techniques
Expanding data warehouse to include non-traditional data sources
Computer screen real estate!
Acknowledgements
At the risk of missing someone...Kevin Kranis (retired)Laurie ChristiansonDan Kleier
The entire Discovery Organization -- They generated the data!