forager poster

1
Forager: A Multi-Objective Reverse QSAR Search Agent ABSTRACT TITLE: Forager: Multi-Objective Reverse QSAR Optimisation BACKGROUND: Given structure-property data sets, our new software system, the Discovery Bus, automatically generates multiple QSAR models for each property and updates these as new data or methods become available. This creates a shifting landscape of QSAR models for multiple properties which can be used to guide the selection of novel chemical structures that satisfy the Research Target Profile (RTP) definition of a new drug. Forager has been developed to search for non- dominated solutions to an RTP within a complex descriptor space, where the search heuristics are provided by multiple QSAR models. OBJECTIVE: Rapid and complete identification of non-dominated solutions in Chemical Descriptor Space for multiple properties estimated by QSAR models. METHODS: Forager uses a modified Particle Swarm Optimisation (PSO) algorithm to search descriptor space for non-dominated solutions. The descriptors space is the union of descriptors in QSAR models used to estimate the properties of interest. The PSO is modified by allowing “herding” of particles into sub-groups that search together. A second modification is the variation of particle speed depending on recent success in identifying non-dominated solutions. RESULTS and CONCLUSIONS: Forager has rapidly identified non-dominated solutions in descriptor space for 2 QSAR Linear Models. Fully automated operation and updating using the “Discovery Bus” These solutions can be used as fitness criteria in the evolution of novel chemical structures by “Colonist”. (See companion poster) The unusual physical space discovered by Forager in this first proof of concept study may prove to be unreachable by realistic chemical structures Robert J. Leahy 1 ; David E. Leahy 1 ; Damjan Krstajic 2 , Vladimir Sykora 1 1 Molecular Informatics Group, Newcastle University 2 Research Centre for Cheminformatics, Belgrade, Serbia OBJECTIVE Forager takes a Research Target Profile (RTP) which defines the success criteria for an optimisation project and may include statements such as Maximise (e.g. potency, solubility), Minimise (e.g. Herg, other potencies), and Target Value (e.g. Log D = 0, PPB = 90%). It requires that QSAR models are already available for each property in the RTP. The objective of Forager is to search through the descriptor space (defined as the union set of all descriptors used by the QSAR models, in order to find non- dominated solutions for the RTP. These solutions are then used by Colonist to find the nearest chemical structures in our database and evolve novel structures that match the solution set. METHODS Forager Methodology Forager is a modified particle swarm algorithm that searches chemical descriptor space to identify non-dominated solutions for the desired attributes. Particle Swarm Optimisation A PSO is a form of swarm intelligence. When one particle detects a desirable path the rest of the swarm will be able to follow quickly even if they are on the opposite side of the swarm. Particles are influenced by the rest of the swarm but also explore independently. Reverse QSAR Space Particles have a position and velocity in multi-dimensional descriptors space created from union of descriptors used by QSAR models. Movement is influenced by memory of their own best position and knowledge of the swarm's best. Particles communicate good positions to each other and adjust their own position and velocity based on these good positions defined in two ways: global best updated when a new non-dominated position is found by any particle in the swarm neighbourhood best where each particle only communicates with a sub-set of the swarm about non-dominated solutions Herding Since there is not one best global result, other techniques allow for global movement including: separation: steer to avoid crowding alignment: steer towards the average heading of local particles cohesion: steer to move toward the average position of local particles These rules create 3 vectors which are then weighted and added to the vector for moving towards a local best, producing the finished movement vector. The relative weightings of these vectors is determined at the start of the simulation Varying Speed As particles move around the search space it is possible for them to vary their speed within lower and upper bounds. All particles are created with random speeds between the upper and lower bounds. If a particle doesn't find a new non- dominated solution, it’s speed will increase to cover more area. Alternatively, the particle slows to investigate the area in more detail. The change in speed depends on whether the particle found a personal best value or a global best. By varying the speed the particle can move quickly to cover large areas of search space, while exploring space more thoroughly once the right area has been found. Optimisation of Modified PSO Because Forager contains a large number of arbitrary variables we have also written a program to optimize those variables using a conventional genetic algorithm. Work is continuing in this area. CONCLUSIONS An automated search method for Pareto solutions in descriptor space to multi- QSAR derived property optimisation has been demonstrated These solutions can be used as fitness criteria in the evolution of novel chemical structures by “Colonist”. (See companion poster) The unusual physical space discovered by Forager may prove to be unreachable by realistic chemical structures Implementation as a competitive workflow on the Discovery Bus gives fully automated updating and operation For additional information please contact: Professor David E. Leahy Molecular Informatics Group Newcastle University [email protected] Forager as a Competitive Workflow Forager is implemented as a workflow on our automation system, the “Discovery Bus”. This has the advantages that new RTP files and new QSAR models used by those files automatically trigger a re-run of the Forager search process and give new sets of solutions. The top-level workflow, shown below, takes as input an RTP file, a set of program variables and a search space, defined by descriptor variation within a database of drug-like molecules Data Structure-Property database QSAR Auto-QSAR Forager Multi-objective reverse QSAR solutions Colonist Evolution of novel structures BACKGROUND Forager is a component of an automated process for deriving QSAR models from data and using these models as the basis for reverse-engineering novel chemical structures that meet multiple objectives. The system uses our “Discovery Bus” software to integrate and automate these processes. The Discovery Bus Forager Top Level Workflow Forager Lower Level Workflow 0 5 10 15 20 25 30 35 40 45 50 -5 0 5 10 15 20 25 30 35 Solubility HIV Inhibition Pareto Optima Discovery 5 Steps 20 Steps 50 Steps RESULTS Pareto Optima for HIV K i and Solubility Maximisation Forager rapidly identifies non-dominated solutions in descriptor using 2 QSAR Linear Models for HIV Protease Inhibition and Solubility. Even though constrained to stay within descriptor ranges for drug-like compounds, optimisation goes beyond normal property ranges Fully automated operation and updating using the “Discovery Bus” Easily extended to optimise more than 2 properties Easily extended to more complex QSAR models

Upload: david-leahy

Post on 27-Jun-2015

308 views

Category:

Health & Medicine


0 download

DESCRIPTION

Forager: particle swarm optimisation used to identify solutions to multiple objectives in chemical descriptor space

TRANSCRIPT

Page 1: Forager Poster

Forager: A Multi-Objective Reverse QSAR Search Agent

ABSTRACT

TITLE:

Forager: Multi-Objective Reverse QSAR Optimisation

BACKGROUND:

Given structure-property data sets, our new software system, the Discovery

Bus, automatically generates multiple QSAR models for each property and

updates these as new data or methods become available. This creates a shifting

landscape of QSAR models for multiple properties which can be used to guide the

selection of novel chemical structures that satisfy the Research Target Profile

(RTP) definition of a new drug. Forager has been developed to search for non-

dominated solutions to an RTP within a complex descriptor space, where the

search heuristics are provided by multiple QSAR models.

OBJECTIVE:

Rapid and complete identification of non-dominated solutions in Chemical

Descriptor Space for multiple properties estimated by QSAR models.

METHODS:

Forager uses a modified Particle Swarm Optimisation (PSO) algorithm to search

descriptor space for non-dominated solutions. The descriptors space is the union

of descriptors in QSAR models used to estimate the properties of interest.

The PSO is modified by allowing “herding” of particles into sub-groups that

search together. A second modification is the variation of particle speed depending

on recent success in identifying non-dominated solutions.

RESULTS and CONCLUSIONS:

Forager has rapidly identified non-dominated solutions in descriptor space

for 2 QSAR Linear Models.

Fully automated operation and updating using the “Discovery Bus”

These solutions can be used as fitness criteria in the evolution of novel

chemical structures by “Colonist”. (See companion poster)

The unusual physical space discovered by Forager in this first proof of

concept study may prove to be unreachable by realistic chemical structures

Robert J. Leahy1; David E. Leahy1; Damjan Krstajic2, Vladimir Sykora1

1 Molecular Informatics Group, Newcastle University 2 Research Centre for Cheminformatics, Belgrade, Serbia

OBJECTIVE

Forager takes a Research Target Profile (RTP) which defines the success criteria

for an optimisation project and may include statements such as Maximise (e.g.

potency, solubility), Minimise (e.g. Herg, other potencies), and Target Value (e.g.

Log D = 0, PPB = 90%). It requires that QSAR models are already available for

each property in the RTP.

The objective of Forager is to search through the descriptor space (defined as the

union set of all descriptors used by the QSAR models, in order to find non-

dominated solutions for the RTP.

These solutions are then used by Colonist to find the nearest chemical structures in

our database and evolve novel structures that match the solution set.

METHODS

Forager Methodology

Forager is a modified particle swarm algorithm that searches chemical descriptor

space to identify non-dominated solutions for the desired attributes.

Particle Swarm Optimisation

A PSO is a form of swarm intelligence. When one particle detects a desirable path

the rest of the swarm will be able to follow quickly even if they are on the

opposite side of the swarm. Particles are influenced by the rest of the swarm but

also explore independently.

Reverse QSAR Space

Particles have a position and velocity in multi-dimensional descriptors space

created from union of descriptors used by QSAR models. Movement is influenced

by memory of their own best position and knowledge of the swarm's best.

Particles communicate good positions to each other and adjust their own position

and velocity based on these good positions defined in two ways:

global best updated when a new non-dominated position is found by any

particle in the swarm

neighbourhood best where each particle only communicates with a sub-set

of the swarm about non-dominated solutions

Herding

Since there is not one best global result, other techniques allow for global

movement including:

separation: steer to avoid crowding

alignment: steer towards the average heading of local particles

cohesion: steer to move toward the average position of local particles

These rules create 3 vectors which are then weighted and added to the vector for

moving towards a local best, producing the finished movement vector. The relative

weightings of these vectors is determined at the start of the simulation

Varying Speed

As particles move around the search space it is possible for them to vary their

speed within lower and upper bounds. All particles are created with random

speeds between the upper and lower bounds. If a particle doesn't find a new non-

dominated solution, it’s speed will increase to cover more area. Alternatively, the

particle slows to investigate the area in more detail. The change in speed depends

on whether the particle found a personal best value or a global best. By varying

the speed the particle can move quickly to cover large areas of search space, while

exploring space more thoroughly once the right area has been found.

Optimisation of Modified PSO

Because Forager contains a large number of arbitrary variables we have also

written a program to optimize those variables using a conventional genetic

algorithm. Work is continuing in this area.

CONCLUSIONS

An automated search method for Pareto solutions in descriptor space to multi-

QSAR derived property optimisation has been demonstrated

These solutions can be used as fitness criteria in the evolution of novel

chemical structures by “Colonist”. (See companion poster)

The unusual physical space discovered by Forager may prove to be

unreachable by realistic chemical structures

Implementation as a competitive workflow on the Discovery Bus gives fully

automated updating and operation

For additional information please contact:

Professor David E. Leahy

Molecular Informatics Group

Newcastle University

[email protected]

Forager as a Competitive WorkflowForager is implemented as a workflow on our automation system, the

“Discovery Bus”. This has the advantages that new RTP files and new

QSAR models used by those files automatically trigger a re-run of the

Forager search process and give new sets of solutions. The top-level

workflow, shown below, takes as input an RTP file, a set of program

variables and a search space, defined by descriptor variation within a

database of drug-like molecules

Data• Structure-Property database

QSAR• Auto-QSAR

Forager• Multi-objective reverse QSAR solutions

Colonist• Evolution of novel structures

BACKGROUNDForager is a component of an automated process for deriving QSAR models from data

and using these models as the basis for reverse-engineering novel chemical structures

that meet multiple objectives. The system uses our “Discovery Bus” software to

integrate and automate these processes.

The Discovery Bus

Forager Top Level Workflow

Forager Lower Level Workflow

0

5

10

15

20

25

30

35

40

45

50

-5 0 5 10 15 20 25 30 35

So

lub

ilit

y

HIV Inhibition

Pareto Optima Discovery

5 Steps 20 Steps 50 Steps

RESULTS

Pareto Optima for HIV Ki and Solubility Maximisation

Forager rapidly identifies non-dominated solutions in descriptor using 2

QSAR Linear Models for HIV Protease Inhibition and Solubility.

Even though constrained to stay within descriptor ranges for drug-like

compounds, optimisation goes beyond normal property ranges

Fully automated operation and updating using the “Discovery Bus”

Easily extended to optimise more than 2 properties

Easily extended to more complex QSAR models