forager poster
DESCRIPTION
Forager: particle swarm optimisation used to identify solutions to multiple objectives in chemical descriptor spaceTRANSCRIPT
![Page 1: Forager Poster](https://reader036.vdocuments.net/reader036/viewer/2022083003/558dfd501a28abb50d8b45ff/html5/thumbnails/1.jpg)
Forager: A Multi-Objective Reverse QSAR Search Agent
ABSTRACT
TITLE:
Forager: Multi-Objective Reverse QSAR Optimisation
BACKGROUND:
Given structure-property data sets, our new software system, the Discovery
Bus, automatically generates multiple QSAR models for each property and
updates these as new data or methods become available. This creates a shifting
landscape of QSAR models for multiple properties which can be used to guide the
selection of novel chemical structures that satisfy the Research Target Profile
(RTP) definition of a new drug. Forager has been developed to search for non-
dominated solutions to an RTP within a complex descriptor space, where the
search heuristics are provided by multiple QSAR models.
OBJECTIVE:
Rapid and complete identification of non-dominated solutions in Chemical
Descriptor Space for multiple properties estimated by QSAR models.
METHODS:
Forager uses a modified Particle Swarm Optimisation (PSO) algorithm to search
descriptor space for non-dominated solutions. The descriptors space is the union
of descriptors in QSAR models used to estimate the properties of interest.
The PSO is modified by allowing “herding” of particles into sub-groups that
search together. A second modification is the variation of particle speed depending
on recent success in identifying non-dominated solutions.
RESULTS and CONCLUSIONS:
Forager has rapidly identified non-dominated solutions in descriptor space
for 2 QSAR Linear Models.
Fully automated operation and updating using the “Discovery Bus”
These solutions can be used as fitness criteria in the evolution of novel
chemical structures by “Colonist”. (See companion poster)
The unusual physical space discovered by Forager in this first proof of
concept study may prove to be unreachable by realistic chemical structures
Robert J. Leahy1; David E. Leahy1; Damjan Krstajic2, Vladimir Sykora1
1 Molecular Informatics Group, Newcastle University 2 Research Centre for Cheminformatics, Belgrade, Serbia
OBJECTIVE
Forager takes a Research Target Profile (RTP) which defines the success criteria
for an optimisation project and may include statements such as Maximise (e.g.
potency, solubility), Minimise (e.g. Herg, other potencies), and Target Value (e.g.
Log D = 0, PPB = 90%). It requires that QSAR models are already available for
each property in the RTP.
The objective of Forager is to search through the descriptor space (defined as the
union set of all descriptors used by the QSAR models, in order to find non-
dominated solutions for the RTP.
These solutions are then used by Colonist to find the nearest chemical structures in
our database and evolve novel structures that match the solution set.
METHODS
Forager Methodology
Forager is a modified particle swarm algorithm that searches chemical descriptor
space to identify non-dominated solutions for the desired attributes.
Particle Swarm Optimisation
A PSO is a form of swarm intelligence. When one particle detects a desirable path
the rest of the swarm will be able to follow quickly even if they are on the
opposite side of the swarm. Particles are influenced by the rest of the swarm but
also explore independently.
Reverse QSAR Space
Particles have a position and velocity in multi-dimensional descriptors space
created from union of descriptors used by QSAR models. Movement is influenced
by memory of their own best position and knowledge of the swarm's best.
Particles communicate good positions to each other and adjust their own position
and velocity based on these good positions defined in two ways:
global best updated when a new non-dominated position is found by any
particle in the swarm
neighbourhood best where each particle only communicates with a sub-set
of the swarm about non-dominated solutions
Herding
Since there is not one best global result, other techniques allow for global
movement including:
separation: steer to avoid crowding
alignment: steer towards the average heading of local particles
cohesion: steer to move toward the average position of local particles
These rules create 3 vectors which are then weighted and added to the vector for
moving towards a local best, producing the finished movement vector. The relative
weightings of these vectors is determined at the start of the simulation
Varying Speed
As particles move around the search space it is possible for them to vary their
speed within lower and upper bounds. All particles are created with random
speeds between the upper and lower bounds. If a particle doesn't find a new non-
dominated solution, it’s speed will increase to cover more area. Alternatively, the
particle slows to investigate the area in more detail. The change in speed depends
on whether the particle found a personal best value or a global best. By varying
the speed the particle can move quickly to cover large areas of search space, while
exploring space more thoroughly once the right area has been found.
Optimisation of Modified PSO
Because Forager contains a large number of arbitrary variables we have also
written a program to optimize those variables using a conventional genetic
algorithm. Work is continuing in this area.
CONCLUSIONS
An automated search method for Pareto solutions in descriptor space to multi-
QSAR derived property optimisation has been demonstrated
These solutions can be used as fitness criteria in the evolution of novel
chemical structures by “Colonist”. (See companion poster)
The unusual physical space discovered by Forager may prove to be
unreachable by realistic chemical structures
Implementation as a competitive workflow on the Discovery Bus gives fully
automated updating and operation
For additional information please contact:
Professor David E. Leahy
Molecular Informatics Group
Newcastle University
Forager as a Competitive WorkflowForager is implemented as a workflow on our automation system, the
“Discovery Bus”. This has the advantages that new RTP files and new
QSAR models used by those files automatically trigger a re-run of the
Forager search process and give new sets of solutions. The top-level
workflow, shown below, takes as input an RTP file, a set of program
variables and a search space, defined by descriptor variation within a
database of drug-like molecules
Data• Structure-Property database
QSAR• Auto-QSAR
Forager• Multi-objective reverse QSAR solutions
Colonist• Evolution of novel structures
BACKGROUNDForager is a component of an automated process for deriving QSAR models from data
and using these models as the basis for reverse-engineering novel chemical structures
that meet multiple objectives. The system uses our “Discovery Bus” software to
integrate and automate these processes.
The Discovery Bus
Forager Top Level Workflow
Forager Lower Level Workflow
0
5
10
15
20
25
30
35
40
45
50
-5 0 5 10 15 20 25 30 35
So
lub
ilit
y
HIV Inhibition
Pareto Optima Discovery
5 Steps 20 Steps 50 Steps
RESULTS
Pareto Optima for HIV Ki and Solubility Maximisation
Forager rapidly identifies non-dominated solutions in descriptor using 2
QSAR Linear Models for HIV Protease Inhibition and Solubility.
Even though constrained to stay within descriptor ranges for drug-like
compounds, optimisation goes beyond normal property ranges
Fully automated operation and updating using the “Discovery Bus”
Easily extended to optimise more than 2 properties
Easily extended to more complex QSAR models