fo|k fofu;ksxkf}dkl% a chemoinformatics tool for...
TRANSCRIPT
fo|k fofu;ksxkf}dkl%
International Journal of Management, MIT College of Management Vol. 1, No.1, July 2013ISSN 2321 - 6700 © MIT Publication.
31
A CHEMOINFORMATICS TOOL FOR LIPINSKI RULE
Saurabh Shelar
Veena Rajan
Shine Devaraj
ABSTRACT
A common basic platform is required where the medicinal
chemist can upload his own chemical entity and judge whether
the molecule can have a druggable property or not. Eventhough
the basic online lipinski's parameter calculators are available;
there is a need for a standalone tool for calculating the
properties of a number of lead compounds in a systematic way.
Thus we started working on the standalone application which
deals with the number of chemical compound file formats and
to check if they satisfy Lipinski's rule, which is one of the
important aspects in virtual screening.
Keywords: Drug, Lead, Lipinski's Rule, Virtual screening, IT
application, Python
BACKGROUND
The development of a new drug is a very important part of
today's economy. It requires testing the efficacy & the safety of
the new drugs through clinical trials. Drug discovery is a step
that is done before the drug development. The development of
a drug is a very time consuming process, the average time being
twelve to fifteen years. The cost that is involved in the
development of a new drug is another factor that limits the
number of new drugs that come into the economical market.
The success of a drug in the economy will recover its cost of
development. The reality of these economics is that new drugs
that may benefit only a few are unlikely to make it to clinical
trials. Drugs that may benefit millions of people in developing
countries too poor to pay for the new drug will also have a low
priority for development. Modern drug discovery involves the
identification of screening hits, medicinal chemistry and
optimization of those hits to increase the affinity, selectivity,
efficacy/potency, metabolic stability and oral bioavailability.
Once a compound that fulfills all of these requirements has
been identified, it will begin the process of drug development
prior to clinical trials.
Selection of potential lead candidate from a large pool of
chemical compounds is a very tough task in the discovery of a
new drug. Selection and confirming the activity of a new
molecule itself is a challenging task. However there are various
methods available for screening the large number of
compounds in a highthroughput manner to search the active
lead.
Eventhough the basic online lipinski's parameter calculators are
available; there is a need for a standalone tool for calculating
the properties of a number of lead compounds in a systematic
way. There are some applications which deals with the Lipinski's
Rule but those don't work with all chemical file formats.. So
before we select a drug for oral use which includes not only
tablets, capsules, emulsions and suspensions but novel systems
including Liposomes & nanoparticles, we need to ensure that
our drug or lead compound satisfies the Lipinski's rules for
initiating further virtual screening procedures. The tool gives a
preliminary selection of chemical compounds based on
molecular properties that may affect the drug's
pharmacokinetics in the human body which will be revealed
and concluded at the time of preclinical and clinical trials.
Scripting languages such as Python are much suited for
common programming tasks in cheminformatics such as data
analysis and parsing information from files. Of the current
popular scripting languages, Python is one of the standard
language for scripting in cheminformatics. Several commercial
cheminformatics toolkits have interfaces in Python. We
describe a Python module tool named Molecular Test for
Lipinski Rule, that provides helps us with preliminary selection
of chemical compounds based on molecular properties that
may affect the drug's pharmacokinetics in the human body
which will be revealed and concluded at the time of preclinical
and clinical trials. This application helps us to check whether a
chemical satisfies Lipinski's rules or no.
Lipinski's rule is one of the very important aspects in the field of
drug designing.The medicinal chemist Christopher Lipinski and
his colleagues analysed the physico-chemical properties of
around 2,000 drugs in clinical trials and found that a compound
is more likely to be membrane permeable and easily absorbed
by the body if it matches the following rules.
• Not more than 5 hydrogen bond donors (nitrogen or
oxygen atoms with one or more hydrogen atoms).
• Not more than 10 hydrogen bond acceptors (nitrogen or
oxygen atoms)
• A molecular mass less than 500 daltons.
• An octanol-water partition coefficient log P not greater
than 5.
There are some applications which deals with the Lipinski's Rule
but those don't work with all chemical file formats. Thus we
started working on the standalone application which deals with
fo|k fofu;ksxkf}dkl%
International Journal of Management, MIT College of Management Vol. 1, No.1, July 2013ISSN 2321 - 6700 © MIT Publication.
32
A CHEMOINFORMATICS TOOL FOR LIPINSKI RULE
the number of chemical compound file formats and to check if
they satisfy Lipinski's rule, which is one of the important aspects
in virtual screening.The formats that we adopted for this
application are namely MOL, MOL2, SDF, XYZ, ALC, CDXML file
formats.
A Molfile is a file format created by MDL for holding information
about the atoms, bonds, connectivity and coordinates of a
molecule. The molfile consists of some header information, the
connection table containing atom info, then bond connections
and types, followed by sections for more complex information.
It has a file extension '.mol' . The MOL2 file format has the
advantage of storing all the necessary information for atom
features, position, and connectivity. It is also a standardized
format that other modeling programs can read. The Protein
Data Bank (pdb) file format is a textual file format describing the
three dimensional structures of molecules held in the Protein
Data Bank. The pdb format accordingly provides for description
and annotation of protein and nucleic acid structures including
atomic coordinates, secondary structure assignments, as well
as atomic connectivity. A typical XYZ format specifies the
molecule geometry by giving the number of atoms with
Cartesian coordinates that will be read on the first line, a
comment on the second, and the lines of atomic coordinates in
the following lines. A CDXML is a CDX file specially formatted so
that it conforms to the XML specification. SDF is one of a family
of chemical-data file formats developed by MDL; it is intended
especially for structural information. "SDF" stands for
structure-data file, and SDF files actually wrap the molfile (MDL
Molfile) format.
IMPLEMENTATION
This application is implemented using scripting language
Python 2.7 [1], GUI implementation using PyQt[2] and Pybel[3]
module which provides access to Open Babel toolkit. The
standalone application for the python program is converted
using PyInstaller[4].
The necessary of this application considered by the fact that
this application will have as many chemical file formats as we
will incorporate in future so that Molecular test for Lipinski's
Rule can be done under one roof to find if particular molecule
satisfies Lipinski's rule or not which is one of the important step
in drug discovery.
APPLICATION DETAILS
Based on the Lipinski's rule, the application is developed and
care is taken to develop the application more user friendly and
any person even with less computer knowledge can use this
with ease.
User has to select the file format of which he needs to get the
status on Lipinski's Rule. Currently there are 7 chemical file
formats in the application of which test can be done. In future,
we will add more file formats to test the rule. Also, there may be
possibility of bugs in the application as application is in the beta
phase.
fo|k fofu;ksxkf}dkl%
International Journal of Management, MIT College of Management Vol. 1, No.1, July 2013ISSN 2321 - 6700 © MIT Publication.
A CHEMOINFORMATICS TOOL FOR LIPINSKI RULE
Figure 1: Application front which shows the Lipinski's Rules and below that parameters to be selected for it.
Figure 2: There are 7 file formats currently added to the application and more will be added in the future. There is option to select a file or select a folder. This option becomes very useful when user need to test number of files in one go. In such cases user can select the folder so that all the specified file formats will be searched under the folder.
Figure 3: There is option to select a file or a folder. Folder option is useful when user need to test multiple files. File browser will
display only particular file format which is specified by the user.
33
fo|k fofu;ksxkf}dkl%
International Journal of Management, MIT College of Management Vol. 1, No.1, July 2013ISSN 2321 - 6700 © MIT Publication.
A CHEMOINFORMATICS TOOL FOR LIPINSKI RULE
Figure 5: User can select multiple files residing in the particular folder by selecting the folder option. There are possibilities that
file is not in proper format. In such cases, the application will pop-up the message that file is not in proper format.
Figure 6 : If the file is not in the proper format, application will show message that file is not in proper format. Log of parsed files
and molecules created in “My Documents” where there will be log of every file parsed. File will contain the number of molecule
parsed and their status of the Lipinski's Rule. In future, this log creation will be turned into next phase where searching of the
molecule status will be possible from application. Also, Lipinski's Rule satisfied molecules will be listed in separate list.
Figure 4: File browser will display only those files which file format is selected by the user. Folder browser will display only folders for selection.
34
fo|k fofu;ksxkf}dkl%
International Journal of Management, MIT College of Management Vol. 1, No.1, July 2013ISSN 2321 - 6700 © MIT Publication.
A CHEMOINFORMATICS TOOL FOR LIPINSKI RULE
Figure 7 : Log file creation which will contain the molecule details as well as the status for the Lipinski's rule.
Lipinski's Rule checked using creation of a function in python. Properties of every molecule is taken out using Pybel such as
molecular weight, LogP, Donors and Acceptors, Refractivity index and checked based on the Lipinski's Rule.
FUTURE EXTENSIONSNumbers of extensions are expected in this application in future such as:
Additions of more file formats.Creation of search algorithm for molecules.Creation of better logging mechanism.Addition of graphs.
••••
REFERENCES
1. CDXML: http://www.cambridgesoft.com/services/documentation/sdk/chemdraw/cdx/IntroCDXML.htm
2. John J. Irwin and Brian K. Shoichet : ZINC − A Free Database of Commercially Available Compounds for Virtual Screening.
[journal of chemical information and modeling http://pubs.acs.org/doi/abs/10.1021/ci049714+: ]
3. Noel M O'Boyle , Chris Morley and Geoffrey R Hutchison :Pybel: a Python wrapper for the OpenBabel cheminformatics
toolkit [http://journal.chemistrycentral.com/content/2/1/5]
4. Python 2.7 : http://www.python.org/download/releases/2.7/
5. PyQt : http://www.riverbankcomputing.com/software/pyqt/intro
6. Pybel : http://openbabel.org/docs/2.3.1/UseTheLibrary/Python_Pybel.html
7. PyInstaller : http://www.pyinstaller.org/
8. Text Refered from [http://www.learner.org]
9. The Lipinski rule [http://www.nature.com]
10. Zsolt Zsoldos Irina Szabo , Zsolt Szabo and A Peter Johnson : Software tools for structure based rational drug design
[http://www.sciencedirect.com/science/article/pii/S0166128003007267]
35