fo|k fofu;ksxkf}dkl% a chemoinformatics tool for...

5
fo|k fofu;ksxkf}dkl% International Journal of Management, MIT College of Management Vol. 1, No.1, July 2013 ISSN 2321 - 6700 © MIT Publication. 31 A CHEMOINFORMATICS TOOL FOR LIPINSKI RULE Saurabh Shelar Veena Rajan Shine Devaraj ABSTRACT A common basic platform is required where the medicinal chemist can upload his own chemical entity and judge whether the molecule can have a druggable property or not. Eventhough the basic online lipinski's parameter calculators are available; there is a need for a standalone tool for calculating the properties of a number of lead compounds in a systematic way. Thus we started working on the standalone application which deals with the number of chemical compound file formats and to check if they satisfy Lipinski's rule, which is one of the important aspects in virtual screening. Keywords: Drug, Lead, Lipinski's Rule, Virtual screening, IT application, Python BACKGROUND The development of a new drug is a very important part of today's economy. It requires testing the efficacy & the safety of the new drugs through clinical trials. Drug discovery is a step that is done before the drug development. The development of a drug is a very time consuming process, the average time being twelve to fifteen years. The cost that is involved in the development of a new drug is another factor that limits the number of new drugs that come into the economical market. The success of a drug in the economy will recover its cost of development. The reality of these economics is that new drugs that may benefit only a few are unlikely to make it to clinical trials. Drugs that may benefit millions of people in developing countries too poor to pay for the new drug will also have a low priority for development. Modern drug discovery involves the identification of screening hits, medicinal chemistry and optimization of those hits to increase the affinity, selectivity, efficacy/potency, metabolic stability and oral bioavailability. Once a compound that fulfills all of these requirements has been identified, it will begin the process of drug development prior to clinical trials. Selection of potential lead candidate from a large pool of chemical compounds is a very tough task in the discovery of a new drug. Selection and confirming the activity of a new molecule itself is a challenging task. However there are various methods available for screening the large number of compounds in a highthroughput manner to search the active lead. Eventhough the basic online lipinski's parameter calculators are available; there is a need for a standalone tool for calculating the properties of a number of lead compounds in a systematic way. There are some applications which deals with the Lipinski's Rule but those don't work with all chemical file formats.. So before we select a drug for oral use which includes not only tablets, capsules, emulsions and suspensions but novel systems including Liposomes & nanoparticles, we need to ensure that our drug or lead compound satisfies the Lipinski's rules for initiating further virtual screening procedures. The tool gives a preliminary selection of chemical compounds based on molecular properties that may affect the drug's pharmacokinetics in the human body which will be revealed and concluded at the time of preclinical and clinical trials. Scripting languages such as Python are much suited for common programming tasks in cheminformatics such as data analysis and parsing information from files. Of the current popular scripting languages, Python is one of the standard language for scripting in cheminformatics. Several commercial cheminformatics toolkits have interfaces in Python. We describe a Python module tool named Molecular Test for Lipinski Rule, that provides helps us with preliminary selection of chemical compounds based on molecular properties that may affect the drug's pharmacokinetics in the human body which will be revealed and concluded at the time of preclinical and clinical trials. This application helps us to check whether a chemical satisfies Lipinski's rules or no. Lipinski's rule is one of the very important aspects in the field of drug designing.The medicinal chemist Christopher Lipinski and his colleagues analysed the physico-chemical properties of around 2,000 drugs in clinical trials and found that a compound is more likely to be membrane permeable and easily absorbed by the body if it matches the following rules. Not more than 5 hydrogen bond donors (nitrogen or oxygen atoms with one or more hydrogen atoms). Not more than 10 hydrogen bond acceptors (nitrogen or oxygen atoms) A molecular mass less than 500 daltons. An octanol-water partition coefficient log P not greater than 5. There are some applications which deals with the Lipinski's Rule but those don't work with all chemical file formats. Thus we started working on the standalone application which deals with

Upload: haxuyen

Post on 02-Mar-2019

224 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: fo|k fofu;ksxkf}dkl% A CHEMOINFORMATICS TOOL FOR …mitpublications.org/yellow_images/1388206263_logo_paper 4.pdf · fo|k fofu;ksxkf}dkl% International Journal of Management, MIT

fo|k fofu;ksxkf}dkl%

International Journal of Management, MIT College of Management Vol. 1, No.1, July 2013ISSN 2321 - 6700 © MIT Publication.

31

A CHEMOINFORMATICS TOOL FOR LIPINSKI RULE

Saurabh Shelar

Veena Rajan

Shine Devaraj

ABSTRACT

A common basic platform is required where the medicinal

chemist can upload his own chemical entity and judge whether

the molecule can have a druggable property or not. Eventhough

the basic online lipinski's parameter calculators are available;

there is a need for a standalone tool for calculating the

properties of a number of lead compounds in a systematic way.

Thus we started working on the standalone application which

deals with the number of chemical compound file formats and

to check if they satisfy Lipinski's rule, which is one of the

important aspects in virtual screening.

Keywords: Drug, Lead, Lipinski's Rule, Virtual screening, IT

application, Python

BACKGROUND

The development of a new drug is a very important part of

today's economy. It requires testing the efficacy & the safety of

the new drugs through clinical trials. Drug discovery is a step

that is done before the drug development. The development of

a drug is a very time consuming process, the average time being

twelve to fifteen years. The cost that is involved in the

development of a new drug is another factor that limits the

number of new drugs that come into the economical market.

The success of a drug in the economy will recover its cost of

development. The reality of these economics is that new drugs

that may benefit only a few are unlikely to make it to clinical

trials. Drugs that may benefit millions of people in developing

countries too poor to pay for the new drug will also have a low

priority for development. Modern drug discovery involves the

identification of screening hits, medicinal chemistry and

optimization of those hits to increase the affinity, selectivity,

efficacy/potency, metabolic stability and oral bioavailability.

Once a compound that fulfills all of these requirements has

been identified, it will begin the process of drug development

prior to clinical trials.

Selection of potential lead candidate from a large pool of

chemical compounds is a very tough task in the discovery of a

new drug. Selection and confirming the activity of a new

molecule itself is a challenging task. However there are various

methods available for screening the large number of

compounds in a highthroughput manner to search the active

lead.

Eventhough the basic online lipinski's parameter calculators are

available; there is a need for a standalone tool for calculating

the properties of a number of lead compounds in a systematic

way. There are some applications which deals with the Lipinski's

Rule but those don't work with all chemical file formats.. So

before we select a drug for oral use which includes not only

tablets, capsules, emulsions and suspensions but novel systems

including Liposomes & nanoparticles, we need to ensure that

our drug or lead compound satisfies the Lipinski's rules for

initiating further virtual screening procedures. The tool gives a

preliminary selection of chemical compounds based on

molecular properties that may affect the drug's

pharmacokinetics in the human body which will be revealed

and concluded at the time of preclinical and clinical trials.

Scripting languages such as Python are much suited for

common programming tasks in cheminformatics such as data

analysis and parsing information from files. Of the current

popular scripting languages, Python is one of the standard

language for scripting in cheminformatics. Several commercial

cheminformatics toolkits have interfaces in Python. We

describe a Python module tool named Molecular Test for

Lipinski Rule, that provides helps us with preliminary selection

of chemical compounds based on molecular properties that

may affect the drug's pharmacokinetics in the human body

which will be revealed and concluded at the time of preclinical

and clinical trials. This application helps us to check whether a

chemical satisfies Lipinski's rules or no.

Lipinski's rule is one of the very important aspects in the field of

drug designing.The medicinal chemist Christopher Lipinski and

his colleagues analysed the physico-chemical properties of

around 2,000 drugs in clinical trials and found that a compound

is more likely to be membrane permeable and easily absorbed

by the body if it matches the following rules.

• Not more than 5 hydrogen bond donors (nitrogen or

oxygen atoms with one or more hydrogen atoms).

• Not more than 10 hydrogen bond acceptors (nitrogen or

oxygen atoms)

• A molecular mass less than 500 daltons.

• An octanol-water partition coefficient log P not greater

than 5.

There are some applications which deals with the Lipinski's Rule

but those don't work with all chemical file formats. Thus we

started working on the standalone application which deals with

Page 2: fo|k fofu;ksxkf}dkl% A CHEMOINFORMATICS TOOL FOR …mitpublications.org/yellow_images/1388206263_logo_paper 4.pdf · fo|k fofu;ksxkf}dkl% International Journal of Management, MIT

fo|k fofu;ksxkf}dkl%

International Journal of Management, MIT College of Management Vol. 1, No.1, July 2013ISSN 2321 - 6700 © MIT Publication.

32

A CHEMOINFORMATICS TOOL FOR LIPINSKI RULE

the number of chemical compound file formats and to check if

they satisfy Lipinski's rule, which is one of the important aspects

in virtual screening.The formats that we adopted for this

application are namely MOL, MOL2, SDF, XYZ, ALC, CDXML file

formats.

A Molfile is a file format created by MDL for holding information

about the atoms, bonds, connectivity and coordinates of a

molecule. The molfile consists of some header information, the

connection table containing atom info, then bond connections

and types, followed by sections for more complex information.

It has a file extension '.mol' . The MOL2 file format has the

advantage of storing all the necessary information for atom

features, position, and connectivity. It is also a standardized

format that other modeling programs can read. The Protein

Data Bank (pdb) file format is a textual file format describing the

three dimensional structures of molecules held in the Protein

Data Bank. The pdb format accordingly provides for description

and annotation of protein and nucleic acid structures including

atomic coordinates, secondary structure assignments, as well

as atomic connectivity. A typical XYZ format specifies the

molecule geometry by giving the number of atoms with

Cartesian coordinates that will be read on the first line, a

comment on the second, and the lines of atomic coordinates in

the following lines. A CDXML is a CDX file specially formatted so

that it conforms to the XML specification. SDF is one of a family

of chemical-data file formats developed by MDL; it is intended

especially for structural information. "SDF" stands for

structure-data file, and SDF files actually wrap the molfile (MDL

Molfile) format.

IMPLEMENTATION

This application is implemented using scripting language

Python 2.7 [1], GUI implementation using PyQt[2] and Pybel[3]

module which provides access to Open Babel toolkit. The

standalone application for the python program is converted

using PyInstaller[4].

The necessary of this application considered by the fact that

this application will have as many chemical file formats as we

will incorporate in future so that Molecular test for Lipinski's

Rule can be done under one roof to find if particular molecule

satisfies Lipinski's rule or not which is one of the important step

in drug discovery.

APPLICATION DETAILS

Based on the Lipinski's rule, the application is developed and

care is taken to develop the application more user friendly and

any person even with less computer knowledge can use this

with ease.

User has to select the file format of which he needs to get the

status on Lipinski's Rule. Currently there are 7 chemical file

formats in the application of which test can be done. In future,

we will add more file formats to test the rule. Also, there may be

possibility of bugs in the application as application is in the beta

phase.

Page 3: fo|k fofu;ksxkf}dkl% A CHEMOINFORMATICS TOOL FOR …mitpublications.org/yellow_images/1388206263_logo_paper 4.pdf · fo|k fofu;ksxkf}dkl% International Journal of Management, MIT

fo|k fofu;ksxkf}dkl%

International Journal of Management, MIT College of Management Vol. 1, No.1, July 2013ISSN 2321 - 6700 © MIT Publication.

A CHEMOINFORMATICS TOOL FOR LIPINSKI RULE

Figure 1: Application front which shows the Lipinski's Rules and below that parameters to be selected for it.

Figure 2: There are 7 file formats currently added to the application and more will be added in the future. There is option to select a file or select a folder. This option becomes very useful when user need to test number of files in one go. In such cases user can select the folder so that all the specified file formats will be searched under the folder.

Figure 3: There is option to select a file or a folder. Folder option is useful when user need to test multiple files. File browser will

display only particular file format which is specified by the user.

33

Page 4: fo|k fofu;ksxkf}dkl% A CHEMOINFORMATICS TOOL FOR …mitpublications.org/yellow_images/1388206263_logo_paper 4.pdf · fo|k fofu;ksxkf}dkl% International Journal of Management, MIT

fo|k fofu;ksxkf}dkl%

International Journal of Management, MIT College of Management Vol. 1, No.1, July 2013ISSN 2321 - 6700 © MIT Publication.

A CHEMOINFORMATICS TOOL FOR LIPINSKI RULE

Figure 5: User can select multiple files residing in the particular folder by selecting the folder option. There are possibilities that

file is not in proper format. In such cases, the application will pop-up the message that file is not in proper format.

Figure 6 : If the file is not in the proper format, application will show message that file is not in proper format. Log of parsed files

and molecules created in “My Documents” where there will be log of every file parsed. File will contain the number of molecule

parsed and their status of the Lipinski's Rule. In future, this log creation will be turned into next phase where searching of the

molecule status will be possible from application. Also, Lipinski's Rule satisfied molecules will be listed in separate list.

Figure 4: File browser will display only those files which file format is selected by the user. Folder browser will display only folders for selection.

34

Page 5: fo|k fofu;ksxkf}dkl% A CHEMOINFORMATICS TOOL FOR …mitpublications.org/yellow_images/1388206263_logo_paper 4.pdf · fo|k fofu;ksxkf}dkl% International Journal of Management, MIT

fo|k fofu;ksxkf}dkl%

International Journal of Management, MIT College of Management Vol. 1, No.1, July 2013ISSN 2321 - 6700 © MIT Publication.

A CHEMOINFORMATICS TOOL FOR LIPINSKI RULE

Figure 7 : Log file creation which will contain the molecule details as well as the status for the Lipinski's rule.

Lipinski's Rule checked using creation of a function in python. Properties of every molecule is taken out using Pybel such as

molecular weight, LogP, Donors and Acceptors, Refractivity index and checked based on the Lipinski's Rule.

FUTURE EXTENSIONSNumbers of extensions are expected in this application in future such as:

Additions of more file formats.Creation of search algorithm for molecules.Creation of better logging mechanism.Addition of graphs.

••••

REFERENCES

1. CDXML: http://www.cambridgesoft.com/services/documentation/sdk/chemdraw/cdx/IntroCDXML.htm

2. John J. Irwin and Brian K. Shoichet : ZINC − A Free Database of Commercially Available Compounds for Virtual Screening.

[journal of chemical information and modeling http://pubs.acs.org/doi/abs/10.1021/ci049714+: ]

3. Noel M O'Boyle , Chris Morley and Geoffrey R Hutchison :Pybel: a Python wrapper for the OpenBabel cheminformatics

toolkit [http://journal.chemistrycentral.com/content/2/1/5]

4. Python 2.7 : http://www.python.org/download/releases/2.7/

5. PyQt : http://www.riverbankcomputing.com/software/pyqt/intro

6. Pybel : http://openbabel.org/docs/2.3.1/UseTheLibrary/Python_Pybel.html

7. PyInstaller : http://www.pyinstaller.org/

8. Text Refered from [http://www.learner.org]

9. The Lipinski rule [http://www.nature.com]

10. Zsolt Zsoldos Irina Szabo , Zsolt Szabo and A Peter Johnson : Software tools for structure based rational drug design

[http://www.sciencedirect.com/science/article/pii/S0166128003007267]

35