reaxys medicinal chemistry. production innovation to generate...

18
PRODUCTION INNOVATION TO GENERATE THE BEST INFORMATION

Upload: others

Post on 28-Jun-2020

13 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Reaxys Medicinal Chemistry. Production Innovation to Generate …elar.urfu.ru/bitstream/10995/31052/3/Reaxys Medicinal... · 2019-04-21 · PRODUCTION INNOVATION TO GENERATE THE BEST

PRODUCTION INNOVATION TO

GENERATE THE BEST INFORMATION

Page 2: Reaxys Medicinal Chemistry. Production Innovation to Generate …elar.urfu.ru/bitstream/10995/31052/3/Reaxys Medicinal... · 2019-04-21 · PRODUCTION INNOVATION TO GENERATE THE BEST

PRODUCTION INNOVATION TO GENERATE THE BEST INFORMATION• 2014 2

OUR MISSION

“The goal of Reaxys Medicinal Chemistry is to facilitate the

development of ‘smarter leads’ – leads with optimal affinity,

selectivity and ADMET properties; leads that will not fail in preclinical

and clinical phases for reasons that could have been predicted. This

requires access to data that must be laboriously extracted by hand

from the overwhelmingly large body of published literature. For that,

we have established a methodical and unrivalled

production process.”

— Dr. Olivier Barberan, Product Manager for

Reaxys Medicinal Chemistry

Page 3: Reaxys Medicinal Chemistry. Production Innovation to Generate …elar.urfu.ru/bitstream/10995/31052/3/Reaxys Medicinal... · 2019-04-21 · PRODUCTION INNOVATION TO GENERATE THE BEST

PRODUCTION INNOVATION TO GENERATE THE BEST INFORMATION• 2014 3

Reaxys Medicinal Chemistry covers the intersection between the informational spaces of small molecules and bioactivity. Mediating the relationships in this intersection are the drug candidates and the druggable targets (1,2), biological pathways, tissues, cell lines, organisms and bioassays that were used to test hit and lead compounds (Figure 1). Consequently, all of the compounds included in Reaxys Medicinal Chemistry have reported bioactivity and all of the data are for real, experimentally determined biological effects.

In this informational space, you will find answers to questions that support decisions for drug discovery and lead optimization. If you are looking for compounds to test or modify, you will find information

on affinity, potency, specificity and synthesis. If you are looking for guidance on how to optimize or repurpose a compound, you will additionally find information on the pharmacokinetic properties, toxicity, off-target activity, metabolism and transport of the compound or similar molecules.

The mission of Reaxys Medicinal Chemistry is to be the single source for the most detailed and high-quality data on small molecules that are relevant and meaningful to medicinal chemistry. In keeping with this philosophy, Elsevier has designed a dedicated production workflow built on five pillars, each of which contributes to achieving the key tenets of “detailed, high-quality, relevant and meaningful” (Figure 2).

THE INFORMATIONAL COVERAGE OF REAXYS MEDICINAL CHEMISTRY

BioactivitySmall molecules

subset of compoundscovered by Reaxys

Informational coverage of RMC: Experimentally determined biological effects of small molecules tested on known targets, cell lines, tissues, organisms, etc.

Macro-molecules

Figure 1. The informational coverage of Reaxys Medicinal Chemistry: where small molecules and bioactivity information intercept.

Page 4: Reaxys Medicinal Chemistry. Production Innovation to Generate …elar.urfu.ru/bitstream/10995/31052/3/Reaxys Medicinal... · 2019-04-21 · PRODUCTION INNOVATION TO GENERATE THE BEST

PRODUCTION INNOVATION TO GENERATE THE BEST INFORMATION• 2014 4

Pillar 1: Information Source

With information spread throughout an ever expanding and interdisciplinary body of publications, data fed into Reaxys Medicinal Chemistry are captured from the full text of relevant articles and patents, independent of journal or patent category.

Pillar 2: Information Excerption

Expert excerptors transform published data from its diverse formats into a discoverable landscape of bioactivity information that matches the mental model of the drug development workflow.

Pillar 3: Information Integration

Third-party data enhance coverage of the information system because integrated data are curated to fit the detailed content paradigm of Reaxys Medicinal Chemistry.

Pillar 4: Quality Assurance

Multiple checks and processes safeguard the quality of data entering and existing in Reaxys Medicinal Chemistry and test database integrity and function. Note that the excerption quality assurance described in this white paper pertains to bioactivity data; chemistry data undergo a separate quality assurance process.

Pillar 5: Information Presentation

The rich and highly granular content of Reaxys Medicinal Chemistry is normalized and standardized to support multiple analysis workflows.

1 2 3 4 51 2 3 4 5

SOURCE

Cover an interdisciplinary and dynamic publication landscape

EXCERPTION

Make published data discoverable and useable

INTEGRATION

Augment content with searchable third-party data

QUALITYASSURANCEEnsure production process matches philosophy

PRESENTATION

Deliver normalized and standardized data for any analysis

Expert information processing

Informationpublished byresearchers

anddevelopers

Data delivered to customers

Figure 2. Five pillars in the production process contribute to the value of data in Reaxys Medicinal Chemistry.

Page 5: Reaxys Medicinal Chemistry. Production Innovation to Generate …elar.urfu.ru/bitstream/10995/31052/3/Reaxys Medicinal... · 2019-04-21 · PRODUCTION INNOVATION TO GENERATE THE BEST

PRODUCTION INNOVATION TO GENERATE THE BEST INFORMATION• 2014 5

INFORMATION SOURCE

“We have created a system that supplies the production process with a selection of source documents that truly capture this interdisciplinary and dynamic publication landscape.”

— Dr. Nina Kaun, Project Manager for Content Integration and Development

How do you ensure that you have a comprehensive overview of all the information available on a topic? How do you select the most important publications to read in your limited time? These questions drive the information sourcing process for Reaxys Medicinal Chemistry. Bioactivity data on small molecules can be found in countless journal articles and patents, and finding these requires going through each publication. This is a daunting task. An approach commonly used by alternative information management systems is to focus on a small core set of journals, enhanced by a superficial examination of a broader publication set.

Interested in accessioning the most relevant data regardless of where they were published, the Reaxys Medicinal Chemistry team has instead developed a pipeline for document annotation and ranking based on text-mining that scans the full text of articles and patents to select source documents for excerption. Documents are ranked according to a score based on the co-occurrences of entities from various domains within sentences, paragraphs and the whole document. The scoring relies on annotating input documents with named entities, i.e., expressions that refer to concepts of relevant domains based on taxonomies of chemistry, anatomy, species, cell lines, disease, herb drugs, physiology, proteins, effects, magnitudes/units, methods and substances. Concept synonyms that are highly relevant for Reaxys Medicinal Chemistry are given a higher score. Additional factors taken into consideration include tables of results, data points and biological parameters (e.g., IC50, Ki), as well as document sections like title, discussion, methods and references. An

aggregated score is calculated for each document that reflects the likelihood of extracting concrete and usable data, even from publications that appear tangential to chemistry. All input documents are then categorized as relevant or non-relevant.

Scientific literature sources for Reaxys Medicinal Chemistry come from the complete literature body of Embase®, Elsevier’s biomedical database. While there is clear overlap in the coverage of the source literature of these two databases, data excerpted from those documents for Reaxys Medicinal Chemistry is specifically identified, extracted, and organized to support lead discovery and optimization workflows, whereas Embase is designed for broad use in biomedical research. All patent classes from the US Patent and Trademark Office, the International Patent System and the European Patent Office are also sources for Reaxys Medicinal Chemistry. This translates to roughly 800,000 articles from almost 6,000 journal titles in Embase plus roughly 800,000 to 1 million patents that are evaluated every year for inclusion in Reaxys Medicinal Chemistry. Of these, roughly 2,000 full-text publications are selected every month for the excerption process, based on the aforementioned criteria (Figure 3).

To ensure that the selection process remains aligned with goals of coverage breadth, information depth and content balance, the outcome of the selection is assessed on a regular basis. A “Gold Set” was created for this evaluation, consisting of 100 relevant (with the number of manually extracted data points), 100 non-relevant, and 50 non-chemistry documents. During the development period, using different criteria and fine-tuning parameters,

Benefits of the Reaxys Medicinal Chemistry source selection process

• Selection based on presence of excerptable data and not journal title

• Not restricted to domain-specific publications

• Full-text search of over 1.6 million documents per year

• Selection emphasizes pharmacologically relevant keywords and concepts

• Feeds excerption process with condensed set of relevant sources

Page 6: Reaxys Medicinal Chemistry. Production Innovation to Generate …elar.urfu.ru/bitstream/10995/31052/3/Reaxys Medicinal... · 2019-04-21 · PRODUCTION INNOVATION TO GENERATE THE BEST

PRODUCTION INNOVATION TO GENERATE THE BEST INFORMATION• 2014 6

Journal articles from the complete Embase collection ~800,000/year

All patent classes from US, EU and world patent systems ~800,000 – 1,000,000/year

SOURCE SELECTION

Text mining winnows the literature base to documents with relevant, excerptable data

Regular quality control of outcome

EPO

USPTO

PCTFull-text articles and patents — manual excerption of chemistry and connected biology~24,000/year

Figure 3. Paring down a mountainous body of publications to the most relevant: the Reaxys Medicinal Chemistry source selection process identifies documents for excerption.

the “Gold Set” for patents yielded a precision score of 94%, a recall score of 94% and a combined quality score of 94% (see Measuring excerption performance). In the final evaluation, the precision score was calculated using 9,500 journal articles.

Unlike other information solutions,

the objective of the selection and excerption process for Reaxys Medicinal Chemistry is not to generate the largest number of results, but the most relevant results. A smaller set of results, where every hit is applicable to a research question, is more valuable than a long list of irrelevant hits.

Page 7: Reaxys Medicinal Chemistry. Production Innovation to Generate …elar.urfu.ru/bitstream/10995/31052/3/Reaxys Medicinal... · 2019-04-21 · PRODUCTION INNOVATION TO GENERATE THE BEST

PRODUCTION INNOVATION TO GENERATE THE BEST INFORMATION• 2014 7

INFORMATION EXCERPTION

“The detailed excerption needed to augment Reaxys Medicinal Chemistry goes above and beyond indexing and to date cannot be done purely automated. It requires grasping the full extent of information in a source document and translating that into a new data form that is understandable, usable and discoverable. The power of our excerption lies in the people who know what that should look like.”

— Dr. Ralph Hössel, Advisory Database Content Specialist, Content Enrichment and Taxonomies

Articles and patents selected for excerption are catalogued by an automated system that recognizes the standardized form of citation data, extracts that information, and assigns a publication identifier, if not already available. The unique identifier is important for the later integration of third-party data (see Information Integration). The cataloged citations and their full text files are then fed through a “reading machine” that performs an initial automated excerption of chemical data based on common data formats and keywords. These data include identifiers of chemical entities and basic properties such as chemical structures, melting points, and reaction components and conditions. Then, the excerpted data files and full-text sources are delivered to excerption teams for manual review and augmentation.

The excerption process runs along two sequential work streams; first, the excerption of chemistry data and then the bioactivity data. Dedicated excerptors, who have specialized in each area, perform the excerption and their work is supported by the use of Elsevier’s proprietary interactive Excerption Interface, or iEI (Figure 4). The chemistry excerptor loads the data files extracted by the reading machine into iEI and reworks the data to ensure correctness. These data are highlighted and color-coded in the source document (Figure 4A). Then he or she proceeds to examine in detail the source publication and complete the process of identifying and excerpting all relevant data from the full document, whether scientific article or patent. Following chemistry excerption, the pharmacology excerptor adds all relevant bioactivity data present in the source to complete the data files destined for accession into Reaxys Medicinal Chemistry.

Page 8: Reaxys Medicinal Chemistry. Production Innovation to Generate …elar.urfu.ru/bitstream/10995/31052/3/Reaxys Medicinal... · 2019-04-21 · PRODUCTION INNOVATION TO GENERATE THE BEST

PRODUCTION INNOVATION TO GENERATE THE BEST INFORMATION• 2014 8

Figure 4. Everything at a glance. In interactive panels, the interactive Excerption Interface for Reaxys Medicinal Chemistry displays in interactive panels the source document undergoing excerption (top left enlargement) with pre-excerpted chemical data, and all possible data entry masks needed to support both the chemistry (middle enlargement) and the pharmacology (bottom left enlargement) excerption work streams.

iEI grants excerptors a single-screen view of the source document and the data entry fields corresponding to the content architecture of Reaxys Medicinal Chemistry. A number of features of iEI facilitate excerption by controlling and standardizing data entry.

Almost every entry field is controlled through one or more of the taxonomies organizing data in Reaxys Medicinal Chemistry (see What’s in a taxonomy?) or by embedded glossaries. For the excerptor this means that he or she must select the appropriate term from a tree representation of the taxonomy or from a glossary of possible entries based on search input (Figure 5). In this way, the excerption and corresponding indexing of data points is guaranteed to be unambiguous (e.g., a valid target) and to fit into the database structure and functionality.

Numerical entry fields are also controlled. Units are selected from a

menu specific to the type of data and values are restricted to predetermined limits applicable to the type of data. Thus, for example, the iEI will require an excerptor to confirm values that appear unreasonable for enzyme inhibition assays (a warning) or will not accept values that are orders of magnitude out of range (an error).

Data are entered into different masks corresponding to concepts matching bioassays, such as target, biological materials, and results. Entry fields are interlinked between masks so that the corresponding entries need to be made only once.

Additional quality checks are embedded in iEI, which the excerptor can run prior to submitting the excerpted data. These checks compare data in related entry fields and masks to ensure that they are consistent with one another and display disparities as a link that lead the excerptor to the culprit fields.

Page 9: Reaxys Medicinal Chemistry. Production Innovation to Generate …elar.urfu.ru/bitstream/10995/31052/3/Reaxys Medicinal... · 2019-04-21 · PRODUCTION INNOVATION TO GENERATE THE BEST

PRODUCTION INNOVATION TO GENERATE THE BEST INFORMATION• 2014 9

Even with the assistance of iEI, the excerption process is time and labor intensive. However, it is this meticulous work that accounts for the unmatched granularity of the data in Reaxys Medicinal Chemistry. Each target, for example, is associated with over 25 data entry fields ranging from target name and species to stoichiometry, transfection type, start and end amino acid, and mutation type (see enlargement in Figure 4). This level of detail, coupled to the already unmatched granularity of chemical data for small molecules from Reaxys, is one of many features that differentiate Reaxys Medicinal Chemistry from alternative solutions on the market.

For example, other solutions perform keyword searches in patents to link them to specific targets, so target searches generate long lists of patents. However, they do not examine if the patent actually contains relevant data about the target nor do they capture those data to make them available for use in analyses.

Figure 5 is a screenshot of iEI showing the amount of detailed data connected to a single bioassay in Reaxys Medicinal Chemistry. Five assays were identified in the source document (listed on the left), and for each bioassay a wealth of experimental and conceptual details are excerpted.

Figure 5. The granularity and interconnectedness of the data in Reaxys Medicinal Chemistry distinguishes it from any other information system. Enlargements of the screenshot show five bioassays (left enlargement) described in a source document (top enlargement). Each excerpted bioassay is associated with detailed data entered into the bioassay mask (a section displayed in bottom enlargement) and linked to corresponding concepts: target(s), biological material(s), and measurements. Each of these concepts is conversely elaborated by including relevant details and complementary information.

Page 10: Reaxys Medicinal Chemistry. Production Innovation to Generate …elar.urfu.ru/bitstream/10995/31052/3/Reaxys Medicinal... · 2019-04-21 · PRODUCTION INNOVATION TO GENERATE THE BEST

PRODUCTION INNOVATION TO GENERATE THE BEST INFORMATION• 2014 10

What’s in a taxonomy?

Reaxys Medicinal Chemistry is built on several taxonomies and glossaries: some unique and others shared with Embase. These data architectures serve to describe the concepts and terms of medicinal chemistry from the perspective of several relevant domains, explicitly outlining the relationships between terms and the overall informational space of the discipline. The domains included in the architecture of Reaxys Medicinal Chemistry range from Targets, Organisms and Cell Lines, to Administration Route, Diseases and in vivo/in vitro Procedures.

Taxonomies and glossaries assign meaning to the content in Reaxys Medicinal Chemistry. For example, the comprehensive “Targets” taxonomy places data in the context of the current state of knowledge about valid pharmaceutical targets. Data retrieval functions use these taxonomies to deliver results for every query and thus, the comprehensive lists of relevant hits that emerge from Reaxys Medicinal Chemistry arise from the expertly designed and extensive data structure.

Naturally, taxonomies and glossaries are expected to evolve along with the rapid advances made in medicinal chemistry research. The scope of the taxonomies underlying Reaxys Medicinal Chemistry is not restricted to terms and concepts used in the excerpted source documents. Instead, an interdisciplinary team of content experts constructed the taxonomies with a view toward the future, including relevant terms and concepts that comprehensively cover all aspects of medicinal chemistry. The terminology used is that recommended in the scientific community, based on authorities like UniProt, Rfam, and Gene Ontology. Furthermore, a system is in place to evaluate and accommodate new concepts coming from updates to Embase and candidate terms proposed by excerptors. Thus, Reaxys Medicinal Chemistry provides the most up-to-date perspective of the information landscape of pharmacologically relevant small molecules.

Want to learn more? Read our white paper on the architecture of Reaxys Medicinal Chemistry.

Page 11: Reaxys Medicinal Chemistry. Production Innovation to Generate …elar.urfu.ru/bitstream/10995/31052/3/Reaxys Medicinal... · 2019-04-21 · PRODUCTION INNOVATION TO GENERATE THE BEST

PRODUCTION INNOVATION TO GENERATE THE BEST INFORMATION• 2014 11

INFORMATION INTEGRATION

“Reaxys Medicinal Chemistry emerged from consolidating bioactivity data collected in Reaxys and the extensive biological content of Aureus Sciences. Striving to create single-point information access for users, we acquired data from the GOSTAR bioactivity database to produce the uncontested champion of detailed and comparable bioactivity data for small molecules.”

— Dr. Olivier Barberan, Product Manager of Reaxys Medicinal Chemistry.

To capture the full informational space of bioactivity of small molecules, the content of the database is complemented with data from the GOSTAR® bioactivity database. The high level of data detail of GOSTAR makes it suitable for wrapping into the architecture of Reaxys Medicinal

Chemistry, but integration of this content requires citation matching and normalization of bioactivity and chemical data.

Monthly updates from GOSTAR are accessioned to the production database after these two steps:

• Articles and patents of the incoming GOSTAR data are matched to existing publications in the production database via patent number, publication identifier, or other key citation indices. Duplicates are filtered out and new source documents are registered into the production database.

• Data assigned to terms and concepts in the data files from GOSTAR are mapped to those in Reaxys Medicinal Chemistry. Mental data models of the two repositories differ in some areas of the bioactivity data. Through this mapping, entries are remodeled, split or merged to align the incoming data with the taxonomies and glossaries of Reaxys Medicinal Chemistry. Chemical structures also require normalization to transform structure data into the format supported by Reaxys Medicinal Chemistry.

Once the normalized data and corresponding citations are accessioned into the production database, the data that will ultimately remain in the final database are selected. As a general rule, GOSTAR content is retained if for a given publication, GOSTAR has compound and/or bioactivity data that

are not already in Reaxys Medicinal Chemistry. Thereafter, the integrated data undergo the pre-publication quality assurance protocol. After this careful integration, Reaxys Medicinal Chemistry has 30% more content than GOSTAR, as well as more granular bioactivity data and experimental details (Figure 6).

0

5

10

15

20

25

30

Integrity™

ChEMBL™

GOSTAR

Reaxys Medicinal Chemistry

Data pointsMolecules

Million

Reaxys Medicinal Chemistry vs.

other databases

Figure 6. Merging data from three exceptional sources of pharmacological data on small molecules, Reaxys Medicinal Chemistry trumps alternative information systems in both number of molecules and number of data points included.

How many data points are

checked?It would be impossible to examine every single excerpted data point. Instead, a statistically representative sample of the excerpted data is taken. Samples taken for quality checks of Reaxys Medicinal Chemistry contain enough data points to guarantee a 95% confidence level.

Page 12: Reaxys Medicinal Chemistry. Production Innovation to Generate …elar.urfu.ru/bitstream/10995/31052/3/Reaxys Medicinal... · 2019-04-21 · PRODUCTION INNOVATION TO GENERATE THE BEST

PRODUCTION INNOVATION TO GENERATE THE BEST INFORMATION• 2014 12

QUALITY ASSURANCE

“Neither excerption decisions nor excerption guidelines are entirely black and white. Quality assurance is done both automatically and manually, with the ultimate aim to continuously improve the quality of excerption.”

– Dr. Catherine Noban, Scientific Information Specialist

Data excerpted for Reaxys Medicinal Chemistry undergoes scrutiny at several points during the production process. Quality assurance is more than ensuring that new data added to Reaxys Medicinal Chemistry are complete and precise. Quality

assurance must also safeguard consistency across all data in the repository, adherence to the database architecture, effective production, proper database function, and smooth operation of all features.

Quality assurance

Guarantee accuracy and consistency of excerpted data

Ensure reliability of scores and results from Quality Assurance step 2 and adherence to excerption guidelines

Verify architecture and function of complete database

Quality assurance process review

Pre-publication quality assurance

Continuous Regular review Every 2 weeks

Checks and controls embedded in iEl performed during excerption

Manual assessment of excerption quality according to Elsevier Content Quality Management principles and tools using a representative sample

1

2

Manual validation of results from Quality Assurance step 2 based on a selection of reviewed items

3 Roughly 4,000 automatic queries run on test server to test integrity of database

Manual queries performed in user interface to check speed and function

4

5

Quality measurements and continuous improvement based on assessment results; potential further actions include additional training

Ensure the reliability of the quality results measured by the quality assurance process

Filter out “rogue” data and identify problems in database structure and function

Process

Goal

Step

Result

Frequency

Figure 7. A total of five quality assurance checks are completed at three points in the production workflow before data are published to the public server of Reaxys Medicinal Chemistry. The three rounds focus on data accuracy, quality assurance reliability, and database function.

Page 13: Reaxys Medicinal Chemistry. Production Innovation to Generate …elar.urfu.ru/bitstream/10995/31052/3/Reaxys Medicinal... · 2019-04-21 · PRODUCTION INNOVATION TO GENERATE THE BEST

PRODUCTION INNOVATION TO GENERATE THE BEST INFORMATION• 2014 13

The interactive Excerption Interface (iEI) used by excerptors to capture data based on detailed excerption guidelines for Reaxys Medicinal Chemistry has a number of quality checks that guarantee internal consistency in the data. To begin, almost all entry fields are supported by controlled vocabularies. An excerptor selects relevant terms from taxonomies or glossaries and can input numeric values only within a predetermined range. This prevents nonsensical data entries and maintains naming and indexing consistency. Excerptors can propose candidate terms for entry fields and these are evaluated for addition to database architecture according to strict rules. Furthermore, embedded quality functions crosscheck

related entry fields and masks with one another to prevent logic, data type, or other contradictions among interconnected information. In this way, the iEI not only guides excerptors in their work, but also assists them in reviewing it for completeness and accuracy (see Information Excerption).

Subsequently, once newly excerpted data are added to the Reaxys Medicinal Chemistry production database, a representative sample of the excerpted content is sent back to the excerption team for quality assurance. The team performs a manual examination of this sample to assess excerption quality according to methodologies and tools dictated by Elsevier Content Quality Management. The data are crosschecked on two levels:

• Every excerpted data point is examined for precision. Was the data point interpreted correctly and was it excerpted appropriately?

• The full text and supplementary information of every document is read through to identify all data that should have been excerpted. Were all data in a document captured?

To ensure effectiveness of the review itself, two expert controllers perform the data quality assessment, generate

detailed reports that list all incorrect and missing information and provide feedback to excerptors.

Quality assurance process review: making sure quality assurance is reliable

Upon completion of the quality assurance process, the Elsevier Content Quality Management team takes a random selection of items reviewed by the excerption team for further scrutiny. Two expert controllers manually examine the assessment based on the selected items to validate the reliability of the quality assurance results and ensure adherence to excerption guidelines for Reaxys Medicinal Chemistry. This conversely, warrants that the requirements dictated by the overall philosophy of Reaxys Medicinal Chemistry are met.

Measuring excerption

performanceA quantitative measure is needed to monitor excerption quality. It must encompass multiple aspects of the quality of data. The key performance indicator used by the Elsevier Content Quality Management team is a statistic that reflects the recall of data – does a data point appear where it should appear? – and the precision of data – is that data point correct? Additionally, each entry field is weighted differently in the calculation, based on its importance to the user. So, for example, an incorrect target is considered a critical error and is thus weighted more heavily. The final statistic is based on four components: recall, precision, occurrence, and importance of a field. The value of this quantitative measure must be above 95%.

“No information system of this magnitude can be 100% error-free, but our systems strive for the highest precision and recall possible.”

–Dr. Michael Maier, Operations Manager

Technology, Elsevier Life Science

Solutions.

Quality assurance: making sure excerpted data are right

Page 14: Reaxys Medicinal Chemistry. Production Innovation to Generate …elar.urfu.ru/bitstream/10995/31052/3/Reaxys Medicinal... · 2019-04-21 · PRODUCTION INNOVATION TO GENERATE THE BEST

PRODUCTION INNOVATION TO GENERATE THE BEST INFORMATION• 2014 14

Before data from the production database are published to the public server, they are loaded to a test server where another round of quality checks are performed. At this point, the objective is to ensure that added data do not conflict with the database architecture or impact the established

functionality of Reaxys Medicinal Chemistry. With the newly integrated data, roughly 4000 queries are run automatically in the test server. Each one produces a log file. These log files are manually compared to log files from previous quality checks to detect inconsistencies, such as:

• Less data are retrieved by a query than previously. This indicates the presence of an unintentional bottleneck in data retrieval.

• Significantly more data are retrieved by a query than are added to the database. If it is known that an accession has resulted in a 1% increase in data volume in the database, a query returning over 10% more data than previously is suspect.

Query results that do not match expectations are manually examined and measures are taken to remove the culprit. In addition, expert controllers perform a lengthy protocol of manual queries directly in the user interface of Reaxys Medicinal Chemistry and test

the speed of retrieval, the results of the queries, and correct functionality. Only after both quality checks are completed and any emerging problems are resolved are the data published to the final database server.

INFORMATION PRESENTATION

“The power of Reaxys Medicinal Chemistry is that the data are ready to be discovered, used, digested and analyzed. That laborious work of preparing the data is done. The user can now focus on gaining insights.”

— Dr. Michael Maier, Operations Manager Technology, Elsevier Life Science Solutions.

The true value of Reaxys Medicinal Chemistry lies in its content. Part of this value comes from the precise and detailed excerption of data in an extensive body of literature. Another aspect is the high-quality process and architecture of the database itself, as well as the integrated mechanisms to identify areas of improvement. Finally, a very distinctive and valuable feature of the content of Reaxys Medicinal Chemistry is that the data are normalized and standardized through the production process.

Normalized means that data are reduced to a common form and organized to minimize redundancy. Standardized means that data on a given bioactivity parameter are on comparable scales to equalize

parameter range and data variability. This conversely means that the millions of data points in Reaxys Medicinal Chemistry can serve as direct input for any desired analysis. The pX value featured in the database is a standardized measure of affinity. This parameter is calculated by converting the variety of measures used to define the affinity between compound and target to a comparable scale. As a result, the pX value can be compared from one source to another or from one compound-target pairing to another (see Comparing apples and oranges). The heatmap in the Reaxys Medicinal Chemistry user interface capitalizes on this comparability to provide an interactive matrix that summarizes affinity for a large number of

Pre-publication quality assurance: making sure the database works

Page 15: Reaxys Medicinal Chemistry. Production Innovation to Generate …elar.urfu.ru/bitstream/10995/31052/3/Reaxys Medicinal... · 2019-04-21 · PRODUCTION INNOVATION TO GENERATE THE BEST

PRODUCTION INNOVATION TO GENERATE THE BEST INFORMATION• 2014 15

compound–target pairings and can be used to explore factors that contribute to affinity or find interesting activity hotspots (Figure 8).

The same normalized and standardized data support applications that necessitate large quantities of data, such as in silico structural or statistical modeling of compounds. The data can be obtained through application programming interfaces (API) that run a defined query in Reaxys Medicinal Chemistry and integrate results into an external platform for analysis. Alternatively, standardized flat files (SFF) with the organized database content can be queried and analyzed directly in a proprietary platform.

Elsevier has an experienced team dedicated to assisting in the integration of these satellite data formats into internal workflows and analysis platforms.

The flexibility with which users can avail themselves of the data in Reaxys Medicinal Chemistry underscores the philosophy behind this information management system: to singly source the highly granular, proficiently excerpted and comprehensive data needed to inform critical decisions along the process of discovering and optimizing a therapeutic compound, so that you can learn from the work of others, avoid dead ends, and move forward.

Figure 8. Heatmaps in Reaxys Medicinal Chemistry display normalized and standardized data affinity measures (pX) for various compound (on the x-axis) and target (on the y-axis) pairings. The user can filter the affinity values (see left enlargement) to narrow in on values relevant to a specific research question or experimental condition.

Page 16: Reaxys Medicinal Chemistry. Production Innovation to Generate …elar.urfu.ru/bitstream/10995/31052/3/Reaxys Medicinal... · 2019-04-21 · PRODUCTION INNOVATION TO GENERATE THE BEST

PRODUCTION INNOVATION TO GENERATE THE BEST INFORMATION• 2014 16

Comparing apples and orangesThe pX value is a systematic conversion of affinity measures to a standardized parameter via compound concentration. All measures of affinity that are concentration dependent can be transformed into pX values, regardless of whether they are expressed on a logarithmic scale (pIC50, pEC50, pED50, pLD50, etc.) or a normal scale (LC50, ED50, Ki, etc.), and regardless of the units used (µm, nM, mM, g/l, etc.). Single data points associated with percent activity (inhibition, activation, viability, etc) are pared down to a common scale. Assuming that the compound can achieve 100% activity and that a sigmoidal curve with a Hill slope of 1 describes the relationship between compound concentration and activity, pX is calculated as follows:

If percent activity > 25, pX = –log10(AC50).AC50 is the concentration corresponding to 50% activity and is calculated as:AC50 = 100 X [C] % activitywhere [C] is the concentration of the tested compound.If percent activity < 25, pX = 1.pX is only calculated if compound concentration is reported as a single value.

-[C]

Page 17: Reaxys Medicinal Chemistry. Production Innovation to Generate …elar.urfu.ru/bitstream/10995/31052/3/Reaxys Medicinal... · 2019-04-21 · PRODUCTION INNOVATION TO GENERATE THE BEST

PRODUCTION INNOVATION TO GENERATE THE BEST INFORMATION• 2014 17

Quick facts about Reaxys Medicinal Chemistry• Over 5.4 million substances, 11,000 targets and 26

million biological results• Current content extracted from almost 6000 journal

titles and thousands of patents• Journal articles and patents selected from over 1.6

million documents each year• Manual excerption and checks to safeguard the high

granularity and quality of data• Developed by the world’s fourth largest digital media

company, based on over 150 years experience in scientific information management

• Meets ISO 9001 standards

References

1. Sakharkar, M. and Sakharkar, K. (2007) Targetability of human disease genes. Curr Drug Discov Technol 4, 48.

2. Reymond, J.-L. and Awale, M. (2012) Exploring chemical space for drug discovery using the Chemical Universe Database. ACS Chem Neurosci 3, 649.

Compare Reaxys Medicinal Chemistry to the way you currently obtain information — experience the difference.

Page 18: Reaxys Medicinal Chemistry. Production Innovation to Generate …elar.urfu.ru/bitstream/10995/31052/3/Reaxys Medicinal... · 2019-04-21 · PRODUCTION INNOVATION TO GENERATE THE BEST

elsevier.com/reaxysCopyright © 2014. GOSTAR® is a registered trademark of GVK Biosciences Private Limited. ChEMBLTM is a

trademark of the European Molecular Biology Laboratory. IntegritySM is a trademark of Thompson Reuters. Embase® and the related trademarks are owned and protected by Elsevier BV. Reaxys® and the Reaxys®

trademark are owned and protected by Reed Elsevier Properties SA. All rights reserved.

AMERICAS(8.00 a.m. - 7.00 p.m. CST – St. Louis) Tel: US toll-free: +1 888 615 4500Tel: Non toll-free: +1 314 447 8069 Email: [email protected] Brazil: [email protected]

ASIA AND PACIFIC(9.00 a.m. - 6.00 p.m. SST – Singapore) Tel: +65 6349 0222Email: [email protected]

JAPAN(9.30 a.m. - 5.30 p.m. JST – Tokyo) Tel: +81 3 5561 5035Email: [email protected]/jp

EUROPE AND ALL OTHER REGIONS(9.00 a.m. - 6.00 p.m. CET – Amsterdam) Tel: +31 20 485 3767Email: [email protected]

CONTACT INFORMATION Please visit elsevier.com/reaxys