life science analytics

20
Baker Tilly refers to Baker Tilly Virchow Krause, LLP, an independently owned and managed member of Baker Tilly International. © 2012 Baker Tilly Virchow Krause, LLP Baker Tilly Management Consulting Realizing Business Value from Unstructured Data

Upload: andrew-malinow-phd

Post on 15-Feb-2017

51 views

Category:

Data & Analytics


2 download

TRANSCRIPT

Page 1: Life Science Analytics

Baker Tilly refers to Baker Tilly Virchow Krause, LLP,an independently owned and managed member of Baker Tilly International. © 2012 Baker Tilly Virchow Krause, LLP

Baker Tilly Management Consulting

Realizing Business Value from Unstructured Data

Page 2: Life Science Analytics

THE VALUE OF UNSTRUCTURED DATA ANALYTICS

2

Page 3: Life Science Analytics

3

There is a tremendous opportunity to gain a competitive advantage by analyzing unstructured dataIndustries continue to struggle with integrating unstructured analytics into their business models. It is time consuming to identify all of the relevant data source and technically challenging to consume the data into an analytics environment, where additional processing needs to occur before the data can be analyzed.

Leveraging a Broad Variety of Data:Companies must be able to transform and parse data from multiple sources and in multiple formats: databases, text files, scientific devices, transactions, and even social media postings. End users also need easy, consistent access to all of this data to create a 360-degree view- of their customers, their products, or their brand.

The Value of Unstructured Data AnalyticsUnlocking the True Potential of Big Data

Page 4: Life Science Analytics

The Value of Unstructured Data AnalyticsMapping data sources to use-cases

Business use-cases that can benefit from an analysis of unstructured data include:• Clinical Trial Development-Analysis: Expedite the analysis of patient diary and Patient Reported Outcome data to reduce

time-to-market (and potentially uncover unanticipated benefits in early stage trials)• Clinical Trial-PRO Development- Analyzing publicly available discussion forum data can accelerate the development of

Patient Reported Outcome measures and streamline the FDA’s protocol review process• Active Market Surveillance (Pharmacovigilance): Are patients using and experiencing your product in a manner that is

consistent with your Clinical Trial data?• Market Intelligence: Understanding how your customers are describing their experiences with specific medications can

inform market positioning and facilitate targeted messaging• Labelling Claim Expansion: Are there unanticipated applications and benefits that are being articulated by customers

that can be used to inform programmatic expansion of an existing compound?

Data Sources Include: Clinical Research PubMed www.clinicaltrials.gov FDA.gov

Patient Support Sites patientslikeme.com

dailystrength.org

askapatient.com

Social Media Platforms Reddit Twitter Facebook

Clinical Trial Data (e.g., Patient Diaries) Call Center Notes Documents

Internal

Page 5: Life Science Analytics

The abundance of data provides tremendous opportunity…

Page 6: Life Science Analytics

And an overwhelming amount of data points:

Page 7: Life Science Analytics

7

We can help separate meaningful from meaningless

Page 8: Life Science Analytics

MAPPING DATA SOURCES TO USE CASES

8

Page 9: Life Science Analytics

9

The Value of Unstructured Data AnalyticsAvailability of Data

Relevant data is readily available:

Page 10: Life Science Analytics

CASE STUDY: PHARMACOVIGILANCE

10

Page 11: Life Science Analytics

Case-Study: PharmacovigilanceData Source:Reddit

11

234M Unique Users 853,824 Subreddits 11,464 Active Communities

217 Countries 8 Billion Page Views Monthly 13+ minutes spent on Average

Page 12: Life Science Analytics

Sample Use-Case: PharmacovigilanceData Source: www.reddit.com

Analyzing all of the post titles can yield value…

But analyzing the conversations people are having, and associated metadata like post date and # of comments can be infinitely more powerful

Page 13: Life Science Analytics

13

The Challenges with Analyzing Externally Sourced Unstructured Data:• There are thousands of posts, and tens of thousands (and more) comments• Without technology and a methodical text mining processes, gaining insight would require manual review

and data collection.

The amount of time to mine insights from the data would take on the order of months making it difficult to impact business decisions

Sample Use-Case: PharmacovigilanceData Source: www.reddit.com

Page 14: Life Science Analytics

14

• To source the data we wrote a Python script to crawl the site and scrape the data

• We ran a query on Reddit, using ‘Lipitor’ as the search term and analyzed the results using Python and Oracle Big Data Discovery

• The following are some visualizations and insights we were able to glean from the data.

Sample Use-Case: PharmacovigilanceData Source: www.reddit.com

Page 15: Life Science Analytics

15

We are able to view a quick top-line summary of the data set and KPIs:

And a distribution of where posts have been submitted

Sample Use-Case: PharmacovigilanceData Source: www.reddit.com

Page 16: Life Science Analytics

16

Symptoms that people discuss, buried in the comments section of the posts have been tagged, aggregated and visualized in a Tag Cloud:

And we can see how the volume of comments about the symptoms has changed over time:

Sample Use-Case: PharmacovigilanceData Source: www.reddit.com

Page 17: Life Science Analytics

17

We are able to see distribution of comments by location…

And limit our analysis to a geographic location of interest. Our summary data updates automatically based on this refinement:

Sample Use-Case: PharmacovigilanceData Source: www.reddit.com

Page 18: Life Science Analytics

18

We can set up alerts that tell us when Pfizer products are mentioned:

And configure the alerts to show us the terms that were used to flag them:

Sample Use-Case: PharmacovigilanceData Source: www.reddit.com

Page 19: Life Science Analytics

19

Users have complete visibility into the source data and finding key words and phrases is facilitated by powerful search technology:

Sample Use-Case: PharmacovigilanceData Source: www.reddit.com

Page 20: Life Science Analytics

20

Summary• Robust publicly available unstructured data provides opportunities to inform multiple use-cases, including:

- Pharmacovigilance - Competitor Analysis - Market Research - Expedited Clinical Trial End-Point Development

• For most companies these data points represent a difficult ‘aspirational’ data source for inclusion in Business Processes• Barriers include:

- Identifying the relevant publicly available data sources- Technical challenges associated with sourcing the data- Methodology/Technical approach to generating insights (Text Analytics)- Integrating insights into Business Processes

• Baker Tilly can help!

Proposed next steps• Custom Demo Development

Conduct 1/2 day onsite Discovery working-session Define high-value use-case for demo Identify 2-3 high value unstructured sources for inclusions in demo Develop 4-5 visualizations to demonstrate value and surface insights

The Value of Unstructured Data AnalyticsSummary & Proposed Next Steps

Interested in learning more? Contact Andrew Malinow, PhD