hard to get? knowledge heritage from invention to product ... · 2 introduction scholars studying...

27
Hard to get? Knowledge Heritage from Invention to Product Market Innovation Using Natural Language Processing Sheryl Winston Smith, Ph.D. BI Norwegian Business School [email protected] January 15, 2019 Submission to 2019 Wharton Technology Conference ABSTRACT Patents and patent citations are a window into the innovation process. Yet even established firms have difficulty commercializing inventions. A conundrum for scholars is: does the inventive activity captured in patents and patent citations translate into innovation in the product market? In other words, how reliably can we understand the link between patents and introduction of novel products to market? Thus, unpacking the link between knowledge and innovation is crucial for innovation, entrepreneurship, and strategy scholars. This paper brings to the innovation literature an important methodological approach from computer science: application of natural language processing and the vector space model in information retrieval to the study of innovation. Specifically, this paper details a novel methodology to algorithmically compare the similarity of patent texts to the description of innovations in two distinct contexts: 1) novel products brought to market by corporate venture capital investors in the medical device industry , and 2) IPO prospectus of startups in the communications equipment industry. Keywords: Natural Language Processing; Text Analysis; Innovation; Corporate Venture Capital; IPO; Intellectual Property; Commercialization of Technological Innovation; Technology Entrepreneurship

Upload: others

Post on 27-Jun-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Hard to get? Knowledge Heritage from Invention to Product ... · 2 INTRODUCTION Scholars studying innovation, entrepreneurship, and strategy continually seek to understand how firms

Hard to get? Knowledge Heritage from Invention to Product Market Innovation

Using Natural Language Processing

Sheryl Winston Smith, Ph.D.

BI Norwegian Business School [email protected]

January 15, 2019

Submission to 2019 Wharton Technology Conference

ABSTRACT

Patents and patent citations are a window into the innovation process. Yet even established

firms have difficulty commercializing inventions. A conundrum for scholars is: does the

inventive activity captured in patents and patent citations translate into innovation in the product

market? In other words, how reliably can we understand the link between patents and

introduction of novel products to market? Thus, unpacking the link between knowledge and

innovation is crucial for innovation, entrepreneurship, and strategy scholars. This paper brings to

the innovation literature an important methodological approach from computer science:

application of natural language processing and the vector space model in information retrieval to

the study of innovation. Specifically, this paper details a novel methodology to algorithmically

compare the similarity of patent texts to the description of innovations in two distinct contexts:

1) novel products brought to market by corporate venture capital investors in the medical device

industry , and 2) IPO prospectus of startups in the communications equipment industry.

Keywords: Natural Language Processing; Text Analysis; Innovation; Corporate Venture

Capital; IPO; Intellectual Property; Commercialization of Technological Innovation;

Technology Entrepreneurship

Page 2: Hard to get? Knowledge Heritage from Invention to Product ... · 2 INTRODUCTION Scholars studying innovation, entrepreneurship, and strategy continually seek to understand how firms

2

INTRODUCTION

Scholars studying innovation, entrepreneurship, and strategy continually seek to understand

how firms cultivate, develop, and use novel knowledge in pursuit of innovation. Myriad

questions exist: How efficiently does a company use scientific or technical knowledge to obtain

patents and is this novel knowledge incorporated in new products? Similarly, for startups, the

ability to reach significant milestones, such as an initial public offering (IPO), depends on the

innovativeness of its ideas. More broadly, for established companies and startups alike, what

value does the company receive in exchange for investment in innovation?

For scholars and practitioners, answering these crucial questions requires precise ways to

discern how inventiveness, i.e. patenting, is reflected in observable performance outcomes such

as introduction of novel products or successful launch of new ventures. This paper harnesses the

power of an important set of methods from the growing computer science field of natural

language processing (NLP) in a novel manner to more directly link inventive activity and

innovation-related performance outcomes. From a theoretical standpoint, these methods present

two types of value to researchers. In a specific sense, the methodology in this paper enables

scholars to directly examine the knowledge heritage of innovative products and ideas. In a

broader sense, these methods enable researchers to more fully analyze the capabilities and

incentives that bring inventions to fruition, e.g., through commercialization of products or

creation of new ventures.

Patents are key milestones in the innovation process. Patents offer a substantial window on

invention (Griliches, 1990), and patent counts and citations are used widely to infer knowledge

transfer (Hall et al., 2005; Jaffe and Trajtenberg, 2002; Trajtenberg 1990; Ziedonis, 2004).

However, this inference is clouded by strategic citations and from addition of examiner citations

of which the inventor was often unaware (Alcácer and Gittelman, 2006; Alcácer et al., 2009;

Page 3: Hard to get? Knowledge Heritage from Invention to Product ... · 2 INTRODUCTION Scholars studying innovation, entrepreneurship, and strategy continually seek to understand how firms

3

Lampe, 2012; Lemley and Sampat, 2012). Overall, while patents are valuable measures, they

also suffer from noted limitations. Identifying knowledge heritage more directly through the

computationally sophisticated textual analysis presented here can ameliorate this.

To illustrate the points above, two distinct settings are highlighted. First, the paper develops

an application rooted in a key strategic question from the knowledge search and CVC literature:

How can we measure the value companies derive from knowledge search, and, more specifically,

what innovation-related outcomes can companies attribute to CVC investments? The empirical

context in this paper—drawn from a novel dataset of CVC investment in the medical device

industry— emphasizes the usefulness of this methodology in tracing knowledge heritage

between patents and innovative products. Several factors make this a clean setting in which to

focus on the novel methodology. Companies in this industry actively seek breakthrough

innovation, patent heavily, and leave a paper trail of patent and pre-market approval documents

that can be analyzed (Ernst and Young, 2011). Moreover, the context of CVC investment allows

isolation and comparison of a distinct source of invention originating outside the company that

leaves a knowledge signature on the corporate investor’s product innovations.

While the focal context is on CVC investors’ innovation outcomes, the methodology has

broader potential applications. For example, this methodology is well suited to probe inside the

black box of the genesis of new ventures and the commercialization of innovative products

(Aggarwal and Hsu, 2009). To illustrate this, the paper further spotlights a key research question

relating to the development of innovation capabilities in new ventures: To what extent do

capabilities surrounding invention contribute to the launch of startups and successful

commercialization? The potential of this methodological approach to link inventive activity,

commercialization of products, and new venture launch through IPO is examined using an

example of a startup initial public offering (IPO) in the communications equipment industry.

Page 4: Hard to get? Knowledge Heritage from Invention to Product ... · 2 INTRODUCTION Scholars studying innovation, entrepreneurship, and strategy continually seek to understand how firms

4

This example also points to generalizability of the methodology in a range of settings.

The paper makes three distinct contributions. Foremost, the paper pushes forward strategy

scholarship by signifying the computational power associated with using language as data:

specifically, this paper applies versatile IR methodology from computer science to allow strategy

scholars to study a core array of questions about innovation and technological capabilities.

Second, ideas and knowledge are the feedstock of innovation. By applying a powerful

computational approach to track the knowledge heritage of new products, this methodology

permits insightful answers to key research questions: Does success in invention follow through to

innovation? To what extent do companies leverage technological capabilities for market

success? Third, the paper extends the literature on CVC investment by identifying a new method

for tracking the essential outcome of interest—innovation—that has previously been elusive

(Benson and Ziedonis, 2009; Dushnitsky and Lenox, 2005; Dushnitsky and Shaver, 2009; Katila

et al., 2008; Winston Smith and Shah, 2013). This methodology provides new insight into CVC

as a mechanism for external knowledge search. Finally, this paper contributes to our

understanding of the extent to which IPOs draw upon the inventive activity of new ventures.

TRACING KNOWLEDGE FLOWS BETWEEN PATENTS AND INNOVATION

OUTCOMES

The link between knowledge, i.e., fundamental scientific and engineering concepts, and

innovation-related outcomes, such as introduction of novel products or launch of a new venture,

is subject to substantial social science research. Appropriating benefits from inventive activity is

fraught with problems, including technical uncertainty, market uncertainty, and difficulty

preventing encroachment on ideas (Arrow, 1962; Levin et al., 1985). Gains from invention can

happen through licensing if markets for technological ideas are well-developed (Arora et al.,

2001). Startups may engage in cooperative R&D with larger firms to bring inventions to market

Page 5: Hard to get? Knowledge Heritage from Invention to Product ... · 2 INTRODUCTION Scholars studying innovation, entrepreneurship, and strategy continually seek to understand how firms

5

(Aggarwal and Hsu, 2009). However, even established firms face a difficult path to

commercialize invention (Gittelman and Kogut, 2003). Thus, a fundamental question that vexes

strategy scholars is: does the inventive activity captured in patents (and patent citations) translate

into product innovation? In other words, how reliable is the link between invention, often studied

through patents, and innovation, i.e. introduction of novel products to market?

Patents are an enduring measure of inventive activity (Griliches, 1990). The logic of tracing

knowledge flows through patent citations is straightforward. Backward citations to prior patents

represent a legally enforceable boundary between existing prior art and the new invention, which

must be both novel and non-obvious (Trajtenberg 1990). Thus, patent citations are widely used

to measure knowledge transfer. However, the extent of knowledge flow is clouded by addition of

a substantial proportion of citations by patent examiners after the patent is filed, rather than the

inventor (Alcácer et al., 2009). As well, patents and citations do not indicate whether the

invention results in an innovative product being introduced to market. This distinction is far from

trivial, as the stark example of Kodak illustrates. Despite possessing a valuable patent stock

resulting from decades of inventive activity, Kodak was forced to declare bankruptcy, largely

due to the failure to introduce successful product market innovations. For example, Kodak began

receiving patents related to digital photographic technology in the 1970s; however, the company

did not introduce its first digital camera to market much later and then only as a reluctant market

entrant (The Economist, 2012). Finally, reliance on patent protection varies significantly by

industry (Levin et al., 1987).

Recognizing that patent citations and citation patterns alone do not allow a complete picture

of the transfer of knowledge in the innovation process, scholars increasingly seek new

approaches to probe the relationship between knowledge and innovation. Indeed, a surprising

divergence exists between scientific knowledge and inventive success (Gittelman and Kogut,

Page 6: Hard to get? Knowledge Heritage from Invention to Product ... · 2 INTRODUCTION Scholars studying innovation, entrepreneurship, and strategy continually seek to understand how firms

6

2003). Roach and Cohen (2013) link survey data with patent citations to pinpoint the

contribution of knowledge flows from public institutions to industrial innovation. They find that

patent citations underrepresent certain types of knowledge flows, such as contracted academic

research. They also find that citation patterns are influenced by strategic concerns.

Alternatively, borrowing from computer science, Kaplan and Kavili (2014) deploy close textual

analysis using topic modeling based on latent Dirichlet analysis to understand the evolution of

new cognitive breakthroughs based on patent abstracts in the context of nanotechnology.

Using Natural Language Processing to Link Invention, Knowledge, and Innovation

Outcomes

Text matching algorithms can be used to quantify overlap between distinct documents

reflecting inventive activity, e.g. text in patent documents, and product innovations, e.g., text in

pre-market approval (PMA) documents.

The basic intuition and application of natural language processing techniques (NLP) and

information retrieval (IR) are covered in detail in the computer science literature (Kwon and Lee,

2003; Manning et al., 2008; Salton and McGill, 1986; Salton et al., 1975). Briefly, IR as a field

can be thought of as ‘… finding material (usually documents) of an unstructured nature (usually

text) that satisfies an information need from within large collections (usually stored on

computers)(Manning et al., 2008, p.1). IR covers wide ground: open-ended web search, domain-

specific search (such as a patent database), and personal information. Technically, IR and NLP

are distinct fields, where the former focuses on probabilistic approaches and the latter on the use

of “natural” (as opposed to “controlled”) queries (Lewis and Jones, 1996). However, although

these fields were largely separate for decades, they are becoming increasingly interchangeable as

NLP adopts probabilistic approaches and IR broadens in scope (Lease, 2007; Lewis and Jones,

1996; Manning and Schuetze, 1999; Singhal, 2001).

Page 7: Hard to get? Knowledge Heritage from Invention to Product ... · 2 INTRODUCTION Scholars studying innovation, entrepreneurship, and strategy continually seek to understand how firms

7

This paper focuses on domain-specific search (Manning et al., 2008, p. 2). Search space can

be conceptualized as a query seeking answers relevant to a question at hand. In this case, the

query—what is relevant to innovation?—results in looking at outcomes that reside in relatively

large collections of documents—e.g., patent or PMA databases— and searching for the kernel of

knowledge that is valuable to the relevant outcomes. Mathematically, IR weights the relative

similarity of knowledge from two sources of text to facilitate efficient identification, based on

the criteria of precision, i.e., the proportion of correct documents retrieved, and recall, i.e., the

proportion of relevant documents returned (Lease, 2007; Lewis and Jones, 1996; Singhal, 2001).

The vector-space model underlies this process and enables direct quantification of document

similarity between a given pair of retrieved texts. This paper algorithmically executes a series of

steps to simplify text documents, assign relative weights to words, and map them into n-

dimensional vector space. A meaningful overlap between these vectors is then computed to

provide a metric for document comparison (Salton and McGill, 1986; Salton et al., 1975; Singh

and Dwivedi, 2012). As such, this method is considered to be a ranked retrieval model (Manning

et al., 2008). These algorithms comprise the backbone of search engines (Google, 2013a, b;

Kwon and Lee, 2003; Singhal, 2001)..1

Several papers in strategy research look to variants of NLP methodologies to quantify the

relative importance of words within a given text. Menon et al. (2018) use NLP to analyze

strategy change and differentiation from rivals. Lee and James (2007) use centering resonance

analysis (CRA), which utilizes network positioning of words in text to determine influence, in

this case to relate gender to media slant in the appointment of new chief officers. Ronda-Pupo

and Guerras-Martin (2012) use co-word analysis, in which they measure the likelihood of two-

1 To be precise, latent semantic analysis (also referred to as latent semantic indexing) is also based on word stemming and the

vector space model, in common with the IR methodology applied in this paper. However, latent semantic analysis involves development of a matrix representation of each document, which then needs to be decomposed (using singular value decomposition) in order to compare documents (Hofmann, 1999).

Page 8: Hard to get? Knowledge Heritage from Invention to Product ... · 2 INTRODUCTION Scholars studying innovation, entrepreneurship, and strategy continually seek to understand how firms

8

words appearing together in a given document against a probabilistic backdrop, to study the

evolution of research concepts in the field of strategic management. In a similar fashion,

Kabanoff and Brown (2008), apply machine learning techniques based on text classifying

algorithms to examine the evolution of strategic discourse by CEOs. In the context of innovation,

and closely related to the methodology in this paper, Kaplan and Vakili (2014) use topic

modeling to trace the evolution of breakthrough ideas in the emergent field of nanotechnology

based on a form of natural language processing known as latent Dirichlet allocation (LDA)

analysis of patents. This approach is based on probabilistic modeling of co-occurrence of words

to infer meaning about ‘latent’ topics (Blei et al., 2003; Chang et al., 2009).

RESEARCH CONTEXT I: CORPORATE VENTURE CAPITAL AND

COMMERCIALIZATION OF NOVEL PRODUCTS

Innovation propels sustainable competitive advantage over rivals. Breakthrough innovations,

in particular, confer advantages in entering new product areas or creating altogether new product

markets are thus a holy grail. The strategic management literature increasingly identifies a broad

set of search mechanisms that firms use to achieve breakthrough innovations (Maula et al., 2012;

Narayanan et al., 2009). Key among these is the practice of corporate venture capital (CVC)

investment, where large corporate investors make equity investments in startups to gain access to

novel knowledge (Dushnitsky and Lenox, 2005). Research spanning strategy and finance

suggests that CVC results in innovation related returns to investors (Benson and Ziedonis, 2009;

Chemmanur et al., 2014; Winston Smith and Shah, 2013)

While startups afford new ideas for CVC investors, this knowledge is hard to transfer back

to the corporate innovation context. In part, this is a natural outcome of a variety of

organizational features, such as the distinctive innovation climates in startups relative to large

corporations or potential misalignment of incentives (Dushnitsky and Shaver, 2009; Lerner,

Page 9: Hard to get? Knowledge Heritage from Invention to Product ... · 2 INTRODUCTION Scholars studying innovation, entrepreneurship, and strategy continually seek to understand how firms

9

2013). As well, the research may be more exploratory (Wadhwa and Kotha, 2006). CVC

investors state strategic goals such as bringing new products to market as the reason for their

investments; however, CVC is a long-term strategy, and resulting products rarely display an

identifiable knowledge path back to the original investment (Lerner, 2013). Moreover, new

products often draw upon a variety of complementary internal and external investments, making

it difficult to attribute the value contributed by any one source. Given the marked potential

attenuation of knowledge, managers and scholars require particularly precise measures to assess

the path from investment to corporate innovation.

This computational approach can be implemented to link underlying scientific and

technological knowledge contained in patent documents—i.e., the knowledge heritage— to

introduction of novel products. Notably, the methodology can be applied in any setting where

texts can be compared to one another, making it broadly applicable in strategic management

scholarship. Here, the methodology is used to address a vexing question in CVC research: How

can we measure the value companies derive from knowledge search, and, more specifically, what

innovation-related outcomes can companies attribute to CVC investments?

The primary empirical context here is the medical device industry, an industry driven by

innovation and hallmarked by extensive CVC activity (Standard and Poor's Corporation, 2007).

The medical device industry is an excellent setting to demonstrate the power of these IR

techniques. The medical device industry provides text to analyze that reflects both invention and

innovation outcomes. Firms in this industry nearly universally rely on patents, which codify the

underlying scientific and technological knowledge, to define intellectual property rights (Levin et

al., 1985). Firms must file for PMA by the U.S. Food and Drug Administration (FDA) to

introduce a new medical device in the marketplace, thus documenting the innovation. Moreover,

clinical trials are costly—over $100 million by some estimates—and PMA approval equates with

Page 10: Hard to get? Knowledge Heritage from Invention to Product ... · 2 INTRODUCTION Scholars studying innovation, entrepreneurship, and strategy continually seek to understand how firms

10

introduction to market (Ernst and Young, 2009).

The medical device industry also is characterized by fast innovation cycles. Achieving and

sustaining an edge in innovation is crucial to the introduction of new devices to market. (Ernst

and Young, 2011). To this end, device companies investment routinely in startups for access to

external sources of innovative ideas (Ernst and Young, 2011; PricewaterhouseCoopers, 2006).

From a methodological perspective, CVC investment in startup companies provides a transparent

case to track knowledge heritage: the patents are filed by and belong to separate companies but

the knowledge can be linked to the investors’ PMAs. Finally, from a strategy perspective, the

medical device industry continues to be fertile setting for strategy scholars to examine important

aspects of innovation (Chatterji et al., 2008; Karim and Mitchell, 2000; Winston Smith and Shah,

2013; Wu, 2013). For policy makers, understanding the link between knowledge and innovation

helps save lives and decrease costs (Winston Smith and Sfekas, 2013).

Selection of text documents to analyze

The text matching approach above is applied to a unique dataset consisting of all CVC

investments in the medical device industry made by the 4 largest medical device-CVC investors

in this industry (Medtronic, Boston Scientific, Johnson and Johnson, and Guidant) over the

period 1978-2007. One hundred and twenty-four unique startup companies received at least one

round of CVC investment by at least one of these four incumbents over the sample period. To

capture inventive activity, the full text of all focal patents (i.e., patents filed by the four

incumbents that cited the startup companies) and all of the startup patents filed between 1978-

2010 were downloaded using the US Patent and Trademark Office database. Patents must cite

related prior patents, enabling us to link each incumbent’s patents to the startup companies’

patents. To trace innovation, all PMAs filed by these CVC investors were extracted from the full

U.S. FDA Pre-market Approvals database. Together, these four companies filed 170 PMAs

Page 11: Hard to get? Knowledge Heritage from Invention to Product ... · 2 INTRODUCTION Scholars studying innovation, entrepreneurship, and strategy continually seek to understand how firms

11

between 1978 and 2007. The PMAs were then matched to the 1,954 patents filed by the startups.

(An illustrative case is extracted from this sample to demonstrate the steps below.)

To illustrate the approach, patent texts from Image-Guided Neurologics, a startup in which

Medtronic invested, and Medtronic’s own patents are compared to texts describing innovations

that were commercialized and introduced to market by Medtronic: 1) Activa for Deep Brain

Stimulation (DBS); 2) Interstim Sacral Nerve Stimulator; 3) Synchromed Infusion Pump; and, 4)

Intrel Spinal Cord Stimulator.

The methodological goal is to create an analytical comparison between two sets of text

documents, those representing inventions (in this case, patents) and product innovations (in this

case, PMA applications), to allow quantification of how closely related these texts are to one

another. In this manner, applying IR theory and techniques creates an original measure that links

the knowledge heritage of product market innovations by the CVC investor back to the startups

in which it invested. Practically, these steps are accomplished using the open-source

programming language Python and associated modules. Python is fast, well-documented, and

continually augmented through user-generated modules for a variety of tasks undergirding IR

and ‘big data’ analysis (Prechelt, 2000).

Algorithmic calculation of knowledge similarity between pairs of text documents

The vector-space model identifies common elements between document pairs. Once the

texts are processed, the intersection of the two dictionaries (in this case, the ‘patent dictionary’

and the ‘PMA dictionary’) is generated. The number of words, N, in each dictionary defines an

N-dimensional vector. The intersection vector is composed of elements whose magnitude is

given by the tf-idf score. Practically, this is accomplished using Python code for precise vector

computations to calculate the term frequency (tf) and inverse document frequency (idf) measures

described above. At this point, each document (i.e., the patent document and the PMA

Page 12: Hard to get? Knowledge Heritage from Invention to Product ... · 2 INTRODUCTION Scholars studying innovation, entrepreneurship, and strategy continually seek to understand how firms

12

document) is now represented as a vector in which individual words are the vector components

and the tf-idf weights give the magnitude of each component. The final computational task

involves quantifying the extent to which the two documents are related by computing the cosine

distance between the two vectors in common space. This information is used to calculate overlap

between each patent document relative to each PMA document.

Illustrative example: Medtronic and Imaged-Guided Neurologics

The tension between the quest for relevant external knowledge and the challenge of

incorporating that novel knowledge in a way that benefits the investor lies at the heart of CVC

investing (Dushnitsky and Shaver, 2009; Wadhwa and Kotha, 2006). In an overview of CVC

practices, Lerner (2013) observes: ‘Knowledge doesn't automatically flow from start-ups to the

large organizations that have invested in them—at least not in a timely manner.” Nonetheless,

he concludes: ‘For companies that have found traditional in-house research unequal to the task

of generating valuable insights into next-generation technologies or the movements of the

market, the creation of a venture fund might well prove to be what executives are always looking

for— the breakthrough idea that changes everything.”

The methodology in this paper opens a wider window on tracing the often-oblique path from

breakthrough ideas to novel innovations. The surrounding context is the explosive potential of

the neuromodulation market for Medtronic and rival device makers (Arndt, 2005b; Medtronic

Inc., 2014a). To elucidate the power of the textual analysis methodology, four devices for which

Medtronic filed PMA applications (Activa for Deep Brain Stimulation (DBS), Intrel Spinal Cord

Stimulator, Interstim Sacral Nerve Stimulator, and Synchromed Infusion Pump) are analyzed in

comparison to two sets of patent documents: 1) cited patents of Image-Guided Neurologics

(IGN), a startup company, and 2) the focal Medtronic patents that cite the IGN patents. These

devices share common scientific and technologic principles with each other (and with cardiac

Page 13: Hard to get? Knowledge Heritage from Invention to Product ... · 2 INTRODUCTION Scholars studying innovation, entrepreneurship, and strategy continually seek to understand how firms

13

pacemakers), i.e., implantation of tiny electrical stimulators in the body. At the same time, each

organ system in the body requires specific expertise and capabilities.

In 2003, Medtronic invested in IGN, a startup with cutting-edge development of a

‘frameless’ stereotactic head frame. The investment represented a continuing foray into the

expanding area of neuromodulation, which relied on implantation of precisely targeted

stimulators in the brain to treat multiple issues, including Parkinson’s disease and epileptic

seizures (Arndt, 2005b). Neuromodulation represented a growing area of exploration for

Medtronic. On one hand, the underlying principles were complementary to Medtronic’s core

expertise in cardiac rhythm management—for example, in online Q&A Medtronic notes that

DBS is ‘sometimes called a brain pacemaker” (Medtronic Inc., 2014b). On the other hand, in

response to the question: ‘The pacemaker has been around for more than 40 years. Why has it

taken so long to find other applications for this device?” then CEO Dr. Steven Oesterle

commented on the difficulty of succeeding in this market: ‘ … the barriers to entry to the IPG

[implanted pulse generators] market are huge. These things are extremely complicated and

difficult to make, so you don't see people say, "Gee, I could go do that," raise $10 million, and

get working. The info-tech hurdles are as big as they get.’ (Arndt, 2005a). In the same interview,

however, he presciently observed: ‘This is a huge market. It is underdeveloped and

underpenetrated, and it's where growth will be in the medical-device industry over the next

decade.” (Arndt, 2005a). In other words, this represented an area of technology that was both

difficult to master but crucial to future growth, i.e., the ultimate logic underlying CVC investing.

Table 1 presents cosine similarity matches between the IGN patents cited by focal

Medtronic patents and each of the Medtronic PMAs. These scores show different degrees of

connection to each of the PMAs. Typically, a medical device builds on multiple patents. The

values suggest the highest relative similarity is between IGN’s patent #720480 (“Deep organ

Page 14: Hard to get? Knowledge Heritage from Invention to Product ... · 2 INTRODUCTION Scholars studying innovation, entrepreneurship, and strategy continually seek to understand how firms

14

access device and method”) and Medtronic’s Activa for Deep Brain Stimulation PMA.

Interestingly, the degree of similarity (cosine similarity score 0.109) is close to that of

Medtronic’s focal patent #736651 (“Robotic trajectory guide”) that cited this IGN patent to the

same PMA (cosine similarity score 0.115). Moreover, the average cosine similarity of all cited

IGN patents (0.08118) exceeds the average score for the focal Medtronic patents (0.06617),

suggesting relatively stronger knowledge linkage between IGN’s patents. Table 2 provides a

fuller flavor of the analysis by showing underlying representative word stubs and associated tf-

idf weights for PMA 960009 associated with Medtronic’s Activa Neurostimulator and IGN’s

cited patents.

-------------------------------------------------

Insert Table 1 & Table 2 about here

-----------------------------------------------------

Common knowledge heritage amongst the devices is also reflected in the scoring. The next

highest set of scores for IGN patents is with Medtronic’s PMA 970004, the Interstim device for

treatment of incontinence (0.07119); again the IGN patents score higher on average than those of

Medtronic (0.05932). The average score for PMA 860004, the Synchromed Infusion Pump

(0.05775) is slightly lower than that for Medtronic (0.07949) but is similar to IGN’s match with

Activa and Interstim. Interestingly, these three devices share more common physiological

properties than does the Intrel spinal stimulator (Arndt, 2005a; Medtronic Inc., 2014a). Patents

from both IGN and Medtronic are relatively less connected with the spinal stimulator than the

other three devices. Moreover, IGN’s score (0.03891) is relatively lower than Medtronic

(0.04002).

The algorithmic textual analysis used in this example underscores more general points. First,

Page 15: Hard to get? Knowledge Heritage from Invention to Product ... · 2 INTRODUCTION Scholars studying innovation, entrepreneurship, and strategy continually seek to understand how firms

15

the relative magnitude of the cosine similarity scores provides basis for comparison between

different sets of documents, e.g. IGN patents matched to a Medtronic PMA, relative to

Medtronic patents matched to the same PMA. In this manner, the scores permit inference about

the relative strength of the linkage. Alternatively, the scores could also be used to set a threshold

level. Second, as given, these examples provide a substantive view inside the scoring and the

deep knowledge it can provide. However, it is worth noting that knowledge-match scores are

computed for all potential CVC investor-startup dyads, resulting in a full matrix that quantifies

how closely every CVC investor's PMA is related in context-dependent textual analysis space to

the patents of each startup. This matrix can then be used as an input for econometric analysis.

Ultimately, Medtronic acquired IGN in 2005 (Medtronic Inc., 2005). Neuromodulation

accounted for $1.7 Billion in sales and about 11% of total revenue in 2012 (Medtronic Inc.,

2013).

RESEARCH CONTEXT II: COMMERCIALIZATION AND NEW VENTURE

CREATION

To illustrate the versatility of this methodology beyond the medical device setting (where

the existence of PMA documentation provides a textual record of innovation), a second example

is provided below. The knowledge heritage of Vocera, a startup in the communications

equipment industry is linked from its patents to textual measures of innovation leading to

commercialization (product reviews) and new venture creation (IPO prospectus) to demonstrate

the broader usefulness of this approach.

Text Matching Vocera patents, IPO prospectus, and product reviews

A growing question in the strategy and CVC literature focuses on the impact of CVC

investment prior to IPO on the startup (Chemmanur et al., 2014; Katila et al., 2008; Park and

Steensma, 2012). In order to elucidate generality of this methodology beyond the medical device

Page 16: Hard to get? Knowledge Heritage from Invention to Product ... · 2 INTRODUCTION Scholars studying innovation, entrepreneurship, and strategy continually seek to understand how firms

16

setting, the second emprical context applies this methodology to analyzing the knowledge

heritage in product innovations and the IPO prospectus of Vocera (VCRA), a communications

and software company. Prior to its IPO in 2012, Vocera received CVC investment from

Motorola, Cisco, and Intel (Albenesius, 2005; Lipset, 2003). This context illustrates how this

methodology can be used to more deeply probe the relationship between CVC investment and

startup innovation, including the impact on the commercialization by the startup and launching

of a new venture.

-------------------------------------------------

Insert Table 3 about here

-----------------------------------------------------

Table 3 presents representative results from this analysis applied to Vocera’s patents and to

textual measures of innovation leading to commercialization (product reviews) and new venture

creation (IPO prospectus). As well, this table points to the variety of documents that can be

mined for insights into innovation outcomes. As in the prior example, these results suggest that

Vocera leaned on its technical knowledge heritage in commercialization of new products.

Columns 1-3 analyze the degree of similarity between Vocera’s 16 patents and innovation

outcomes, represented through different product reviews of newly commercialized products. The

results suggest that Vocera introduced product innovations that heavily encompass the

underlying technical knowledge, as evidenced by substantially large cosine similarity scores

between the patent portfolio and each of the three technical reviews. Column 4 also tracks the

knowledge relationship to the products, but uses a professional medical journal article for the

textual record of the product.

Column 5 applies this semantic matching methodology to trace the knowledge heritage

Page 17: Hard to get? Knowledge Heritage from Invention to Product ... · 2 INTRODUCTION Scholars studying innovation, entrepreneurship, and strategy continually seek to understand how firms

17

embedded in Vocera’s S-1 form filed with the SEC. Specifically, the IPO prospectus is compared

to Vocera patent portfolio. In this case, the cosine similarity results suggest that Vocera

leveraged its technical knowledge in reaching IPO. The literature suggests that patents help new

ventures launch successfully. These patents may serve as a signaling mechanism for new firms

(Hsu and Ziedonis, 2013). An additional explanation may be that the patents represent the

knowledge that forms the backbone of the new venture. This example thus provides insight into

questions of scholarly significance that are distinct from those presented in the main analysis.

Future research can include broader and comparative analytics to further address these questions.

DISCUSSION AND CONCLUSION

This paper presents a significant advance in integrating computer science IR methods with

questions at the frontiers of strategy research. Strategic management scholars have focused

substantial attention on the relationship between knowledge and innovation for firms’

competitive survival. An important strand within the literature examines external knowledge

sourcing as a key component of the innovation process. Yet scholars have also been limited in

trying to link the knowledge that results from inventive activity to introduction of novel products

that result in advantage over rivals. This methodology provides scholars with new tools to

reliably and reproducibly disentangle knowledge that is inherently embedded in text, and to

deploy this insight to more deeply probe a full range of the innovation process and outcomes.

More broadly, this methodology enables strategy scholars to effectively incorporate

language-based metrics as data. This question-focused paper details an innovative methodology

to objectively link the underlying knowledge components in patent documents with product

innovation. This paper brings an intersection of the insights and techniques from IR—that

language itself can be valuable data through mathematical manipulation— with the insight from

strategic management that innovation is a complex process involving far from transparent flows

Page 18: Hard to get? Knowledge Heritage from Invention to Product ... · 2 INTRODUCTION Scholars studying innovation, entrepreneurship, and strategy continually seek to understand how firms

18

of knowledge from a variety of sources.

External validity of the text matching methodology in related disciplines

While application of this methodology is novel in the strategic management literature, the

techniques themselves have been validated in computer science (Manning et al., 2008).

Researchers in domains relevant to strategic management recently have started to draw

inspiration from this approach as well. In the finance literature, Hoberg and Phillips (2010)

develop a similar vector space model to identify product market synergies in mergers and

acquisitions, and Hanley and Hoberg (2010) analyze the relationship between IPO prospectuses

and pricing. A similar approach was used to quantify firm fundamentals and investor sentiment

(Tetlock, 2007; Tetlock et al., 2008). Recently, scholars in the management of information

systems literature have turned to vector space techniques to link the similarity between consumer

preferences and online consumer rankings (Archak et al., 2011) and to study the boundaries of

online communities (Van Alstyne and Brynjolfsson, 2005).

Strengths and limitations

This methodology has both strengths and limitations. The methodology works well to study

innovation in a setting such as the medical device industry, where documents are available to

trace the knowledge heritage from patent to product innovation. More broadly, this methodology

is well suited to industries that utilize patenting and where there are readily available descriptions

of product innovations. Thus, this might easily be deployed in the context of the oft-studied

pharmaceutical and biotechnology industry (Cardinal, 2001; Dunlap-Hinkler et al., 2010;

Higgins and Rodriguez, 2006; Rothaermel, 2001).

On the other hand, in industries that do not rely on patents, or in in complex technology

industries, measures of invention take greater care to identify. However, while this same

criticism might be applied to existing approaches that rely on patent counts and citations, it is

Page 19: Hard to get? Knowledge Heritage from Invention to Product ... · 2 INTRODUCTION Scholars studying innovation, entrepreneurship, and strategy continually seek to understand how firms

19

helpful to also note that the flexibility of the methodology here is that it can be applied to any

text document (not just patents). The second example provides insights in this direction. As the

expansion of digitized text documents continues to explode, it is easier to trace the digital trail of

knowledge flows beyond patents, e.g., in technology blogs, online forums, LinkedIn data,

interview transcripts, IPO documents, and so on. Likewise, the methodology can be extended to

scientific, technical, and professional publications to capture measures of invention outside of

patent documents.

Page 20: Hard to get? Knowledge Heritage from Invention to Product ... · 2 INTRODUCTION Scholars studying innovation, entrepreneurship, and strategy continually seek to understand how firms

20

REFERENCES

Aggarwal VA, Hsu DH. 2009. Modes of cooperative R&D commercialization by start-ups. Strategic Management Journal 30(8):835-864.

Albenesius C. 2005. Motorola Invests In Vocera. PC Magazine April 25, 2007. http://www.pcmag.com/article2/0,2817,2122072,00.asp [May 31, 2014].

Alcácer J, Gittelman M. 2006. Patent Citations as a Measure of Knowledge Flows: The Influence of Examiner Citations. Review of Economics and Statistics 88(4):774-779.

Alcácer J, Gittelman M, Sampat B. 2009. Applicant and examiner citations in U.S. patents: An overview and analysis. Research Policy 38(2):415-427.

Archak N, Ghose A, Ipeirotis PG. 2011. Information Systems in Management Science

Deriving the Pricing Power of Product Features by Mining Consumer Reviews. Management

Science 14(2):B-114-B-120.

Arndt M. 2005a. Online extra: Dr. Oesterle's stimulating work. In Business Week Online.

Arndt M. 2005b. Rewiring the body. In Business Week; 74-82.

Arora A, Fosfuri A, Gambardella A. 2001. Markets for technology: the economics of innovation

and corporate strategy. MIT Press: Cambridge, MA.

Arrow KJ. 1962. Economic Welfare and the Allocation of Resources to Invention. In The Rate

and Direction of Inventive Activity: Economic and Social Factors, Nelson RR (ed). Princeton University Press.: Princeton; 609-625.

Benson D, Ziedonis RH. 2009. Corporate Venture Capital as a Window on New Technologies: Implications for the Performance of Corporate Investors When Acquiring Startups. Organization Science 20(2):329-351.

Blei DM, Ng AY, Jordan MI, Lafferty J. 2003. Latent Dirichlet Allocation. Journal of Machine

Learning Research 3(4/5):993-1022.

Cardinal LB. 2001. Technological Innovation in the Pharmaceutical Industry: The Use of Organizational Control in Managing Research and Development. Organization Science 12(1):19-36.

Chang J, Boyd-Graber J, Wang C, Gerrish S, Blei DM. 2009. Reading Tea Leaves: How Humans Interpret Topic Models. Neural Information Processing Systems. http://www.cs.princeton.edu/~blei/papers/ChangBoyd-GraberWangGerrishBlei2009a.pdf [2009].

Chatterji AK, Fabrizio K, Mitchell W, Schulman K. 2008. Physician-Industry Cooperation In The Medical Device Industry. Health Affairs 27(6):1532-1543.

Page 21: Hard to get? Knowledge Heritage from Invention to Product ... · 2 INTRODUCTION Scholars studying innovation, entrepreneurship, and strategy continually seek to understand how firms

21

Chemmanur TJ, Loutskina E, Tian X. 2014. Corporate Venture Capital, Value Creation, and Innovation. Review of Financial Studies.

Dunlap-Hinkler D, Kotabe M, Mudambi R. 2010. A story of breakthrough versus incremental innovation: corporate entrepreneurship in the global pharmaceutical industry. Strategic

Entrepreneurship Journal 4(2):106-127.

Dushnitsky G, Lenox MJ. 2005. When do firms undertake R&D by investing in new ventures? Strategic Management Journal 26:947-2005.

Dushnitsky G, Shaver JM. 2009. Limitations to interorganizational knowledge acquisition: the paradox of corporate venture capital. Strategic Management Journal 30(10):1045-1064.

Ernst and Young. 2009. Pulse of the industry: Medical technology report 2009.

Ernst and Young. 2011. Pulse of the industry: Medical technology report 2011.

Gittelman M, Kogut B. 2003. Does Good Science Lead to Valuable Knowledge? Biotechnology Firms and the Evolutionary Logic of Citation Patterns. Management Science 49(4):366-382.

Google I. 2013a. Information Retrieval at Google.

Google I. 2013b. Natural language processing at Google.

Griliches Z. 1990. Patent Statistics as Economic Indicators: A Survey. Journal of economic

literature 28(4):1661-1707.

Hall BH, Jaffe A, Trajtenberg M. 2005. Market Value and Patent Citations. The RAND Journal

of Economics 36(1):16-38.

Hanley KW, Hoberg G. 2010. The Information Content of IPO Prospectuses. Review of

Financial Studies 23(7):2821-2864.

Higgins MJ, Rodriguez D. 2006. The outsourcing of R&D through acquisitions in the pharmaceutical industry. Journal of Financial Economics 80(2):351-383.

Hoberg G, Phillips G. 2010. Product Market Synergies and Competition in Mergers and Acquisitions: A Text-Based Analysis. Review of Financial Studies 23(10):3773-3811.

Hofmann T. Probabilistic Latent Semantic Analysis

. In Proceedings of the Proceedings of the Fifteenth Conference Annual Conference on Uncertainty in Artificial Intelligence (UAI-99). Available at.

Hsu DH, Ziedonis RH. 2013. Resources as dual sources of advantage: Implications for valuing entrepreneurial-firm patents. Strategic Management Journal:n/a-n/a.

Jaffe AB, Trajtenberg M. 2002. Patents, Citations & Innovations: A Window on the Knowledge

Economy. MIT Press: Cambridge, Massachusetts.

Page 22: Hard to get? Knowledge Heritage from Invention to Product ... · 2 INTRODUCTION Scholars studying innovation, entrepreneurship, and strategy continually seek to understand how firms

22

Kabanoff B, Brown S. 2008. Knowledge structures of prospectors, analyzers, and defenders: content, structure, stability, and performance. Strategic Management Journal 29(2):149-171.

Kaplan S, Vakili K. 2014. The Double-Edged Sword Of Recombination In Breakthrough Innovation. Strategic Management Journal:n/a-n/a.

Karim S, Mitchell W. 2000. Path-dependent and path-breaking change: reconfiguring business resources following acquisitions in the U.S. medical sector, 1978–1995. Strategic

Management Journal 21(10-11):1061-1081.

Katila R, Rosenberger JD, Eisenhardt KM. 2008. Swimming with Sharks: Technology Ventures, Defense Mechanisms and Corporate Relationships. Administrative Science Quarterly 53(2):295-332.

Kwon O-W, Lee J-H. 2003. Text categorization based on k-nearest neighbor approach for Web site classification. Information Processing & Management 39(1):25-44.

Lampe R. 2012. Strategic Citation. Review of Economics and Statistics 94(1):320-333.

Lease M. 2007. Natural language processing for information retrieval: the time is ripe (again). Paper presented at the Proceedings of the ACM first Ph.D. workshop in CIKM, Lisbon, Portugal.

Lee PM, James EH. 2007. She-e-os: gender effects and investor reactions to the announcements of top executive appointments. Strategic Management Journal 28(3):227-241.

Lemley MA, Sampat B. 2012. Examiner Characteristics and Patent Office Outcomes. Review of

Economics and Statistics 94(3):817-827.

Lerner J. 2013. Corporate Venturing. In Harvard Business Review. Harvard Business Review: Boston; 86-94.

Levin RC, Cohen WM, Mowery DC. 1985. R&D Appropriability, Opportunity, and Market Structure: New Evidence on Some Schumpeterian Hypotheses. American economic

review 75(2):20-24.

Levin RC, Klevorick A, Nelson RR, Winter SG. 1987. Appropriating the Returns from Industrial Research and Development. Brookings Papers on Economic Activity 1987(3):783-820.

Lewis DD, Jones KS. 1996. Natural language processing for information retrieval. Commun.

ACM 39(1):92-101.

Lipset V. 2003. Vocera secures funding from Intel, Cisco. Wi-Fi Planet September 5, 2003. http://www.wi-fiplanet.com/voip/article.php/3073271/Vocera-Secures-Funding-from-Intel-Cisco.htm [May 30, 2014].

Manning C, Raghavan P, Schutze H. 2008. Introduction to Information Retrieval. Cambridge University Press: Cambridge.

Page 23: Hard to get? Knowledge Heritage from Invention to Product ... · 2 INTRODUCTION Scholars studying innovation, entrepreneurship, and strategy continually seek to understand how firms

23

Manning C, Schuetze H. 1999. Foundations of Statistical Natural Language Processing. MIT Press: Cambridge.

Maula MVJ, Keil T, Zahra SA. 2012. Top Management's Attention to Discontinuous Technological Change: Corporate Venture Capital as an Alert Mechanism. Organization

Science.

Medtronic Inc. 2005. Medtronic Acquires Image-Guided Neurologics; Broadens Portfolio Of Products For Improved Deep Brain Stimulation Surgical Procedures.

Medtronic Inc. 2013. Annual Report.

Medtronic Inc. 2014a. Business i\overview: Neuromodulation.

Medtronic Inc. 2014b. Questions and answers about DBS for OCD.

Menon A, Choi J, Tabakovic H. 2018. What You Say Your Strategy Is and Why It Matters: Natural Language Processing of Unstructured Text. Academy of Management

Proceedings 2018(1):18319.

Narayanan VK, Yang Y, Zahra SA. 2009. Corporate venturing and value creation: A review and proposed framework. Research Policy 38(1):58.

Park HD, Steensma HK. 2012. When does corporate venture capital add value for new ventures? Strategic Management Journal 33(1):1-22.

Prechelt L. 2000. An empirical comparison of seven programming languages. Computer 33(10):23-29.

PricewaterhouseCoopers. 2006. Corporate Venture Capital Activity On the Rise in 2006. Washington, DC.

Roach M, Cohen WM. 2013. Lens or Prism? Patent Citations as a Measure of Knowledge Flows from Public Research. Management Science 59(2):504-525.

Ronda-Pupo GA, Guerras-Martin LÁ. 2012. Dynamics of the evolution of the strategy concept 1962–2008: a co-word analysis. Strategic Management Journal 33(2):162-188.

Rothaermel FT. 2001. Complementary assets, strategic alliances, and the incumbent's advantage: an empirical study of industry and firm effects in the biopharmaceutical industry. Research Policy 30(8):1235-1251.

Salton G, McGill MJ. 1986. Introduction to Modern Information Retrieval. McGraw-Hill: New York.

Salton G, Wong A, Yang CS. 1975. A vector space model for automatic indexing. Commun.

ACM 18(11):613-620.

Singh JN, Dwivedi SK. Analysis of Vector Space Model in Information Retrieval. In Proceedings of the National Conference on Communication Technologies & its impact

Page 24: Hard to get? Knowledge Heritage from Invention to Product ... · 2 INTRODUCTION Scholars studying innovation, entrepreneurship, and strategy continually seek to understand how firms

24

on Next Generation Computing CTNGC 2012. Available at.

Singhal A. 2001. Modern information retrieval: A brief overview. Bulletin IEEE Computer

Society Technical Committee on Data Engineering 24(4):35–43.

Standard and Poor's Corporation. 2007. Healthcare: Products and Supplies. In Standard and

Poor's Industry Surveys.

Tetlock PC. 2007. Giving Content to Investor Sentiment: The Role of Media in the Stock Market. The Journal of Finance 62(3):1139-1168.

Tetlock PC, Saar-Tsechansky M, Macskassy S. 2008. More than Words: Quantifying Language to Measure Firms' Fundamentals. The Journal of Finance 63(3):1437-1467.

The Economist. 2012. The last Kodak moment? The Economist January 14, 2012. http://www.economist.com/node/21542796 [January 14, 2012].

Trajtenberg M. 1990. A penny for your quotes : Patent citations and the value of innovations. RAND Journal of Economics 21(1):172-187.

Van Alstyne M, Brynjolfsson E. 2005. Global Village or Cyber-Balkans? Modeling and Measuring the Integration of Electronic Communities. Management Science 51(6):851-868.

Wadhwa A, Kotha S. 2006. Knowledge creation through external venturing: Evidence from the telecommunications equipment manufacturing industry. Academy of Management

Journal 49(4):1-17.

Winston Smith S, Sfekas A. 2013. How Much do Physician-Entrepreneurs Contribute to New Medical Devices? Medical Care 51(5):461-467.

Winston Smith S, Shah SK. 2013. Do Innovative Users Generate More Useful Insights? An Analysis of Corporate Venture Capital Investments in the Medical Device industry. Strategic Entrepreneurship Journal 7(2):151-167.

Wu B. 2013. Opportunity costs, industry dynamics, and corporate diversification: Evidence from the cardiovascular medical device industry, 1976–2004. Strategic Management

Journal:n/a-n/a.

Ziedonis RH. 2004. Don 't Fence Me In: Fragmented Markets for Technology and the Patent Acquisition Strategies of Firms. Management Science 50(6):804-820.

Page 25: Hard to get? Knowledge Heritage from Invention to Product ... · 2 INTRODUCTION Scholars studying innovation, entrepreneurship, and strategy continually seek to understand how firms

25

Table 1. Cosine Similarity Scores for Medtronic PMAs: Medtronic and Image-guided Neurologics Patents

Image-

Guided

Neuro.

Cited

Patent #

PMA

960009

Activa DBS

PMA

840001

Itrel- Spinal Stim.

PMA

970004

Interstim-Incontinen

ce

PMA

860004

Synchromed

Infusion Pump

Medtronic

Focal

Patent #

PMA

960009

Activa DBS

PMA

840001

Itrel- Spinal Stim.

PMA

970004

Interstim-Incontinen

ce

PMA

860004

Synchromed

Infusion Pump

6902569 0.06751 0.03692 0.07123 0.05452 7497863 0.03312 0.02061 0.03025 0.03394 6902569 0.06751 0.03692 0.07123 0.05452 7517337 0.09224 0.06819 0.07859 0.19924 6793664 0.08074 0.04134 0.08441 0.07092 7497857 0.02437 0.02113 0.02625 0.02083 7204840 0.10897 0.04047 0.05788 0.05105 7366561 0.11495 0.05016 0.10219 0.06396 Average 0.08118 0.03891 0.07119 0.05775 0.06617 0.04002 0.05932 0.07949

* Image-guided Neurologics patent titles cited by Medtronic patents

6902569 Trajectory guide with instrument immobilizer 6793664 System and method of minimally-invasive exovascular aneurysm treatment 7204840 Deep organ access device and method

* Medtronic focal patent titles 7497863 Instrument guiding stage apparatus and method for using same 7517337 Catheter anchor system and method 7497857 Endocardial dispersive electrode for use with a monopolar RF ablation pen 7366561 Robotic trajectory guide

Page 26: Hard to get? Knowledge Heritage from Invention to Product ... · 2 INTRODUCTION Scholars studying innovation, entrepreneurship, and strategy continually seek to understand how firms

26

Table 2. Word Stem tf-idf Scores for Medtronic PMA 960009 and Activa-Image-Guided Neurologics Pair

PMA 96009 (Activa-

Deep Brain Stimulation) Image-guided Neurologics

Patent #

Word stub wj = tf-idf 6793664

Word stem wj = tf-idf 6902569

Word stem wj = tf-idf 7204840

Word stem wj = tf-idf stim 0.53366 dev 0.30664 op 0.11141 dev 0.13184 system 0.47544 op 0.11122 pass 0.10699 access 0.12024 impl 0.29030 system 0.10168 mat 0.07992 brain 0.08441 deep 0.26538 im 0.07706 loc 0.07629 hol 0.06850 control 0.26363 techn 0.05243 cap 0.06832 bur 0.06085 brain 0.26140 extend 0.05033 remov 0.06781 op 0.06067 elect 0.21676 loc 0.04766 paty 0.06418 cap 0.05961 trem 0.21628 ring 0.03498 method 0.05261 deep 0.05464 lead 0.11395 apply 0.03178 system 0.04844 im 0.04833 neurostim 0.08967 brain 0.03165 im 0.03548 loc 0.04317 extend 0.08887 surfac 0.02247 intern 0.03093 altern 0.03967 chang 0.06958 gen 0.01948 respect 0.02539 extend 0.03696 kit 0.06789 method 0.01926 fram 0.02539 system 0.03267 ex 0.06172 docu 0.01786 reduc 0.02380 respect 0.02936 access 0.05273 diamet 0.01666 magnet 0.02351 assembl 0.02360 process 0.04687 adv 0.01643 part 0.02255 remov 0.02217 ad 0.03703 altern 0.01589 hol 0.02158 pass 0.02111 mod 0.03555 mat 0.01589 surfac 0.01958 paty 0.02100 magnet 0.03555 respect 0.01499 apply 0.01938 platform 0.01863 new 0.03555 reson 0.01479 effect 0.01904 correspond 0.01769

Cosine similarity PMA 96009: DBS 0.08074 0.06751 0.10897

Page 27: Hard to get? Knowledge Heritage from Invention to Product ... · 2 INTRODUCTION Scholars studying innovation, entrepreneurship, and strategy continually seek to understand how firms

27

Table 3. Vocera Cosine Similarity Between Vocera Patents, Product Research Papers, Reviews, and IPO Prospectus

networkworld.com

reviews:

February 3, 2003

Wireless network

blankets

Computerworld Australia

March 7, 2007

Wearable Tech

Industry News

October 08, 2013

Vocera ER study

South Med J.

2013;106(3):189-

195

Vocera

Communications

IPO Prospectus S1

February 2011

Vocera Patent #

6892083 0.46859 0.23934 0.15776 0.19883 0.17822 6901255 0.46376 0.25666 0.20004 0.22942 0.24150 7190802 0.13127 0.06790 0.06440 0.07134 0.08875 7206594 0.25715 0.19701 0.17371 0.18186 0.23226 7248881 0.46395 0.23926 0.16564 0.21143 0.18824 7257415 0.46780 0.24362 0.16834 0.21270 0.19417 7310541 0.46140 0.23772 0.16063 0.20489 0.17919 7457751 0.32279 0.18118 0.13118 0.15673 0.18608 7764972 0.33194 0.22582 0.16818 0.20421 0.21323 7953447 0.45202 0.24303 0.16810 0.21724 0.18957 7978619 0.25356 0.25417 0.17135 0.18720 0.21074 8098806 0.44184 0.23946 0.13303 0.16868 0.17696 8121649 0.44336 0.23905 0.16898 0.22102 0.18754 8175887 0.32470 0.18820 0.13722 0.15894 0.18451 8498865 0.42122 0.26889 0.16400 0.22929 0.21349 8626246 0.45254 0.24387 0.16924 0.21915 0.19203

Average Patent similarity 0.38487 0.22282 0.15636 0.19206 0.19103 Vocera Communications

IPO Prospectus 0.18105 0.27202 0.25765 0.13687 1.00000