kandidatuppsats - s u · monitoring patent activity using forecasting techniques bevakning av...

Kandidatuppsats Statistiska institutionen

Bachelor thesis, Department of Statistics

Nr 2015:2

Monitoring patent activity using forecasting techniques

Bevakning av patentaktivitet genom statistiska prognosmetoder

Arian Barakat och Rebin Hosini

Självständigt arbete 15 högskolepoäng inom Statistik III, VT2015

Handledare: Andriy Andreev

Abstract

Monitoring patent activity is one of many ways of keeping track on competitors’technological advancement or the progression of a technological area as a whole,information that typically is beneficial for firms. However, patent applications usuallyremain undisclosed under a time period of 18 months until the publication date and a needof forecasting techniques may therefore arise.

Both long-run as well as short-run forecast models have been proposed in previousstudies and this thesis aims to propose a methodology that combines both long-runand short-run trends into one single model through a “2-point-estimation” method. Aforecast model will be derived from the methodology using a case of study, the CPC classH05B6/64. The two models derived from the methodology are a Logistic-ARIMA(2,3,2)and a Gompertz-ARIMA(2,3,2) model. The latter model turned out to be the relative bettermodel in terms of forecasting performance for the CPC classed technology H05B6/64.

The conclusion is that the proposed methodology may be a suitable approach asan extension to previous suggested forecasting methods, regarding the limitations ofthe thesis. Although the results, there is need of further performance validation of themethodology.

Keywords: Forecasting, Patent activity, Patent information, Sigmoid functions, ARIMA,Logistic, Gompertz

i

Acknowledgment

We would like to express our gratitude of receiving the opportunity of working withIAMIP Sverige AB. It is much appreciated to have received all the useful inputs fromthe employees.

We would also like to give a special thanks to our supervisor Andriy Andreev for theinsightful help and guidance.

ii

Contents

Abstract i

Acknowledgment ii

Contents 1

1 Introduction 31.1 Problem Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.2 Aim and Purpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.3 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41.4 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41.5 Outline of the thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2 Background 52.1 Intellectual Property . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52.2 Usage of patent information in business context . . . . . . . . . . . . . . 6

3 Previous Research 7

4 Thesis Framework 104.1 Data description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104.2 Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

4.2.1 2-point estimation method . . . . . . . . . . . . . . . . . . . . . 114.2.2 Forecast Horizon . . . . . . . . . . . . . . . . . . . . . . . . . . 124.2.3 In-sample and Out-of-sample data . . . . . . . . . . . . . . . . . 12

4.3 Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124.4 Critical viewpoint . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

5 Theoretical Framework 145.1 The forecasting process . . . . . . . . . . . . . . . . . . . . . . . . . . . 145.2 Trend and Stationarity . . . . . . . . . . . . . . . . . . . . . . . . . . . 145.3 Statistical Test and Autocorrelation functions . . . . . . . . . . . . . . . 15

5.3.1 Augmented Dickey-Fuller . . . . . . . . . . . . . . . . . . . . . 155.3.2 ACF and PACF . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

5.4 Sigmoid Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185.4.1 Logistic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185.4.2 Gompertz . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185.4.3 Richards . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195.4.4 Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

5.5 ARIMA/ARMA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205.6 Model Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

5.6.1 R-square and adjusted R-square . . . . . . . . . . . . . . . . . . 215.6.2 Akaike information criterion . . . . . . . . . . . . . . . . . . . . 225.6.3 Forecast performance . . . . . . . . . . . . . . . . . . . . . . . . 22

1

6 Result and Analysis 246.1 Deterministic trend . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

6.1.1 Logistic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 256.1.2 Gompertz . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

6.2 ARIMA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 266.2.1 Logistic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 266.2.2 Gompertz . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

6.3 Forecast . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 316.3.1 Logistic-ARIMA(2,3,2) . . . . . . . . . . . . . . . . . . . . . . 316.3.2 Gompertz-ARIMA(2,3,2) . . . . . . . . . . . . . . . . . . . . . 32

6.4 Model Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 326.4.1 Logistic-ARIMA(2,3,2) . . . . . . . . . . . . . . . . . . . . . . 336.4.2 Gompertz-ARIMA(2,3,2) . . . . . . . . . . . . . . . . . . . . . 346.4.3 Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

7 Discussion and Conclusion 37

References 39

Appendix A 40

Appendix B 45

1 Introduction

1.1 Problem DescriptionBusiness competition is a worldwide known phenomenon that has become a natural partof the economy and it often provides benefits for the end-consumer. Business rivalry ishowever a costly activity for firms and an insight of competitors’ future strategies cantherefore be of importance for the survival of the firm.

An element of business competition is monitoring the competing firms and there isa numerous of important fields to keep track on and ways of doing so. One importantfield is the research and development (R&D) area and a way of monitoring this field isto measure the research activity for each firm or the progression of the technology as awhole. At first, such an approach might strike you as problematic as information of thistype is usually confidential. Suggestions have however been made in order to deal withthis problem of confidentiality. One of many solutions have been proposed by professorHolger Ernst, who suggests that one can use patent documents as a source of information.[7]

Patent documents contain useful information that can be used for several purposes, oneas which professor Ernst suggested is to monitor the competitors’ R&D activity. Patentdocuments are created as individuals or legal entities apply for the exclusive right to exploitan innovation and they are available trough several databases. There were approximately2.35 millions patent applications filed worldwide in 2012, which correspond to around 45000 applications each week [15]. Findings have shown that the number of patents filedcorrelates with the progression of a technology and can therefore be used to monitor theR&D activity among competitors.

There is however limitations of using patent documents as a source of information. Thevery nature of patents is that the application remains undisclosed until the publication datethat usually takes place 18 month after the first filing. This time delay has led to severalattempts of forecasting the number of patents filed for each company or the technologyas a whole in order to monitor the activity level. Both short-run as well as long-runsmodels have been proposed for this problem, but what if one could combine these twotypes of models into one single model? Similar ideas of combining several models havebeen made in other areas and the question is whether this approach is appropriate for thearea of intellectual property (IP).

1.2 Aim and PurposeThe aim of this thesis in statistics is to suggest a potential forecastmethodology thatmergesa long-run and a short-run model in order to forecast the number of filed patents usingonly previously filed patents as a source of information. The purpose is thus to evaluatewhether this suggested methodology is appropriate for the area of IP. If this is the case,the methodology will be an extension to previous suggested methods of how to managethe absence of information during the time period of undisclosed patent applications. Asmentioned earlier, a forecast can be made on both a micro level (firms) as well as on amacro level (technology area). This thesis will focus on the latter and the “heating usingmicrowave” technology (CPS class H05B6/64) will be used as a case of study. The thesisis the result of a request made by a company named IAMIP Sverige AB, an intellectual

3

asset management consulting firm located in Sundbyberg (Stockholm), Sweden.

1.3 MethodThe suggested methodology used in this thesis is a two-phase approach; where adeterministic trend is fitted in the first step and an autoregressive integrated movingaveragemodel in the second step. Amore detailed description of the framework is given ina dedicated part, section 4. The open source software “R” has been used for the necessarystatistical computations.

1.4 LimitationsThere are possibly other sources of information one could use in order to measure theresearch activity of firms or the progression of a technology. However, the sourceof information in this thesis is limited to the previously filed patents within a certaintechnology. The choice of long-run models is limited to the popular Sigmoid functionswhile the short-run is limited to an ARIMA model as suggested in previous studies.Estimation and validation of the proposed forecast methodology and model is restrictedto the case study, which is: number of patents filed for the “heating using microwaves”technology (CPC class: H05B6/64). The time period is set between the year 1941 and2007, which correspond to 67 observations as the time series is provided on a yearly basis.

1.5 Outline of the thesisThe reader will be providedwith a short background description of the intellectual propertyarea and the usage of this property in a business context. This orientation will be givenin section 2. The section will then be followed by a summary of previous researchwithin the area, where previous suggested forecast models and methodologies will bepresented. Both long- and short run models will be introduced, as the purpose of thisthesis is to suggest a forecast methodology that merges these two types of models. Theproposed methodology will thereafter be described in detail in a dedicated part, section 4.A description of the necessary theoretical framework will be given in section 5 and theresult will thereafter be presented in section 6 followed by a discussion and conclusion.

4

2 Background

2.1 Intellectual PropertyIntellectual property or IP is the legal term that refers to non-materialistic property asinnovations and creations of themind. The laws of intellectual property provide the creator(or the owner) of the property exclusive rights to exploit the invention for a limited timeperiod. In other words, the creator can either manufacture a product based on the IP,license the IP or even sell it. There are a variety of intellectual properties that protectsdifferent types of creations. Some common types are: copyright, patents and trademarks[2]. This thesis will only focus on patents as previously stated.

The purpose of patents is to encourage innovations and thus the technical developmentin a society. Due to the properties of patent laws, a patent also give advantages to thecreator in form of exclusive rights to exploit the innovation. This implies that there isa trade-off between the society and the creator itself. However, not every innovation isgranted as a patent. A patent is granted if the innovation fulfills a number of requirements,which is usually examined by a national patent office or a so-called supranational patentoffice as the European patent office (EPO) depending where the patent rights are sought.The patentability of a innovation is established on the terms of novelty, inventiveness andindustrial applicability [11].

The requirements of an invention ensure that the same innovation cannot be patentedmore than once. The novelty requirement implies that the invention must hold the qualityof being new and should thus not be a known invention. The patentability requirement ofinventiveness (or non-obviousness, a term also used) guarantees that the technical solutionof an invention is not obvious for a person whom has ordinary skill in the field where theinvention is directed. The third requirement refers to the applicability to an industry wherethe innovation should solve a technical problem and that the outcome should be invariable.[2]

In order to obtain a patent, one must apply on a national- or an international leveldepending on the strategy of the applicant(s). The applicant(s) might only want to protectthe innovation in their home country and thus filing on a national level is the only relevantchoice. However, if one is interested in protecting several markets, the applicant(s) mustapply for a patent in each country. Thus, there is no such a thing as a world patent.If the latter strategy is chosen, one usually uses supranational procedures. An exampleof a supranational office is the EPO as mentioned. Regardless the strategy, applicantsoften start on a national level before applying in several other countries. The first filing istypically applied in the inventor(s)’s home country or where the research and development(R&D) department is located. This first filing is commonly referred to as the “prioritydocument”.

When a patent is sought in several countries or through supranational procedures forthe same invention, the invention can be granted with a patent in each and one of thecountries. However, since all these patents describe the same innovation they are linkedtogether to the same so-called patent family. When patent information is used for statisticalpurposes, one usually uses the patent families rather than the patent documents by itselfin order to avoid double counting the same invention1. [11]

1The individual patent documents can also be used as e.g. a measurement of firms globalization

5

Since patents can be applied in several countries, it is normal that the applicationsappears in different languages. In order to avoid linguistic difficulties, one can use thepatent classification system. Patents are classified into a hierarchical classification systemwith both alphabetical and numerical symbols. The two systems commonly used todayare: The International Classifications system (IPC) and The Cooperative Classificationsystem (CPC). The latter system was created by EPO and USPTO (United States Patentand Trademark Office) as an attempt to harmonize the classification system.

A patent application remains undisclosed after the filing date (or priority date) as wellas during the pending time until the date of publication, which normally takes place 18months after the filing date. However, the applicant(s) can under any stage withdrawtheir patent application and the innovation will thereby neither be disclosed nor publiclyrecorded as an filed patent. [11].

2.2 Usage of patent information in business contextThe main function of the patent is to provide the creator or the owner of the IP exclusiverights to use the innovation as mentioned earlier. However, the patent documents alsocontain information that can be of interest in many areas. The information can be ofimportance on a macro-, intermediate- as well as on a micro level. Areas where patentinformation can be used reaches from assessing technology development, monitoringcompetitors to analyzing a firm’s technological competence. Thus parties as governments,firms and stakeholders are examples of groups that can use patent information.

In a business context, patent information is often used to provide decision-makersbusiness intelligence (BI) and a numerous implementation areas have been proposed.These are e.g. assessing firm’s own technology, recognizing strategic changes amongthe firm’s competitors or identifying new technological trends among competitors as wellas identifying key inventors within a technological field. Several indicators have beensuggested in order to analyze the different areas. One of the suggested indicators is themeasurement of patent activity, which can be represented by the number of filed patents.By analyzing the number of filed patents, one can obtain insight of competitors R&Dstrategies or focus changes in different technological areas. [7]

The information that patent documents provides have however limitations. Asmentioned in the previous section, a patent is undisclosed under a time period until thepublications date. The applicant can however withdraw the patent application underany stage during this period. Thus applications that are withdrawn will usually notprovide decision-makers any information. Irrespective to whether the patent application iswithdrawn or not, there is an 18 month period where external decision-makers are withoutany information about the possible changes. In order to cope with this time delay, someattempts of forecasting the patent activity has been made [8]. Attempts to forecast in theshort-run as well as in the long run have been made, which will be discussed in section 3

6

3 Previous ResearchExtensive research regarding the area of forecasting patent filings has been madethroughout the course of time and numerous models have been proposed in order toforecast patent activity as well as technology development. Thus, both short- and longrun forecasting models have been applied in different types of studies.

An interesting observation made in several studies is the behavior of filed patentsover time. It has been suggested that the cumulative number of patents generally bearresemblance to a S-shaped curve, also known as the sigmoid curve [6]. The sigmoidfunctions has been applied in several different fields such as medicine, management aswell as in economics. One commonly used term that is related to the sigmoid function isthe life cycle concept that describes the evolution of different objects such as biologicallife, sales over time and economical development. [4]

One of the areas where the life cycle term has been implemented in is the technologicalarea, where it is usually denoted as the product life cycle. Different stages of the cyclehave been recognized, which is related to the diffusion of the technology. The four stagesthat have been identified can be observed in Figure 1 and these are: emerging, growth,maturity and saturation. In the emerging stage, when a technology is quite young, thenumber of patent filings is relatively low. In the following stage, one can observe anincrease in patent activity, which is followed by the succeeding maturity stage. In thisstage, the number of patent applications per year is declining where in the last stage thepatent activity drops quite significantly [6]. In mathematical terms, the slope is at itsmaximum during the growth stage while zero in the saturation stage. From a statisticalperspective, this suggested evolution might imply that the patent activity follows sometype of trend and that the variance might be heterogeneous over time.

Figure 1: Cumulative patent applications along the product life cycle [6]

Estimating the long-run trend of the patent activity is a useful approach when parties areinterested in e.g. determining a technology’s natural or physical limit. In order to addressthese types of interests, one has utilized the observed behavior of the accumulated patentapplications mentioned earlier. As suggested, the accumulation of filed patents resembles

7

the sigmoid function and therefore links to different S-curves have beenmade. Some of thesigmoid functions hold some properties that enables one to estimate or forecast the upperlimit and thereby use the models in the mentioned problem areas. One of these propertiesis symmetry after the infliction point, which allows one to estimate the upper limit (thenatural or physical limit of the technology in this case) even though the technology havenot reached the latter stages. The most popular models that have been proposed are theLogistic, Gompertz, Log-logistic, the Richards and the Weibull functions. [9]

It happens that the development of technologies behave differently from each other,which will naturally be reflected in the patent activity. With this in mind, one usuallyproposes a selection of models in a study in order to find the most suitable long-runtrend. The different sigmoid functions have its strengths as well as weaknesses and there istherefore no exact answer to the question of whichmodel outperform the others, it dependson the technology that is being studied. However, there are some criterion that have beensuggested in order to weed out the least performing models. The commonly criterion aresustainability in different assumptions, goodness of fit and computational simplicity [9].These criterion are often used in statistical methodology and will also be considered inthis thesis with the exception of the computational simplicity criteria.

The short-run forecast is an interesting approach when firms try to monitorcompetitors’ R&Dactivity or when patent offices try to estimate theworkload in upcomingperiods. The short-run can therefore in this case be considered as the first 18 months afterthe filing or the priority date when patent applications are yet undisclosed. In cases wherethe data is provided on a yearly basis, one can consider the short-run as up to a two yearhorizon due to practical reasons.

Similar to the case of estimating the long-run trend, several types of models have beenproposed in order to solve the short-run purposes of forecasting. These suggested modelsare for example different cases of autoregressive integrated moving average (ARIMA orARMA) models originally presented by Box and Jenkins, autoregressive distributed lag(ADL) models as well as vector autoregressive (VAR) models. The ARIMA or ARMAmodels are often used in forecasting univariate time series, while the latter two modelsare normally used in multivariate cases. These models have been applied in patent officesin order to estimate the upcoming workload and will fulfill the purpose as long as theunderlying assumptions of the models remains valid [11]. Common assumptions forthese models are: stationarity as well as constant variance in the time series. However,as suggested earlier, the behavior of patent activity over time suggests that the variancemight actually not be time-invariant and thereby violating the assumptions. If this is thecase, one could argue that the forecast estimation might give decision-maker misleadinginference.

Although the ARIMA and ARMA models are powerful tools for forecasting, themodels are not able to handle volatility. In order to overcome this weakness of ARIMAmodels, suggestion of hybrid models have been made. The idea of hybrid models is thatone is able to utilize the strength of multiple models through an integration and thus obtaina better forecast performance. This technique has been applied in different areas such asforecasting asset prices. In a research, it was concluded that a hybrid ARIMA-GARCHmodel outperformed the regular ARIMA model and therefore a more accurate forecast ofthe gold price could be made. [1] In this particular case, the generalized autoregressiveconditional heteroscadastic (GARCH) model was used as an extension to the ARIMA

8

model as the GARCH is able to estimate and handle volatility. [10].This mentioned example of hybrid models suggested that the underlying structure of

the data followed either an ARIMA or ARMA model. Parallels of this suggestion canbe drawn to the concept of deterministic or long-run trends, which will be explained insection 5.2. By identifying and removing a deterministic trend, one is able to treat the“residuals” as ordinary time series and thus being able to apply an appropriate short-runforecast model to the time series. The idea is thereby to forecast into the short-run usinga short-run model as one assumes a certain long-run trend of the data.

As mentioned earlier, it was suggested by previous research that the accumulatedpatent filings over time followed some type of S-curve and it can therefore be natural toassume that the deterministic trend within the accumulated patent filings might be exactlyof this type. We will therefore treat the suggested deterministic trend from previousresearch as a starting point as we propose a forecast methodology for patent filings inthis thesis.

9

4 Thesis Framework

4.1 Data descriptionThe data used in this thesis is the number of patent filings (families) over time within theCPC class H05B6/64 and is provided by the company IAMIP. This subclass describesthe technological area of electrical heating devices, more precisely ”Heating usingmicrowaves”. The families is defined according to the International Patent Documentation(INPADOC). The time period of the data is set between the years 1941 and 2007, wherethe number of patents are measured on a yearly basis. This particular technological areahas been chosen as it is presumed to have reached the maturity or the saturation stage.

Figure 2: Yearly patent filings for the CPC class H05B6/64

Figure 2 represents the ordinary time series of the microwave technology. The y-axismeasures the number of patent filings (families) within the subclass and the x-axis denotesthe time period. As it can be observed, the plot exhibits two peeks, one around the mid80’s and the second one after the turn of millennium. By looking at the second peak, onecould suspect that the technology is saturated and that the number of filed patents willcontinue to decline.

Figure 3: Cumulative patent filings for the CPC class H05B6/64

The cumulative patent filings for the CPC class H05B6/64 is shown in Figure 3 and thisfigure corresponds to the product life cycle presented in Figure 1 in the previous section. It

10

might be difficult to determinewhether the technology actually have reached the saturationstage on basis of the this figure, since the plot is still pointing upwards. One can howeverdraw the conclusion that the technology have reached a level of saturation if both Figure2 and 3 is combined and jointly analyzed.

Table 1: Descriptive Statistics for yearly and cumulative patent filings, CPC classH05B6/64

Data Mean Min Max ObsYearly 286.5373 0 1047.0 67Accumulated 5076.836 0 19200 67

4.2 Approach4.2.1 2-point estimation method

Previous research presented in section 3 suggested that the long-run trend in accumulatedpatent filings has the shape of a Sigmoid function. We will therefore use this knowledgeas a starting point of the proposed model and treat this finding as the underlying structureor the deterministic trend. Since the aim of this thesis is to suggest a forecast methodologyand conduct a short-run forecast, one will have to take the short-run affects into account aswell. The proposed methodology that will be used in this paper is something that we cancall a “2-point estimation” method, where a deterministic trend will be estimated in thefirst step and thereafter be used in order to detrend the data (the deterministic trend willalso be used as the first of the two point-estimations). The “residuals” will thereafter betreated as ordinary time series, where an ARIMA(p,d,q) model will be fitted onto the dataso that a forecast into the short run can be carried out (the second point-estimation). Thecoefficients of the deterministic trend will be fixed throughout the study (for simplicityand comparison reasons), while the coefficients for the short-run forecast model will berecursively updated as the time periods unfolds. The 2-point estimation model can bedescribed as:

Yt = St + at (4.1)

and the estimated model (forecast) as:

Yt = St + at + et (4.2)

where Yt is the accumulated patent filings, St the deterministic trend, at the “residuals”and et the forecast error. It will be assumed that the long-run trend (St) follows a sigmoidfunction, while the short-run trend (at) is assumed to follow an ARIMA(p,d,q) process, asprevious studies have proposed these models. The forecast error is assumed to be whitenoise with mean zero and finite variance.

By removing the trend component, one assumes that number of accumulated patentsfilings is equilibrated around the long-run trend and that short-run shocks affect the

11

observed value. This approach will allow us to take the long-run trend into considerationas we try to forecast into the short-run.

Several sigmoid functions have been proposed as the underlying structure, asmentioned earlier. However, only the common ones will be used in this thesis. Thesefunctions are: the Logistic, Gompertz and the Richards. Each and one of these functionswill be described in section 5.4. The functions will be fitted onto the data throughnon-linear least square method and the most suitable one(s), in terms of goodness of fit,will be chosen as the deterministic trend.

The determination of a suitable ARIMA(p,d,q) model is accomplished through theBox-Jenkins (BJ) methodology. The actual methodology contains 4 steps, which are: (1)identification of orders, (2) estimation, (3) diagnostic checking and (4) forecast. All thesteps will be used in this thesis, but only step (1) is necessary to determine the orders ofthe different components in the ARIMA model. The dickey-fuller unit root test is usedto determine the integration order, the autocorrelation function (ACF) for the order of themoving average and lastly the partial autocorrelation function (PACF) for the order ofthe autoregressive component [10]. The coefficient estimation is conducted through themaximum likelihood (ML) method using the statistical software, described in section 4.3.

4.2.2 Forecast Horizon

After suitable models have been fitted for both the long-run as well as the short-run trend,a forecast of accumulated patent filings will be carried out. Due to the fact that patentapplications remain undisclosed at least for 18 months (1,5 year), only 1- and 2-step-aheadforecast will be of interest. These steps reflect a 1-year-ahead respectively a 2-years-aheadforecast, considering the structure of the data used for the case of study.

4.2.3 In-sample and Out-of-sample data

In order to evaluate a forecast model, one can either monitor the performance throughan ongoing process or divide the data into a so-called in-sample and out-of-sample data[13]. The in-sample data is used to execute the actual estimation, while the out-of-sampledata is used to evaluate the forecast model. There is no particular guideline of how largethe respective samples should be, but one have to bear in mind that larger in-sample datausually provide relatively better estimations. We have chosen to distribute the respectivesample size to 85 percent (in-sample) and 15 percent (out-of-sample) of the original data,to the nearest whole number (year). This distribution was chosen as we kept in mind thatthe data we are working with only have 67 observations (yearly data, 1941→ 2007). Thedistribution of the original data allows the in-sample data to reach from the year 1941 to1998, while the evaluationwindow reaches from the year 1999 to 2007. In other words, thenumber of forecast errors will be 9 for the 1-step-ahead forecast and 8 for the 2-step-aheadforecast.

4.3 SoftwareThe statistical software used in this thesis is the open source program “R”. The softwareincludes a numerous of basic packages for statistical purposes, we have however usedadditional ones in order to reach the aim of this thesis. The additional packages that have

12

been used are: reshape, tseries, grofit, nls2, alr3, pracma, forecast, resample, curvetestand boot 2. The usages of these packages have led to fact that the Sigmoid functions havebeen redefined. How these functions have been redefined is addressed in section 5.4.

4.4 Critical viewpointBefore applying any forecast techniques one must agree upon the fact that predicting theexact outcome of a certain event are quite difficult. Prophecies or forecasts, a moremodernterm, have been used in many cultures throughout the course of time as guidelines fordecisions. These prophecies were probably not successful every time as unexpected eventsoccurred. With this in mind, it can be concluded that unanticipated shocks that affectthe society or the market are hard to foretell. However, the forecast techniques that havebeen applied in many areas provide guidelines for decision-makers. The guidelines allowsdecisions-makers to generate more fact-based decisions rather than completely rely ontheir respective instincts.

The suggested methodology and model, proposed in this thesis, have the propertyof being “automatic” in that sense that it disregards an expert’s input and knowledgewithin the field. An improvement of the methodology would therefore be if an expert,with considerable knowledge within the technology, could assist in the estimation of thedeterministic trend in order to obtain the most plausible long-run trend. Such studies havebeen carried out and an implementation of those techniques could be incorporated in the2-point estimation method in further studies.

A further limitation of the thesis framework relies on the provided data used in thisthesis. The “validation window”, which is 9 observations for the 1-step-ahead forecastand 8 observations for the 2-step-ahead forecast, can from a statistical point of viewbe regarded as undersized. This undersized out-of-sample data will naturally affect themodel diagnostics and thereby not allowing certain conclusions to be made. However, asdiscussed in section 4.2.3, the size of the validation data was chosen while considering thesize of the in-sample data as well.

2Further details about the packages can be found at http://cran.r-project.org/

13

5 Theoretical Framework

5.1 The forecasting processIn order to operate in a structural and scientific way, one should follow some type ofprocess in the working routine. This is true regardless the field of study, thus applying aprocess in a forecasting methodology is a good idea.

The forecasting process follows a number of steps, which are:

1. Problem identification2. Data collection3. Data analysis4. Model selection and fitting5. Model validation6. Forecast model deployment7. Monitoring forecasting model performance

The first five steps are common in statistical methodology. Before applying any kindof statistical techniques one must identify the background problem and distinguish howthe inference might solve the specific problem. The second step is to collect the relevantinformation in order to solve the problem and in the proceeding steps is where one analyzesthe collected data, fits a model onto the data and finally validating the model.

The last two steps regards the procedure when deploying the model in e.g. a businessenvironment and where the model performance is monitored through an ongoing process.This thesis will focus on the five first steps where we will fit a model to the data, validatethe model and finally suggest the most reasonable model as previously described.

5.2 Trend and StationarityThe base in several frameworks for time series assumes that the observations in the seriesare stationary. This assumption applies for the ARIMA models, which will be utilizedin the proposed methodology. The general definition of stationary time series is that theobservations are time invariant and thus implying on some type of equilibrium or stabilityin the series. Mainly, there are two different types of stationary time series: strictly andweakly stationary. The first type implies that the probability distribution of the observationyt, yt+1, yt+2, yt+n is exactly the same as for the observations yt, yt+k, yt+k+1, yt+k+n Thisassumption is quite strong and it can therefore be difficult to observe actual time series thatfulfills this assumption. On the other hand, a weakly stationary time series only assumesthat themean and variance are time invariant, whichmight reflect amore realistic scenario.

It happens that time series exhibit some type of trend or process over time. Timeseries that exhibit this kind of property implies on the presence of a relationship betweenthe observations, which would violate the stationarity assumption mentioned earlier. Therelationship that is exhibited can either be of stochastic or deterministic sort. [10]

A stochastic process can be interpreted as a collection of random variables ordered intime. This means that the observed value is the sum of all the random shocks that haveoccurred. Suppose that the value of Yt is equal to:

14

Yt = Y0 + ut, where ut ∼ N(0, σ2) (5.1)

Since the stochastic process is the sum of all the random variables ordered in time one canwrite the general equation as:

Yt+n = Y0 + ut + ut+1 + ut+2 + . . .+ ut+n = Y0 +n!

t=1

ut (5.2)

E(Yt+n) = Y0

V (Yt+n) = tσ2

As it can be seen in (5.2), the expected value in a certain time period is independent fromthe time term. However, this does not apply to the variance in a stochastic process. As seenin the equation above, the variance is a function of time, which implies on a non-constantvariance.

The deterministic process is distinguished from the stochastic process by the fact thatthe process is a function of time. In other words, it is assumed that the time series havesome type underlying structure, which will effect the expected value. A simple formof a deterministic process can be written as in (5.3). Thus, the difference between thedeterministic and the stochastic process is the trend component,B1t in this case. The trendcomponent is not explicitly assumed to be a linear function of time and can therefore be afunction of time in several forms, e.g. a nonlinear.

Yt = Y0 + β1t+ ut, where ut ∼ N(0, σ2) (5.3)

E(Yt) = Y0 + β1t

V (Yt) = σ2

By estimating the trend component and removing it from the deterministic process, onecan obtain a so-called trend stationary process (TSP). The trend can either be recognized byearlier empirical findings or through statistical procedures. The “residuals” of Yt will, afterthe removal of the trend component, vary around the trend and thus become a so-calledTSP.

5.3 Statistical Test and Autocorrelation functions5.3.1 Augmented Dickey-Fuller

The Dickey-Fuller (DF) test, also known as the unit root test, is a test of stationarity.As mentioned in the previous section, stationarity is one of the key assumptions whenmodeling a time series. However, the Dickey-Fuller only test for weakly and not strictlystationary time series. In the context of modeling an ARIMA model, the DF test is oftenused to determine the order of integration, the first step of the Box-Jenkins methodology

15

[10]. The starting point of the DF test is a simple unit root process, given as:

Yt = ρYt−1 + ut (5.4)

where ut is assumed to be white noise.The concept is to test whether ρ = 1, which implies on the presence of an unit root

process. However, the test statistic becomes biased under the unit root process. In order toattend to this problem, the process can be transformed by subtracting Yt−1 on both sides:

Yt − Yt−1 = ρYt−1 − Yt−1 + ut = δYt−1 + ut (5.5)

where δ = (ρ− 1).This transformation allows to test whether δ = 0, which is the same as testing for

ρ = 1. The test statistic follows the τ (tau) distribution and will no longer be biased [10].The hypothesis structure is given as:

"H0 = Non-Stationary (δ = 0)

H1 = Stationary (δ < 0)

Under the usual DF test, the error term (ut) is assumed to be uncorrelated. However, incases when the error term is correlated over time, the estimation ρ will be biased. Thispotential correlation problem is addressed through the extension of the the DF test, usuallyknown as the augmented Dickey-Fuller (ADF) test [10].

The augmenting element is applied by adding previous lags to the equation, given as:

∆Yt = β1 + β2t+ δYt−1 +m!

i=1

αi∆Yt−i + ut (5.6)

where

∆Yt−1 = (Yt−1 − Yt−2) and ∆Yt−2 = (Yt−2 − Yt−3) etc.

and the Dickey-Fuller test statistic is:

DFτ =ρ

σρ(5.7)

where σ(ρ) is the standard deviation of ρ. The test statistic is then compared to the criticalvalues that are tabulated by Dickey and Fuller which follows the τ distribution. [10]

16

5.3.2 ACF and PACF

The identification of the orders for the autoregressive and the moving average componentin the ARIMA(p,d,q) model is achieved through an examination of the autocorrelationfunction (ACF) and the partial autocorrelation function (PACF). As mentioned in section4.2.1, the ACF is used to identify the order of the MA process, while the PACF is used forthe order of the AR process. The identification of the orders is carried out after the orderof integration has been determined.

Autocorrelation FunctionThe autocorrelation coefficient measures the correlation between a given variable and thek : th lag of the very same variable. The autocorrelation coefficient at lag k is given as:

Rk =E[(Yt − Y )(Yt+k − Y )]#

E[(Yt − Y )2]E[(Yt+k − Y )2]=

Cov(Yt, Yt+k)

V ar(Yt)=

γkγ0

(5.8)

The set of all the Rk values, that will say; Rk for all k = 0, 1, 2, 3 . . . K, is called theautocorrelation function (ACF). The ACF is commonly used as a visual interpretationof the correlation between the lags plotted in a so-called correlogram3. Further, theautocorrelation is symmetric around zero (Rk = R−k), which implies that it is onlynecessary to compute one of the half.

Partial Autocorrelation FunctionIn contrast to the ACF, the PACF estimate the partial autocorrelation rather than the overallcorrelation of the lags. Assume three variables: X, Y and Z. Then considering a linearregression as:

X = b1 + k1Z where k1 =Cov(Z,X)

V ar(Z)(5.9)

and

Y = b2 + k2Z where k2 =Cov(Z, Y )

V ar(Z)(5.10)

thus the errors is given as:

X∗ = X − X = X − (b1 + k1Z)

Y ∗ = Y − Y = Y − (b2 + k2Z)

The partial correlation simply becomes the correlation between X and Y with respect tothe adjustment of variable Z. In other words, Cov(X∗, Y ∗).

3A plot of the sample autocorrelations against time lags.

17

5.4 Sigmoid FunctionsA Sigmoid function is a mathematical function with the visual appearance of the letterS, thus the function is also usually denoted as the “S-curve”. There are several typesof functions within the “Sigmoid family”, some common types used for estimating thelong-run trend of patent activity are the ones introduced in section 3. The Sigmoidfunctions are generally characterized by a progressive process where the function startswith a small beginning that accelerates, followed by a infliction point and finally ending ata level of saturation. These characteristics or properties that the Sigmoid functions possessare useful in many fields and have been applied for example in growth models. Thisapplication has opened up the possibility to either model or forecast certain progresses, asdiscussed in section 3.

The general characteristic can be explained mathematically as: limt→−∞ f(t) = 0 andlimt→∞ f(t) = A, where 0 is the lower limit and A is the upper limit. The functions arethus bounded by these limits and therefore the range is (0, A).

We will in this thesis work with the commonly used sigmoid functions as referred insection 4.2.1, these are: the Logistic, Gompertz and the Richards’.

5.4.1 Logistic

The general definition of the logistic function can be written as in (5.11).

f(t) =A

1 + βe−kt(5.11)

where t is the time value, A the upper growth limit and k the steepness of the curve [5].The general logistic function can however be modified in order to estimate the parametersas different software use different methods. The modified logistic function that will beused in this thesis is (5.13). By denoting the parameters in (5.11) as β = eγ , γ = 4µ

A λ+ 2and k = 4µ

A , one can show that (5.13) equals (5.11).By replacing the old notations with the new ones and inserting them in (5.11), one will

get:

f(t) =A

1 + eγe(−4µA t)

=A

1 + e(−4µA t+γ)

=A

1 + e(−4µA t+ 4µ

A λ+2)(5.12)

−→ f(t) =A

1 + e(4µA (λ−t)+2)

(5.13)

where A is the upper limit, µ the maximum slope of the curve and λ is the lag phase.Equation (5.12) shows that the modified logistic function equals to (5.11). [12]

5.4.2 Gompertz

The Gompertz function is, unlike the the logistic curve, neither symmetric with respect toits infliction point nor does the function show exponential growth in the early stage. The

18

Gompertz function exhibit a more gradual approach toward the upper limit (Saturationlevel) than the logistic function [9]. The modified Gompertz function can be written as:

f(t) =A

ee[µeA (λ−t)+1)]

(5.14)

5.4.3 Richards

The Richards’ curve, also known as the generalized logistic curve, is an extension of thelogistic curve presented in section 5.4.1. The Richards’ introduce a forth parameter, theshape parameter v, which allows the sigmoid function to be more flexible than the usuallogistic curve. Using the same notation as for the previous sigmoid functions, the modifiedRichards’ curve is given as:

f(t) =A

[1 + νe1+ν+ µA (1+ν)1+

1ν (λ−t))]

1ν

(5.15)

where A once again is the upper bound, µ the maximum slope of the curve, λ the lag phaseand ν the shape parameter.

5.4.4 Comparison

The proposed sigmoid functions posses various characteristics that can be useful todescribe different types of growth models. The logistic function exhibit properties assymmetry about the infliction point, while the latter two functions propose asymmetricdistribution. In contrast to the logistic function, the Gompertz does not show exponentialgrowth in the early stage, which can be useful to model technologies that indicates initialgrowth inertia.

The Richards’ function presented an additional parameter, the shape parameter, whichallows the function to be more flexible than the logistic function. A visual example of thefunctions (with fictive parameters) is given in Figure 4.

Figure 4: Comparison of sigmoid functions with fictive parameters

19

5.5 ARIMA/ARMAThe autoregressive-moving average (ARMA) models or the generalized autoregressiveintegrated moving average (ARIMA) models have been the workhorse of forecastingtechnique in the world of statistics. The models were proposed by both Box and Jenkins in1971 and is often referred to only as the ”Box-Jenkins” models. The idea behind the modelis to combine an autoregressive process with a moving average process and thus being ableto utilize the strength of the two processes jointly. The components in the ARIMAmodelsare denoted as; the autoregressive AR(p) process, the moving average process MA(q) andan Integrated I(d) process. The latter component is used in the generalized ARIMAmodelin order obtain stationarity, discussed in section 5.2. [13]

The concept of an autoregressive process is that the observed value regresses on itself,hence the name. This means that the past observation, or so-called lagged observations,contributes to the value of the current observation. However, this contributions is alsoassumed to “move along” as time goes and thereby making the last observation obsoletefor the next observation. The order of an AR process, denoted as p, determine the numberof past observations that contributes to the current one [13]. The AR process of order pcan be written as:

Yt = δ + φ1Yt−1 + . . .+ φpYt−p + ϵt (5.16)

which can also be written as:

Φ(B)Yt = δ + ϵt where Φ(B) = 1− φ1B − . . .− φpBp (5.17)

where δ is the level coefficient, φ the regression coefficient, B the backward shift operatorand ϵt is assumed to be white noise. Assuming a stationary process, the expected value ofYt is:

E[Yt] = E[δ + φ1Yt−1 + . . .+ φpYt−k + ϵt] −→ E[Yt] =δ

1− φ1 − φ2 − . . .− φp

The moving average, MA, models can be seen as a serial extension of white noise [14].The MA model of order q is thus given as:

Yt = µ+ ϵt − θ1ϵt−1 − . . .− θqϵt−q (5.18)

where the ϵt, as in the AR models, is assumed to be white noise with an expected valueof zero and finite variance. The MA models can also be written with the backward shiftoperator B as:

Yt = µ+ (1−q!

i=1

θiBi)ϵt (5.19)

20

Furthermore, since the white noise term has an expected value of zero, the expected valueof the MA(q) process simply becomes µ, shown in the equation below. This propertyimplies that the MA(q) process is weakly stationary regardless to the order of the processas well as the values of θ1 → θq.

E[Yt] = E[µ+ ϵt − θ1ϵt−1 − ....− θqϵt−q] = µ

The last component, I(d), is often used when the time series exhibit non-constant behaviorover time. In other words, this occurs when the level of the time series is time-dependentand is often referred to as a non-stationary problem. As discussed in section 5.2, onecan remove a long-run trend and treat the detrended data as ordinary time series. Theintegration component in theARIMAmodel is a further technique to handle non-stationaryproblems. The technique is simply conducted by taking the the difference of the observedvalue and the past value. The order of the difference is determined by the Dickey-Fullertest, mentioned in section 5.3.1, and is denoted by the letter d. The first difference of atime series can be written as:

wt = (Yt − Yt−1) = (1− B)Yt (5.20)

and a generalization of higher orders as

(1−B)dYt

By combining these mentioned components, one obtains the generalized ARIMA(p,d,q)model. The general form of the ARIMA model is given as:

Φ(B)(1− B)dYt = δ + (1−q!

i=1

θiBi)ϵt (5.21)

and

Φ(B)Yt = δ + (1−q!

i=1

θiBi)ϵt when d = 0 (5.22)

As seen in in (5.22), the generalized ARIMA models becomes an ARMA model whend = 0.

5.6 Model Validation5.6.1 R-square and adjusted R-square

The coefficient of determination is an important tool when determining how well theobservations explain the model. The equation is:

21

R2 = 1−$n

i=1(Yi − Yi)2$ni=1(Yi − Y )2

(5.23)

R2 = 1−$n

i=1(Yi − Yi)2/(n− k)$ni=1(Yi − Y )2(n− 1)

(5.24)

where (5.23) is the equation of the coefficient of determination and (5.24) is the equationof the adjusted coefficient of determination. k is the number of parameters, n is the numberof observations, Yt is the variable of interest, Yt is the estimated variable of interest and Yt

is the mean. As it can be seen, the degrees of freedom is taken into consideration in (5.24)and models with relative more parameters will therefore be penalized. Hence, the R2 willnever be greater than the R2 value.

5.6.2 Akaike information criterion

An additional goodness of fit measurement is the Akaike information criterion, also knowas the AIC. Similar to the adjusted R-square, the AIC penalize the goodness of fit valueswhen additional parameters or regressors is added [13]. The Akaike information criterionformula is given as.

AIC = e2kn

$ni=1(Yi − Yi

2)

n(5.25)

By using all the goodness of fit measurements jointly, it is possible the increase thecertainty of choosing a proper model.

5.6.3 Forecast performance

As discussed in section 4.4, a predicted forecast is rarely accurate in terms of pointestimation. To determine the performance of a forecast model, one usually uses theforecast error. The forecast error is the difference between the observed value and theforecast estimation from the previous period(s) [13]. The forecast error is given as:

et = Yt − Yt−τ (5.26)

where τ is the number of lead or “step(s)-ahead” forecast.

There are several techniques to measure the performance of a forecast. The commonlyused are: Mean squared prediction error (MSPE), Mean absolute error (MAE) and RootMean squared error (RMSE) [13]. These measurements are mathematically described as:

22

MSPE =

$ni=1 e

2i

N(5.27)

MAE =

$ni=1 |ei|N

(5.28)

RMSE =

%$ni=1 e

2i

N(5.29)

As several models are proposed, a comparison between the performance of each modelcan be of interest. A measurement that can be used for this purpose is the skill score (SS)measurement, which is defined as:

SS = 1− MSPEforecast

MSPEreference(5.30)

where MSPEforecast is the mean square predicted error for the model to be comparedandMSPEreference is the mean square predicted error for the reference model. Negativevalues of the SS implies that the reference model perform relatively better than thecomparing model while positive values implies the opposite. A value of zero suggestthat the models perform equally good.

23

6 Result and Analysis

6.1 Deterministic trendTable 2 present the estimated parameter values using the non-linear least square method.If we start with recalling the definition of the parameters; A is the upper limit of the curve,µ the maximum slope of the curve, λ the lag phase (or point of infliction) and finally theparameter ν is the shape parameter for the Richards’ curve.

An examination of the parameters allows us to obtain some useful information of thebehavior of each function. The overall impression is that the proposed curves behaverelatively similar, yet differently with respect to the parameters. Startingwith the lag phaseparameter, λ, it is noticeable that the values are quite similar to each other. In other words,the infliction point occurs around the same time period with minor deviation (around theyear 1972) for all the functions. The slope parameters suggest likewise behavior for all thecurves. The major difference between the functions lies within the A parameter. Recallingthe maximum value of the cumulative patent filings from Table 1, 19200 filings, it isobserved that the Richards’ is the least performing model in terms of following the actualobservations. Regarding this, it might strike one as that the Logistic and the Gompertzfunctions are doing a more fitting job of describing the actual observed values. Thisimpression is also visually confirmed by Figure 5, where the Richards’ is farther awayfrom the observed values in comparison to the logistic and Gompertz curve.

Figure 5: Comparison of estimated sigmoid functions and observed values

Table 2: Parameter EstimatesParameter Logistic Gompertz Richards

A 15619.6 27686.5 12674.2µ 519.4 498.8 475.5λ 1972.6 1972.3 1971.3ν 1

24

The goodness of fit statistics in Table 3 is a further confirmation of this previousmentionedsuspicion. The R-square and the adjusted R-square values might come across as beingquite similar for all the functions, but closer observation reveals that the logistic andGompertz perform better than the Richards’ in terms of goodness of fit. This is alsoconfirmed by AIC values presented in the same table.

Table 3: Goodness of fit for the sigmoid functionsTest Statistic Logistic Gompertz Richards

R2 0.999443 0.999225 0.9925739R2 0.999423 0.999197 0.9921614AIC 9641.467 13411.6 133051.6

The results imply that both the Logistic and the Gompertz curve outperform the Richards’and should thus be excluded for further usage. However, the results also give theimpression of that the logistic and Gompertz curve is rather equal in the context ofoperating as the deterministic trend. With this conclusion, the choice is to continue withboth functions separately where a comparison of the two final models will be carried out.

6.1.1 Logistic

The detrended data using the logistic function as deterministic trend is given in Figure 6,where the horizontal line represent the reference line and where the vertical line marksthe border between the in-sample and out-of-sample data. In other words, the plot is avisualization of the term at (for the in-sample data), denoted in (4.1).

It can be suspected that the term at will increase drastically after the cutoff year 1998as a result of the difference between the maximum value of the accumulated patent filingsand the A parameter for the logistic function (given in Table 2). Statistically, this mayimply that the time series is non-stationary, which will be considered in the upcomingsections.

Figure 6: Detrended data using the logistic function as deterministic trend (in-sample data)

25

6.1.2 Gompertz

Similar to the Logistic case, the term at (for the in-sample data) is presented in figure 7.Once again, it can be suspected that the term will increase after the cutoff year. However,the increase will not be equally great as in the previous case as a result of the higher valueof the parameter A for the Gompertz function given in Table 2.

Figure 7: Detrended data using the Gompertz function as deterministic trend (in-sampledata)

Comparable to the logistic case, the result might suggest a non-stationary process. Thiswill be further investigated in the upcoming sections, as earlier mentioned.

6.2 ARIMA6.2.1 Logistic

The first step of modeling an ARIMA(p,d,q) is to identify the different orders of thecomponents. Table 4 presents the result of the ADF test statistic for the logistic case.According to the test statistic, the time series becomes weakly stationary first at the 3rdorder of integration with a significance level of less than 5 percent.

Table 4: ADF test statistics for the detrended data using the logistic function asdeterministic trend

Order of Integration Dickey-Fuller Statistic P-valueNone -3.421 0.06135First -0.8349 0.9533Second -1.9296 0.603Third -4.8977 <0.01

Figure 8 present the plot of the third difference of the whole time series. An overallimpression is that the data exhibit rather stable behavior up until the end. A noticeable

26

point in the plot is the observation next to last, which might strike one as an “outlier”.Outliers tend to affect models as well as estimations and should therefore be attended towhile proposing a forecast. However, since this outlier lies within the out-of-sample data,it will be regarded as a random shock yet unobserved and thus absorbed in the forecasterror.

Figure 8: Third difference of the complete detrended time series using the logistic functionas deterministic trend

An examination of Figure 9 and 10 allows us to determine the order of the AR(p)respectively the MA(q) component. The PACF reveals that appropriate order of theAR component is of the second degree, as it is cut off after lag 2. The lags within theconfidence interval are presumed to be statistically insignificant and therefore disregarded.

Despite the fact that the p = 2 might be a reasonable order of the autoregressivecomponent, a concern might arise when observing lag number 11. As observed in thePACF, the lag is slightly over the 95 percent confidence interval, which might possiblyimply on a higher degree of order. If the time series were provided on a monthly basis,one could suspect a seasonal effect with respect to lag 11. However, this observationseems to be a random event as all the other lags are statistically insignificant and sincethis deviation is not recurring. With this reasoning, an AR(2) is still a suitable order forthe autoregressive component.

27

Figure 9: PACF correlogram for the detrended and integrated time series using the logisticfunction as deterministic trend

Figure 10: ACF correlogram for the detrended and integrated time series using the logisticfunction as deterministic trend

The ACF plot allows for a more straightforward conclusion. An inspection of figure 10suggest that the order of the MA component should be of the second degree, as the lag iscut off after lag 2 with no deviation from the other lags.

In summary, the findings allows us to draw the conclusion that the appropriate modelis an ARIMA(2,3,2).

6.2.2 Gompertz

The procedure of identifying the orders and thus determining the most suitable ARIMAmodel for the Gompertz case is somewhat identical to the previous section.

Table 5 reveals that the detrended data becomes weakly stationary first at the 3rd orderof integration, similarly to the logistic case. The plot of the integrated data is given inFigure 11 and somewhat identical characteristics to Figure 8 can be noticed. Once again,

28

the point next to last gives the impression of being an “outlier”. As in previous case, thisobservation will be treated as a random shock yet unobserved and thus absorbed in theforecast error.

Table 5: ADF test statistics for the detrended data using the gompertz function asdeterministic trend

Order of Integration Dickey-Fuller Statistic P-valueNone -3.0202 0.1628First -2.6885 0.2969Second -2.9101 0.2076Third -4.8908 <0.01

Figure 11: Third difference of the complete detrended time series using the gompertzfunction as deterministic trend

The identification of the AR(p) and the MA(q) for the Gompertz case is once moreaccomplished through the visual inspection of the PACF and ACF.

Starting with Figure 12, the order of the autoregressive component is likewise thelogistic case identified as of the second degree. Once again, lag 11 exhibit an deviationfrom the rest of the adjoining lags. Similar to the previous case, the deviation will bedisregarded as it is considered to be a random occurring.

29

Figure 12: PACF correlogram for the detrended and integrated time series using thegompertz function as deterministic trend

An inspection of the ACF plot in Figure 13 reveals that lag is cut off after the second one.Similar to Figure 10 in the previous section, there are no further lags exceeding the 95percent confidence interval. This indicates that a suitable order of the moving averagecomponent is of the second degree, that will say MA(2).

Figure 13: ACF correlogram for the detrended and integrated time series using thegompertz function as deterministic trend

Recapitulating the findings, an integration of order 3, an autoregressive process oforder 2 and finally a moving average process of order 2. These results imply that anARIMA(2,3,2) is an appropriate model for forecasting the accumulated patent filings withthe Gompertz function as the deterministic trend.

It might be strange that the suitable orders in the ARIMA models are equal in bothcases (Logistic and Gompertz), however, the presented results suggest that this is the case.This outcome opens up the possibility to evaluate the importance of choosing a suitabledeterministic trend when the forecast performance of the two models is analyzed in theproceeding sections.

30

6.3 ForecastThe following tables, Table 6 to 9, present the forecast estimates for the two differentforecast models. The two first columns in each table represent the long-run estimate andthe short-run estimate respectively. The third column (Yt) is simply the sum of St and at,recall (4.1) and (4.2):

Yt = St + at

and

Yt = St + at + et

where St is the value of the deterministic (long-run) trend and at is the value the short-runtrend. Notice that the values of St is the same for both the 1-step-ahead forecast and2-step-ahead forecast. This is due to the fact that the coefficient estimates is remainedfixed over time (for the long-run trend) as described in section 4.2.1.

6.3.1 Logistic-ARIMA(2,3,2)

An example of how the values are calculated in the tables can be given if we assume thefirst observation in Table 6. The forecast point estimation for the year 1999 is:

12796.95 + 604.4266 = St + at = Yt = 13401.37

The 95 percent confidence interval is simply 1.96×SEat . The same calculation procedureis applied for all the years, Table 6 to 9.

Table 6: 1-Step ahead forecast using Logistic function as deterministic trendYear St at Yt SEat CI (95%)1999 12796.94 604.4266 13401.37 37.31 ± 73,12762000 13091.59 1017.348 14108.94 37.543 ±73,584282001 13360.92 1385.104 14746.02 37.35 ±73,2062002 13605.96 1889.769 15495.73 37.16 ±72,83362003 13827.98 2310.329 16138.31 37.53 ±73,55882004 14028.39 2926.62 16955.01 37.68 ± 73,85282005 14208.667 4092.149 18300.82 50.68 ±99,33282006 14370.34 4581.41 18951.75 57.5327 ±112.76412007 14514.92 5105.65 19620.57 61.8468 ±121,22

31

Table 7: 2-Step ahead forecast using Logistic function as deterministic trendYear St at Yt SEat CI (95%)2000 13091.59 912.3020 14003.89 78.76 ± 154,36962001 13360.92 1446.125 14807.05 81.59175 ±159,922002 13605.96 1835.911 15441.87 80.51971 ±157,822003 13827.98 2423.544 16251.52 79.93 ±156,662004 14028.39 2839.571 16867.96 79.78687 ±156,382005 14208.667 3549.133 17757.8 79.31 ± 155,452006 14370.34 5085.510 19455.85 113.72 ±222,902007 14514.92 5519.158 20034.08 103.67 ±203,1883

6.3.2 Gompertz-ARIMA(2,3,2)

Table 8: 1-Step ahead forecast using Gompertz function as deterministic trendYear St at Yt SEat CI (95%)1999 13272.73 102.2185 13374.95 34.50 ± 67,632000 13747.41 322.0670 14069.48 35.66473 ±69,902001 14215.17 487.4152 14702.59 35.41094 ±69,412002 14675.37 803.4175 15478.79 36.40672 ±71,3572003 15127.40 987.8127 16115.21 36.15779 ±70,8692004 15570.77 1370.322 16941.09 36.84820 ± 72,222005 16005.02 2285.927 18290.95 50.90435 ±99,772006 16429.77 2491.961 18921.73 56.86229 ±111,452007 16844.70 2725.108 19569.81 59.81931 ±117,2458

Table 9: 2-Step ahead forecast using Gompertz function as deterministic trendYear St at Yt SEat CI (95%)2000 13747.41 184.2758 13931.69 67.10031 ±131,5172001 14215.17 469.5215 14684.69 72.97280 ±143,0272002 14675.37 647.4016 15322.77 72.77466 ±142,6382003 15127.40 1079.5901 16206.99 76.39562 ±149,7352004 15570.77 1232.3943 16803.16 74.23783 ± 145,5062005 16005.02 1715.083 17720.1 75.69794 ±148,3682006 16429.77 2993.831 19423.6 113.82248 ±223,0922007 16844.70 3128.884 19973.58 102.30873 ±200,525

6.4 Model ValidationIn this section, the forecast performance of the two proposedmodels will be presented. Thetables in the two following subsections present the actual observed value of accumulatedpatent filings together with the estimated 1-step-ahead and 2-step-ahead forecast for eachdeterministic trend. The forecast error is also presented in the same tables, which will beused in order to calculate the different performance measurements.

32

The forecast errors is also used in the model diagnostics presented in appendix A,which is given for each model and forecasting horizon. There is however a possibleconcern regarding the small amount of forecast errors while assessing the relative modelperformance and the model diagnostics. In order to address this concern, non-parametricbootstrapping methods have been used for simulating the distribution of the mean for theforecast errors. This is also given in appendix A. The comparison analysis of the twomodels is presented in section 6.4.3 using the forecast performance measurements earlierpresented in section 5.6.3

6.4.1 Logistic-ARIMA(2,3,2)

Table 10: Forecast error of Logistic-ARIMA(2,3,2)Year Yt Yt,1−step et,1−step Yt,2−step et,2−step

1999 13454 13401.37* 52,632000 14079 14108.94* -29,94 14003.89* 75.112001 14773 14746.02* 26,98 14807.05* -34.052002 15436 15495.73* -59.73 15441.87* -5.872003 16186 16138.31* 47,69 16251.52* -65.522004 17233 16955.01 277.99 16867.96 365.042005 17981 18300.82 -319.82 17757.8 2242006 18650 18951.75 -301.75 19455.85 -805.852007 19198 19620.57 -422.57 20034.08 -836.08* = Yt within the CI (95%)

MSPELogistic,1−step =

$ni=1 e

2n,1−step

N=

459418

9= 51046.44

MSPELogistic,2−step =

$ni=1 e

2n,1−step

N=

1542982

8= 192872.8

MAElogistic,1−step =

$9i=1 |en,1−step|

N=

1539.1

9= 171.0111

MAElogistic,2−step =

$8i=1 |en,2−step|

N=

2411.52

8= 301.44

RMSElogistic,1−step =#MSPElogistic,1−step =

√51046.44 = 225.9346

RMSElogistic,2−step =#MSPElogistic,2−step =

√192872.8 = 439.1729

33

6.4.2 Gompertz-ARIMA(2,3,2)

Table 11: Forecast error of Gompertz-ARIMA(2,3,2)Year Yt Yt,1−step ϵt,1−step Yt,2−step ϵt,2−step

1999 13454 13374.95 79.052000 14079 14069.48* 9.52 13931.69 147.312001 14773 14702.59 70.41 14684.69* 88.312002 15436 15478.79* -42.79 15322.77* 113.232003 16186 16115.21* 70.79 16206.99* -20.992004 17233 16941.09 291.91 16803.16 429.842005 17981 18290.95 -309.95 17720.1 260.92006 18650 18921.73 271.73 19423.6 -773.62007 19198 19569.81 -371.81 19973.58 -775.58* = Yt within the CI (95%)

MSPEGompertz,1−step =

$ni=1 ϵ

2n,1−step

N=

411499.6

9= 45722.18

MSPEGompertz,2−step =

$ni=1 ϵ

2n,1−step

N=

1495573

8= 186946.6

MAEgompertz,1−step =

$9i=1 |ϵn,1−step|

N=

1517.96

9= 168.6622

MAEgompertz,2−step =

$8i=1 |ϵn,2−step|

N=

2567.78

8= 320.9725

RMSEgompertz,1−step =#MSPEgompertz,1−step =

√45722.18 = 213.8275

RMSEgompertz,2−step =#MSPEgompertz,2−step =

√186946.6 = 432.3732

6.4.3 Comparison

The performance measurements for each time-ahead period and model is given in Table12. According to the table, the Gompertz-ARIMA(2,3,2) appears to outperform theLogistic-ARIMA(2,3,2) model with respect to the 1-step-ahead forecast regarding allthe performance measurements. This is also true for the 2-step-ahead forecast with theexception of the MAE measurement. However, the MSPE and and RMSE measurementpenalize larger errors at a higher degree than in the MAE. In other words, although theMAE measurement is lower in the 2-step-ahead forecast for the Logistic-ARIMA(2,3,2)than in the Gompertz-ARIMA(2,3,2) model, the case seems to be that be that the overallpoint estimation error is smaller in the Gompertz-ARIMA(2,3,2) model.

34

Table 12: Forecast Performance MeasurementsMeasurement Logistic1−step Logistic2−step Gompertz1−step Gompertz2−step

MSPE 51046.44 192872.8 45722.18 186946.6MAE 171.0111 301.44 168.6622 320.9725RMSE 225.9346 439.1729 213.8275 432.3732

The following skill score equations are used to place the model performances in relationto each other. One could use any of the performance measurements given in Table 12 inorder the determine the relative performance. However, as the interest is to evaluate theoverall performance, the MSPE is used to penalize the occurrence of large errors.

Regarding the 1-step-ahead skill score, it can be concluded that theGompertz-ARIMA(2,3,2) model outperform the Logistic-ARIMA(2,3,2) with a relativescore of 11.6 percent. The relative performance in the 2-step-ahead case is however notequally great. A value of -0.032 suggest that the Gompertz-ARIMA(2,3,2) model have a3.2 percent higher performance that the other model.

SS1−step = 1− MSPELogistic,1−step

MSPEGompertz,1−step= 1− 51046.44

45722.18≈ −0.116

SS2−step = 1− MSPELogistic,2−step

MSPEGompertz,2−step= 1− 192872.8

186946.6≈ −0.032

Although the results imply that the Gompertz-ARIMA(2,3,2) is better thanLogistic-ARIMA(2,3,2) model, in terms of performance, one have to keep in mindthat the out-of-sample data is quite small. Regarding this, an equally certain conclusioncannot be done.

To overcome these comparison difficulties due to small out-of-sample data, anon-parametric bootstrapping method have been used for simulating the possibledistribution of the mean for the forecasting errors for each model(given in AppendixA) . One have to however bear in mind that the distribution will naturally take theform of a normal distribution due to the central limit theory as we bootstrap the meanof the re-sampled forecast errors. Analyzing Figures 15 and 19 (the bootstrappingresults for the 1-step-ahead forecast for each model), it can observed that values forthe Logistic-ARIMA(2,3,2) model are centered around a value < 0 (figure 15). Thisimplies that the Logistic-ARIMA(2,3,2) model tend to overestimate and thus indicatingthe presence of a systematic bias. In contrast to the logistic case, the forecast errors ofthe Gompertz-ARIMA(2,3,2) model appear to be centered around the value zero. Theinterpretation is thus that the Gompertz-ARIMA(2,3,2) model tend to be more accurate interms of point estimation, while comparing to the Logistic-ARIMA(2,3,2) model.

An overall impression is that both models appear to be suitable models for forecastingthe accumulated patent filings, while regarding the model diagnostics in appendix A.The Q-Q plots in the appendix suggest that the distribution of the residuals might follow

35

a normal distribution with minor deviations. However, the limited number of forecasterrors makes it difficult to draw a certain conclusion of the error distribution. Regardingthe performance measurements of the forecast, the Gompertz-ARIMA(2,3,2) model seemto be the relative better model for the case of “heating using microwave” technology.The bootstrapping technique also exhibit result in favor of the Gompertz-ARIMA(2,3,2)model as the simulated distribution is centered around the mean zero, while indicating asystematic bias for the Logistic-ARIMA(2,3,2) model.

36

7 Discussion and ConclusionBefore a discussion is made, the purpose and limitations of forecasting techniques haveto be stated once again. A forecasting technique allows the beneficiary to obtain usefulinsight of the potential outcome in the proceeding time periods, which can be of helpwhen planning for the future. The forecast techniques lack however the ability to pinpointthe exact outcome with certainty and the estimation should therefore be treated carefully.With this said, a forecast estimation is nonetheless a guideline for decision-makers andshould for that reason be used as an additional source of information as a complement toan expert’s input.

A review of the presented result in the previous section enables us to evaluatethe proposed methodology as well as the models. Section 6.1 provides the necessaryinformation in order to determine the appropriate underlying long-run trend for the“heating using microwave” technology. The three proposed S-curves exhibit quite similarcharacteristics with the exception of the A parameter. In all cases, the parameter is eitherunderestimated or overestimated with respect to the actual observed maximum value. Itwas however expected that the estimated S-curves would deviate from the actual values,but not with the obtainedmagnitude. Although the deviation, the idea is to use the s-curvesas the deterministic trend for the first step of the suggested “2-point estimation” method.

The final proposed models for the case of study are a Logistic-ARIMA(2,3,3) and aGompertz-ARIMA(2,3,2) model. The models appear to be relative good for the purposeof forecasting, yet different result regarding the forecast performance is obtained. Themodel diagnostics and the bootstrapping technique present however results in favor of thelatter model, regarding the assumption and goodness of fit criterion. However, it has tobe stated that the number of forecast errors are undersized and it is therefore difficult todraw certain conclusions.

While comparing these results to previous models mentioned in section 3, it becomesclear that the the suggested models perform rather good as an extension to the previousmethods. Themethodology andmodels used in this thesis appears to be valid regarding theunderlying assumptions of time series4, when a proper long-run trend is chosen. However,the undersized forecast error sample contributes once again to uncertain conclusions.The presented result in the previous section also allows us to realize the importance ofchoosing a suitable deterministic trend. As described earlier, the product life cycle andthe associated patent activity might behave differently in each technology, therefore aninsight of the current technology is essential to validate the chosen long-run trend. Itall comes down to the fact of the circumstance of how the technology might progress,a knowledge usually possessed by experts within the field. There is therefore no exactanswer to the question of which deterministic trend outperform the others, it depends onprogression of the technology as mentioned.

Considering the limitations regarding the source of information used and the numberof observations provided by the data, the proposed methodology and models still managedto forecast rather good. A preferable framework would consist of monthly data insteadof yearly as used, this would increase the in-sample and out-of-sample data, whichwould then be useful when estimating and validating the models and methodology. Withthis reasoning, it can be concluded that proposed methodology is a suitable approach

4with the exception of the overestimation of the Logistic-ARIMA(2,3,2) model

37

when forecasting patent activity while using the CPC classed H05B6/64 technology asa case of study. Therefore, the methodology might be seen as an extension to previousmethods. Despite the fact that the results might imply on a suitable methodology, one isinclined to propose further performance evaluations using other technologies as the caseof study. It is also of weight to once again point out the importance of choosing a properdeterministic/long-run trend when conducting these performance evaluations.

As discussed in the critical viewpoint, section 4.4, there are limitations of the proposedmethodology. The approach in this thesis can be categorized as an “automatic” procedureas no consideration of an expert’s knowledge is taken. A potential development ofthe proposed methodology would therefore be to incorporate an expert’s input whenestimating the deterministic/long-run trend. This approach would open up the possibilityof adapting and thus estimating the underlying structure to a several different technologiesand therefore possibly increasing the forecasting performance.

38

References[1] Ahmad.M.H , Azizan.N.A , Yaziz.S.R & Zakaria.R, 2013, The performance of

hybrid ARIMA-GARCH modeling in forecasting gold price,p.1206

[2] Bernitz.U , Karnell.G, Pehrson.L, Sandgreen.C, 1993, Immaterialrätt ochotillbörlig konkurrens, p.1,2,83

[3] Bollerslev.T, 1986, Generalized autoregressive conditional heteroskedasticity

[4] Cruz.F, Nieto.M, Lopéz.F, Performance analysis of technology using the S curvemodel: the case of digital signal processing (DSP) technologies, p.440

[5] Demana.F, Waits.B, Foley.G, Kennedy.D, Precalculus,(Graphical, Numerical,Algebraic),2011,8th edition,p.258

[6] Dubaríc.E, Giannoccaro.D , Bengtsson.R & Ackermann.T, 2010, Patent data asindicators of wind power technology development,p.2,.

[7] Ernst.H, 2003, Patent Information for strategic technology management, p.234

[8] Ernst.H, Fabry.B, Langholz.J & Köster.M, 2006, Patent portfolio analysis asa useful tool for identifying R&D and business opportunities—an empiricalapplication in the nutrition and health industry, p.216

[9] Erto.P &Vanacore.A, 2009, A Critical Review and Further Advances in InnovationGrowth Models, p.248,250,249

[10] Gujarati.N.Damodar & Porter.Dawn., 2009, Basic Econometrics,fifthedition,p.781,740-746,773-774,755-758

[11] Hingley.P & Nicolas.M, 2006, Forecasting Innovations, Methods for PredictingNumber of Patent Filings, p.2,4,6,12,13

[12] Hasenbrink.G, Kahm.M, Kschischo.M , Lichtenberg-Frat’e.H & Ludwig.J, 2010,grofit: Fitting Biological Growth Curves with R, p.3

[13] Kulahci.M, Jennings.C, .Montgomery.D.C, 2008, Introduction to Time SeriesAnalysis and Forecasting,p.256,246,19,49-52

[14] Tsay.R Analysis of Financial Time Series, 2005, second edition, p.50

[15] WIPO, WIPO IP Facts and Figures,2013

39

Appendix AThe figures in this appendix present the model diagnostics for the Logistic-ARIMA(2,3,2)and the Gompertz-ARIMA(2,3,2) models. Figure 15, 17, 19 and 21 are the results of thebootstrapping procedure made with the purpose of simulating the distribution of the meanof the forecast errors. The used bootstrapping method is of non-parametric character andthe procedure is replicated 5000 times.

Logistic-ARIMA(2,3,2)1-step-ahead diagnostics

Figure 14: Model diagnostics for 1-Step-ahead forecast using Logistic-ARIMA(2,3,2)

40

Figure 15: Result of bootstrapping procedure for 1-Step-ahead forecast usingLogistic-ARIMA(2,3,2)

2-step-ahead diagnostics

Figure 16: Model diagnostics for 2-Step-ahead forecast using Logistic-ARIMA(2,3,2)

41

Figure 17: Result of bootstrapping procedure for 2-Step-ahead forecast usingLogistic-ARIMA(2,3,2)

Gompertz-ARIMA(2,3,2)1-step-ahead diagnostics

Figure 18: Model diagnostics for 1-Step-ahead forecast using Gompertz-ARIMA(2,3,2)

42

Figure 19: Result of bootstrapping procedure for 1-Step-ahead forecast usingGompertz-ARIMA(2,3,2)

2-step-ahead diagnostics

Figure 20: Model diagnostics for 2-Step-ahead forecast using Gompertz-ARIMA(2,3,2)

43

Figure 21: Result of bootstrapping procedure for 2-Step-ahead forecast usingGompertz-ARIMA(2,3,2)

44

Appendix BThe following appendix is an extract from the script file while using the statisticalsoftware “R”. It should be noted that the script file is not subsequent despite the fact thatthe lines are ordered without breaks. The used packages for the script and software isdescribed in section 4.3.

45

Script_Extract.R

46

kandidatuppsats - s u · monitoring patent activity using forecasting techniques bevakning av...

Documents