nano science and technology dynamics database -...
TRANSCRIPT
Nano science and technology dynamics database
Lionel Villard* – Michel Revollo* – Aurélie Delemarle** – Bernard Kahane* – Andrei Mogoutov** – Philippe Larédo** * ESIEE, IFRIS ** Ecole des Ponts, IFRIS *** Aguidel, Paris
ILLUSTRATIONS OF RESULTS OBTAINED EXAMPLES OF RESULTS OBTAINED (1) The treatment of patents owned by large firms (cf CIB for a list) highlights a unique paOern: the presence of nano patents in all sectors. It drove us to suggest that we face a new type of “General Purpose Technology” (cf Bresnehan & Trajtenberg 1995) not based on an industry, but transforming the R&D acYviYes of all exisYng industries. Larédo P. et al., 2010, Dynamics of nanosciences and technologies: Policy implicaYons, STI Policy Review 1, 43-‐62. (2) The analysis of strong collaboraYons between clusters highlights three features (1) weak interconYnental linkages, (2) a very hierarchical structure of clusters in the US around 5 poles, and (3) a very dense collaboraYon network in Europe with more inter-‐country than intra-‐country strong linkages. Larédo P. & Villard L., 2013, PresentaYon FondaYon Jean Monnet pour L’Europe, Lausanne, 4 octobre, in press.
INFORMATION ON SUBSTANTIVE CONTENT OF THE NANO DATABASE Data sources: Patent data: Patstat IFRIS database built on Patstat version October 2009 provided by EPO and complemented by REGPAT database (OECD) and INPI (France). PublicaYon data: Web of science (SCI expanded, SSCI, A&HCI, Conference proceedings (COCI-‐S & CPCI-‐SSH) Data processing: The first step is the downloading of data based upon the query developed. The seed is based on the prefix ‘nano’ (with classical excepYons like nanoliter) generaYng 517050 publicaYons. SemanYc analyses of keywords are made on this seed idenYfying the internal specificity of keywords and then checked about their external specificity (on the whole WoS). ArYcles are then extracted on the basis of the vocabulary selected. One vocabulary deals with the whole period (1991-‐2010) giving the ‘staYc’ extension; and a year by year analysis builds the ‘dynamic’ part of the dataset (see graph above). The same is done for patents. Three enrichments have been made: -‐ AffiliaYons: a categorisaYon has been produced in 5 types and a triple harmonisaYon process has reduced by 60% the number of different organisaYons (from 97194 for 2.2 million addresses to 39765) -‐ GeolocalisaYon: a process using a two step approach has enabled a 99% geolocalisaYon rate on filled data (however a large number of patents do not have any informaYon on inventor’s addresses and this requires further developments).
The current database system is My SQL INFORMATION ON THE DATABASE SYSTEM LEGAL ISSUES ENCOUNTERED AND ACCESS CONDITIONS
The dataset is only accessible for research purposes (no commercial use is authorised). Only aggregated data can be published in public reports and/or academic journals. Users need to belong to an insYtuYon that have both a subscripYon to the Web of Science and to Patstat.
BASIC CHARACTERISTICS The Nano S&T dynamics database (Nano) developed by IFRIS collects publicaYons and patents between 1991 and 2010 about nano S&T. One central characterisYcs of emerging S&T is that they do not correspond to pre-‐exisYng categorisaYons and require the elaboraYon of semanYc based queries. IFRIS has developed a dynamic query gathering 1.18 million publicaYons and 735000 priority patents. Four types of enrichments have been organised dealing with: (i) categorisaYon and harmonisaYon of insYtuYonal affiliaYons, (ii) geolocalisaYon of all authors and inventors; (iii) geographical clustering of S&T acYviYes; and (iv) themaYc clustering of S&T acYviYes. It offers for each publicaYon, 14 ‘main units of observaYon’ and 11 for patents
-‐ ClusterisaYon: 2 clusterisaYons have been developed. The first one is based on central ciYes (with more than 1000 publicaYons) using geographical distance to build clusters: this has idenYfied 203 clusters covering 75% of publicaYons (see figure). A second one answers problems faced with a new methodology using DBScan and Chameleon. InformaFon on all variables/indicators: -‐ For each publicaYon 14 ‘main units of observaYon’: 10 classical units directly derived from the WoS and 4 units corresponding to the enrichments made (harmonised insYtuYon, type of insYtuYon, geographical coordinates, cluster -‐ For each patent, 11 ‘main units of observaYon’: 6 classical units derived from Patstat, the Inpadoc family, geographical coordinates and cluster of applicants and of inventors.
firmes du DTI scoreboard total nano %Electronic & electrical equipment 103 70 68%Technology hardware & equipment 226 150 66%Chemicals 96 84 88%Pharmaceuticals & biotechnology 153 73 48%Health care equipment & services 53 39 74%Automobiles & transport 86 59 69%Aerospace & defence 35 24 69%Materials & construction 55 42 76%Oil, Gas & Electricity 53 39 74%Food producers inc. Beverages) 32 16 50%General industrials 38 24 63%Household & personal goods 40 21 53%Industrial engineering 70 35 50%Telecom & media 32 14 44%Software & computer services 110 14 13%banks, insurance, retail, leisure 49 6 12%total 1231 710 58%