nano science and technology dynamics database -...

1
Nano science and technology dynamics database Lionel Villard* – Michel Revollo* – Aurélie Delemarle** – Bernard Kahane* – Andrei Mogoutov** – Philippe Larédo** * ESIEE, IFRIS ** Ecole des Ponts, IFRIS *** Aguidel, Paris ILLUSTRATIONS OF RESULTS OBTAINED EXAMPLES OF RESULTS OBTAINED (1) The treatment of patents owned by large firms (cf CIB for a list) highlights a unique paOern: the presence of nano patents in all sectors. It drove us to suggest that we face a new type of “General Purpose Technology” (cf Bresnehan & Trajtenberg 1995) not based on an industry, but transforming the R&D acYviYes of all exisYng industries. Larédo P. et al., 2010, Dynamics of nanosciences and technologies: Policy implicaYons, STI Policy Review 1, 4362. (2) The analysis of strong collaboraYons between clusters highlights three features (1) weak interconYnental linkages, (2) a very hierarchical structure of clusters in the US around 5 poles, and (3) a very dense collaboraYon network in Europe with more intercountry than intra country strong linkages. Larédo P. & Villard L., 2013, PresentaYon FondaYon Jean Monnet pour L’Europe, Lausanne, 4 octobre, in press. INFORMATION ON SUBSTANTIVE CONTENT OF THE NANO DATABASE Data sources: Patent data: Patstat IFRIS database built on Patstat version October 2009 provided by EPO and complemented by REGPAT database (OECD) and INPI (France). PublicaYon data: Web of science (SCI expanded, SSCI, A&HCI, Conference proceedings (COCIS & CPCISSH) Data processing: The first step is the downloading of data based upon the query developed. The seed is based on the prefix ‘nano’ (with classical excepYons like nanoliter) generaYng 517050 publicaYons. SemanYc analyses of keywords are made on this seed idenYfying the internal specificity of keywords and then checked about their external specificity (on the whole WoS). ArYcles are then extracted on the basis of the vocabulary selected. One vocabulary deals with the whole period (19912010) giving the ‘staYc’ extension; and a year by year analysis builds the ‘dynamic’ part of the dataset (see graph above). The same is done for patents. Three enrichments have been made: AffiliaYons: a categorisaYon has been produced in 5 types and a triple harmonisaYon process has reduced by 60% the number of different organisaYons (from 97194 for 2.2 million addresses to 39765) GeolocalisaYon: a process using a two step approach has enabled a 99% geolocalisaYon rate on filled data (however a large number of patents do not have any informaYon on inventor’s addresses and this requires further developments). The current database system is My SQL INFORMATION ON THE DATABASE SYSTEM LEGAL ISSUES ENCOUNTERED AND ACCESS CONDITIONS The dataset is only accessible for research purposes (no commercial use is authorised). Only aggregated data can be published in public reports and/or academic journals. Users need to belong to an insYtuYon that have both a subscripYon to the Web of Science and to Patstat. BASIC CHARACTERISTICS The Nano S&T dynamics database (Nano) developed by IFRIS collects publicaYons and patents between 1991 and 2010 about nano S&T. One central characterisYcs of emerging S&T is that they do not correspond to preexisYng categorisaYons and require the elaboraYon of semanYc based queries. IFRIS has developed a dynamic query gathering 1.18 million publicaYons and 735000 priority patents. Four types of enrichments have been organised dealing with: (i) categorisaYon and harmonisaYon of insYtuYonal affiliaYons, (ii) geolocalisaYon of all authors and inventors; (iii) geographical clustering of S&T acYviYes; and (iv) themaYc clustering of S&T acYviYes. It offers for each publicaYon, 14 ‘main units of observaYon’ and 11 for patents ClusterisaYon: 2 clusterisaYons have been developed. The first one is based on central ciYes (with more than 1000 publicaYons) using geographical distance to build clusters: this has idenYfied 203 clusters covering 75% of publicaYons (see figure). A second one answers problems faced with a new methodology using DBScan and Chameleon. InformaFon on all variables/indicators: For each publicaYon 14 ‘main units of observaYon’: 10 classical units directly derived from the WoS and 4 units corresponding to the enrichments made (harmonised insYtuYon, type of insYtuYon, geographical coordinates, cluster For each patent, 11 ‘main units of observaYon’: 6 classical units derived from Patstat, the Inpadoc family, geographical coordinates and cluster of applicants and of inventors. firmes du DTI scoreboard total nano % Electronic & electrical equipment 103 70 68% Technology hardware & equipment 226 150 66% Chemicals 96 84 88% Pharmaceuticals & biotechnology 153 73 48% Health care equipment & services 53 39 74% Automobiles & transport 86 59 69% Aerospace & defence 35 24 69% Materials & construction 55 42 76% Oil, Gas & Electricity 53 39 74% Food producers inc. Beverages) 32 16 50% General industrials 38 24 63% Household & personal goods 40 21 53% Industrial engineering 70 35 50% Telecom & media 32 14 44% Software & computer services 110 14 13% banks, insurance, retail, leisure 49 6 12% total 1231 710 58%

Upload: nguyentuong

Post on 29-Jun-2018

225 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Nano science and technology dynamics database - RISISrisis.eu/wp-content/uploads/2014/08/IFRIS-Nano-St-dynamics-poster.pdf · Nano science and technology dynamics database ... technologies:

Nano science and technology dynamics database

Lionel  Villard*  –  Michel  Revollo*  –  Aurélie  Delemarle**  –  Bernard  Kahane*  –  Andrei  Mogoutov**  –  Philippe  Larédo**  *  ESIEE,  IFRIS    **  Ecole  des  Ponts,  IFRIS  ***  Aguidel,  Paris  

ILLUSTRATIONS  OF  RESULTS  OBTAINED  EXAMPLES  OF  RESULTS  OBTAINED  (1)  The  treatment  of  patents  owned  by  large  firms  (cf  CIB  for  a  list)  highlights  a  unique  paOern:  the  presence  of  nano  patents  in  all  sectors.  It  drove  us  to  suggest  that  we  face  a  new  type  of  “General  Purpose  Technology”  (cf  Bresnehan  &  Trajtenberg  1995)  not  based  on  an  industry,  but  transforming  the  R&D  acYviYes  of  all  exisYng  industries.  Larédo  P.  et  al.,  2010,  Dynamics  of  nanosciences  and  technologies:  Policy  implicaYons,  STI  Policy  Review  1,  43-­‐62.  (2)  The  analysis  of  strong  collaboraYons  between  clusters  highlights  three  features  (1)  weak  interconYnental  linkages,  (2)  a  very  hierarchical  structure  of  clusters  in  the  US  around  5  poles,  and  (3)  a  very  dense  collaboraYon  network  in  Europe  with  more  inter-­‐country  than  intra-­‐country  strong  linkages.    Larédo  P.  &  Villard  L.,  2013,  PresentaYon  FondaYon  Jean  Monnet  pour  L’Europe,  Lausanne,  4  octobre,  in  press.    

INFORMATION  ON  SUBSTANTIVE  CONTENT  OF  THE  NANO  DATABASE  Data  sources:  Patent  data:  Patstat  IFRIS  database  built  on  Patstat  version  October  2009  provided  by  EPO  and  complemented  by  REGPAT  database  (OECD)  and  INPI  (France).  PublicaYon  data:  Web  of  science  (SCI  expanded,  SSCI,  A&HCI,  Conference  proceedings  (COCI-­‐S  &  CPCI-­‐SSH)    Data  processing:  The  first  step  is  the  downloading  of  data  based  upon  the  query  developed.  The  seed  is  based  on  the  prefix  ‘nano’  (with  classical  excepYons  like  nanoliter)  generaYng  517050  publicaYons.  SemanYc  analyses  of  keywords  are  made  on  this  seed  idenYfying  the  internal  specificity  of  keywords  and  then  checked  about  their  external  specificity  (on  the  whole  WoS).  ArYcles  are  then  extracted  on  the  basis  of  the  vocabulary  selected.  One  vocabulary  deals  with  the  whole  period  (1991-­‐2010)  giving  the  ‘staYc’  extension;  and  a  year  by  year  analysis  builds  the  ‘dynamic’  part  of  the  dataset  (see  graph  above).  The  same  is  done  for  patents.  Three  enrichments  have  been  made:    -­‐  AffiliaYons:  a  categorisaYon  has  been  produced  in  5  types  and  a  triple  harmonisaYon  process  has  reduced  by  60%  the  number  of  different  organisaYons  (from  97194  for  2.2  million  addresses  to  39765)  -­‐  GeolocalisaYon:  a  process  using  a  two  step  approach  has  enabled  a  99%  geolocalisaYon  rate  on  filled  data  (however  a  large  number  of  patents  do  not  have  any  informaYon  on  inventor’s  addresses  and  this  requires  further  developments).      

The  current  database  system  is  My  SQL    INFORMATION  ON  THE  DATABASE  SYSTEM  LEGAL  ISSUES  ENCOUNTERED  AND  ACCESS  CONDITIONS  

The  dataset  is  only  accessible  for  research  purposes  (no  commercial  use  is  authorised).  Only  aggregated  data  can  be  published  in  public  reports  and/or  academic  journals.  Users  need  to  belong  to  an  insYtuYon  that  have  both  a  subscripYon  to  the  Web  of  Science  and  to  Patstat.    

BASIC  CHARACTERISTICS  The  Nano  S&T  dynamics  database  (Nano)  developed  by  IFRIS  collects  publicaYons  and  patents  between  1991  and  2010  about  nano  S&T.    One  central  characterisYcs  of  emerging  S&T  is  that  they  do  not  correspond  to  pre-­‐exisYng  categorisaYons  and  require  the  elaboraYon  of  semanYc  based  queries.  IFRIS  has  developed  a  dynamic  query  gathering  1.18  million  publicaYons  and  735000  priority  patents.    Four  types  of  enrichments  have  been  organised  dealing  with:  (i)  categorisaYon  and  harmonisaYon  of  insYtuYonal  affiliaYons,  (ii)  geolocalisaYon  of  all  authors  and  inventors;  (iii)  geographical  clustering  of  S&T  acYviYes;  and  (iv)  themaYc  clustering  of  S&T  acYviYes.    It  offers  for  each  publicaYon,  14  ‘main  units  of  observaYon’  and  11  for  patents    

-­‐  ClusterisaYon:  2  clusterisaYons  have  been  developed.  The  first  one  is  based  on  central  ciYes  (with  more  than  1000  publicaYons)  using  geographical  distance  to  build  clusters:  this  has  idenYfied  203  clusters  covering  75%  of  publicaYons  (see  figure).  A  second  one  answers  problems  faced  with  a  new  methodology  using  DBScan  and  Chameleon.      InformaFon  on  all  variables/indicators:  -­‐  For  each  publicaYon  14  ‘main  units  of  observaYon’:  10  classical  units  directly  derived  from  the  WoS  and  4  units  corresponding  to  the  enrichments  made  (harmonised  insYtuYon,  type  of  insYtuYon,  geographical  coordinates,  cluster  -­‐  For  each  patent,  11  ‘main  units  of  observaYon’:  6  classical  units  derived  from  Patstat,  the  Inpadoc  family,  geographical  coordinates  and  cluster  of  applicants  and  of  inventors.  

firmes du DTI scoreboard total nano %Electronic & electrical equipment 103 70 68%Technology hardware & equipment 226 150 66%Chemicals 96 84 88%Pharmaceuticals & biotechnology 153 73 48%Health care equipment & services 53 39 74%Automobiles & transport 86 59 69%Aerospace & defence 35 24 69%Materials & construction 55 42 76%Oil, Gas & Electricity 53 39 74%Food producers inc. Beverages) 32 16 50%General industrials 38 24 63%Household & personal goods 40 21 53%Industrial engineering 70 35 50%Telecom & media 32 14 44%Software & computer services 110 14 13%banks, insurance, retail, leisure 49 6 12%total 1231 710 58%