first interim report: response to comments from the community on the draft first interim report

Final version

R&D Evaluation Methodology and Funding Principles Response to comments from the community on the Draft First Interim Report

R&D Evaluation Methodology and Funding Principles

Response to comments from the community on the Draft First Interim report

February 5, 2015

Bea Mahieu and Erik Arnold, Technopolis Group

Response to comments from the community on the Draft First Interim Report

R&D Evaluation Methodology and Funding Principles i

Table of Contents 1. Introduction 4

2. Key topics for clarification 6

2.1 The attention for applied research and innovation 6 2.2 The value of the evaluation for institutional management 6 2.3 The research outputs 6

3. The Evaluation Methodology 8

3.1 The evaluation structure 8 3.1.1 Definition of an EvU 9 3.1.2 Non-participating research organisations 9 3.1.3 Minimum threshold for participation – types of outputs 9 3.1.4 Disciplinary classification of the research outputs 11 3.1.5 Definition of a Research Unit 11 3.1.6 What are subject panels and how do they relate to fields? 14 3.1.7 Selection of the researchers for an RU 14 3.1.8 Food science a field? 15 3.1.9 Assignment of an RU to a subject panel 15 3.1.10 Interdisciplinary research 15 3.1.11 The threshold for cross-referrals and registration as Interdisciplinary Research Unit 16 3.1.12 The size of the Research Units 17

3.2 The scope of evaluation 17 3.2.1 The typology of research organisations 17 3.2.2 Inclusion of all research organisation types 18

3.3 The evaluation method 18 3.3.1 The value of panel evaluation without site visits 18 3.3.2 The panel evaluation and the value for institutional management 19 3.3.3 The evaluation panels – the calibration exercise and the role of the main panels 20

3.4 The assessment criteria in general 21 3.4.1 The need for information on the 5 criteria – the view of a scientific research organisation 21 3.4.2 The need for information on the 5 criteria – the view of some RTOs 22 3.4.3 The balance basic/applied research 22 3.4.4 Bias towards science? 23 3.4.5 Definition of the research outputs 26


ii R&D Evaluation Methodology and Funding Principles

3.5 The assessment criterion Research Environment 26 3.5.1 Researchers with multiple contracts and the issue of FTE 26 3.5.2 The indicator on inbreeding 26 3.5.3 Collecting information at the EvU rather than RU level 27

3.6 The assessment criterion Scientific Research Excellence 28 3.6.1 Number of outputs submitted 28 3.6.2 ERC grants as an indication of excellence 29

3.7 The assessment criterion Overall Research Performance 29 3.7.1 Research productivity and the issue of co-publications 29 3.7.2 Research productivity and the counting of books 30 3.7.3 The publishing profile and the use and limits of bibliometrics 30 3.7.4 The sub-criterion ‘Ability to attract PhD students’ 31

3.8 The evaluation results 31 3.8.1 Evaluation results at the institutional (EvU) level 31 3.8.2 Evaluation results at the national level 32 3.8.3 Aggregation of evaluation results or synthesis of scores 32

3.9 The choice of the comparator countries 33


4 R&D Evaluation Methodology and Funding Principles

1. Introduction

This document is part of the consultation process with the community in the Czech Republic related to the design of the new R&D evaluation methodology and funding principles.

It contains the response from the study team on the many requests for further information and clarifications as well as on the critique expressed by the research stakeholders in the country on the Evaluation Methodology as it was explained in the draft version of the First Interim Report.

We received the comments on the report on December 4 and discussed the EM further with the research community during the conference on January 7. We delivered the final version of the First Interim Report to the IPN team on January 19.

First of all, we would like to thank the numerous stakeholders for their comments and contributions to the design and improvement of the proposed EM. Their feedback is of high importance to us. We received comments from

• The research community, funding bodies and other stakeholders – 166 organisations, for a total of 878 (partial) comments

• The IPN project team and its Key Activities members – for a total of 40 general observations and 144 detailed ones, as well as the review by two international experts

• The RD&I Council – for a total of 21 comments

• The AVO – summarising the comments by its members and providing their detailed comments

• The 20 Small Pilot Evaluation (SPE) panel members and 3 panel coordinators/ secretariat members, each responding to 8 feedback questions. We are still in the process of collecting the feedback from the organisations that participated in the SPE; full reporting on the SPE will be done in the Third Interim Report

The high number of organisations that provided feedback and the approximately 1200 specific comments received illustrate the importance that the Czech R&D community attributes to the design of the new evaluation methodology.

The intent of this document is to give the stakeholders a view on how we took their feedback into account in the final version of the First Interim Report – or the reasons why we didn’t (so far) and our reflections on the matter.

Quite obviously, seeing the high numbers involved, our response cannot be exhaustive. However, there were some major recurring themes in the comments or requests for specification, which we cover in the sections below.

It should be noted that different interest groups in the research community often made contrasting requests for modifications. We have accepted those that effectively could contribute to the improvement of the evaluation methodology from a methodological perspective. In this context we considered the requirement to cover the full range of research activities, their outputs and impacts – from basic research to applied research and innovation, and the need to maintain a balance between the direct and indirect costs of the evaluation exercise, the double function of the EM (providing strategic information and driving the PRFS component of the institutional funding system), and the need to provide the evaluation panels with an adequate level of quality information.


R&D Evaluation Methodology and Funding Principles 5

We would like to draw the attention of the reader to the several sections and paragraphs in the Draft First Interim Report that were substantially modified in the final version of the report and to the new sections that were written with the intent to explain better the EM to the Czech community. These changes were triggered by the comments from the community. The modified sections and paragraphs are:

• Two sections in the Introduction of the report: The Evaluation Methodology – work in progress, and The Evaluation Methodology proposed in this report

• Section 4.3.2 – The use of thresholds

• Section 4.4 – Evaluation method (paragraphs on site visits)

• Section 4.5.2 – The assessment criteria in detail

• Section 4.5.3 - Tools for the data collection

• Section 4.5.4 – Indicators used (sub-section on the calculation of FTE researchers, on Research outputs, thresholds and rulings)

• Section 4.6.2 – Interdisciplinary research

• Section 4.7 – The evaluation results

• Section 4.8.2 – The use of the national RD&I Information System

• Section 5.1.1 – The governance and management structure

• Section 5.1.2. – Roles and tasks of the Evaluation Management Team

• Section 5.2.4 – Working methods (the paragraph on the main panel working methods)

• Section 5.3.1 – Integrity of the panel review process (the paragraph on Conflicts of interest)

In this document, we focused our response on the comments made related to the Evaluation Methodology. Wherever relevant, we refer also to comments made related to the evaluation implementation.

We structured the document following as much as possible the structure of the chapter on the Evaluation Methodology in the First Interim Report.

Seeing the length and level of detail of the comments and this document, we first give an overview of what the key topics were in the comments we received and in which sections in this report we provided our specific response.



2. Key topics for clarification

2.1 The attention for applied research and innovation A major point of contention was the perceived focus on scientific research and therefore a lack of attention for the assessment of applied research and innovation in the EM. Comments related to this issue addressed, amongst other, the selection of the eligible research outputs, the topics covered in the assessment criteria valid for all types of research organisations, and the use of evaluation panels.

We invite the readers that were particularly concerned on this topic to read the following sections:

• Section 3.1.2 - Non-participating research organisations

• Section 3.1.3 - Minimum threshold for participation – types of outputs • Section 3.3.3 - The evaluation panels – the calibration exercise and the role of the

main panels

• Section 3.4.1 - The need for information on the 5 criteria – the view of a scientific research organisation

• Section 3.4.2 - The need for information on the 5 criteria – the view of some RTOs • Section 3.4.3 - The balance basic/applied research

• Section 3.4.4 - Bias towards science?

2.2 The value of the evaluation for institutional management Another point of concern was to what extent the evaluation results would constitute valuable strategic information for the research organisation and its management.

We invite the readers to read in particular the following sections:

• Section 3.1.12 – The size of the Research Units

• Section 3.3.1 - The value of panel evaluation without site visits • Section 3.3.2 - The panel evaluation and the value for institutional management

• Section 3.5.3 - Collecting information at the EvU rather than RU level • Section 3.8.1 - Evaluation results at the institutional (EvU) level

• Section 3.8.3 - Aggregation of evaluation results or synthesis of scores

2.3 The research outputs The debate on research outputs is a long-standing one in the Czech research community. The intensity of it was fully understandable and justified in the context of the Metodika where the funding was based only on the volume of research outputs (and lately also quality, to a certain extent). It is less so in the context of the new EM, focused on the assessment of quality rather than quantity. Nevertheless, a large number of the comments received focused on this topic.

We invite the readers to read in particular the following sections:

• Section 3.1.2 - Non-participating research organisations

• Section 3.1.3 - Minimum threshold for participation – types of outputs • Section 3.4.3 - The balance basic/applied research



• Section 3.4.4 - Bias towards science?

• Section 3.6.1 - Number of outputs submitted for review

• Section 3.7.1 - The issue of co-publications

• Section 3.7.2 - The value of books

• Section 3.7.3 - The publishing profile and the use and limits of bibliometrics



3. The Evaluation Methodology

Topics that are covered in this chapter are

• The evaluation structure (Section 3.1), i.e. the definition of the Evaluated Unit and the threshold for participation and its implications, the definition and selection of a Research Unit, and the coverage of and attention to interdisciplinary research

• The scope of the evaluation (Section 3.2), i.e. the typology of ROs and the coverage of all research organisation types

• The evaluation method (Section 3.3), i.e. use of panel evaluations without site visits, the value of the findings for institutional management, and the panel working methods

• The assessment criteria in general (Section 3.4), covering general topics such as the need for the 5 performance criteria for all types of RO, the balance basic/applied research for the assessment of the quality of research and the exclusion of applied results from the list of eligible outputs

• The assessment criterion Research Environment (Section 3.5), focusing on the ways and focus of information collection and their reason

• The assessment criterion Scientific Research Excellence (Section 3.6), the number of outputs that can be submitted and the use of the ERC grants as measure of excellence

• The assessment criterion Overall Research Performance (Section 3.7), including the issue of how to count co-publications, books, the sue of bibliometrics, and the sub-criterion ‘Ability to attract PhD students

• The evaluation results (Section 3.8), at the institutional (EvU) level, the national level, and the approach for the aggregation of the evaluation results

• The choice of the comparator countries for the analysis of international practice (Section 3.9)

3.1 The evaluation structure Section is this chapter cover the comments related to the definition of an EvU and to the Research Units. The section is structured as follows:

• Related to the EvU:

− Section 3.1.1 - Definition of an EvU

− Section 3.1.2 - Non-participating research organisations

• Related to the definition of an RU:

− Section 3.1.3 - Minimum threshold for participation – types of outputs

− Section 3.1.4 - Disciplinary classification of the research outputs

− Section 3.1.5 - Definition of a Research Unit

− Section 3.1.6 - What are subject panels and how do they relate to fields?

− Section 3.1.7 - Selection of the researchers for an RU

− Section 3.1.7 - Food science a field?



− Section 3.1.9 - Assignment of an RU to a subject panel

− Section 3.1.10 - Interdisciplinary research

− Section 3.1.9 - The threshold for cross-referrals and registration as Interdisciplinary Research Unit

− Section 3.1.11 - The threshold for cross-referrals and registration as Interdisciplinary Research Unit

− Section 3.1.12 - The size of the Research Units

3.1.1 Definition of an EvU Several organisations commented on the fact that in the draft version of the First Interim Report, an EvU was defined as “a research organisation, except for the public HEIs where the Evaluated Unit is a Faculty”. This definition gave the impression that Institutes or Centres would not have the possibility to act as EvU. This was not the intention.

Response:

In the Final First Interim report, the definition is corrected in “An Evaluated Unit is a research organisation, except for the public HEIs where the EvU is a Faculty or Institute or any other organisational unit at that level such as Centres” (see Section 4.2)

3.1.2 Non-participating research organisations The question raised was what the effect of non-participation would be for institutional funding. It was also the impression that this meant that an EvU could select only some of its Research Units to participate in the evaluation, while leaving out others.

Response:

The EM indicates that participation in the evaluation should be voluntary. The reasoning for this clause was that any evaluation exercise of the type we propose inevitably causes a burden on the evaluated organisation; small research organisations (RO) should therefore have the possibility to opt out.

In the Final First Interim Report we specify that non-participation in the evaluation implies that the research organisation will not benefit of the performance-based research funding component of the institutional funding budget (see Section 4.2, last paragraphs). We also specify that a participating EvU should participate as a whole, i.e. it cannot participate only with some of its researchers/departments and select which RUs participate and which not.

In the Second Interim Report on the Funding Principles, we propose that the performance-based research funding component would constitute 15% of the total institutional funding (see Section 3.2.3). Whether or not this research organisation will anyhow undergo some sort of evaluation – outside of the ‘national’ evaluation – is a decision of the funding body.

This specification regards also the research organisations that do not exceed the minimum threshold; we cover this topic in the next paragraph.

3.1.3 Minimum threshold for participation – types of outputs The discussion around the minimum threshold rotates predominantly around the types of outputs that are eligible for inclusion, i.e. scholarly outputs (Articles - J, Conference proceedings – D, Books - B), non-traditional scholarly outputs in the form of Results used by the funding provider – H, Classified information – V, Certified methodologies, art conservation methodologies and specialised map works - N, and



IPR-related outputs (Patents and patent applications – P, and Plant/breeders rights – Zodry & Zplem) (see Exhibit 27 in the Draft First Interim Report).

Similar to the discussions around the outputs that can be submitted for review or counted in for the calculation of research productivity (see also Section 3.4, below), there are two contrasting views:

• Reflecting past experience and the discussions in the community that led to the exclusion of many types of outputs from the Metodika 2013-2015 (and in particular the ‘applied research’ ones), some organisations see the danger of gaming. We quote, “The scheme of "thresholds" invites for gaming: at least 50 "research outputs" in 5 years could be easily generated by even small, low-quality groups if "weak" outputs as papers in low-quality journals, conference proceedings, brief home-published books and chapters therein (claimed to be monographs), etc. are counted as "research outputs". (It is surprising to read that even "patent applications", i.e., not necessarily awarded patents, should count.)”

• Other organisations consider that the exclusion of ‘applied research’ outputs (such as software, prototypes etc) discriminates research organisations that conduct this type of research (technical universities, RTOs, etc). These organisations also criticise the reference made to the Metodika 2013-2015 and see this as an indication of a fundamental bias in the Evaluation Methodology towards scientific (basic) research

Response:

In the Final First Interim Report we did not change the outputs that are eligible for this threshold. We based our decision on the following considerations:

• The R&D base in the Czech Republic in terms of type and number of research organisations is highly fragmented, leading to a high number of research organisations. It also includes many organisations where research is not a primary activity, which is not an exceptional situation as such. However, the benefit and value of including in an (expensive) national research assessment also research organisations that produce only a very limited number of research outputs is questionable. It hardly can be considered an efficient investment of public money

• The objections of the research community related to inclusion of applied research outputs for which the quality is not verifiable through external sources cannot be ignored. Past experience shows, in fact, that inclusion of these outputs in numerical indicators has led to a proliferation of these outputs and consequently gaming. Gaming is a potential effect of every evaluation methodology linked to funding and any evaluation methodology tries to avoid such unintended effects, to the extent possible. For this reason, eligible research outputs were selected based on the reliability of their verification process, which is linked to their registration in external databases and/or by external users. The inclusion of the V-type non-traditional output was needed to cover security research

• A preliminary simulation of the effects of the minimum threshold shows that these are to be considered marginal; it is effectively excluding only very ‘small’ research organisations in terms of their volume in research outputs (see Section 4.3.2 in the Final First Interim Report. We will update these data for the Final Study Report).

As a consequence, we considered that the inclusion of also applied research outputs for the calculation of this minimum threshold constitutes a needless risk for gaming.

Other parts of the EM address interactions with society and societal impacts of RD&I. These provide additional means through which the role of the RUs in innovation can be assessed and ultimately rewarded.



The request for inclusion of any type of applied results seems to be more a statement of principle in the discussion on basic versus applied research between the scientific research organisations and those focusing on development and innovation rather than indicating an unfair approach in the Evaluation Methodology. We respond to the critique on the bias towards science in Section 3.4.4, below.

3.1.4 Disciplinary classification of the research outputs Some research organisations asked on which base the disciplinary classification of a publication will be determined. Options are

• The classification in the RD&I Information System (VaVaI), determined by the research organisation’s allocation of the research output against a specific field upon its registration in the system, and

• The classification of the journals in the international databases such as WoS, Scopus etc

Response:

We specified in the Final First Interim Report that for the definition of the RU the classification by the research organisations in the RD&I IS will be taken into account. This is needed in order to allow for continuity with the current system and the use of historical data. It is also the system that is easiest consultable by the research community. Most important, not all journals are covered in the international databases; the use of a discipline classification based on journals would require a scrutiny and decision-making by the Czech research community. This is done in some countries such as Norway and Denmark, but establishing such a procedure is a lengthy process and can be an option only in a longer-term perspective. We will address this in the Third Interim report.

The bibliometric data report for the evaluation panels will show the field classification of the publications by the researchers in the RU as they indicated them at the moment of registration as well as the field classification of the journals where the papers have been published, whenever available in the international bibliometric databases.

3.1.5 Definition of a Research Unit Some research organisations still seem to be confused on what and who constitutes a Research Unit, so it seems useful to repeat here the specifications.

Response:

As is explained in the Final First Interim Report, a Research Unit is composed of the individual researchers in an EvU that have their major focus of research on one scientific field. The scientific fields are the ones listed in the table below. They are broad – e.g. Chemical sciences, Mathematics, Basic medicine etc. exactly in order to allow for a RU to include researchers that do (interdisciplinary) research covering different sub-fields.

A Research Unit is therefore a sub-set of an EvU, but it does not necessarily represent a coordinated or collaborating research group. In practice it can consist of some or all researchers in one department, in two or more departments, or in 1 or more department(s) plus some research groups.

All researchers in an EvU have to be allocated to an RU, and each researcher can be allocated to only 1 RU in the evaluated unit.

There is no maximum threshold. This implies that an EvU where all of the research is conducted in one field of research can register only one RU.

An EvU can register more than 1 Research Unit only if each of these Research Units have produced a minimum of 50 research outputs in the respective fields during the evaluation period.



Disciplinary Area Scientific Field 1. Physical sciences 1.1 Mathematics

1.2 Physical sciences 1.3 Chemical sciences

1.4 Earth and related Environmental sciences

1.5 Other natural sciences

2. Engineering and technology

2.1 Civil engineering

2.2 Electrical engineering, Electronic engineering, Information engineering

2.3 Computer and information sciences

2.4 Mechanical engineering

2.5 Chemical engineering

2.6 Materials engineering

2.7 Medical engineering

2.8 Environmental engineering

2.9 Environmental biotechnology

2.10 Industrial biotechnology

2.11 Nano-technology

2.12 Other engineering and technologies

3. Medical and Health sciences

3.1 Basic medicine

3.2 Clinical medicine

3.3 Health sciences

3.4 Medical biotechnology

3.5 Other medical sciences

4. Biological and Agricultural Sciences

4.1 Biological sciences (Medical to be 3) 4.2 Agriculture, Forestry, and Fisheries

4.3 Animal and Dairy science

4.4 Veterinary science

4.5 Agricultural biotechnology

4.6 Other agricultural sciences

5. Social sciences 5.1 Psychology

5.2 Economics and Business

5.3 Educational sciences

5.4 Sociology

5.5 Law

5.6 Political science

5.7 Social and economic geography

5.8 Media and communications

5.9 Other social sciences



Disciplinary Area Scientific Field 6. Humanities 6.1 History and Archaeology

6.2 Languages and Literature

6.3 Philosophy, Ethics and Religion

6.4 Arts (arts, history of arts, performing arts, music)

6.5 Other humanities

3.1.6 What are subject panels and how do they relate to fields? Another point where clarification is needed for some research organisations is the concept of the subject panels and how these relate to the fields.

Response:

The objective is to run the evaluation exercise with approximately 24 subject panels. These subject panels will be structured at the level of fields, taking into account the volume of research activity in the CR.

This means that a distinction needs to be made between:

• Fields with a high volume of research activity in the CR (e.g. all the fields in the ‘Physical sciences’ disciplinary area, clinical medicine, biological sciences etc. (see the Final First Interim report, exhibit 6), and

• Fields with a low volume of research activity in the CR (e.g. mechanical engineering, industrial biotechnology, veterinary science, sociology, political science etc.

The subject panels will have different characteristics:

• For the fields with a high volume, the subject panels will cover 1 field only and the panel members are experts in the specific sub-fields within the field

• For the fields with a low volume, the subject panels will cover several fields and the panel members will be experts in the specific fields. These panels may have to be larger than the field-specific ones above in order to cover all expertise needed (but should not exceed the number of 10)

The definition of the subject panels for an evaluation will be the task of the evaluation committee before launching the valuation exercise, based on an analysis of the publication volume in the fields over the evaluated period and with the support of scientific experts in order to combine fields into subject panels in the most adequate way.

3.1.7 Selection of the researchers for an RU Some research organisations saw a danger for gaming. They considered that because the selection of the researchers to define a Research unit is the task of an Evaluation unit, the EvU might choose only the best researchers to be part of an RU.

Response:

The EvU is expected to include all of its researchers and it has all the interest to do so.

In the Final First Interim Report (Section 5.3.2) we specified that upon registration of the RUs, the EvU will sign a statement on the accurateness of the information provided and the willingness to deliver any proof, upon request.

We also specified that the evaluation management team and panel secretariats will perform random checks regarding, for example, the number of researchers, PhD graduates, and strategic partnerships and the volume of grants and contract research,



and that the submitted information will be confronted with information in available databases about, for example, staff and revenues of research organisations, dissertations, grants and service contracts.

In this specific case, a comparison of the names of the researchers included in all the EvU’s RUs with the information in the RD&I IS will easily identify cases of gaming.

The assessment criteria are accompanied with a clause declaring: If fraud or dishonesty is detected, the panels will assign the lowest quality levels for the RU against all assessment criteria.

3.1.8 Food science a field? Another research organisation highlights that Food Science is an important field of research in the Czech Republic and that it therefore should be considered a field for the evaluation, rather than sub-field as it is in the OECD-based classification.

Response:

We feel reluctant to change the OECD classification as it currently is, due to the clear advantages it brings for international comparisons and the consideration of interdisciplinarity in the field. This said, it is up to the Evaluation Management Committee to identify the subject panels for (each) evaluation and this aspect can be taken into consideration at that moment.

3.1.9 Assignment of an RU to a subject panel Other organisations considered that research units are created by selecting researchers who have their primary research in the field. However these researchers publish also in other fields and even in other disciplinary areas. They saw the danger that the RU would be assigned to an evaluation panel that is not appropriate.

Response:

It is the responsibility of the EvU to register the RU and choose the subject panel for evaluation. Whenever the evaluation management team and/or the (main) subject panels identify issues on the matter, they will inform the EvU and agree on another assignment.

3.1.10 Interdisciplinary research Research organisations also noted that many journals in WoS are assigned to multiple disciplines and asked, “Are all of these going to be taken into account? How is this going to be tackled in order to avoid gaming (of bibliometric evidence) supporting claims for doing interdisciplinary research?”

Other research organisations considered “absolutely unacceptable” the fact that each researcher can be assigned to a single Research Unit only. They state, “This means, that if a researcher contributes to two fields nearly equally, only one half of his/her activity is counted and only one field profits from his activity. The other fields, to which he/she contributed a lot as well, cannot count his/her results as he/she is assigned only to a single field. A significant part of the performance of the EvU can be lost in such a way. This is neither fair, nor natural or logical, nor defendable! Moreover, it demotivates to co-operate between distinct units within an evaluation unit.”

Response:

As we mention in Section 4.6.2 of the Final First Interim Report, the EM includes measures as broad as possible to take interdisciplinary research into account, covering interdisciplinary research both among fields in a disciplinary area and across areas.

First of all, an individual researcher is requested to assign him/herself to a Research Unit in his/her main field of research. This does not imply that his/her research



activities in other fields will be ‘lost’. The research outputs considered for the analyses in the bibliometric report as well as those that can be submitted for review encompass all research outputs from the researchers included in the RU, no matter the field. The blibliometric report indicates the fields against which these outputs are registered, in the RD&I IS and the international databases.

The RU also has the possibility to indicate interdisciplinarity of the researchers’ activities – and is explicitly recommended to do so in the background information that they are asked to provide (see the Guidelines for the evaluated research organisations Section 3.3.2 – Background report to the Final First Interim Report). The RU is asked also to indicate how interdisciplinarity is linked and fits within the strategy of the research organisation in the field.

Second, in those cases where interdisciplinarity is a fundamental characteristic of the research conducted, the RU can ask for cross-referrals in the evaluation process (interdisciplinarity within a single disciplinary area) or can apply for registration as an Interdisciplinary Research Unit (interdisciplinarity covering more than 1 disciplinary area). The main panels will decide whether cross-referrals or the registration as an Interdisciplinary Research Unit can be accepted, based on the ‘proof’ provided a) that interdisciplinarity is effectively a significant aspect of the research strategy in the field, b) that it influences the profile of the researchers involved, and c) the interdisciplinarity is illustrated by the characteristics of the research outputs.

For the sake of clarity, ‘cross-referral’ stands for the assessment of the submitted information by 2 or more subject panels within the disciplinary area. The RU ‘core’ subject panel assigns the starred levels against the assessment criteria, building upon advice from (an)other relevant subject panel(s). The ‘core’ subject panel informs the other panel(s) of its decision. In case that there is disagreement between the ‘core’ and the other panel(s), the main panel chair intervenes and has the final word.

3.1.11 The threshold for cross-referrals and registration as Interdisciplinary Research Unit A research organisation asked whether the 30% threshold suggested to identify interdisciplinary research was “evidence-based, i.e. based on analysis indicating that there is in fact significant drop around 30%, or is it rather a kind of rule-of-thumb?” They specified “Since gaming is likely around this threshold, it would be preferable to consider in more detail the available evidence in order to specify the exact value of this threshold. In other words, it should be supported by computation based on data from the Czech system.”

Response:

We have defined the minimum of 30% based on data related to research institutions and fields where interdisciplinary research typically occurs. The key intent of that analysis was to ensure that the minimum threshold for participation in the evaluation and for the constitution of a RU (50 publications in a field) would not penalise interdisciplinary research.

At the same time, the threshold intends to limit the risk of a proliferation of requests for cross-referrals and requests for Interdisciplinary Research Units, which would have highly negative effects on the costs of the evaluation exercise (more panels and more panel members needed).

In relation to the potential gaming effect, the Final First Interim report specifies that RU can apply for cross-referrals between the subject panels or the registration as an Interdisciplinary Research Unit. As mentioned in the previous section, the RU will need to submit information on their publication profile (bibliometric data) as well as ‘prove’ interdisciplinarity by providing information on the profile of the researchers



included in the RU (CVs) and their research strategy. The main panel chairs will decide whether to accept the applications.

3.1.12 The size of the Research Units Some research organisations pointed out that in many Evaluated Units the current definition of a Research Unit will lead to the creation of huge RU, including hundreds of researchers. They wondered to what extent the performance assessment at such a broad level (i.e. including all departments or institutional units) could provide valuable strategic information. One of them stated, “For such large RU the crude characterization of each of the five assessment criteria by one numerical value (4 to unclassified) will make such assessment useless for formative function and will not provide ‘strategic information to actors at all levels’. “

Response:

First, we need to point out that the starred quality levels for each criterion will not constitute strategic information for any of the RUs, no matter how big or small. Starred levels are to be considered merely a synthesis of the assessment.

The ‘strategic information’ will derive from the justification provided by the evaluation panels of the reasons why they assigned the specific quality level and their conclusions and recommendations. From that perspective, the panel’s feedback will be of direct value to the institution’s management. We cover this topic further in Section 3.8.1, below.

3.2 The scope of evaluation In this chapter we cover the comments related to the defined typology of research organisations (Section 3.2.1) and the inclusion of all research organisation types in the EM (section 3.2.2).

3.2.1 The typology of research organisations Some research organisations asked for clarification and wondered whether “scientific research institutes (part of scientific ROs) include the public research institutions (VVI) of the Academy of Science or all the public research institutions or other.”

Others strongly contest the validity of the typology of research organisation as defined in the EM. They consider that research organisations in the CR should be classified taking into account their legal forms. They also state, “The only meaningful categorization of genuine RO is as follows

i. HEI, primarily Universities

ii. Public RO of the Academy of Sciences

iii. Public RO and State RO (Statní příspěvkové organizace) established by Ministries

iv. Business sector RO”

Response:

In the Final First Interim Report (see Section 3.3.2), we clarified that in the context of an evaluation, a categorisation of research organisations is needed based on their function in the National Innovation System. A basic concept for evaluation is that one assesses primarily to what extent an organisation has reached the results it was expected to achieve, based on its purpose and stated objectives. This is best described in terms of the profile of the primary (expected) beneficiaries of the research outputs and outcomes/impacts.

We also indicate in Section 4.3.1 that research organisations that are not ASCR research institutes, Higher Education Institutes and University Hospitals apply for registration under a specific category. This application needs to be approved by the



Evaluation Management Team, wherever relevant in consultation with the Evaluation Management Board.

3.2.2 Inclusion of all research organisation types In the last comment reported in the section above, national resource/infrastructure research organisations are excluded from the list of ‘genuine’ RO types. The research organisation also considers that changes are needed in the range of research organisations covered by the evaluation (and institutional funding principles). As a consequence, the research organisation considers that an EM covering more than the scientific research organisations is a needless effort.

Response:

Our position is that this is a discussion that should be conducted in another platform and decided upon by the Czech policy makers. The EM is designed following the Terms of Reference for the study and reflects the current policy decision on which organisation is entitled to be considered ‘research organisation’. The task of this study is to design an evaluation methodology, not to criticise policy decisions or to substitute for policy makers.

The term Research Organisations in the report therefore refers to the research organisations that are recognised as such by Czech law.

In the Draft Second Interim Report we wrote a section on the institutional funding of different research organisations from an ‘Economics of Knowledge’ perspective. We will expand our reasoning on this topic for the Final Second Interim Report.

3.3 The evaluation method Comments related to the evaluation method are categorised and responded to in the following sections:

• Section 3.3.1 - The value of panel evaluation without site visits

• Section 3.3.2 - The panel evaluation and the value for institutional management

• Section 3.3.3 - The evaluation panels – the calibration exercise and the role of the main panels

3.3.1 The value of panel evaluation without site visits Several research organisations wondered to what extent a panel evaluation could reach a rightful and fair assessment if the panel evaluation process excludes onsite visits.

Response:

There is no doubt that onsite visits are precious opportunities for the collection of more in-depth or eventual complementary information. We have recognised this in all the versions of the First Interim Report, and have also repeatedly indicated that a major barrier for the use of onsite visits in national evaluations is the cost of the exercise (see the Final First Interim Report, Section 3.4.1 and Section 4.4). We informed the Czech community that onsite visits are extremely rare in national research assessments because of their scale (to the best of our knowledge, New Zealand is the only country in the world that is willing to take up the high costs). As a norm, the method adopted is the one used also in the EM, i.e. remote peer review and assessment complemented with panel meetings.

Where evaluations are organised at field or institutional levels, onsite visits have been traditional but are increasingly substituted with other less-costly tools in case the panel require additional information, such as Question & Answer (Q&A) sessions using videoconference facilities and the invitation of the evaluated organisations’



representatives to a central location for interviews. In Section 4.4 we considered that in the Czech context, evaluations including site visits could be envisaged as a follow-up of the national evaluation; organised by the funding bodies and focused on research organisations that showed a poor performance in the national evaluation and/or did not pass the minimum threshold.

In the Final First Interim Report Section 4.5.3, we also indicated that we revised the indicators for which information needed to be provided, increasing the focus on the self-assessment by the Research Units. This revision was triggered by the feedback from the panel members in the Small Pilot Evaluation. They underlined the need for such more qualitative information that would enable them setting the data provided by the RU in their context. This would compensate for the lack of site visits.

A major remark of the panel members was the apparent general lack of attention for an appropriate transfer of qualitative information by the evaluated RU. This was a surprising and unexpected approach by the Czech research organisations. No doubt the time factor will have played a role there (the evaluated RUs had relatively little time available for filling in their submission forms); however, we suspect that also the approach in the former Metodika exclusively focused on the collection of numeric information has influenced the RUs’ expectation. We cover this further in the next section.

The new EM will require that RUs pay more attention to qualitative aspects of reporting, more than was the case in the past. The topics for self-assessment included in the EM are:

• The adequacy of the research infrastructure and facilities

• The RU research strategy

• The value of the RU activities for the advancement of research

• The competitive positioning of the RU in the national and the international context

• The societal relevance of the RU activities

• The RU SWOT analysis

3.3.2 The panel evaluation and the value for institutional management Several organisations questioned that the EM would be able to result in valuable strategic information at the EvU level, i.e. institutional management. The reasoning is that because the unit of assessment is defined as a Research Unit, which is not directly linked to the structure of the organisation (e.g. a department), the value of the evaluation for the institutional management would be minimal.

Response:

First, we expect that the leadership at th EvU level will be fully capable of reading and drawing conclusions from the EM panel reports about the fields in which the EvU is involved. Second, the proposed EM includes a process through which panels will aggregate findings from the RU level to the EvU level.

It can be expected that in practice, in most cases the Research Units (defined at the level of field) will encompass one of more departments, eventually together with some individual researchers and more probably, research groups.

A change made in the Final First Interim Report in order to improve the information for the evaluation panels on the organisational structure is the request to the RTU for background information (see Section 3.3.2 in the Background report to the Final First Interim Report “Guidelines for the Evaluated ROs”). This background information is to replace and improve the focus of the background information documents that the SPE coordinators/secretariat developed for the SPE evaluation



panels. The RUs are asked to describe “the historical (i.e. past 10 years) development of the RU in the context of the EvU” (Q014) and the main fields and foci of research in the RU, with the specification: “In case the research carried out is clearly specialised, describe each field separately. Describe the role of multidisciplinarity or interdisciplinarity and the role of basic and applied research. If the Research Unit spreads over different organisational units, e.g. departments, please list these units and their main fields and foci of research.” (Q015).

The RUs will have various other opportunities to explain eventual differences for various departments in the self-assessment questions mentioned above, providing valuable information to the evaluation panels in order to set and assess the institutional context for the research conducted in the field.

In the Final First Interim report (Section 4.5.3) we also highlight: “An important value of evaluations such as the one designed in this study is that it constitutes a major opportunity for a collective assessment of research performance by the researchers and their management structure, leading to an improved understanding of strengths and weaknesses, and ultimately – an improved research strategy. By increasing the importance of these activities in the current EM, we hope to foster this type of approach to research management in all research organisations.” As an example, the institutional management will be able to collect direct strategic information from this SWOT analysis performed at the RU levels.

In the Background report to the Final First Interim Report “Guidelines for the Evaluated ROs” (Section 2.1.1) we specify: “The submission of information for assessment is about more than the transfer of data. Self-assessment by the RUs is a crucial part of the evaluation. For this purpose, the EvUs will need to set up a structure (‘committees’) that coordinates and ensures the quality of the data collection and self-assessment process, for the EvU components and the RU components.”

3.3.3 The evaluation panels – the calibration exercise and the role of the main panels Some research organisations criticised the use of evaluation panels and specifically “their independence and subjective and formalistic approach to work with background documents” and “the absolute discretion of panel members in setting up weights for criteria in the individual fields”.

Response:

It seems that these research organisations misunderstood the scope of the calibration exercise and the working method of the evaluation panels, so a clarification seems needed. We refer also to Section 4.6.1 in the Final First Interim Report.

First, the evaluation panels will not design a final score for each RU and therefore are not asked to set up scoring weights for the 5 assessment criteria, in their field or taking into account the type of RO. As we explained in the Draft Second Interim Report, this will be the task of the policy makers in the context of the institutional funding allocation.

Second, the purpose of the calibration exercise is to identify the importance of sub-criteria for assessment, in the fields and for the different types of RO. The intent is to ensure that the assessment reflects the differences in the field specifics as well as the different policy expectations from the types of RO. This will be expressed in ‘weights’. However, the translation of the scores against the sub-criteria into an overall score for the assessment criterion will not be arithmetic.

The process is as follows:

• In their very first meeting, the panels will perform a calibration exercise identifying the importance of sub-criteria for their fields and for the different



types of RO. They will reflect on these topics ‘in the abstract’ and based on their own expertise.

• For each criterion, the panel members compare their notes from their individual remote assessment and their scores for the RU (at sub-criterion and criterion level) and reach an agreement on the score, first for the sub-criterion and then the criterion. It is in this moment that the panel secretariat reminds them of the ‘weightings’ that they decided upon for the different types of RO.

In this process the panels will not act in ‘absolute discretion’. The role of the main panel chair is to control this process and guarantee the adequacy of the importance attributed and the consistency of their implementation. The main panel members are to receive and review the subject panels’ reports on the calibration exercise (see Section 5.2.4 in the Final First Interim Report). The main panel can intervene and discuss decisions made by the subject panels in case they seem inappropriate

3.4 The assessment criteria in general There were a number of important comments on the use and selection of the assessment criteria and their indicators that cut across the various criteria.

We grouped them, and responded to them, in the following sections:

• Section 3.4.1 - The need for information on the 5 criteria – the view of a scientific research organisation

• Section 3.4.2 - The need for information on the 5 criteria – the view of some RTOs

• Section 3.4.3 - The balance basic/applied research

• Section 3.4.4 - The exclusion of applied research results from the count-outs - bias towards science?

• Section 3.4.5 - Definition of the research outputs

3.4.1 The need for information on the 5 criteria – the view of a scientific research organisation A research organisation questioned the need to collect information on all five evaluation criteria for all the research organisations. They consider, “The need for five evaluation criteria is not discussed. It will be very hard to find an EvU with a poor management and without a broad international contacts and cooperation that exhibits great research performance and research excellence. […] Since the first two criteria represent a great burden for EvU because the requested materials are not regularly collected they would not be applied to EvU (RU) exhibiting excellent results in the third and fourth criteria. The evaluation according to the first two criteria can (maybe) help non-excellent EvUs to improve their performance. The selection can be based on the present Metodika”.

Response:

The reasoning provided by the research organisation is correct in the sense that excellent institutional management and international collaborations constitute the base for excellent performance. Nobody is perfect, though; it will be very hard for even the best of research organisations to score a 5 or 4 on all assessment criteria. A structured self-assessment of strengths and weaknesses against all criteria is a useful exercise for any organisation; also well-performing organisations may be in need of advice and recommendations on how further to improve.

Most important, the EM is directly linked to the institutional funding for research organisations and its results will drive the performance-based component of the institutional funding for research (see the Second Interim Report). A key function of the EM is to create changes in the behaviour of the research communities by



rewarding performance against specific criteria (“steer” research behaviour” – see Section 3.2 and Section 4.5.1. in the Final First Interim Report). As repeatedly stated in the Final First Interim Report, these criteria are of relevance for all research organisations, no matter their mission or level of performance, and an assessment against all these criteria needs to be made for all research organisations, guiding their performance-based funding.

3.4.2 The need for information on the 5 criteria – the view of some RTOs Also some RTOs question whether “the volume and detail of the information required by EM” is of relevance for all RO. They specify, “The information of research staff, research training, career development, international research presence and collaboration, PhD students, competitive funding, research management makes no sense and has little relation to the RTOs’ mission and operation.”

Response:

Related to the importance of the information on the institutional management, in the Final First Interim Report we have

• Improved the description of research staff for non-scientific research organisations and will include further improvements in the future

• Specified that the information related to career management and number of PhDs regards only those research organisations that teach or train PhDs

Any research organisation, including RTOs, can and should draw benefit from contact with and input from international research organisations – if only to stay up-to-date with the latest developments in the field.

Information related to competitive funding gives the evaluation panel a view to what extent the research organisations’ activities are in line with and contribute to needs and the national priorities, for example related to research-industry collaboration, participation to cluster projects, or collaboration with scientific research organisations.

3.4.3 The balance basic/applied research It is interesting to note that the scientific research organisation making the comment covered in Section 3.4.1, above, omitted in its considerations the societal relevance criterion.

The position of that research organisation is in direct contrast to the view of the research organisation that asked for a better balance between basic and applied research and innovation in the EM, i.e. the quality of science and its societal impact. The research organisation notes a strong emphasis on bibliometrics and considers, “Even though we realize that high quality science is measured mainly by quality of publications, such evaluation could be considered imbalanced, especially in the view of poor results of the efforts of the Czech Republic to improve cooperation with the industrial sphere in the area of applied research via strategic documents (e.g. IUS). The report must contain information on cooperation with the industry, transfer of results, …, Contractual research is also mentioned only marginally.”

Response:

The EM is designed so that both the quality of science and its societal impact is assessed, the latter both in terms of conditions set up for the achievement of these impacts (collaboration with industry or other actors in society, awareness raising among specific interest groups or society in general, involvement in incubators etc.), and the effects reached. Important information collected in this context is also the level of contract research, with industries and public administration and agencies, and



in the Final First Interim Report we added the topic of the creation of spin-off companies.

In their re-construction of the RU profile related to the transfer of knowledge and technology, the evaluation panels also have at their disposal and look into information related to participation in competitive research programmes launched by specific ministries and agencies, in general and more specifically including centres of excellence or competence centres, incubators etc.

In the Final First Interim Report, we have clarified and further stressed the importance of the information that is to be provided in the RU’s self-assessment.

Finally, we refer to the Draft Second Interim Report, Section 3.4.1, where we reason on the importance of all criteria – including the societal relevance one – for all types of research organisations and present some scenarios for the weights to be attributed to them when considering the allocation of the institutional funding.

In other words, from our perspective the EM does take societal impact fully into account. The reason why there are several sections in the report dedicated to bibliometrics is merely a response to the insistence of certain actors in the community to have this aspect covered in-depth. We agree that there seems to be an overemphasis on this topic; we perceive it as a symptom of the difficulty for the research community to go beyond the exclusive focus on research outputs that existed in the Metodika.

Finally, one should bear in mind that this is an evaluation methodology that is aimed to drive a component of the institutional funding and while it intends to steer behaviour, including an increase in knowledge transfer activities, it is predominantly the task of the funding agencies to launch targeted programmes in this field in order to solve the problem.

3.4.4 Bias towards science? Several non-scientific research organisations consider the EM to have an overall bias towards science. They stated that the EM has an unbalanced approach and disadvantages these organisations, i.e. it is geared towards scientific research organisations and does not sufficiently cover outputs and outcomes of applied research and development.

They protest in particular against the exclusion of applied research outputs from the list of outputs that can be taken into account for the ‘Research Excellence’ and ‘Research Performance’ criteria. They state, “Information on outputs requested applies mostly to HEI and AS CR Institutes. Not so for business sector ROs as well as for most State Research organizations (museum, galleries, libraries) and organizational parts of the state which constitute about half of all RO.”

Applied research results for which inclusion is requested (for the minimum threshold and the research excellence/research performance criteria) vary and include:

• Non-publication outputs H, V, N, and P, prototypes, softwares, technologies etc. newly set definitons of eligible outputs. Current definitions in Metodika 2013-15(which is still valid) are very limited and unsuitable.

• Output types F, G, Z technology, R – with request for “further discussion on some output type definitions which providers experienced as problematic - H, N cert.metodologies“

• Semi-industrial scale plant (Zpolop), Verified technology (Ztech), Utility model (Fuzit), Industrial models (Fprum), Prototype (Gprot), Functional sample (Gfunk), Medical treatment (Nlec) and Software (R)

Response:

As an introduction to our response, we should mention that we have improved the definition of the Research Excellence and Overall Research Performance criteria in the



Final First Interim report, as well as the explanation of the data for the assessment (see Section 4.5.2).

We did not change the research outputs that are taken into account for these two assessment criteria. The more technical methodological reasons are set out in Section 3.1.3, above. In the paragraphs below, we try and explain the conceptual reasons for this decision.

The assessment of the criterion Overall Research Performance is based on two sub-criteria: ‘Research Output’ and ‘Competitiveness in Research’.

• The main indicators for the sub-criterion ‘Research Output’, i.e. a) research productivity, b) RU publication profile, and c) Value of the research activities for the advancement of research

• The main indicators for the sub-criterion ‘Competitiveness in Research’, i.e. a) Capacity to gain external funding (competitive and contract research), b) Ability to attract PhD students, and c) RU competitive positioning in research (self-assessment)

A core principle for the design of the EM is that research outputs are considered in terms of their quality. In contrast to the Metodika, the quantity of research outputs is of marginal importance in this EM: it is taken into account only for the assessment of ‘research productivity’, which is 1 of the 3 main indicators for 1 of the 2 sub-criteria (‘Research Output’) that will inform the assessment against 1 of the 5 criteria, i.e. the Overall Research Performance.

We specified that the criterion Research Excellence aims at identifying ‘peak’ quality in Scientific research outputs.

We distinguish between ‘scientific’ and ‘applied’ research outputs – rather than ‘basic’ and ‘applied’ research results - in order to avoid the wrong impression that applied research organisations do not publish scholarly outputs.

• ‘Scientific’ research outputs are the scholarly outputs, i.e. the ‘traditional’ scholarly outputs that are focused on communication to the ‘scholarly’ research community

• Applied research outputs are research outputs that are intended for

− The ‘Non-traditional scholarly outputs’ that most often communicate research results to non-scholarly communities, and

− Research results for innovation, i.e. IPR-related outputs and ‘non-publication’ outputs as listed by the research community in their comments above

Aspects of the quality of research outputs are assessed in three Assessment Criteria: Scientific Research Excellence, Overall Research Performance and Societal Relevance, depending on the type of outputs.

• The quality of scientific research outputs is assessed through

− Direct assessment, i.e. peer review in the Scientific Research Excellence criterion

− Assessment of the (potential) effects on the research community in the Overall Research Performance criterion (sub-criterion Research Output). This assessment is done based on the ‘RU publication profile’, i.e. informed by bibliometric data, and covers types and profiles of publication channels, types of outputs, etc. In this context, bibliometric data encompass more than WoS/Scopus data and the outputs considered include all peer-reviewed journal articles, so not only the JImp ones (see Exhibit 27 in the final First Interim Report)



− The value of the research activities for the advancement of research in the Overall Research Performance criterion (sub-criterion Research Output). This assessment is based on a description provided by the RU on the effects of their work and value for development of the scientific field and/or further R&D. In the Guidelines for the Evaluated Research Organisations (Background Report to the Final 1st Interim report, Q 051) we specify: Topics can include: major scientific breakthroughs, research leading to the development of new or improved concepts, methods, standards, industrial and utility designs, pilot plants, proven technologies, prototypes, software, new or improved processes, products, artistic outputs, research enabling an improved access to information or knowledge etc.

− The value of the research activities for society, in the Societal Relevance criterion. It should be noted that we refer to ‘societal’ (rather than ‘social’) relevance, i.e. the relevance for society. This encompasses industry and economy as well as environment, health, education, social welfare etc.

• The quality of applied research results is assessed – and can only be assessed - in terms of their value for the users, i.e.

− The extent to which the research results has (potential) effects on the research community and/or further R&D and therefore a potential long-term effect on the user. This is assessed under the sub-criteria ‘Research Output’ in terms of the ‘RU publication profile’ (for eligible applied results) and the ‘value of the research activities for the advancement of research’ (for any type of result)

− The extent to which the research results led to the take-up by industry or other users – and therefore short-to-medium effect on the user and impact on innovation (the Societal Relevance criterion).

As mentioned above, the Scientific Research Excellence criterion aims at identifying ‘peak’ quality in Scientific research outputs. It responds to the intent – and apparently need – to identify those research organisations that have the capacity to produce such outputs, influencing the country’s global competitiveness for scientific research. As such, this is a factor for competition mainly (but not only!) among scientific research organisations and, in fact, in the funding principles presented in the Draft Second Interim Report we propose to attribute a low weight to this criterion for the non-scientific research organisations.

In terms of competitiveness beyond scientific research, the research organisations have the possibility to describe their competitive positioning when providing information for the second sub-criterion on Overall Research Performance, i.e. the RU competitiveness in research, by means of self-assessment. In addition, for this sub-criterion, the capacity to gain external funding, i.e. competitive funding and contract research is another main indicator that is of particular relevance for the non-scientific research organisations (but not only!).

The assessment criterion Societal Relevance is entirely dedicated to the assessment of the quality of research results - as well as activities and outcomes – with respect to innovation. For this assessment criterion, quantitative and qualitative information is collected against the full range of activities and potential outcomes of applied research, i.e. the collaboration with industry, membership of advisory boards, the volume of competitive & contract research with/for industry, income from the commercialisation of research outputs, the creation of spin-off companies, IPR-related outputs (patents) and the geographical distribution of the patent offices, participation in incubators or clusters, the profile of the industry partners and/or clients, and the use of research outputs in the industry/business environment.



3.4.5 Definition of the research outputs A research organisation asked, “The term “Conference proceedings (D)” should be clearly defined. Does it mean conference proceedings or a paper in conference proceedings? Which conferences are eligible? Only those included in WoS or Scopus?”

Response:

The definition of Conference Proceedings – as for any other research output in the EM - is the one indicated in the Metodika 2013-15, Annex 2.

3.5 The assessment criterion Research Environment Comments specifically related to this criterion are grouped and commented upon in the following sections:

• Section 3.5.1 - Researchers with multiple contracts and the issue of FTE

• Section 3.5.2 - The indicator on inbreeding

• Section 3.5.3 – Collecting information at the EvU rather than RU level

3.5.1 Researchers with multiple contracts and the issue of FTE A research organisation considered, “A problem may arise: several ROs will register the same researcher. Who will check the correct registration of an individual researcher (to avoid multiple occurrences)?”

Another research organisation considered, “At Czech HEI conversion of headcount into FTE has to be resolved systematically. A proposal: 1 faculty member = 0.5 FTE researcher.”

Response:

In the current legal situation, there is no ‘correct’ registration of an individual researcher. Researchers are allowed to have multiple contracts and are not obliged to inform their employers on the matter.

The difficulty in the Czech R&D system to reach an accurate view on the FTE Researchers opens the door for considerable gaming. Full-Time Equivalent Researcher stands for the calculation of the time effectively dedicated to research in combination with the effective working time in the research organisation.

We covered this topic in Section 4.5.4 of the Final First Interim Report (the sub-heading FTE researchers) where we indicated the measures that we have adopted in the EM to try and limit the gaming. We also suggest measures that could be taken in order to solve this issue. We covered the topic also in the Draft Second Interim Report as one of the scenarios envisages the use of FTE researchers for normalisation purposes.

In several countries, faculty in Higher Education Institutes (HEI) are, indeed, considered to spend (as an average) half of their time for teaching and half for research. However, we feel reluctant to define this measure top-down, and only for the HEIs. There should be a general agreement on the matter.

3.5.2 The indicator on inbreeding Some research organisations considered that the assessment of the level of inbreeding is irrelevant for the assessment of the institutional management. It is accepted that “In many cases, the higher the level of inbreeding, the lower the dynamics in scientific performance”, but this should note be considered “a universal truth”. They suggest that this should be only an additional criterion, which should not be included in the assessment as a "performance indicator".



Another research organisation wondered, “Which limit is considered negative? How will this issue be evaluated?”

Response:

In the Final First Interim Report (Section 4.5.2), the level of inbreeding is included as one of the indicators that will provide the evaluation panels with information for the assessment of the HR management. This information gives the evaluators a view on the openness of the organisation towards researchers coming from other organisations and scientific cultures. As said, this is only one of the indicators, needed to complete the picture on the ‘dynamism’ of the institution that is provided through other indicators (e.g. the study visits).

3.5.3 Collecting information at the EvU rather than RU level Other organisations considered that it makes little sense asking information on the institutional environment at the level of RU (as it was indicated in the draft version of the First Interim Report). They considered that these topics – and specifically HR management, research infrastructure, and research strategy - are defined at the EvU level. They therefore insisted that in order to ensure efficiency in the evaluation process, this information should be collected at the EvU level. The expectation is that this would also ensure the value of the evaluation for the creation of strategic information to the benefit of institutional management.

Response:

We took these comments into account for the information on HR development and indicated in the Background Report that this information is to be provided at the EvU level (Q011 and Q012). The RUs will have the opportunity to indicate eventual differences of approach approaches in the RU/departments involved (Q033) compared with the EvU level.

We are reluctant to collect also the information on the research infrastructure and research strategy at the EvU level. This needs to be set against the importance of self-assessment as described in Section 3.3, above.

The risk is that if this information is asked for all RUs at the level of EvU, the response will consist in a mere listing and a formal higher-level (and therefore less detailed) description. This would be of very limited interest to the evaluation panels and would create a negative impact on the potential depth of the evaluation panels’ insight and therefore, the depth and value of their comments, conclusions and recommendations. As a result, while it would possibly create the benefit of a reduced burden for the evaluated RUs, it would considerably limit the value in strategic information that the evaluation panels would be able to provide, and therefore the value of the evaluation exercise as such.

While the list of research infrastructure can be provided at the EvU level, the view of the researchers on the adequacy of the infrastructure for the needs of research in their field (Q037) is of critical value.

The same accounts for the description of the research strategy for which there needs to be an in-depth reasoning. The instructions (see the Background report - Q038) are to provide a description covering the following topics:

• Research plan: What are the key research objectives and means to achieve these objectives? Have you defined performance indicators to measure progress?

• Development needs: Is there need for new knowledge, facilities; is the present level of funding sufficient for attaining the objectives laid down?

• Use of resources: What is the intended use of resources (human, financial, equipment) in the light of the strengths and weaknesses in the SWOT analysis and how does the RU intend to combat the weaknesses and exploit the strengths?



• The strategy in the context: Do the strategies of State and the Institution/Unit support each other? How do you take into account the possible ethical questions within research?

We doubt that the EvU can provide this information for all its RUs without involving the research groups and/or departments that are effectively active in the field.

3.6 The assessment criterion Scientific Research Excellence We have responded to the multiple comments on the eligible research outputs in Section 3.1.3 and Section 3.4.4, above. In this last section we also explained the concepts underlying the design of this criterion and its positioning in the overall EM. In this section we focus on

• The comments related to the number of outputs that can be submitted for review (Section 3.6.1), and

• The suggestion to use ERC grants as an indicator of excellence (Section 3.6.2)

3.6.1 Number of outputs submitted Various research organisations pointed out that when applying the minimum threshold of 1%, many small ROs would be able to submit 1 research output only. Other organisations objected to the 2% maximum threshold.

Response:

In the Final First Interim Report (Section 4.5.4), we improved the definition and the thresholds for the submission of the most outstanding scholarly research outputs.

We stated: “Each Research Unit will submit for review a number of research outputs that accounts for minimum 1% and maximum 2% of the total number of scholarly outputs by the researchers in the Research Unit over the evaluation period - but however no less than 3 and no more than 20.”

In relation to the upper limits, it is important that the research community understands that the intent of the criterion is to identify ‘peak quality’ and not to assess the overall quality of the research outputs, as is the case in Pillar II of the Metodika 2013-15.

As we stated in the report, this indicator is highly competitive and it is paramount that the RU performs a careful selection of the outputs to submit, taking into account the criteria of originality, significance and rigour. These are more important than bibliometric data such as the impact factor of the journal in which the article is published.

We will give examples on how the Small Pilot Evaluation panels defined these criteria in the Third Interim Report. In the full-scale evaluation, these criteria should be defined in the Evaluation Protocol, i.e. the document providing information on the subject panels, the evaluation methodology etc., which will be published when the evaluation exercise is launched.

In the Guidelines for the evaluated research organisations (Background report to the Final First Interim Report - Section 3.2), we specify that the RU are expected to select research outputs where

• RU researchers are among the main authors, preferably the main author(s)

• The publication is based on research conducted at least partly in the research organisation



• The authors are trained researchers that are employed in the EvU (so, not PhD students or visiting researchers)

Finally, the RUs should take into account the process for the final score against this criterion: the 2 reviewers will assign scores to each publication and agree on a final score per publication; the evaluation panel will decide on the final score for the assessment criterion based on the average of the scores for all submitted publications. As a result, the process and quality of the selection is of key importance if the RU wants to reach high scores: it is better to submit fewer outstanding publications than the maximum amount possible if that also includes merely ‘good’ publications.

3.6.2 ERC grants as an indication of excellence Some research organisations expressed their disappointment that the “ERC evaluation and similar high ranking systems are not mentioned at all”.

Response:

The ERC criteria of excellence are indeed recognised in the international research community as the most possible demanding criteria for excellence. However, its use for a national evaluation is not appropriate, as such and in particular seeing the performance of the Czech republic in gaining ERC grants. A minimal possible spread of the performance over the 5 scores is needed to make the assessment useful; an assessment against the ERC criteria would risk that the overall majority of the RUs would score 1 or 2, with very few exceptions.

As for the use of ERC grants as indicators of excellence, these are individual grants and including them in a PRFS risks creating a distorting effect on the R&D system, i.e. the creation of an "ERC-grant owner" market (a phenomenon that can be seen for any other indicator based on individual performance). Furthermore, while employment of an ERC grant owner is important for the prestige of an institution, it would not be appropriate to base funding of an institution on the characteristics and achievements of a single researcher in the institute.

3.7 The assessment criterion Overall Research Performance We have responded to the multiple comments on the eligible research outputs in Section 3.1.3 and Section 3.4.4, above. In this last section we also explained the concepts underlying the design of this criterion and its positioning in the overall EM. In this section we focus on

• Section 3.7.1 - Research productivity and the issue of co-publications

• Section 3.7.2 - Research productivity and the counting of books

• Section 3.7.3 - The publishing profile and the use and limits of bibliometrics

• Section 3.7.4 - The sub-criterion ‘Ability to attract PhD students’

3.7.1 Research productivity and the issue of co-publications Several organisations criticised the rulings related to the co-publications for the count-out of research productivity. They considered that co-publications among RUs in the same EvU should not be counted as one each, because this would lead to gaming. They also wondered to what extent the condition for this indication (“the publications needed to be of a clear interdisciplinary nature”) is controllable and who would be in charge of the checking.

Researchers also pointed out the issue of large collaborations (ATLAS, CMS, ALICE etc) and considered that these publications should be excluded from bibliometric reports or analysed separately.

Response:



We recognise that we overlooked the need to update the related paragraphs in the reports covering this topic.

The way the EM has been implemented in the SPE – and should be implemented in the future – is that co-publications among researchers in a single EvU (no matter their RU) are counted as a single publication.

The ruling that co-publications among EvUs count as one each, however, remains as we are reluctant to risk discouraging discourage collaborations among different research organisations.

In relation to the large collaborations, these will be separated out and specified in the bibliometrics report.

3.7.2 Research productivity and the counting of books Some research organisations also contest the fact that books were indicated as ‘counting as 4 outputs’.

Response:

Also in this case, we have overlooked the need to update the related paragraphs in the reports.

It should be noted that the assessment of research productivity is not based on pure arithmetic. The panels do not take into account the overall sum of the publications and divide that with the number of FTE researchers. Instead, they have at their disposal the breakdown of production of the different types of research outputs over the years and the sum for the evaluated period per output type. This allows the panels fully to take into account the specificity in publication profile of the discipline.

It is the panels’ task to consider whether the production of the different outputs is within the norm internationally, taking into consideration the number of researchers active in the RU.

3.7.3 The publishing profile and the use and limits of bibliometrics Some research organisations wondered, “Given that bibliometry is meant as additional information only and serves well only for some fields, how will panellists learn about the quality of outputs? Will RUs report complete references to all its outputs? Will RUs sort these lists by their perceived importance and quality?”

Others that are active in the field of SSH considered, “The publishing profile of a RU will bring only information on the number of publications and no proxy information on their potential quality. This will lead to more weight to quantity than quality.”

Response:

The quality of scientific research outputs will be assessed by means of the bibliometric data that are typically considered to be quality indicators.

In the Final Interim Report (Section 4.5.4) we recognised that this is an issue for those fields that are not sufficiently covered in the international databases (WoS/Scopus).

We are looking into possible solutions for the issue. A solution that we indicated in the report is a stronger reliance on the panel members’ expertise. Rather than proposing extended or additional reviews of selected outputs, which would possibly create a disadvantage for the research organisations in the other fields, we are looking into possible means and measures to provide the evaluation panels with information on quality of journals, conferences etc that are not covered in WoS/Scopus. Some of these are already mentioned in the report.

The RD&I Information System will play an important role in this context. We will cover both of these topics in further detail in the Third Interim Report.



3.7.4 The sub-criterion ‘Ability to attract PhD students’ The use of this sub-criterion was criticised by several organisations. Some consider, “This indicator is easy to game because there is no effectively binding ceiling on the number of PhD enrolled (and funded by the government).” Others envisage, “It will likely lead to gaming in terms of trying to accept as many students for PhD studies as possible regardless of their quality and of the quality of the PhD advisors.”

Some of these organisations recognise the relevance of indicators such as the level of investment in PhD training and the effectiveness of the PhD education and trend. Some consider that what matters most is the number of PhD grads and their placement, others find that “additionally to the statistical numbers, the panellists should check a few randomly selected theses on their level and quality”.

Response:

The sub-criterion ‘Ability to attract PhD students’ is considered an indication of the esteem that the RU has in the local research environment and in particular the upcoming generation of researchers.

The research organisations rightfully indicate that this is an indicator that can be gamed with. However, in the Final First Interim Report (Section 4.5.2) we improved the description of the data that the evaluation panels will look into. At this point, gaming with data on PhD students would be against the interest of the evaluated organisation.

The data consulted by the panels are:

• For the RU that in practice train PhD students: the number and trends of PhD students trained and the level of investment in PhD training (PhD students versus FTE researchers)

• For the HEIs: the number and trends of PhD students enrolled, level of investment in PhD training (PhD students versus FTE researchers, excluding employed PhD students), and the effectiveness of the PhD education and trend (ratio PhDs awarded/PhD students enrolled)

3.8 The evaluation results Comments and questions related to the evaluation results are grouped and responded to under the following headings:

• Section 3.8.1 - Evaluation results at the institutional (EvU) level

• Section 3.8.2 - Evaluation results at the national level

• Section 3.8.3 – Aggregation of evaluation results or synthesis of scores

3.8.1 Evaluation results at the institutional (EvU) level Research organisations wonder how the assessment at RU level will result in an evaluation report at the EvU level and what will be the form of the final output.

Response:

In the Final First Interim Report (Section 4.7) we improved the description of the final evaluation results and specified:

• The evaluation panels define the final RU scores for each assessment criterion; they do not generate an aggregated final score for the RU as a whole

• The core of the evaluation results is the justification – for each criterion – for the starred quality levels

• The panels write out conclusions at the level of RU and provide recommendations for future improvements



In the same section we also inform on the reasons why we took these decisions and the value that the evaluation results at the RU level will have as strategic information for the EvU and its management.

The Third Interim Report will contain an example of such a RU panel report that was produced for the Small Pilot Evaluation.

In Section 5.2.4 we specify that the main panel chairs will decide on the panel chair that will be charged with writing out the overview report at the EvU level. The responsible subject panel chair will draft an overview of the evaluation results for each RU in the EvU and draw conclusions and recommendations with a specific focus on the research environment criterion.

Just as there will not be an aggregated final score for the RUs, there also will not be an aggregated score for the EvU.

In the Draft Second Interim Report we explain how the scores of the RUs against the specific assessment criteria will be ‘translated’ into institutional funding.

3.8.2 Evaluation results at the national level Research organisations asked: “How the quantitative information (scores) assigned to RUs and EvUs on research performance and excellence at the level of scientific fields and disciplines will be synthesized to the level of the whole country. Will the synthesis highlight particular RU among other RUs in the same field and how it will compare to the rest of the word?”

Response:

In the Final First Interim Report (Section 5.2.3) we stipulated that the subject panel chairs will be in charge of preparing an analytical report on the state of research in their field of discipline, based on the assessment outcomes for the RUs and supported by bibliometric data at the national field level. The analytical report contains an overview of the outcomes and the distribution of the scores (on each of the five criteria) across the relevant RUs and draws conclusions and recommendations. The main panel chairs will be in charge of drafting similar analytical reports at the level of disciplinary area.

The key focus of these reports will be on an analysis of strengths and weaknesses across the different RUs; it is not the objective to define a ranking of the RUs.

3.8.3 Aggregation of evaluation results or synthesis of scores Key to our approach is that we do not see value in a unitary ranking. Instead we use the 5 assessment dimensions to create incentives for 5 different but desirable behaviours.

We are reluctant to see an overall score employed – for precisely the reasons that led the European Commission to try to build a multiple-dimension university ranking system. For this reason, we propose a panel-based aggregation of RU-level judgement to high levels.

Of course, arithmetically it is possible to aggregate the 5 RU-level scores to higher units of analysis. For example, one could use the size normalisation weights of the funding system (probably FTEs) to weight the individual RUs in order to aggregate them.

Inevitably, however, this would increase the importance of the FTE calculation in the evaluation system, which is an aspect that presents high risks for gaming (see Section 3.5.1, above).



3.9 The choice of the comparator countries Some organisations were perplexed about the choice of the countries for the analysis of international practice, i.e. the 5 ‘comparator’ countries for this study - Austria, the Netherlands, Sweden, Norway and Great Britain – complemented with information on the evaluation practices in Belgium (Flanders), Australia, New Zealand, Finland and Italy. They note, “The first evaluation report refers to the “main” first five countries very scarcely. In the critical sections, the authors have a tendency to prefer Great Britain, New Zealand, the Netherlands, Italy and Australia, i.e. mainly countries from the second group. The evaluation experience in these states is very similar to the one in Great Britain. The reasons for these preferences were not explained.”

Other organisations consider, “It is a pity that the analysis did not include other countries, namely the key players in the field of science - Germany, USA, South Korea, Taiwan, Japan and China.”

Response:

The five comparator countries and the additional countries were selected for the analysis related to all the topics covered in this study, i.e. the R&D system and governance structure, the approach to national evaluations, and the institutional funding system. The criteria for selection of these countries included the similarity in size of the country and the R&D base, the structure of the R&D governance system, the function of the evaluation exercise in the country, and/or the characteristics of the funding system, and last but not least, the availability of documentation and evidence on the direct (expected and un-expected) effects that the choices in evaluation methodology and/or funding principles had on the R&D system.

All of these countries adopted methodologies or implemented funding principles that could – in one aspect or another - constitute relevant input for the design of the evaluation system and/or funding principles.

The attentive reader will have noticed that there are substantial differences between the national evaluation systems in the countries that (apparently) we have referred to most often, including differences between the UK, New Zealand and Australia.

It is a fact that the evaluation methodology in the UK is a model for any country in the world because it is the oldest national research performance assessment system with implications for funding; it is also the best documented and the best analysed on its positive and negative effects. By no means does this imply that the UK system is merely copied, in any of the other countries. There is always a need for adaptation to the countries’ needs and strategies, the specific function of the evaluation, and the costs that the R&D system is willing to bear (see Section 3.2 in the Final First Interim Report). The proposed EM adopts these same principles.

In collaboration with