september 2014 -...
TRANSCRIPT
RESEARCH DATA MANAGEMENT PRACTICES OF RESEARCHERS IN
HIGHER EDUCATION INSTITUTIONS IN MALAWI
A study submitted in partial fulfillment
of the requirements for the degree of
Master of Science in Digital Library Management
at
THE UNIVERSITY OF SHEFFIELD
by
THOMAS MPHATSO BELLO
September 2014
i
Abstract
Background. Research funders’ policies on open access to data are influencing how research data are
being managed and shared in HEIs.
Aims. This study aimed to assess the current data management practices of researchers in Malawi’s
institutions of higher learning and their perceptions towards sharing.
Methods. A web-based survey instrument was used to collect data on the attributes of data being
produced and the data management practices. It was based on the DAF methodology developed by
DCC. The link to the questionnaire was emailed to contact people in Malawi’s universities who
forwarded it to researchers in their institutions. Reminders were sent two weeks and and one week
before the close of the survey period. The survey attracted a total of 34 respondents.
Results. Researchers in Malawi are collecting various types of data and in different formats. Some of
it is more discipline specific than others. The volumes being produced range from some megabytes to
the 50GB - 100GB region. The overwhelming majority of researchers manage their research data on
their own. Storage and backup of the data is mostly done using laptop hard disc drives, external drives
and memory sticks. Backup of the data is done as and when the researchers feel. In general, the
researchers support the idea of data sharing but do not appear to be enthusiastic about sharing their
own data. Most of them do not have data management plans.
Conclusions. The findings of this study suggest that although research in Malawi’s academia has been
going on for some time and generating data of varying types, formats and volumes, the data are
managed in a risky manner. However, the extent of these is not clear, calling for physical data audits
and more in depth face to face interviews. Detailed analyses of particular disciplines would be useful
in order to establish discipline-specific approaches and attitudes about research data management in
order to provide effective support and infrastructure to them.
Word count of Abstract: 322
ii
Table of contents
Abstract .................................................................................................................................................................... i
List of figures ........................................................................................................................................................... v
List of tables .......................................................................................................................................................... vii
Acknowledgements ...............................................................................................................................................viii
CHAPTER ONE: INTRODUCTION .............................................................................................................................. 1
1.0 Introduction and context ........................................................................................................................ 1
1.1 Research aims and objectives ................................................................................................................. 4
1.1.1 Aim of study .................................................................................................................................... 4
1.1.2 Specific objectives ........................................................................................................................... 4
1.1.3 Research questions ......................................................................................................................... 4
1.1.4 Significance ..................................................................................................................................... 5
CHAPTER TWO: LITERATURE REVIEW ..................................................................................................................... 7
2.0 Introduction ............................................................................................................................................ 7
2.1 Funder’s requirements ............................................................................................................................ 7
2.2 Academic librarians and research data support ................................................................................... 11
2.3 Researchers’ data management practices ............................................................................................ 12
2.3.1 Storage .......................................................................................................................................... 13
2.3.2 Data policies: ................................................................................................................................. 13
2.3.3 Long-term preservation ................................................................................................................ 14
2.3.4 Training/advice .............................................................................................................................. 14
2.4 State of RDM in Africa ........................................................................................................................... 15
2.5 Conclusion ............................................................................................................................................. 16
CHAPTER III: RESEARCH METHODOLOGY ...................................................................................................... 18
3.0 Introduction .......................................................................................................................................... 18
3.1 Research design .................................................................................................................................... 18
3.1.1 The Data Asset Framework ........................................................................................................... 19
3.1.1.1 Planning the audit ......................................................................................................................... 19
3.1.1.2 Identifying and classifying assets .................................................................................................. 22
3.1.1.3 Assessing management of data assets .......................................................................................... 22
3.1.1.4 Reporting and recommendations ................................................................................................. 22
3.2 Survey Population ................................................................................................................................. 22
3.3 Data analysis ......................................................................................................................................... 23
CHAPTER IV: RESULTS ........................................................................................................................................... 24
4.0 Introduction ................................................................................................................................................ 24
iii
4.1 Demographics of respondents .............................................................................................................. 24
4.2 Details of research data ........................................................................................................................ 25
4.2.1 Data Categories ............................................................................................................................. 26
4.2.2 Data storage media ....................................................................................................................... 26
4.2.3 Formats ......................................................................................................................................... 28
4.2.4 Research data volumes ................................................................................................................. 28
4.2.5 Use of data management plans .................................................................................................... 29
4.3 Responsibility for data management .................................................................................................... 32
4.4 Experiences with loss of research data ................................................................................................. 33
4.5 Issues with storage ................................................................................................................................ 35
4.6 Research Data Backup ........................................................................................................................... 36
4.6.1 Backup media ................................................................................................................................ 37
4.7 Researchers’ perceptions on sharing research data ............................................................................. 38
4.8 Research data ownership ...................................................................................................................... 40
4.9 Research data sharing ........................................................................................................................... 41
4.10 Experience with research council mandating data sharing .................................................................. 45
4.11 Free text responses given for question 24 are in Appendix C8. ........................................................... 47
CHAPTER V: DISCUSSION....................................................................................................................................... 48
5.0 Introduction .......................................................................................................................................... 48
5.1 Attributes of the research data ................................................................................................................... 48
5.1.1 Data categories .................................................................................................................................... 48
5.1.2 Databases ...................................................................................................................................... 49
5.1.3 Image data..................................................................................................................................... 49
5.1.4 Audio data ..................................................................................................................................... 50
5.1.5 Video data ..................................................................................................................................... 50
5.1.6 Data volumes ................................................................................................................................ 50
5.2 Management of research data .................................................................................................................... 51
5.2.1 Storage media used ....................................................................................................................... 51
5.2.2 Research Data Backup ................................................................................................................... 52
5.2.3 Research data sharing ................................................................................................................... 53
5.2.4 Sharing by discipline ...................................................................................................................... 54
5.2.5 Hindrances to sharing ................................................................................................................... 54
5.2.6 Long-term preservation of research data ..................................................................................... 55
5.2.7 Issues with day to day management of data and support needs ................................................. 56
5.2.8 Experience with Data Management Plans .................................................................................... 57
5.2.9 Researchers’ specific concerns ............................................................................................................ 58
iv CHAPTER VI: CONCLUSION .................................................................................................................................... 60
6.0 Introduction .......................................................................................................................................... 60
6.1 Summary of findings of the study ......................................................................................................... 60
6.2 Contribution .......................................................................................................................................... 63
6.3 Limitations of the study ........................................................................................................................ 63
6.4 Recommendations for further research ............................................................................................... 64
CHAPTER VII: REFERENCES .................................................................................................................................... 65
Appendices ............................................................................................................................................................ 74
Appendix A1: Ethics Proposal ........................................................................................................................... 74
Appendix A – Ethics documentation ..................................................................................................................... 74
Research Ethics Review Declaration ................................................................................................................. 79
Appendix A2: Ethics Information Consent Form ............................................................................................... 81
Appendix A3: Ethics Approval ........................................................................................................................... 84
Appendix B: Copy of questionnaire ...................................................................................................................... 85
Research Data Management Practices of Researchers in Malawi ........................................................................ 85
About You .......................................................................................................................................................... 85
Details of your research data ............................................................................................................................ 85
Research data storage ....................................................................................................................................... 89
Research Data Backup ....................................................................................................................................... 90
Research data sharing ....................................................................................................................................... 92
Conclusion ......................................................................................................................................................... 94
End of questionnaire ......................................................................................................................................... 94
Appendix C – Additional Survey Results ............................................................................................................... 95
Appendix D – Letter of introduction ................................................................................................................... 102
Access to Dissertation ................................................................................................................................. 104
CONFIRMATION OF ADDRESS ............................................................................................................................. 106
Alumni Information ............................................................................................................................................. 107
First Employment Destination Details for School Records ................................................................................. 107
v
List of figures
Figure 1: Distribution of respondents by research role
Figure 2: Do you currently hold or have you ever held any research data?
Figure 3: Categories of the electronic data created in respondents’ research fields
Figure 4: A graph showing responses to question 9: estimating of how much electronic research data
currently held/maintained by respondents.
Figure 5: Summary of responses to Question 10: “Do you currently have a data management plan for
your research data?”
Figure 6: Chart showing responses to question 8b on reasons for not developing DMPs
Figure 7: Responses to question 11 “Who, if anyone, is responsible for managing your electronic
research data?”
Figure 8: Question 12: Have you ever lost research data which was not backed up?
Figure 9: Ways in which data loss occurred
Figure 10: Question 13: Have you ever experienced any problems storing your research data due to the
size of the files?
Figure 11: Summary of answers to question 14 “On average, how frequently is your data backed up?”
Figure 12: Question 4b. Where are they backed up?
Figure 13: Responses to question 15 “If the service was offered, would you want your university's
repository to store any of your research data, either for your exclusive use or for wider access?”
Figure 14: Question 18. Do you share ownership of any of your research data with others?
Figure 15: Question 19. How do you currently share research data with colleagues?
vi
Figure 16: Question 20. What problems have you encountered when sharing data with colleagues?
Figure 17: Question 22. What factors would prevent your research data from being made open access
to the general public?
Figure 18: Question 23. Have you ever applied for funding from a body that required some degree of
open access to be provided for your research data?
Figure C1: Research data categories by discipline
Figure C2: Responses to question 8 “What formats/software do you use for your electronic research
data?”
Figure C3: Responses to Question 8a “If you store data in databases, please select the primary program
you use:”
Figure C4: Primary format of images
Figure C5: Primary format of audio
Figure C6: Primary format of video
Figure C7: 14a. What data tends to be backed up?
vii
List of tables
Table 1: Distribution of respondents by institutional affiliation
Table 2: Responses to Question 7 "What are the principal media on which your research data are
stored?"
Table 3: Use of DMPs by discipline
Table 4: Summary of responses to question 10a on main drivers for developing data management
strategies
Table 5: Question 16: If yes, how long would you want the repository to retain any of your research
data, including data only accessible by you?
Table 6: Question 17. Who owns the research data you hold?
Table 7: Question 21. Apart from yourself, who would you want to be allowed access to your research
data?
Table 8: Re-tabulation of Table 7 data
Table 9: Question 23b. Have you ever experienced difficulties in meeting these requirements?
Table C1: Other file formats/software being used by respondents and their areas of application
viii
Acknowledgements
This work has been possible as a result of sponsorship from Kamuzu College of Nursing of the
University of Malawi.
My supervisor Dr. Andrew Cox, deserves special recognition for introducing me to the module
Research Data Management, in addition to several others, which I enjoyed and for guiding me
throughout the dissertation process which is based on that module. The mistakes are my own.
My wife Alice and little angels Mulinde, Becky and Dzalo: Thank you for putting up with dad’s
absenteeism. Your strength kept me going.
1
CHAPTER ONE: INTRODUCTION
1.0 Introduction and context
Over the past decade, there has been a huge interest in the area of research data management
(RDM). Many universities in developed countries such as Australia, United Kingdom and the
United States are now increasingly and actively engaged in RDM (Groenewegen & Treloar,
2013). New job posts have been created to support researchers in managing their research
data throughout the research lifecycle (Pryor & Donnelly, 2009). Academic libraries for
example, have recently repositioned themselves strategically by aligning their service
offering to include supporting RDM on their campuses (Corrall, Kennan, & Afzal, 2013).
Data management training materials for different subject areas have been developed by
organisations such as DCC and ANDS (ANDS, 2014a; DCC, 2014a) and universities such as
The Australian National University1 to support researchers. Some of the training materials
such as RDMRose2 and MANTRA3 have focussed on providing continuing professional
development to librarians and research support staff to equip them to effectively support
researchers (Jones, Pryor, & Whyte, 2013). Furthermore, a range of data repositories to store
and preserve research data assets over the long term have also been deployed (Jones, 2014: p.
103).
To signify the importance of RDM and data curation issues, the DCC runs a special bi-annual
electronic journal, the International Journal of Digital Curation4, specifically to report on such
and related issues (DCC, 2014b).
1 http://anulib.anu.edu.au/_resources/training-and-resources/guides/DataManagement.pdf 2 http://rdmrose.group.shef.ac.uk/ 3 http://datalib.edina.ac.uk/mantra/ 4 http://www.ijdc.net/
2
One of the major forces driving this interest in RDM in institutions of higher learning is the
requirement by funding bodies for research grant applicants to outline data management plans
in their applications for funding (ANDS, 2014b; EPSRC, 2014; NSF, 2010; RCUK, 2011).
Adhering to these funder policy regulations has been a challenge for researchers and they do
need support (Pryor, 2012). Higher education institutions such as those in the UK have
responded by putting in place RDM policy frameworks5 in order to comply with the
mandates by funders with the purpose of continuing to attract research funds (Jubb, 2007).
Among other support service providers, academic librarians are considered as important
stakeholders in the area of managing research outputs because of the various skills and
competencies inherent in their profession (Jones, Pryor, & Whyte, 2013; Michener et al.,
2012). The librarians are taking advantage of this emphasis on RDM to demonstrate their
value and are working in collaboration with IT and Research Offices in providing these data
support services (Corrall, Kennan, & Afzal, 2013; Pryor, 2014).
In a bid to provide effective research data support services, several studies have been
conducted in various universities to understand how researchers are managing their research
data and what their perceptions towards sharing of such data are (Jones, Ball & Ekmekcioglu,
2008; Martinez-Uribe, 2009; Alexogiannopoulos, McKenney & Pickton, 2010; Rice &
Haywood, 2011). Other studies have investigated academic libraries to identify the support
services they are offering or planning to offer to researchers in research data stewardship
(Corrall, Kennan, & Afzal, 2013). All these activities have been taking place in the western
world.
5 http://www.dcc.ac.uk/resources/policy-and-legal/institutional-data-policies
3
In Malawi, various academic institutions have been involved in different types of research for
some time. For example, the University of Malawi and Lilongwe University of Agriculture
and Natural Resources have several research centres which have been conducting different
types of research for many years. Research Director positions have been established in almost
all Malawi’s HEIs in the last few years to lead in the management of institutional research
efforts to help the institutions attract more research funding.
The government body that is mandated to promote and coordinate science and technology
activities through funding local research in Malawi is the National Commission for Science
and Technology (NCST). Regarding data management, NCST’s published guidelines for
social sciences and humanities research (NCST, 2011) stipulate that “researchers shall ...
allow others to have access to their research data” and “institutions shall establish research
data banks and repositories ... to facilitate availability and access by other users.” They also
state that the general format of research proposals should among other things include “data
management ... methods”. With a history and special interest in research in academic
institutions and the stipulations by NCST and against the background of heightened interest
in RDM in the more highly resourced countries, it is not clear how the data arising from these
research efforts are being managed both during the active stages and beyond the life of
research projects because the practices are not documented. Where this is not known,
wasteful duplication of research efforts by collecting data that was already collected in past
projects is inevitable. RDM has been proved to be advantageous because among other things,
it prevents such wasteful duplication, promotes sharing of research data, especially in this age
where research is increasingly becoming global and taking multi-disciplinary approaches
(Michener et al., 2012).
4
1.1 Research aims and objectives
1.1.1 Aim of study
The aim of this study is to assess the status of current Research Data Management practices
of researchers in Malawi’s higher education institutions.
1.1.2 Specific objectives
The objectives of this study are:
To understand the characteristics, types, and volumes of the research data being
generated by researchers in academic institutions in Malawi.
To assess the methods that the researchers are using to store and backup their research
data
To understand how the researchers share their data and what their perceptions towards
data sharing are.
To understand the support needs they have to enable them to effectively manage their
data throughout the research life-cycle
To identify the issues they face in the day to day management of these data
1.1.3 Research questions
This study seeks to answer the following questions:
1. What are the attributes and volumes of research data that researchers in higher
education institutions in Malawi are generating?
2. How are they managing their research data?
5
3. How do researchers in Malawi share their data and how do they perceive the notion of
research data sharing?
4. What challenges do they experience when managing their research data on a day-to-
day basis?
1.1.4 Significance
This study provides a picture of the current practices of managing research data by
researchers in Malawi’s universities. It also helps to identify the research data support needs
that researchers have. This knowledge is important because it could be used in reshaping
some of the academic support services such as the library and IT to effectively support
researchers, thereby making a significant contribution to the whole institutional research
enterprise in the higher education sector. Knowing the types and volumes of data being
generated during research projects, for example, is useful in determining present and future
research data storage needs. Likewise, issues raised in this dissertation could influence
managers in how they budget and plan for training, recruitment and infrastructure in their
institutions to ensure that researchers are effectively supported and research data is
safeguarded.
In terms of data management and sharing, the study will raise awareness of the gaps that exist
in the local policies that deal with research practices such as the guidelines for Social Science
and Humanities research by NCST6 and College of Medicine Research and Ethics Committee
(COMREC) research proposal format7. Therefore, this study has the potential to influence
policy formulation for good RDM at the institutional and national level.
6 http://www.ncst.mw/wp-content/uploads/2014/03/NATIONAL-FRAMEWORK-OF-GUIDELINES-IN-SSH.pdf 7 http://www.medcol.mw/comrec/COMREC_format.doc
6
It is good scientific practice that research is validated (European Science Foundation, 2000)
as this helps prevent fraud by academic researchers who are under pressure to ‘publish’ that
they may not ‘perish’. Validation is easier when good RDM is practised in the institutions
because the data and all contextual information regarding it such as metadata for each
research project is available. It is hoped that this will be the long-term contribution of this
study. Similarly, with funding going to researchers who comply with funders’ data
management and sharing requirements, good RDM is one area that can ensure the flow of
research income into academic institutions. This is another area that this study has the
potential of contributing in ultimately.
It is hoped also that the act of responding to the questionnaire itself will raise awareness of
the importance of having well laid down procedures for managing research outputs (Jerrome
& Breeze, 2009; Parsons, 2013). Stakeholders such as principal investigators and research
directors could use this awareness to initiate promotional activities that are aimed at the
effective stewardship of digital research data in their institutions.
The global nature and multidisciplinarity of 21st Century research mean that researchers and
all research stakeholders everywhere including Malawi need to be knowledgeable in good
data management practices. Owing to its novel nature, this study will form a baseline of
current RDM practices of researchers in Malawi’s higher education sector. Therefore, it has
much to contribute to future research and training development for LIS, IT and research staff
in Malawi in addition to informing policy.
7
CHAPTER TWO: LITERATURE REVIEW
2.0 Introduction
Funders, academic librarians and researchers are some of the major players in the realm of
research data management. This literature review focuses on some of the pertinent
stipulations in funder’s data policies and the debates around them, librarians’ data support
services and the data management practices of researchers. This last theme is the longest
because it is the main focus of this project. It is important to understand the interplay between
the policies, support services and practices of researchers to have a holistic view of how this
area that has become an essential part of the modern research enterprise is unfolding. Lastly,
present RDM efforts in Africa have been discussed to establish context for this research.
Funders’ policies have been the main driving force behind the uptake of RDM. This has seen
HEIs formulating their own data policies and strategies which have influenced some changes
in how research stakeholders are working. Among these include the creation of new research
support roles in the library and IT professions, investments in storage infrastructure and
development of RDM training and guidance for researchers. Following the growing interest
in RDM, a number of studies have been conducted to understand researchers’ practices,
perceptions and requirements so that effective data support services could be tailored for
them.
2.1 Funder’s requirements
In countries such as the United Kingdom, Australia and the United States, most funders have
put in place policies mandating research grant applicants to include data management plans in
their applications addressing a number of issues including sharing and preservation of
datasets (Pryor, 2014). Thus the applicant’s ability to clearly state the type of data he or she
will create, how they will be maintained and shared and explain reasons why the data might
8
not be shared where sharing is not appropriate or expected determines whether or not he or
she is able to secure the grant.
Research data which is generated using tax payers’ money has been described as a “public
good, produced in the public interest” and should therefore “be made openly available with as
few restrictions as possible …” (RCUK, 2011). Access to such research outputs maximises
returns from government investment (OECD, 2007). In the United States, the major funding
agencies that have influenced the data management landscape in academic institutions
include the National Science Foundation (NSF), the National Institutes of Health (NIH) and
the National Endowment for the Humanities (NEH). NSF, for example, declared that for
proposals submitted after January 18, 2011,
“investigators are expected to share with other researchers, at no more than
incremental cost and within a reasonable time, the primary data … created or gathered
in the course of work under National Science Foundation grants” (NSF, 2010a).
Similarly, the Australian National Data Service (ANDS) has outlined requirements for data
management planning for grant applicants in Australia. On data reuse, ANDS aims at
“transforming Australian research data from being a single use research output to a
continually reusable resource” (ANDS, n.d.). In all these policies, data sharing is among the
recurring themes as one of the advantages of managing research data. This seems to address
one of The Royal Society’s propositions against hoarding of research data which states that
there must be “a shift away from a research culture where data is viewed as a private
preserve” (The Royal Society, 2012). The International Human Genome project has been
hailed as one of the good demonstrations of large-scale international research efforts in which
different users globally successfully use openly accessible data for various purposes (OECD,
9
2007). Regarding this project, the Human Genome Organisation (HUGO) Ethics Committee
(2002) described the human genomic databases as “global public goods”, a description which
bears strong similarity to RCUK’s reference to publicly funded research data (2011).
All the major research funders in the UK are proponents of sharing of data sets with domain
specific funders tending to expect varying degrees of sharing to accommodate their
contextual settings and regulatory requirements. MRC, for example, requires that ethical,
legal and institutional considerations should be addressed before sharing of research data
takes place (Medical Research Council, 2011). This is in keeping with RCUK’s advocacy for
balance between return on public investment and threats to infringement of confidentiality
rights of research subjects (RCUK, 2012). Likewise, funders such as EPSRC and ESRC
recognise that there may be circumstances where withholding of research data is justified and
they require that where such issues arise, reasons for restricting access should be given and
the associated metadata should also state those reasons in addition to stating the requirements
that should be fulfilled in order to permit access to such data (EPSRC, 2011; ESRC, 2010).
While AHRC recognises that there may be special circumstances for prohibiting access to
data, it further envisages cases where charging for access could be justified (AHRC, 2014).
ESRC also clearly emphasises the data citation responsibilities of those who publish by re-
using such data (ESRC, 2013).
Borgman (2012) cautions, however, that research data sharing is a complex issue because of
different perceptions by different researchers, types of data and a variety of contexts,
referring to it as a “conundrum”. She notes, for example, that:
“Some of those data may be in sharable forms, others not. Some data are of
recognized value to the community, others not. Some researchers wish to share all of
10
their data all of the time, some wish never to share any of their data, and most are
willing to share some of their data some of the time”.
It seems that funders have anticipated such complexities and attempted to address these
uncertainties by being flexible in their policies, one example being the requirement for
researchers to provide justification where sharing is not possible. NSF (2010b) for example,
concedes that “what constitutes reasonable data management and access will be determined
by the community of interest through the process of peer review ...”, a flexibility which takes
into account disciplinary approaches that may exist or evolve with time. In addition, studies
continue to be carried out with the aim of understanding the discipline-specific approaches to
and perceptions towards data management (Akers & Doty, 2013). This understanding is
useful for planning storage infrastructure requirements and tailored support services for
research data management.
OECD (2007) has identified validating or verifying research as one of the rationales for
sharing data. As career progression in academia is based largely on continued publishing
among other things, it is of utmost importance to guard against scientific malpractices by
academics who may be tempted to fabricate or falsify their data because they are anxious to
publish for purposes of promotion in their job. Cases of data fraud that occurred in The
Netherlands between 2011 and 2012 have been reported, the most outstanding of them being
in the field of social psychology (Doorn, Dillo, & van Horik, 2013). While one of the
committees that were instituted to investigate the fraud admitted that there might be
justifiable reasons for the “selective omission of exceptional scores” that was observed, it
lamented that these were not documented (Levelt Committee, Noort Committee, & Drenth
Committee, 2014). This underlies the importance of depositing research data together with all
their related metadata and descriptions to aid future researchers to make sense of and reuse
11
the data effectively. In this reported case of data fraud, the available primary data were used
to verify the published results and it was indeed proved, through re-calculation of statistics
such as means and standard deviations that the data had been “massaged” in addition to
revealing the research culture that prevailed behind the publications.
As a result of these fraudulent activities, research validation and data management have been
strengthened in the three institutions where these malpractices took place and in the whole
country as a way of clearing the bad image that has been associated with the field of
psychology research and also not to jeopardize employment opportunities for young
psychology graduates in the job market among other reasons. This has also raised awareness
of the importance of research data management in disciplines which are known to lag behind
in this area such as psychology (Doorn, Dillo, & van Horik, 2013).
2.2 Academic librarians and research data support
Studies on academic libraries’ engagement in RDM have been conducted with the aim of
understanding support service, curriculum and training requirements for information students
and professionals (Halbert, 2013). Corrall, Kennan & Afzal (2013) surveyed bibliometric and
data support activities of 140 libraries in Australia, New Zealand, Ireland, and the United
Kingdom. Tenopir, Birch & Allard (2012) assessed the prevailing state of and future plans for
research data services in a random stratified sample of academic libraries with membership in
the Association of College and Research Libraries (ACRL) in the United States and Canada.
Both studies found that a relatively small number of libraries were offering these services and
reported that more were planning to offer these in the future, attributing this to the relatively
new nature of this service offering in both cases. However, these future plans seemed lower
in Ireland by comparison, citing “slower development of data management policies by
national research funding bodies” as the reason. Similarly, in the North American study,
12
libraries in institutions that received NSF grants were found to offer remarkably more data
support services than those that did not, reinforcing the notion that funders’ policies have
significantly contributed to institutional involvement in RDM. The Australia-Europe study
also reported that skills in bibliometrics and RDM along with understanding the research
environment by the librarians, were identified as areas that needed to be addressed in order to
provide effective research data services. The North American study reported that in some
libraries, most staff have been or plan to be reassigned to new data roles while other libraries
are hiring or plan to hire new staff members. This is a clear depiction of how academic
librarians are demonstrating their worth in their institutions.
2.3 Researchers’ data management practices
The science enterprise has become increasingly “data intensive” and more collaborative
(NSF, 2010c) as a result of the unprecedented deluge of data being collected, analysed, re-
used and preserved due to advancements in computational and communications technologies
(Borgman, 2012; Institute of Medicine and National Academy of Sciences, 2009). This
means that data sharing among researchers has now become more important than ever
(Tenopir et. al, 2011).
Several studies aimed at understanding researchers’ practices of and perceptions towards
RDM have been conducted in a number of universities in the western countries. The Data
Asset Framework (DAF) methodology, described in detail in the methodology chapter of this
study, is one of the tools that is being used in HEIs to identify, locate, and assess current
practices of researchers in the management of research data and understanding their
perceptions towards RDM (Jones, Ball & Ekmekcioglu, 2008). The DAF is a tool which was
designed by DCC to audit data assets and identify prevailing data management practices of
researchers.
13
The DAF was piloted May – July 2008 at the Universities of Edinburgh (School of
GeoSciences), Glasgow (Department of Archaeology) and Bath (Innovative Design and
Manufacturing Research Centre (IdMRC), a research group within the Department of
Mechanical Engineering) (Jones, Ball & Ekmekcioglu, 2008) where it largely involved a
series of interviews with researchers. A fourth pilot audit in the series followed at King’s
College London (KCL) during October-December 2008 focusing on researchers from the
Centre for Computing in the Humanities (CCH) (Jones & Ross, 2009). Despite their differing
disciplines and contexts, the data audit of the first three institutions and a similar audit a year
later at the University of Oxford (Martinez-Uribe, 2009) revealed similar issues centring on
storage, data policy and preservation.
2.3.1 Storage
Most of the pilot audits reported that researchers complained of insufficient storage with
several observed cases of researchers storing their data on local hard drives of their laptops or
PCs and on memory sticks. Only very few were reported to have a well-established data
backup plan although most of them knew the consequences of not backing up their data
frequently and some of them had reported having experienced data loss or irretrievability due
to corruption of CDs. Studies have shown that when researchers store data themselves, it
tends to get lost, more especially over the long term (Vines et al., 2014; Wicherts, Borsboom,
Kats, & Molenaar, 2006). This has implications on efficiency in terms of the utilization of
funds, time and data reuse.
2.3.2 Data policies:
Data management policies were found to be non-existent in most cases. Where pockets of
good practice for file naming and version control were observed, these were marred by ad
14
hoc approaches due to a lack of standardisation. The result of this is that collaboration among
researchers is hard to achieve because it is difficult to know the location and correct version
of the data sets.
2.3.3 Long-term preservation
The data audits reported that infrastructure for long-term preservation was not provided in the
piloted institutions and no person was assigned the responsibility over data management
meaning that there was no way of knowing what data assets existed, an issue which is
aggravated when researchers leave these institutions. Researchers at the University of Oxford
were reported to have indicated that one of their top requirements was infrastructure that
would allow publication and long-term preservation of research data.
2.3.4 Training/advice
Another recurring item reported in all these audits was the call by researchers for advice on
practical issues related to managing research data across the research life cycle because they
recognised the importance of managing their research data properly and the risks of not doing
so. This strongly agrees with a study by Tenopir, Birch and Allard (2012) which reported that
researchers faced challenges to manage their data assets properly and responsibly because of
lack of time and funding and they wanted other units to lead in championing research data
services. Most researchers have a desire to preserve their data beyond the life of their projects
but fail to do so because of lack of archiving mechanisms (Alexogiannopoulos,, McKenney
& Pickton, 2010). This is an area where services of IT professionals could be useful and
university management needs to massively invest in.
15
Unlike the other four, the audit at KCL discovered good data management practice in which
its online list of projects was described by the auditors as ‘overwhelmingly of curated digital
assets’. Project directories were found to be well organised by project from the outset,
making it easy for data to be located and its context understood. Researchers in the centre
were said to know what was expected of them in terms of data management and were
committed to ensuring that the results of their data were far reaching. One could be left with
the impression that this was the case because researchers in the centre were from a computing
background and therefore, knowledgeable in managing their digital data assets. The centre
could serve as an exemplar of good practice of research data stewardship for other disciplines
in the institution and beyond.
Other early adopters of DAF include Imperial College London (Jerrome & Breeze, 2009),
Southampton (Gibbs, 2009), University of Northampton (Alexogiannopoulos, McKenney &
Pickton, 2010) and University of Oregon (Westra, 2010) in the USA with the University of
Nottingham in UK as one of the recent adopters of DAF (Parsons, 2013).
The DAF methodology of auditing data assets has gained wide acceptance as evidenced by
the number of studies that have followed the initial 2008 pilots which have taken place at the
Universities of Bath (Jones, 2011), Glasgow and Cambridge (Ward, Freiman, Jones, Molloy
& Snow, 2011), Edinburgh (Rice & Haywood, 2011) and Oxford (Wilson, & Jeffreys, 2013).
2.4 State of RDM in Africa
South Africa, a pacesetter in many areas on the African continent, seems to be preparing to
institutionalise RDM. This can be seen by the increase of RDM activities on its university
campuses. For instance, the University of Cape Town had a 2-day intensive RDM workshop
facilitated by DCC staff in late March 2014 (DCC, 2014). Similarly, in July 2014, the
16
University of South Africa conducted a 2-day LIS Research Symposium8 in which the first
day was dedicated to presentations and discussions on RDM. Themes that were discussed at
this gathering included “libraries as part of the research infrastructure of the academic
institution”, “scientific data curation, citation and scholarly publication”, “developing an
institutional research data management plan” and “research data management and
institutional repositories”. On a deeper level, the Division of Epidemiology and Biostatistics
at the University of the Witwatersrand has introduced a new masters programme in Research
Data Management9 that aims to produce graduates who are able to lead data management
teams and integrate RDM activities at all stages of the research life cycle.
Unlike the developed countries where RDM is becoming fully entrenched, there are currently
no requirements for data management and sharing plans by research funding agencies in
South Africa as part of the grant application process (DCC, 2014; Pienaar, 2010). Without
doubt, this is the case in many African countries such as Malawi.
2.5 Conclusion
As research is becoming increasingly data-intensive, cross-disciplinary and global (Michener
et al., 2012), the workshops and MSc programme in South Africa are an indication of how
universities there are preparing for the global phenomenon that RDM is becoming. There is
need, therefore, for academic institutions in Malawi to start preparing institutional RDM
policies and services too. This study one step in that direction as it attempts to uncover
researchers’ current data management and sharing practices in selected institutions of higher
learning in Malawi. The research, which acts as a baseline study, attempts to identify the
attributes and volumes of the data that researchers are generating and what they do with the
8 http://www.unisa.ac.za/Default.asp?Cmd=ViewContent&ContentID=96779 9 http://www.wits.ac.za/10568/#MSCinfectiousepi
17
data at the end of their research projects. It also highlights the challenges being faced by
researchers in the day-to-day management of their research data and captures some of their
thoughts on data management.
18
CHAPTER III: RESEARCH METHODOLOGY
3.0 Introduction
This study aims to identify the ways in which researchers in Malawi’s higher education
institutions are currently managing their research data. To achieve this, the study has
attempted to answer the following questions:
What are the attributes and volumes of research data that researchers in higher
education institutions in Malawi are generating?
How are they managing their active data and what do they do with the data at the end
of the research projects?
How do researchers in Malawi share their data and how do they perceive the notion of
research data sharing?
What challenges do they experience when managing their research data on a day-to-
day basis?
This chapter outlines the research design that has been used in this study, detailing how the
DAF has been employed and mapped to the phases of the present study. It also discusses the
study population and the data analysis approaches that have been adopted.
3.1 Research design
To answer the research questions, a cross-sectional survey methodology was used. Cross-
sectional research designs are aimed at studying phenomena of interest at a single point in
time (Babbie, 1979; Bryman, 2012; Wagenaar & Babbie, 2004). This was based on a
modification of DCC’s Data Asset Framework (DAF) methodology (DCC, 2009) described
in detail in the subsections below.
19
Studies that are based on the DAF can be termed as descriptive because they describe
researchers’ practices of managing their research data. Descriptive studies describe
populations of interest with respect to some phenomenon (Babbie, 1979).
DCC encourages implementers of the DAF to modify the methodology to account for the
specific contexts in which they are being applied (DCC, 2009). This is what previous data
audits in other institutions have done (Alexogiannopoulos, McKenney, and Pickton, 2010)
and this was also done in this study.
3.1.1 The Data Asset Framework
According to DCC (2009) the DAF is a collection of methods whose purposes are to
discover the data assets being created and held within institutions
assess how the data are managed, shared and preserved
identify any threats to the data
discover researchers’ perceptions towards data creation and sharing and
provide suggestions on improvement of prevailing practice
The framework takes a four-stage approach but encourages flexibility in its application in
order to accommodate the specific needs of the institution being studied. The stages are
planning, identifying and classifying assets, assessing management of data assets and
reporting and recommendations.
3.1.1.1 Planning the audit
This stage involves “planning, defining the purpose and scope of the survey and conducting
preliminary research”. The purpose of the survey has been defined in Chapter I. In terms of
scope, this study has been set to focus only on researchers working in academic institutions in
20
Malawi. Planning for this study started at the proposal stage. It is during this time that key
contact people were sought in advance to help with the dissemination of the email containing
the link to the questionnaire in their institutions. This was done two months before the study
to ensure timely responses which was crucial to the completion of the study and also for
wider dissemination of the instrument with the understanding that respondents are more
likely to respond to a questionnaire sent by someone they know than a stranger (Barnes,
2001).
The research was conducted by collecting data using a web-based survey instrument. The
questionnaire, which was set not to collect respondents’ personal details, was designed using
Google Drive Forms. This method of collecting data is cheaper and quicker to administer
than other methods such as interviews or semi-structured interviews (Bryman, 2012). This
was desirable in this case because the respondents were “geographically widely dispersed”
(Bryman, 2012, p.233), the researcher far away and time was of the essence. The absence of
interviewer in the process of responding to the questionnaire also helps to eliminate any
influence on the responses caused by his or her presence, in addition to convenience on the
part of the respondents (Bryman, 2012, p.233) in that they are able to respond at a time and
place of their choice. The limitations of this method though are low response rates and
difficulty to ask many questions to avoid respondent fatigue (Bryman, 2012, p.235).
The survey instrument was disseminated via email on 25 June 2014 to the contact persons
who had been identified earlier. These in turn forwarded the email to researchers in their
institutions. After two weeks, the contacts were requested to send follow up emails. In
addition, some personalised emails were also sent by this researcher to researchers known to
him in order to increase the response rate. The ethics approval letter from the Information
School’s Ethics Committee and a letter of introduction written by the supervisor of this study
were attached to these emails.
21
The survey comprised a total of 24 questions divided into four sections: demographics,
details of research data held, data storage, backup and research data sharing. See Appendix B.
The first page of the questionnaire contained ethics and consent information which
participants had to read before participating in the study. By clicking ‘Continue’ to proceed to
the next pages, participants agreed to take part in the survey. As is the case with many
questionnaires, the first question asked for demographic details of the respondent to enable
the research to obtain a profile of the participating group.
All the questions except the final one were closed-ended, multiple choice or tick-box based.
This is helpful for study participants because it helps to save the time they take to type in
their responses. This is also helpful to the researcher during data analysis because uniform
responses are more easily grouped and analysed. In addition to the answer options that came
with the questions, many of the questions provided an ‘other’ option where the survey
participants specified their own responses if they were not included on the lists of provided
responses. The final question was open-ended to afford the participating researchers an
opportunity to discuss in more depth any related issues they had in mind. The drawback with
this, though, is that sometimes respondents provide irrelevant answers.
There was no pre-test of the survey instrument due to limitation of time. Since DAF has been
extensively and successfully used elsewhere, this could be an assurance that it was a proven
survey instrument, though it was being applied in a different setting. Pretesting questionnaires
is important because it helps to keep errors to a minimum (Wagenaar & Babbie, 2004, p.
156).
In many data asset surveys that use DAF, a sample of the researchers who participate in the
survey is further interviewed face-to-face for an in-depth understanding of the data assets
22
being held and more qualitative information on practices, needs and perceptions. This has not
been done in this study because time was limited and it was not practical to do so.
3.1.1.2 Identifying and classifying assets
The data collected in the survey was used to identify and classify the data assets being
generated and held by researchers in Malawi. However, this step is more suited to physical
identification than a survey. The questionnaire did not only focus on researchers who held
active data, but also those who once held data. This was done to gather as much data as
possible, knowing that research projects come and go, so researchers will hold data at some
point in their careers.
3.1.1.3 Assessing management of data assets
The data analysis phase of this study has assessed the management and sharing practices of
the research data assets and documented these in the results chapter.
3.1.1.4 Reporting and recommendations
Results of this study have been analysed and reported in the discussion chapter.
Recommendations have been made and also included in the conclusion chapter.
3.2 Survey Population
The population under study comprised researchers from the University of Malawi, Lilongwe
University of Agriculture and Natural Resources (LUANAR) and Mzuzu University. These
institutions were chosen because they have been involved in research for a long time and
represent a diversity of disciplines which served to identify any domain specific practices and
perceptions. These disciplines include agricultural, social and mathematical sciences.
23
3.3 Data analysis
Analysis of data has taken both quantitative and qualitative approaches. Quantitative analysis
has comprised calculation of percentages, proportions and means. Some data have also been
presented in tables and different types of charts. The ‘other’ and ‘any comments’ fields of the
questionnaire have been used to obtain any relevant qualitative data which has helped to
gauge perceptions of the responding researchers. The results of the data have been compared
with what has been reported in the literature, especially those which are based on the DAF
methodology as reported in the literature review.
The free text answers which respondents gave in the final question have been thematically
grouped and reported.
24
CHAPTER IV: RESULTS
4.0 Introduction
This chapter presents the findings of the study.
4.1 Demographics of respondents
A total of 34 participants responded to the questionnaire. Out of these, 17 were Principal
Investigators or Project Managers representing 35% of the respondents. Independent
researchers and those who were members of research teams comprised 21 % of each
category. 2 of the respondents were research assistants and only 1 was a research
support/non-academic staff member while 8 were research students working towards their
doctorate degrees representing 17% of all the respondents. Figure 1 shows the distribution of
the respondents by research role.
These researchers come from a wide array of academic disciplines such as social sciences,
sciences and Engineering.
35%
21%
21%
4%
2%
17%
Figure 1: Distribution of respondents by research role
Principal Investigator
Member of Research Team
Independent Researcher
Research Assistant
Non-academic Staff
Research Student
25
In terms of institutional affiliation, the majority (29) of the respondents came from the
University of Malawi representing 85%, 4 were from the Lilongwe University of Agriculture
and Natural Resources (Luanar) and 1 respondent was from Mzuzu University. Table 1
provides a summary.
Table 1: Distribution of respondents by institutional affiliation
Name of institution Number of respondents Percentage
University of Malawi 29 85%
LUANAR 4 12%
Mzuzu University 1 3%
4.2 Details of research data
In terms of research data holdings, 25 respondents, representing 74% indicated that they held
data while 9 of them (26%) said that they had at one point held data as shown in Figure 2
below.
74%
26%
Figure 2: Do you currently hold or have you ever held
any research data?
Yes, I currently holdresearch data
Yes, I have held researchdata in the past
26
4.2.1 Data Categories
Categories of the electronic data created in respondents’ research fields are summarised in the
chart in Figure 3. It shows that a wide array of research data are collected by researchers.
Close to half of it comprises the Survey/Interview/Focus Group category. Observational data
make up 19% of the reported categories with experimental data making up 15%. Simulated
data make up 8 %, while derived and reference data categories make up 10% each with
‘other’ categories at 2% of the reported data classes.
Figure 3: Categories of the electronic data created in respondents’ research fields
4.2.2 Data storage media
A wide spectrum of data storage media proliferate. As Table 2 indicates, hard disk drives of
laptops/netbooks are used more (at 21%) than all others by researchers to store their research
data followed by USB/Flash drives (14%), campus computer hard disk drives (13%) and
external hard drives (11%). Interestingly, paper also makes up one of the notable data storage
media (8%), more than Email client/servers, CD/DVDs, Hard disk drive of off campus
computers and web-based service such as Google Docs and Dropbox. The survey has
19%
46%
15%
8%
5%
5%
2%
Observational
Survey/Interview/Focus Group
Experimental
Simulated
Derived
Reference
Other
27
revealed that shared drives/servers (e.g. University servers) are only form 2% of the research
data storage media.
However, this question was flawed in that it gave respondents an option of choosing multiple
respondents yet the aim was to find the principal media. This needed to be a multiple choice
question with only one response option.
Table 2: Responses to Question 7 "What are the principal media on which your
research data are stored?"
Principal data storage medium Responses Percentage
Hard disk drive of laptop/netbook 32 21%
USB/Flash drive 21 14%
Hard disk drive of computer on campus 19 13%
External hard drive 17 11%
On paper 12 8%
Email client/server 10 7%
CD/DVD 9 6%
Hard disk drive of computer off campus 8 5%
Web-based service (e.g. Google Docs, Flickr, Box.net,
Dropbox, Pando etc. 8 5%
Shared drive/server (e.g. University server) 3 2%
Third party (including commercial data storage) 2 1%
Cassette Tape (Audio) 2 1%
Photograph 2 1%
Slides 2 1%
Other 2 1%
Hard disk drive of instrument/sensor which generates data 1 1%
VHS/Video Cassette 0 0%
Microfiche 0 0%
28
4.2.3 Formats
The results for the formats/software that researchers are using for their data are in Appendix
C1.
4.2.4 Research data volumes
Asked to estimate the volumes of electronic data they held as Figure 4 shows, the 1-50 GB
range had most responses at just below half of the respondents followed by the 100-500 GB
range which was held by a fifth of the respondents. Identical proportions of respondents
(12%) estimated their research data to be in the region of less than a gigabyte and 50-100 GB
each. Those who held 500 GB - 1TB of data accounted for about a tenth of the respondents
while 1 respondent (3%) had the highest estimation at 50 – 100 TB and another one was not
sure how much he or she held.
29
Figure 4: A graph showing responses to question 9: estimating of how much
electronic research data currently held/maintained by respondents.
4.2.5 Use of data management plans
The survey wanted to find out whether researchers had data management plans (DMPs) for
their research data such as data preservation policy, record management policy and data
disposal strategy. As Figure 5 shows, the majority of the participants (just over two thirds)
acknowledged that they did not have DMPs, about one third indicated that they had one. One
respondent (3%), a research student, did not know whether or not he or she had a DMP.
12%
44%
12%
18%
9%
3%
3%
< 1 GB
1 - 50 GB
50 - 100 GB
100 - 500 GB
500 GB - 1 TB
1 - 100 TB's
Don't know
30
Figure 5: Summary of responses to Question 10: “Do you currently have a data
management plan for your research data?”
Grouping the responses by discipline as shown in Table 3, the data shows that the highest
proportion of respondents who have DMPs are in Agricultural Sciences and Health related
disciplines, each at 50%, followed by Social Sciences at 40%, humanities (33%) and Sciences
(28.6%), while the rest do not have DMPs.
Table 3: Use of DMPs by discipline
Discipline Yes % No % Don't know % Total
Agricultural Sciences 2 50.0% 2 50.0% 4
Engineering & Architecture 2 66.7% 1 33% 3
Humanities 1 33.3% 2 66.7% 3
IT 2 100.0% 2
Law 1 100.0% 1
Business & Management
Science 3 100.0% 3
Medicine & Health 3 50.0% 3 50.0% 6
Sciences 2 28.6% 5 71.4% 7
Social Sciences 2 40.0% 3 60.0% 5
TOTAL 10 29.4% 23 67.6% 1 3% 34
Yes29%
No68%
Don't know3%
31
Respondents indicated various motivations for developing their data management strategy.
The one that featured highly is “Research requirement to access/analyse/annotate others'
data” which made up 50% of the drivers followed by “Volume of data associated with
project” at 14%. Unlike the trend elsewhere, funder mandates feature lowly in influencing
development of DMPs in Malawi at only 7%. This is summarised in Table 4.
A range of reasons for not having DMPs is given by the respondents and presented in Figure
6. Absence of university data management policy and time and effort required were given as
the major ones, both accounting for nearly half of the reasons given. Lack of training or
expertise within research group and lack of local support or guidance together made up
approximately a fifth of the reasons. Nearly a tenth of the reasons given is that DMPs are not
a requirement by project funders followed by the reason that they are not required or
appropriate to field of research or research group at 6%. These were given by respondents
who come from the humanities, social science and science backgrounds.
Table 4: Summary of responses to question 10a on main drivers for developing data
management strategies
Driver for developing DMP Response Percentage
Research requirement to access/analyse/annotate others' data 7 50%
Requirement of project funder 2 14%
Size of project team (i.e. multiple data creators) 1 7%
Volume of data associated with project 3 21%
Complexity of data associated with project (e.g. multiple formats) 1 7%
Absence of university data management policy 0 0%
Other 0 0%
32
Figure 6: Chart showing responses to question 8b on reasons for not developing
DMPs.
4.3 Responsibility for data management
As shown in Figure 7, more than two-thirds of the responses show that researchers manage
the data themselves. Departmental IT Officer and Central ICT account for only 16% of the
responses and just under 15% for Research Project Manager, Research Assistant, Research
Technician, Other designated person in Research Group, Local Data Centre, International
data centre / data archive combined.
3%
6%
9%
11%
11%
14%
23%
23%
Other
Not required / appropriate to field of research or researchgroup
Not required by project funder
Lack of training / expertise within research group
Lack of local support / guidance (e.g. Central Library, ICT)
Don't know
Time and effort required
Absence of university data management policy
33
Figure 7: Responses to question 11 “Who, if anyone, is responsible for managing your
electronic research data?”
4.4 Experiences with loss of research data
The study found that more than half of the respondents have lost research data which was not
backed up as shown in Figure 8.
2%
2%
2%
2%
2%
4%
7%
7%
9%
63%
Research Project Manager
Research Assistant
Research Technician
Local Data Centre
International data centre / data archive
Other designated person in Research Group
PhD Student
Central ICT
Departmental IT Officer
Myself
34
Figure 8: Question 12: Have you ever lost research data which was not backed up?
The participants who acknowledged to have lost data indicated several ways in which this
happened. Several cited multiple ways. As presented in Figure 9. Hardware failure accounted
for half of the ways in which unbacked data was lost followed by software failure at 36%.
Human error or loss and ‘other’ ways of data loss accounted for 7% each.
44%
56%
No
Yes
35
Figure 9: Ways in which data loss occurred
4.5 Issues with storage
The vast majority of participants indicated that they had experienced problems storing their
research data due to the size of files. This is shown in Figure 10. Asked to give details, these
are the statements that they provided: “Inadequate hardware storage facilities”, “difficulties in
opening big files”, “some file systems don't allow big files” and “external hard drives being
full”.
Through hardware failure50%
Through software failure36%
Through human error or loss
7%
Other7%
36
Figure 10: Question 13: Have you ever experienced any problems storing your research
data due to the size of the files?
4.6 Research Data Backup
The data, as displayed in the chart in Figure 11, shows a worrying trend in research data
backup with 35% of respondents indicating that they back up their data on an ad hoc basis,
while only 12% do it on a daily basis.
22%
78%
Yes
No
37
Figure 11: Summary of answers to question 14 “On average, how frequently is your
data backed up?”
4.6.1 Backup media
Media on which the data are backed up are summarised in figure 15 and show that three of
the popular ones are hard disk drives of laptops (20%), external hard drives (19%) and hard
disk drive of computer on campus (14%). Figure 12 summarises the responses on data
backup media. It is worth noting that there were only 3 responses on use of shared drives
such as those of university servers.
12%
15%
26%
3%
35%
6%
3%
Daily
Weekly
Monthly
Annually
Ad hoc
Never
Dont' know
38
Figure 12: Question 4b. Where are they backed up?
4.7 Researchers’ perceptions on sharing research data
As shown in Figure 13, almost 80% of the respondents indicated that they would support the
idea of their university's repository storing any of their research data, either for their
exclusive use or for wider access, if such a service was offered.
1%
1%
1%
1%
2%
3%
3%
4%
4%
7%
8%
11%
14%
19%
20%
Hard disk drive of instrument/sensor which generates…
Third party (including commercial data storage)
Slides
On paper
Photograph
Shared drive/server (e.g. University server)
Don't know
Hard disk drive of computer off campus
CD/DVD
Email client/server
Web-based service
USB/Flash drive
Hard disk drive of computer on campus
External hard drive
Hard disk drive of laptop/netbook
39
Figure 13: Responses to question 15 “If the service was offered, would you want your
university's repository to store any of your research data, either for your exclusive
use or for wider access?”
However, asked how long they would want the repository to retain any of their data, less than
half would want the repository to retain their data perpetually. Most of them (54%) are in
favour of their data being kept only until the end of the project. Table 5 summarises the
responses
Table 5: Question 16: If yes, how long would you want the repository to retain any of
your research data, including data only accessible by you?
None of
my data
Some of
my data
Much of
my data
All of
my data
Not at all 33% 8% 25% 33%
Until the end of the project 8% 38% 54%
For a finite period after end of project 35% 24% 41%
Until I leave the University 13% 20% 13% 53%
In perpetuity 11% 22% 22% 44%
Yes79%
No21%
40
4.8 Research data ownership
Table 6 summarises respondents’ answers to the question of research data ownership for
their projects. The percentage of those who own all the data is the same as that of those who
own some of the data (44%), while just under 10% said they own none of the data. One
respondent did not know who owned the data.
In most of the projects, respondents share data ownership with other academics or
researchers, followed by funding bodies and then journal publishers. Ten percent of the
projects do not share ownership with any one. This is presented in Figure 14.
Table 6: Question 17. Who owns the research data you hold?
Response No. of respondents Percentage
I own all of the data I hold 15 44%
I own some of the data I hold 15 44%
I own none of the data I hold 3 9%
Don't know 1 3%
41
Figure 14: Question 18. Do you share ownership of any of your research data with
others?
4.9 Research data sharing
As shown in Figure 15, e-mail is the method of sharing research data used most often (35%)
followed by portable storage media (23%) and web-based service (20%). It is interesting to
note that paper is used more often (9%) than shared university drives (5%). Less than 5% say
that they never share research data with anyone.
10%
21%
25%
44%
No
Yes, with journals/publishers
Yes, with funding bodies
Yes, with other academics/researchers
42
Figure 15: Question 19. How do you currently share research data with colleagues?
Different problems are encountered by the researchers when sharing data. Only 29% say they
have not faced problems when sharing data with their colleagues.
Finding suitable shared storage space has been a problem to some. Perhaps this confirms that
there are storage issues as also indicated by the finding above that shared drive/server is used
by very few people. Lack of file naming conventions made it difficult to identify files. Other
issues include legal issues surrounding international transfer of data and problems
establishing ownership of data as well lack of time to keep all colleagues constantly up to
date. One respondent also cited ‘network problems’ in addition to the given list of answer
options. These are presented in Figure 16.
2%
3%
3%
5%
9%
20%
23%
35%
Shared computer
I never share data with colleagues
Other
Shared drive/server (e.g. University server)
On paper
Web-based service (e.g. Google Docs, Flickr, Box.net,Dropbox, Pando etc.)
Using portable storage (e.g. CDs, DVDs, external harddrive, memory sticks etc.)
43
Figure 16: Question 20. What problems have you encountered when sharing data
with colleagues?
Question 21 wanted to find out who researchers wanted to be allowed access to their research
data. There was a mistake in that a wrong entry “My colleagues” was added to the list of
response options which has resulted in a wrong column (last column of Table 7).
2%
8%
8%
10%
12%
14%
16%
29%
Other
Lack of version control caused confusion
Problems establishing ownership of data
Legal issues arising from international transfer of data
Time consuming to keep all colleagues constantly up todate
Lack of file naming conventions made it difficult to identifyfiles
Finding suitable shared storage space
I have not encountered problems
44
Table 7: Question 21. Apart from yourself, who would you want to be allowed access to
your research data?
None of
my data
Some of
my data
Much of
my data
All of
my data
My
colleagues
My colleagues 7% 33% 22% 26% 11%
My school 5% 35% 35% 25% 0%
The whole university 4% 43% 30% 22% 0%
Specified academic communities
beyond the university 4% 54% 12% 19% 12%
Anyone (including general
public) 8% 54% 17% 17% 4%
The wrong response was removed and the data re-tabulated resulting in Table 8, which
shows that most of the respondents are in favour of sharing some of their data with all the
stakeholders listed.
Table 8: Re-tabulation of Table YY data
None of
my data
Some of
my data
Much of
my data
All of my
data
My colleagues 8% 38% 25% 29%
My school 5% 35% 35% 25%
The whole university 4% 43% 30% 22%
Specified academic
communities beyond the
university 4% 61% 13% 22%
Anyone (including general
public) 9% 57% 17% 17%
45
Figure 17: Question 22. What factors would prevent your research data from being
made open access to the general public?
4.10 Experience with research council mandating data sharing
The majority of the respondents, over three-quarters, have never applied for funding from a
body that required some degree of open access to be provided for research data. Only 18%
have. Figure 18 summarises the responses.
6%
6%
6%
7%
10%
13%
16%
17%
20%
None
I do not believe the public would have any use for some ofmy data
Data have commercial value
Data contain personal information/have not beenanonymised
Funder restrictions
I do not have the ownership rights to share all of my data
Protect own ideas or intellectual property
Ethics requirements of university/funder
Data are not ready to be released/concern unpublishedwork
46
Figure 18: Question 23. Have you ever applied for funding from a body that required
some degree of open access to be provided for your research data?
The funders that were mentioned by each of the respondents who had applied for funding
from bodies mandating sharing are NUFU, Wellcome Trust, IDRC, NIH, Water Research
Commission and DFID.
Q23b: although only 6 participants indicated that they had at some point applied for research
funding from a body that mandated sharing of research data, it is surprising that a total of 17
respondents answered the question that followed which was intended for those who had made
applications to such a funder. This should have been a multiple choice question demanding
only one response.
Yes18%
No76%
Don't know6%
47
Table 9 indicates that the majority of the respondents did not have problems meeting the
funders’ requirements. One-third of those who said they had applied for funding from such
funding bodies said that they had experienced problems in meeting the requirements but have
always been able to meet the requirements
Table 9: Question 23b. Have you ever experienced difficulties in meeting
these requirements?
No 12 71%
Yes, but I have always been able to meet the requirements 3 18%
Yes, as a result I was unable to obtain funding through this body 0 0%
Yes, and I need training and guidance 2 12%
4.11 Free text responses given for question 24 are in Appendix C8.
48
CHAPTER V: DISCUSSION
5.0 Introduction
The data for this study suggests that many researchers in Malawi, ranging from principal
investigators to research students and from various disciplines, either currently hold research
data or have in the past done so and it can be expected that these will again in the future hold
such data. It is important therefore to understand the characteristics of the data and how they
handle and manage such data in order to support them effectively.
5.1 Attributes of the research data
5.1.1 Data categories
A wide array of research data are collected by researchers in academic institutions in Malawi
the bulk of which is made up of surveys and interviews. Observational and experimental data
are also collected often. These findings are similar to those of several DAF studies such as
the University of Northampton Study which found that observational data tend to compliment
results from surveys or experiments, thereby playing a supporting and complimentary role.
All the disciplines represented in the current study collect survey data more than the other
research data types. However, only Science & Technology and Engineering & Architecture
disciplines collect simulated, derived and reference data. This supports the notion that the
type of data collected is determined by the discipline to which the researchers belong.
A look at the data indicates that Microsoft Office suite of applications such as Word and
Excel is widely used. The proliferation of Microsoft products is beneficial for researchers
because it guarantees ease of use, analysis and sharing of data among researchers owing to
the similarities in formats. If many researchers are using similar software products for their
data, it is easier to support them than when they employ a wide array of applications.
49
Digital audio and video files have been mentioned by the same number of 4 respondents. It is
typical that interview data would be collected using voice or video recorders, stored as digital
audio or video files, transcribed and stored as MS Word document files. Excel or SPSS files
are typically used to keep data collected through questionnaires and Word is used to write
research reports.
Use of audio tapes has been mentioned by a few participants. It could be that more
researchers are using them. It is of concern that some are still using such outdated technology
because there is a risk that the data could be lost due to degradation of or damage to the tapes.
Further to that, the quality of sound on tapes dwindles with time. There is need for such data
to be converted from their present analogue to digital format for easy storage and to preserve
their sonic quality.
5.1.2 Databases
The overwhelming majority of researchers in Malawi store data in databases where the data
shows that SPSS is used more than other database software. One challenge that researchers
are likely to face, as observed in the Northampton study, is keeping up to date with SPSS
version upgrades as this program is updated annually. Their old databases may not open
using updated versions of the software. The solution would be software institutional licences
on campus servers so that as databases are stored on shared drives on the servers, they would
also benefit from the annual upgrades through their institutional subscriptions.
5.1.3 Image data
Use of image data is common by researchers in Malawi’s HEIs and the image format of
choice is ‘.jpg/.jpeg’. Many digital cameras use this as the default standard and a number of
applications including web browsers are able to read this format. This implies that researchers
50
are able to share their image data easily and that they do not have to be restricted by highly
specialised and expensive programs to use them. The other advantage to researchers is that
‘.jpeg’ files do not take up huge amounts of hard disk space because of their compressed
format.
5.1.4 Audio data
Audio research data is also commonly used in Malawi, the majority of which is stored using
‘.mp3’ as the primary format. Files in this format are small in size, implying that they do not
take up much space on the computer and they are also more portable than the other formats.
The drawback though, is that they have compatibility issues with some CD players.
5.1.5 Video data
Comparably, there are less users of video research data than there are for audio. Video takes
up more space than any of the file types. Some research is more suited for video than other
file formats. One reason researchers are not using video as much as other formats could be
unavailability of adequate storage space both on their campuses as well as personal
computers or laptops. Nearly half of the video data is stored using the ‘.mpeg’ format which
is a compressed file format with the advantage of smaller file sizes than other video formats.
This is helpful because it helps mitigate the noticeable inadequacies in storage space or
infrastructure.
5.1.6 Data volumes
The majority of researchers are keeping data volumes that laptop hard disks are capable of
storing. Where institutions do not provide storage infrastructure, as is the case in most
campuses presently, researchers would tend to store their research data on their laptops or
51
external hard disks. This is risky to the data because laptops and external hard disks can be
stolen or do crash. In addition, if a researcher leaves the institution, they would go away with
the data that they obtained in the name of their previous institution.
5.2 Management of research data
5.2.1 Storage media used
Researchers in Malawi are using various types of storage media for their research data. The
ones that abound are those that are managed by the researchers themselves such as laptop
hard disk drives, USB/Flash drives and external hard drives. Laptops are only convenient for
storing data temporarily but they should not be used to store master copies of data. Use of
shared drives such as those on institutional servers is almost non-existent. This is a worrying
state of affairs because the data is at great risk. Laptops, external hard disk drives or
USB/Flash disks can be lost or stolen easily and most of the times they stop working
unexpectedly leading to loss of data.
Interestingly, some researchers depend on e-mail to store their data. Perhaps they email the
data as attachments to themselves. Email services have file size limits that they can transmit
per email and most of them such as Gmail, a common email service platform in many HEIs
these days, allow up to 10MB. With the finding that data for researchers in the current study
far exceeds this limit, it is easy to see that email servers are not recommended for storing
research data and further to that, they are not designed for that purpose.
In most cases, the researchers manage their research data on their own using their own
facilities such as PCs, laptops and portable storage media. On the whole, there is no clear
dedicated responsibility for management of research data. It is of utmost importance that
research data management be given the priority that it deserves by recruiting specialised
52
personnel to handle it and investing in the necessary storage infrastructure. The recruitment
could be done either by re-skilling the available staff in departments such as library or IT or
employing new people altogether. This is what universities wanting to strengthen research
support services in developed countries are doing. Managing research data well guarantees its
integrity over the long term.
5.2.2 Research Data Backup
Backing up research data is an extremely important component of research data management
because it ensures that the data are available long after the projects end. It is also a way of
safeguarding the financial and time investment that was made in obtaining them. It is of great
concern that in Malawi’s academic institutions, there is no comprehensive and systematic
approach to research data backup. Most of the data is backed up on an ad hoc basis. This is
risky to the data because coupled with the finding that most of the data is stored and backed
up using personal devices, the data can inevitably and easily be lost.
The custom of research data backup parallels that of storage where data is mostly backed up
using the researchers’ personal devices such as hard disk drives of laptops, external hard
drives and hard disk drives of computers in their offices on campus. This perhaps explains
why the majority of them backup their data themselves. Similarly just as in storage, backing
up data to institutional server hard disk drives is almost non-existent. The reasons could be
that either such servers are not available or just as other DAF studies found, there is no
awareness of the availability of such infrastructure to be used for storage and back up of
research data.
The data backup practices seem to be lacking in rigour and frequency, which, coupled with
the finding that data is backed up as and when the researchers feel, is a risk to the research
53
enterprise and therefore, wasteful of the funds that financed the research, risky to data reuse
among a host of other problems.
5.2.3 Research data sharing
As research in modern times has become increasingly global and characterised by enormous
volumes of data and collaboration among researchers from a multiplicity of disciplines,
sharing has become crucial. The overwhelming majority of researchers in the current study
support the notion of their university's repository storing their research data, either for their
exclusive use or for wider access, if such a service was offered. Most researchers in Malawi
are largely in favour of sharing their research data. However, when asked how long they
would want a hypothetical repository to hold their research data, only about half of them, on
average, gave their responses and these differed per option.
Surprisingly, contrary to the overwhelming support for data sharing, less than half of those
who responded want all of their data to be retained in perpetuity. Just over half want all of
their data to be held in the hypothetical repository until the end of the project and until they
leave their university. Those who would like some and much of their data to be preserved at
any length of time, on average, exceed just about one fifth of those who responded, meaning
that most researchers are not really in support of sharing their own research data. This finding
seems to agree with what the literature reports, for example Tenopir et al. (2011), who
reported that sharing practices of researchers were minimal.
Most of the participants do not own all of the data that they hold. This could explain the
nonresponse in that they may not feel free to share that which they do not entirely own
themselves. In most cases, ownership is shared between the researchers and other academics
or researchers, journal publishers and funding bodies.
54
Researchers in Malawi share their data in various ways. A number of them use e-mail. These
should be datasets that are small in size transmittable by email. Portable storage media such
as CDs, DVDs, external hard drives and memory sticks and web-based services such as
Google Doc, DropBox and Flickr are also being used to share data. A small proportion is
sharing using paper. In keeping with what has been reported earlier in this document on
storage and backup practices, use of institutional shared drives to share research data is not
common in Malawi’s academic institutions. It is pleasing to note that those who never share
data with colleagues are in a tiny minority.
5.2.4 Sharing by discipline
When the data are disaggregated by discipline, the picture of sharing perceptions is unclear
because the sizes of the data become too small to make any conclusive judgement. This has
been compounded by the lack of responses for some of the options. However, one
observation that raises curiosity is that all participants from the medical sciences selected
“All my data” for all the timeframes given implying that those who responded have no
problems sharing any of their data. How does this compare with the literature regarding
researchers from this discipline? On the contrary, no clear pattern emerges from the data
regarding social sciences, sciences and the other disciplines regarding their perceptions
towards sharing. Therefore, the extent to which researchers from the various disciplines want
to share their data remains largely unclear.
However, in the absence of respondents from other institutions and a low response rate, this
interpretation should be treated with caution.
5.2.5 Hindrances to sharing
55
There are various challenges that prevent researchers in Malawi from effectively sharing their
research data. Researchers do not find suitable shared storage space, a now recurring issue, to
enable them share. The data also suggests that lack of file naming conventions makes it
difficult to identify files. This is a problem that emanates from the absence of research data
management policy. As found by Jones & Ross (2009), the prevalence of “idiosyncratic
working practices” leads to differences in naming conventions of data files and is one of the
hindrances to data sharing.
Legal issues arising from international transfer of data and problems establishing ownership
of data as well as demands on time consuming to keep all colleagues constantly up to date
have also been named as some of the challenges being faced by these researchers. One
responded also cited ‘network problems’ in addition to those that were on the list of options.
Most of the respondents are in favour of sharing only some of their data with all the
stakeholders listed such as their colleagues, school, whole university and other external
academic communities. They are less willing to allow access to all of their data and they are
not in favour of their data being made open access to the general public for a number of
reasons. Some feel that the data are not ready to be released because the works have not been
published, others are concerned with ethics requirements of their university or funder and
some would like to protect their ideas or intellectual property. As noted earlier, there are
researchers who cannot share data because they do not have the ownership rights to share it.
Small numbers of researchers mention funder restrictions, data not being anonymised,
commercial interests as well as a belief that the public would not have any use for some of
their data as the reasons for not wanting to share.
5.2.6 Long-term preservation of research data
56
Preservation of research data ensures its availability over the long-term. The absence of
research data policies, high capacity storage infrastructure and dedicated data management
staff in Malawi’s HEIs makes it almost impossible to preserve electronic research data. This
goes along with the practices and culture of storage and backup that are prevailing at the
moment.
5.2.7 Issues with day to day management of data and support needs
Many of the sub-sections above have been alluded to a number of issues that researchers in
Malawi often face regarding the management of their research data. These hinge on storage,
backup and sharing of data which can be traced to the unavailability of proper storage and
preservation infrastructure.
Linked to the same cause, many researchers lose their research data which has not been
backed up. Data collection, entry, analysis and reporting are expensive activities in monetary
terms and the time and energy it takes to achieve them. Some data is collected over long
periods, often involving many enumerators and therefore, a lot of financial resources. Some
data may only be collected at once as it may involve a particular snapshot in time such as
climate change data. Losing this data therefore, means a waste of time and money. It is also a
blow to reuse and repurposing of the data.
Technical problems such as hardware and software failure together form a huge part of the
reasons for data loss. This is a serious issue because it happens in an environment where the
majority of researchers are managing the data on their own. This means that many
researchers are helpless with the management and safeguarding of their data.
57
Researchers also lose data through human error such as theft or loss of their devices. The
picture is grim when one considers that these are the primary devices on which most of the
researchers are using to store and backup their data.
Asked if they have ever experienced any problems storing research data due to size files, the
majority indicated that they have not. This is a contradiction especially when one considers
the lack of storage infrastructure that has been identified. It could be that researchers feel self-
sufficient in terms of storage of their data because their laptops and external hard discs, which
are their primary storage devices, have high storage capacities although they are prone to
loss, theft or even wear and tear.
The few researchers who concede to having encountered problems with storing huge files of
their research data give various reasons. These are “inadequate hardware storage facilities”
which is a recurring issue, “difficulties in opening big files”, “some file systems don't allow
big files” and “external hard drives being full”.
5.2.8 Experience with Data Management Plans
Research funders’ policies that require grant applicants to include data management plans in
their application for funding have contributed to increased uptake of research data
management activities in many research and academic institutions in developed countries.
The current study wanted to know if researchers in Malawi have had any experiences with
such funding bodies.
The majority of researchers in Malawi have never applied for research funding from any
funding institution that requires some degree of open access to be provided for research data.
Only a small proportion has.
58
Most of the researchers who have dealt with these funders report that they have never
experienced any problems in meeting the data open access requirements but a few have
struggled. Although no one has failed to obtain the funding applied for despite experiencing
difficulties, some have expressed the need for support in preparing DMPs.
Some of the funders which researchers in Malawi have applied funding to include NUFU,
Wellcome Trust, IDRC, NIH, Water Research Commission and DFID. Some of these such as
Wellcome Trust, NIH and DFID do require DMPs from grant applicants.
5.2.9 Researchers’ specific concerns
Researchers in Malawi express various concerns over the current management of their
research data or services they would like to see offered by their universities to guarantee
future access to the data. Their responses have been categorised into different themes.
Theme 1: Policy and Storage issues
“Need system of data management and secure server in the department”
“At the moment my storage of research data at my UNI is on a personal basis. I don't know if
there's a data management policy, I will have to check but I think it will nice to have one”
Theme 2: Concerns of data theft
“In most cases there is element of data theft, mainly between IT personnel and the data
'hunters'. Other don’t mind other people's effort and energy engaged in data collection
especially in its raw form. Once published then it can be made public”.
Theme 3: Investment / infrastructure / sharing /access / storage
59
“My university needs to invest more in ICT access to make it possible to start comfortably
sharing data”
“It would be useful if research data mainly Theses were posted online through institutionally
controlled access for easy access by those interested both nationally and internationally”.
“I would love to have a university central server where I ca deposit my data and be able to
retrieve my data when I am within or outside campus including outside the country”.
Theme 4: Connectivity issues
“The most serious problem is that internet services are poor thereby affecting public access to
some data that we would want to share”.
“Yes, there is a serious challenge with internet connectivity at Polytechnic. Secondly our
publications do not appear in full on our website”.
Theme 5: Training / awareness issues
“Lack of knowledge about data management and there seems to be no-one who minds to
offer some enlightenment on the same”.
Theme 6: Perceived administrative issues
“Management taking too long to put things in place”.
60
CHAPTER VI: CONCLUSION
6.0 Introduction
This study had set out to understand the present research data management practices on the
Malawian higher education scene. This chapter summarises the findings of the study, outlines
its contribution to the body of knowledge in this area and finally makes practical
recommendations to improve how research data management in Malawi.
6.1 Summary of findings of the study
Many researchers from various academic disciplines in Malawi collect and hold a wide array
of research data from time to time. These range from surveys and interviews to observational
and experimental data. Some of the types of data collected are more discipline-specific than
others, for example simulated data is more prevalent in the Technology and Engineering
disciplines than others.
Digital and audio data is also collected. Digital audio and video data are also generated.
These need high capacity storage because they take up a lot of storage space than other data
types.
Microsoft Office applications are in wide use which makes it easier for researchers to use for
analysis and sharing of data due to similarities in formats. It is also easier to support them
than when there is a proliferation of various types of applications.
As in other studies, SPSS is the most common software for storing and manipulating
databases in Malawi.
Researchers also collect image data which are stored mostly using the compressed ‘.mp3’
format. Some video data is also collected where ‘.mpeg’ is the format of choice for using
them.
61
Most of the data generated falls within the less than 1GB to 1TB range where laptop hard
disk drives, USB/Flash drives and external hard drives form the bulk of the media used to
store them. These are the devices that are also mostly used to back up the data. This practice
of data storage and backup poses a great risk to the data because these storage devices are
prone to loss, theft and abrupt irretrievability problems. Worryingly, the most effective way
of storage and backup of research data which is using shared drives on institutional servers is
almost non-existent.
Stewardship of research data is primarily done by the researchers themselves where the
critical function of data backup is largely done on an ad hoc basis as opposed to taking a
more rigorous approach. Clearly, researchers are too busy to back up their data more
systematically and frequently.
Although most of the researchers say that they have not experienced problems storing huge
files due to size, perhaps because they have high capacity laptops and external hard drives,
those who have experienced such challenges mention that there is insufficient hardware
storage infrastructure.
Researchers in Malawi overwhelmingly support the idea of research data sharing. However,
most of them are noncommittal when asked about how long they would want their data to be
held in an open access repository and very few would like their data to be held in perpetuity
in such a repository. This could be the case because most of them do not own all of the data
they hold.
A majority of researchers are not willing to share all of their data with research stakeholders.
Some of them want the data to be published before it can be released and others are
concerned with ethical requirements and yet others are protecting their intellectual property.
62
There is also a feeling that the general public would not have any use with the data and
therefore, there is no need to share them.
A variety of methods in which researchers share data abound. These include email, optical
discs, external hard drives and memory sticks. Web-based services are also being used.
Sharing is hampered by the unavailability of suitable storage space, a lack of file naming
conventions and internet connectivity limitations.
Most researchers have never applied for research funding from bodies that mandate sharing
of research data. Of those who have, most report having no problems meeting the funder’s
requirements although a few have had challenges and none has lost funding as a result. In
agreement with other studies, some researchers in this study are also asking for training and
support with data management plans.
Lack of experience with the funders in question could explain why the overwhelming
majority of researchers in HEIs in Malawi do not have data management plans for their
research data. Examples of these are data preservation policy, record management policy and
data disposal strategy. However, even most of those who have DMPs do not agree that funder
mandates influenced them to do so.
Funders who are well known for mandating data sharing who researchers in Malawi have
dealt with include The Wellcome Trust, NIH and DFID.
A majority of researchers have lost their research data that was not backed up mostly through
technical problems and loss or theft of their laptops and storage devices.
Their expressed concerns regarding management of their research data hinge on policy,
storage, data theft, connectivity, training and administrative issues.
63
6.2 Contribution
This study has much to contribute to the body of knowledge, research support services in
higher education and to formulation of policy on management of research data.
Studies on how researchers manage their research data have been conducted mostly in
Europe, Australia and North America. One known study in Africa was done in South Africa
in 2010. The present study adds to the existing body of knowledge as it provides a picture of
Malawi’s research data management landscape. Future researchers on the topic of RDM
would have to consult this work.
The study identifies areas that researchers need support and guidance in. University
management and support services could use this information to design programs and recruit
personnel to provide this support.
Policy makers in the public and academic sectors are also some of those who could benefit
from the findings of this study which. In combination with other studies and policies, they
could use it to formulate RDM policies that address some of the issues raised in this study as
it has been found that the absence of policy on management of research data seems to
contribute to the idiosyncratic nature that is obvious in the way that the data is being
managed.
6.3 Limitations of the study
Many researchers from the College of Medicine did not respond to the questionnaire because
they wanted to see ethical clearance from Malawi in addition to the one granted by the
University of Sheffield. A lot of research involving funders that require open access to
research data has been going on at this college, therefore, their responses could have given
insight on the extent to which these funders’ policies are shaping management of research
data.
64
The online survey approach did not help to obtain a deep understanding of how researchers
are caring for their data. It was also difficult to really gain a real picture of the data volumes
and types of data these researchers are producing. It was apparent that some respondents were
either tired of responding to certain questions or were indifferent to them.
In hindsight, the questionnaire focussed on too many areas such as types of research data
collected, their different formats, the types of software used in manipulating or storing them,
storage and data loss issues, the data management aspects, experience with DMPs, funders a
sharing.
6.4 Recommendations for further research
The following recommendations are offered for future studies in the area of RDM in Malawi.
To obtain more qualitative data, face to face interviews with researchers and those supporting
them would be useful.
It is recommended that physical audits of computers used for storing research data be carried
out as part of studies to understand how the data is being managed. This would give a better
picture of the types, formats and volumes of data being generated and how these data are
managed.
Good information could be obtained by focussing on fewer themes at a time to ensure that
respondents are engaged with the survey throughout the whole process.
To ensure maximum participation from all potential participants, resolving issues of local
ethical requirements well ahead of time should be considered.
Word count:13,416
65
CHAPTER VII: REFERENCES
Akers, K. G., & Doty, J. (2013). Disciplinary differences in faculty research data
management practices and perspectives. International Journal of Digital Curation, 8(2), 5–
26. doi:10.2218/ijdc.v8i2.263
Alexogiannopoulos, E., McKenney, S. and Pickton, M. (2010) Research Data Management
Project: a DAF investigation of Research Data Management practices at The University of
Northampton. Northampton: University of Northampton. Available from:
http://nectar.northampton.ac.uk/2736
ANDS. (n.d.). Data Reuse. Retrieved May 26, 2014, from
http://ands.org.au/discovery/reuse.html
ANDS. (n.d.). ANDS Guides and Other Resources. Retrieved June 25, 2014, from
http://ands.org.au/guides/index.html
Arts and Humanities Research Council. (2014). Research funding guide (Version 2.6).
Swindon: Arts and Humanities Research Council. Retrieved 03 July 2014 from
http://www.ahrc.ac.uk/SiteCollectionDocuments/Research-Funding-Guide.pdf
Babbie, E. R. (1979). The practice of social research (2nd Ed.). Belmont, California:
Wadsworth Publishing Company.
Barnes, S. (2001). Bristol Online Surveys (BOS) knowledgebase » Survey design. Retrieved
May 6, 2014, from http://www.survey.bris.ac.uk/support/survey-design
66
Borgman, C. (2012). The conundrum of sharing research data. Journal of the American
Society for Information Science and Technology, 63(6), 1059–1078. doi:10.1002/asi.22634
Bryman, A. (2012). Social Research Methods (4th ed.). Oxford University Press.
Corrall, S., Kennan, M. A., & Afzal, W. (2013). Bibliometrics and Research Data
Management services: emerging trends in library support for research. Library Trends, 61(3),
636–674. doi:10.1353/lib.2013.0005Corrall, S., Kennan, M. A., & Afzal, W. (2013).
Digital Curation Centre. (2014). RDM in South Africa - UCT Research Data Management
Policy and Strategy Workshop. Retrieved June 4, 2014, from http://www.dcc.ac.uk/news/uct-
strategy-workshop
Digital Curation Centre (2009) Data Asset Framework: Implementation guide. Retrieved 01
May 2014 from: http://www.data-audit.eu/docs/DAF_Implementation_Guide.pdf
DCC. (n.d.). Digital curation training for all. Retrieved June 25, 2014, from
http://www.dcc.ac.uk/training
DCC. (n.d.). International Journal of Digital Curation. Retrieved June 25, 2014, from
http://www.dcc.ac.uk/resources/curation-journals/ijdc
Doorn, P., Dillo, I., & van Horik, R. (2013). Lies, damned lies and research data: can data
sharing prevent data fraud? International Journal of Digital Curation, 8(1), 229–243.
doi:10.2218/ijdc.v8i1.256
Economic and Social Research Council. (2013). ESRC Research Data Policy September
2010 (Revised March 2013). Swindon: Economic and Social Research Council. Retrieved 03
July 2014 from http://www.esrc.ac.uk/_images/Research_Data_Policy_2010_tcm8-4595.pdf
67
Engineering and Physical Science Research Council (EPSRC). (2014). Principles. Retrieved
June 4, 2014, from
http://www.epsrc.ac.uk/about/standards/researchdata/Pages/principles.aspx
Engineering and Physical Sciences Research Council. (2011). Expectations - EPSRC policy
framework on research data. Expectations - Engineering and Physical Sciences Research
Council. Retrieved 03 July 2014 from
http://www.epsrc.ac.uk/about/standards/researchdata/expectations/
European Science Foundation (2000). Good scientific practice in research and scholarship
(No. 10). European Science Foundation. Retrieved 27 June 2014 from
http://www.esf.org/fileadmin/Public_documents/Publications/ESPB10.pdf
Gibbs, H. (2009). Southampton data survey: our experience and lessons learned. Edinburgh
University. Retrieved from http://www.disc-uk.org/docs/SouthamptonDAF.pdf
Groenewegen, D., & Treloar, A. (2013). Adding value by taking a national and institutional
approach to research data: the ANDS experience. International Journal of Digital Curation,
8(2), 89–98. doi:10.2218/ijdc.v8i2.274
Halbert, M. (2013). The problematic future of research data management: challenges,
opportunities and emerging patterns identified by the DataRes Project. International Journal
of Digital Curation, 8(2), 111–122. doi:10.2218/ijdc.v8i2.276
Human Genome Organisation (HUGO), Ethics Committee. (2002). Statement on human
genomic databases, December 2002. Human Genome Organisation (HUGO). Retrieved 09
July 2014 from http://www.hugo-international.org/img/genomic_2002.pdf
68
Jerrome, N., & Breeze, J. (2009). Imperial College Data Audit Framework Implementation:
Final Report. Programme/Project deposit. Retrieved June 9, 2014, from
http://repository.jisc.ac.uk/307/
Jones, K. (2011). Assessing institutional data storage and management using the Data Asset
Framework (DAF) methodology at the University of Bath. Reports/Papers. Retrieved June 2,
2014, fromhttp://opus.bath.ac.uk/24960/
Jones, S (2014). The range and components of RDM infrastructure and services. Pryor, G.,
Jones, S., & Whyte, A. (Eds.), Delivering Research Data Management services: fundamentals
of good practice. (p. 98). London: Facet Publishing.
Jones, S., Ball, A., & Ekmekcioglu, Ç. (2008). The Data Audit Framework: a first step in the
data management challenge.International Journal of Digital Curation, 3(2), 112–120.
doi:10.2218/ijdc.v3i2.62
Jones, S., & Ross, S. (2009). Data Audit Framework Development (DAFD) Project final
report. Glasgow. Retrieved from http://www.data-audit.eu/docs/DAFDfinalreport.pdf
Jones, S., Pryor, G., & Whyte, A. (2013). ‘How to Develop Research Data Management
Services - a guide for HEIs’. DCC How-to Guides. Edinburgh: Digital Curation Centre.
Available online: http://www.dcc.ac.uk/resources/how-guides
Jones, S., Ross, S., & Ruusalepp, R. (2008). The Data Audit Framework: a toolkit to identify
research assets and improve data management in research led institutions (pp. 213–219).
Presented at the 5th International iPRES Conference: Joined Up and Working: Tools and
69
Methods for Digital Preservation, London, England. Retrieved from
http://www.bl.uk/ipres2008/ipres2008-proceedings.pdf
Jones, S., Ball, A., & Ekmekcioglu, Ç. (2008). The Data Audit Framework: a first step in the
data management challenge. International Journal of Digital Curation, 3(2), 112–120.
doi:10.2218/ijdc.v3i2.62
Jubb, M. (2007). UK Research Funders’ Policies for the Management of Information
Outputs. International Journal of Digital Curation, 2(1), 29–48. doi:10.2218/ijdc.v2i1.12
Levelt Committee, Noort Committee, & Drenth Committee. (2014). Flawed science: the
fraudulent research practices of social psychologist Diederik Stapel (Stapel Investigation).
Tilburg University/University of Groningen/University of Amsterdam. Retrieved 07 July
2014 from https://www.commissielevelt.nl/wp-
content/uploads_per_blog/commissielevelt/2013/01/finalreportLevelt1.pdf
Lyon, L., Rusbridge, C., Neilson, C., & Whyte, A. (2010). Disciplinary Approaches to
Sharing, Curation, Reuse and Preservation: DCC SCARP Final Report to JISC. Edinburgh:
Digital Curation Centre. Retrieved from
http://www.dcc.ac.uk/sites/default/files/documents/scarp/SCARP-FinalReport-Final-
SENT.pdf
Martinez-Uribe, L. (2009). Using the Data Audit Framework: an Oxford case study. Oxford:
University of Oxford. Retrieved fromhttp://www.disc-uk.org/docs/DAF-Oxford.pdf
Medical Research Council. (2011). MRC policy and guidance on sharing of research data
from population and patient studies (No. v01-00). Medical Research Council. Retrieved 03
70
July 2014 from http://www.mrc.ac.uk/news-events/publications/mrc-policy-and-guidance-on-
sharing-of-research-data-from-population-and-patient-studies/
Michener, W. K., Allard, S., Budden, A., Cook, R. B., Douglass, K., Frame, M., … Vieglais,
D. A. (2012). Participatory design of DataONE—Enabling cyberinfrastructure for the
biological and environmental sciences. Ecological Informatics, 11, 5–15.
doi:10.1016/j.ecoinf.2011.08.007
National Science Foundation. (2010). Data Management & Sharing Frequently Asked
Questions (FAQs). Retrieved July 9, 2014, from
http://www.nsf.gov/bfa/dias/policy/dmpfaqs.jsp
National Science Foundation. (2010). Scientists seeking NSF funding will soon be required to
submit data management plans: government-wide emphasis on community access to data
supports substantive push toward more open sharing of research data (Press Release 10-
077). Arlington, Virginia: National Science Foundation. Retrieved 04 July 2014 from
http://www.nsf.gov/news/news_summ.jsp?cntn_id=116928
National Commission for Sceince and Technology. (2011). The framework of guidelines for
research in the social sciences and humanities in Malawi; Issued with legislative anchorage to
the Science and Technology Act No.16 of 2003. NCST. Retrieved 25 June, 2014 from
http://www.ncst.mw/wp-content/uploads/2014/03/NATIONAL-FRAMEWORK-OF-
GUIDELINES-IN-SSH.pdf
National Science Foundation. (2010). Award and Administration Guide. Retrieved May 26,
2014, from http://www.nsf.gov/pubs/policydocs/pappguide/nsf11001/aag_6.jsp#VID4
71
Organisation for Economic Co-operation and Development, The. (2007). OECD principles
and guidelines for access to Research Data from Public Funding. Retrieved from
http://www.oecd.org/sti/sci-tech/38500813.pdf
Parsons, T. (2013). Creating a research data management service. International Journal of
Digital Curation, 8(2), 146–156. doi:10.2218/ijdc.v8i2.279
Pienaar, H. (2010). Survey of research data management practices at the University of
Pretoria, South Africa: October 2009 – March 2010. Retrieved from
http://repository.up.ac.za/handle/2263/15154
Pryor, G., Jones, S., & Whyte, A. (Eds.). (2014). Delivering Research Data Management
services: fundamentals of good practice. London: Facet Publishing.
Pryor, G. (Ed.). (2012). Managing research data. London: Facet Publishing.
Pryor, G., & Donnelly, M. (2009). Skilling up to do data: whose role, whose responsibility,
whose career? International Journal of Digital Curation, 4(2), 158–170.
doi:10.2218/ijdc.v4i2.105
RCUK (2011). RCUK Common Principles on Data Policy - Research Councils UK.
Retrieved May 26, 2014, fromhttp://www.rcuk.ac.uk/research/datapolicy/
Rice, R., & Haywood, J. (2011). Research Data Management initiatives at University of
Edinburgh. International Journal of Digital Curation, 6(2), 232–244.
doi:10.2218/ijdc.v6i2.199
72
Royal Society, The. (2012). Science as an open enterprise. London: The Royal Society.
Retrieved 20 May 2014 from https://royalsociety.org/~/media/policy/projects/sape/2012-06-
20-saoe-summary.pdf
Tenopir, C., Birch, B., & Allard, S. (2012). Academic libraries and research data services:
Current practices and plans for the future; an ACRL white paper. Chicago: Association of
College and Research Libraries, a division of the American Library Association.
Tenopir, C., Allard, S., Douglass, K., Aydinoglu, A., Wu, L., Read, E., … Frame, M. (2011).
Data sharing by scientists: practices and perceptions. PLoS ONE, 6(6).
doi:10.1371/journal.pone.0021101
Vines, T. H., Albert, A. Y. K., Andrew, R. L., Débarre, F., Bock, D. G., Franklin, M. T., …
Rennison, D. J. (2014). The Availability of Research Data Declines Rapidly with Article
Age. Current Biology, 24(1), 94–97. doi:10.1016/j.cub.2013.11.014
Wagenaar, T. C., & Babbie, E. R. (2004). Guided activities for the practice of social
research. (10th ed.). Belmont, CA: Thomson/Wadsworth.
Ward, C., Freiman, L., Jones, S., Molloy, L., & Snow, K. (2011). Making sense: talking data
management with researchers. International Journal of Digital Curation, 6(2), 265–273.
doi:10.2218/ijdc.v6i2.202
Westra, B. (2013). Data Services for the Sciences: A Needs Assessment.Ariande, (64).
Retrieved fromhttp://www.ariadne.ac.uk/print/issue64/westra
73
Wicherts, J. M., Borsboom, D., Kats, J., & Molenaar, D. (2006). The poor availability of
psychological research data for reanalysis. The American Psychologist, 61(7), 726–728.
doi:10.1037/0003-066X.61.7.726
Wilson, J. A. J., & Jeffreys, P. (2013). Towards a unified university infrastructure: the data
management roll-out at the University of Oxford.International Journal of Digital Curation,
8(2), 235–246. doi:10.2218/ijdc.v8i2.287
74
Appendices
Students Staff This proposal submitted by: This proposal is for:
Undergraduate Specific research project
X Postgraduate (Taught) – PGT Generic research project
Postgraduate (Research) – PGR This project is funded by:
Project Title: Research Data Management Practices of Researchers in Malawi:
The Case of Selected Academic Institutions
Start Date: June, 2014 End Date: 01 September, 2014
Principal Investigator (PI):
(student for supervised UG/PGT/PGR research)
Thomas Bello
Email: [email protected]
Supervisor:
(if PI is a student)
Dr Andrew Cox
Email: [email protected]
Indicate if the research: (put an X in front of all that apply)
Involves adults with mental incapacity or mental illness, or those unable to make a personal decision
Involves prisoners or others in custodial care (e.g. young offenders)
Involves children or young people aged under 18 years of age
Involves highly sensitive topics such as ‘race’ or ethnicity; political opinion; religious, spiritual or other beliefs; physical or mental health conditions; sexuality; abuse (child, adult); nudity and the body; criminal activities; political asylum; conflict situations; and personal violence.
Please indicate by inserting an “X” in the left hand box that you are conversant with the University’s policy on the
handling of human participants and their data.
X
We confirm that we have read the current version of the University of Sheffield Ethics Policy Governing
Research Involving Human Participants, Personal Data and Human Tissue, as shown on the University’s
research ethics website at: www.sheffield.ac.uk/ris/other/gov-ethics/ethicspolicy
75
Part B. Summary of the Research
B1. Briefly summarise the project’s aims and objectives: (This must be in language comprehensible to a layperson and should take no more than one-half page. Provide enough information so that the reviewer can understand the intent of the research)
Summary:
Aim of study
The aim of this study is to assess the status of current Research Data Management (RDM) practices of
researchers in Malawi.
Specific objectives
The objectives of this study are:
To understand the characteristics, types, and volumes of the research data being generated by
researchers in Malawi
To assess the methods that the researchers use to store and backup their research data
To understand the researchers’ perceptions on sharing their data assets
To assess the issues they face in the day-to-day management of this data
To understand the support needs they have in order to effectively manage their data
throughout the research life-cycle
To identify the current practices in data preservation beyond the life of the project
B2. Methodology: Provide a broad overview of the methodology in no more than one-half page.
Overview of Methods:
For this study, a web-based self-completion questionnaire will be used to collect data. This will be
based on a modification of Digital Curation Centre’s Data Asset Framework (DAF) methodology10. It is a
questionnaire that asks respondents about the types, formats and maintenance of their research data
throughout the life of the project and the methods for the data’s preservation beyond the life of the
project.
Analysis of the data will be both quantitative and qualitative. Quantitative analysis will involve
calculate percentages, proportions and means. Some information will also be presented in tables and
different types of charts. The ‘other’ and ‘any comments’ fields of the questionnaire will be used to
obtain qualitative data which will help to gauge the perceptions of the respondents
10 http://www.data-audit.eu/docs/DAF_Implementation_Guide.pdf
76
If more than one method, e.g., survey, interview, etc. is used, please respond to the questions in Section C for each method. That is, if you are using both a survey and interviews, duplicate the page and answer the questions for each method; you need not duplicate the information, and may simply indicate, “see previous section.”
C1. Briefly describe how each method will be applied
Method (e.g., survey, interview, observation, experiment):
Description – how will you apply the method? The questionnaire will be designed using Google Drive Forms
and disseminated via email.
About your Participants
C2. Who will be potential participants?
The study population will comprise researchers from College of Medicine and Centre for Social Research both
under the University of Malawi and Lilongwe University of Agriculture and Natural Resources (LUANAR).
C3. How will the potential participants be identified and recruited?
A link to the online questionnaire will be e-mailed to key contacts in those institutions who have agreed to respond
to and disseminate the questionnaire to rightful respondents (researchers) once it is ready.
C4. What is the potential for physical and/or psychological harm / distress to participants?
There is no perceived harm to the participants of this study.
C5. Will informed consent be obtained from the participants?
X Yes
No
If Yes, please explain how informed consent will be obtained?
77 The first page of the questionnaire will contain brief details of the study and what data will be collected and
how confidentially the data will be treated once collected. It will assure them of their freedom to stop
responding at any point. They will be asked to give their consent to responding to the questionnaire, which will
do by clicking a button to proceed to the pages with the questions.
If No, please explain why you need to do this, and how the participants will be de-briefed?
C6. Will financial / in kind payments (other than reasonable expenses and compensation for time) be offered
to participants? (Indicate how much and on what basis this has been decided)
No
About the Data
C7. What data will be collected? (Tick all that apply)
Print Digital
Participant observation
Audio recording
Video recording
Computer logs
Questionnaires/Surveys X
Other:
Other:
C8. What measures will be put in place to ensure confidentiality of personal data, where appropriate?
No personal information will be obtained.
C9. How/Where will the data be stored?
The data will be stored on the iSchool’s Research Data Server where a 10gig share has been allocated.
C10. Will the data be stored for future re-use? If so, please explain
No.
About the Procedure
78 C11. Does your research raise any issues of personal safety for you or other researchers involved in the project
(especially if taking place outside working hours or off University premises)? If so, please explain how it will
be managed.
No.
79
The University of Sheffield. Information School Research Ethics Review Declaration
Title of Research Project: [Research Data Management Practices of Researchers in Malawi: The Case of
Selected Academic Institutions]
We confirm our responsibility to deliver the research project in accordance with the University of
Sheffield’s policies and procedures, which include the University’s ‘Financial Regulations’, ‘Good
Research Practice Standards’ and the ‘Ethics Policy Governing Research Involving Human Participants,
Personal Data and Human Tissue’ (Ethics Policy) and, where externally funded, with the terms and
conditions of the research funder.
In submitting this research ethics application form I am also confirming that:
The form is accurate to the best of our knowledge and belief.
The project will abide by the University’s Ethics Policy.
There is no potential material interest that may, or may appear to, impair the independence
and objectivity of researchers conducting this project.
Subject to the research being approved, we undertake to adhere to the project protocol
without unagreed deviation and to comply with any conditions set out in the letter from the
University ethics reviewers notifying me of this.
We undertake to inform the ethics reviewers of significant changes to the protocol (by
contacting our academic department’s Ethics Coordinator in the first instance).
we are aware of our responsibility to be up to date and comply with the requirements of the
law and relevant guidelines relating to security and confidentiality of personal data, including
the need to register when necessary with the appropriate Data Protection Officer (within the
University the Data Protection Officer is based in CiCS).
We understand that the project, including research records and data, may be subject to
inspection for audit purposes, if required in future.
We understand that personal data about us as researchers in this form will be held by those
involved in the ethics review procedure (e.g. the Ethics Administrator and/or ethics
reviewers) and that this will be managed according to Data Protection Act principles.
If this is an application for a ‘generic’ project all the individual projects that fit under the
generic project are compatible with this application.
We understand that this project cannot be submitted for ethics approval in more than one
department, and that if I wish to appeal against the decision made, this must be done through
the original department.
Name of the Student (if applicable):
Thomas Bello
Name of Principal Investigator (or the Supervisor):
80
Dr. Andrew Cox
Date: 10 June, 2014
81
Appendix A2: Ethics Information Consent Form
The University of Sheffield. Information School
Research Data Management Practices of Researchers in Malawi: The Case of Selected Academic Institutions
Researchers
Thomas Bello [email protected]
Purpose of the research
Clearly state the objective of the research in two to three sentences
This study seeks to understand the characteristics, types, and volumes of research data being
generated by researchers in Malawi and assess the methods that the researchers use to maintain
their active and preserve legacy data. It also aims to understand the researchers’ perceptions
towards data sharing and the issues they face in the day-to-day management of research data.
Who will be participating?
Indicate who will be participating.
For example “We are inviting adults over 18 who have used Facebook in the past two days.”
We are inviting adults over 18 who are researchers in academic institutions in Malawi.
What will you be asked to do?
Indicate what you will ask them to do.
For example, “we will ask you to complete a brief demographics questionnaire so that we have a
profile of our participant group. Then we will conduct a 15 minute interview about when and how you
use Facebook.”
We will ask you to complete a brief demographics section so that we have a profile of our participant
group. The questions that follow after that are about how you manage your research data. The
questionnaire will take no more than 15 minutes to complete.
What are the potential risks of participating?
This will often be “The risks of participating are the same as those experienced in everyday life.” For
some research, you may need to indicate the risk of anonymity being violated, etc.
The risks of participating are the same as those experienced in everyday life
What data will we collect?
82
Be very explicit. Indicate if interviews are audio recorded, if visual observation is used, if participants
are being monitored. In short, stay very clearly what is being collected. For example, “We are audio
recording the interviews, and recording all of your actions when you use the computer in a computer
file.”
We will collect data on your research data management practices by asking you to complete an
online Google Forms questionnaire. Once you submit the form, your answers will be anonymously
collected. No personal data will be collected.
What will we do with the data?
Be very explicit. Only state that it is in a locked cabinet if that is indeed correct. If you propose to re-
use the data in future, again be very explicit. If the data is to be destroyed then say so.
For example, “We will be analyzing the data for inclusion in my masters dissertation. After that point,
the data will be destroyed.”
We will be analyzing the data for inclusion in my masters dissertation. After that point, the data will
be destroyed.
Will my participation be confidential?
Explain how confidentiality will be handled. In some casse, e.g., focus groups or any form of group
activity, anonymity cannot be guaranteed. For example, “We are anonymising the data and coding
the computer files with a random number. No identifying information will be retained.” Or
“Participation is in a focus group with six other people. Our data will be anonymised, but we cannot
guarantee that members of the group will not discuss their participation, although we have requested
that they not do so.”
Your participation in this study will be confidential. There will be no way of knowing who has
responded to the questionnaire because the Google Forms questionnaire confidentiality features will
be enabled so that it does not collect any names or email addresses of respondents.
What will happen to the results of the research project?
State what the plans are and how the participant can receive results. For example, “The results of this
study will be included in my master’s dissertation which will be publicly available. Please contact the
School in six months.” Or “The results of this research will be reported in journal papers; a summary of
the results will be posted to [name a website] or by contacting the primary investigator.
The results of this study will be included in my master’s dissertation which will be publicly available.
Please contact the School in six months.
I confirm that I have read and understand the description of the research project, and that I have had
an opportunity to ask questions about the project.
83
I understand that my participation is voluntary and that I am free to withdraw at any time without
any negative consequences.
I understand that I may decline to answer any particular question or questions, or to do any of the
activities. If I stop participating at all time, all of my data will be purged.
I understand that my responses will be kept strictly confidential, that my name or identity will not be
linked to any research materials, and that I will not be identified or identifiable in any report or
reports that result from the research.
I give permission for the research team members to have access to my anonymised responses.
I give permission for the research team to re-use my data for future research as specified above.
I agree to take part in the research project as described above.
By clicking "Continue", you agree to participate in the study.
Note: If you have any difficulties with, or wish to voice concern about, any aspect of your participation in this study, please contact Dr. Angela Lin, Research Ethics Coordinator, Information School, The University of Sheffield ([email protected]), or to the University Registrar and Secretary.
84
Appendix A3: Ethics Approval
85
Appendix B: Copy of questionnaire
Research Data Management Practices of Researchers in
Malawi
About You Please tell us a little about yourself:
1. What best describes your main research role?
o Principal Investigator/Project Manager
o Member of Research Team/Group
o Independent Researcher
o Research Assistant
o Research Support/Non-academic Staff
o Research Student (PhD or MPhil)
o Other:
2. What is your research group or research active area?
3. Which institution do you work at?
« Back
Continue »
33% completed
Details of your research data For the purpose of this section you should consider the term 'electronic research data' to include all data associated with your projects - this may include numerical data produced by computational experiments, output from experimental equipment, images or audio created from experimental data or data gathered as part of the project or even data collected from surveys relating to the project. 'Research data' do NOT include publications, articles, lectures or presentations. Data that you 'hold' describes any research data that you store anywhere. For example: on a computer, on CDs or on paper.
4. Do you currently hold or have you ever held any research data?
o Yes, I currently hold research data
o Yes, I have held research data in the past
o No
5. Which of the following categories best describe the electronic data created in your field of research? (Please choose all that apply)
o Observational (e.g. video or audio recordings of performances or other primary sources;
photographs of artistic works, historical documents etc. (researcher has a passive role))
86
o Survey/Interview/Focus Group (e.g. quantitative or qualitative responses to survey or
interview questions; oral history accounts (researcher has an active role))
o Experimental (e.g. spectrometry results)
o Simulated (e.g data from a engineering model)
o Derived (e.g data from interrelating survey data)
o Reference (e.g data cataloguing/describing other datasets)
o Other:
6. What types of research data do you hold (e.g. laboratory notes, image collections, transcripts etc.)? (Please select all that apply)
o Data automatically generated from or by computer programs
o Data collected from sensors or instruments (including questionnaires)
o Laboratory notes
o Scans or x-rays
o Slides
o Patient records
o Physical specimens
o Image/photo collections
o Websites
o MS Word files
o Spreadsheets (e.g. Excel)
o SPSS files
o Digital audio files
o Digital video files
o Video tapes
o Audio tapes
o Fieldwork data
o Text corpus
o Documents or reports
o Transcripts
o Other:
7. What are the principal media on which your research data are stored (not including backups)? (Please select all that apply)
o Hard disk drive of computer on campus
87
o Hard disk drive of computer off campus
o Hard disk drive of laptop/netbook
o Hard disk drive of instrument/sensor which generates data
o External hard drive
o Shared drive/server (e.g. University server)
o Third party (including commercial data storage)
o Web-based service (e.g. Google Docs, Flickr, Box.net, Dropbox, Pando etc. (please
specify under 'Other')
o CD/DVD
o USB/Flash drive
o Email client/server
o VHS/Video Cassette
o Cassette Tape (Audio)
o Photograph
o Slides
o Microfiche
o On paper
o Other:
8. What formats/software do you use for your electronic research data? (Please select all that apply)
o Documents
o Spreadsheets
o Databases
o Images
o Audio
o Video
o Websites
o Emails (not including other formats attached to emails)
o Unique program/simulation written specifically for project
o Other:
8a. If you store data in databases, please select the primary program you use:
88
MS
Access
.mdb
OpenOffice
.odb SPSS Oracle MySQL NVivo Other
Program 8b. If you store data as images, please select the primary format you use:
.jpg/.jpeg .gif .tiff .bmp Adobe
Adobe
.ai .svg Other
Format 8c. If you store data as audio, please select the primary format you use:
.mp3 .wav .wma
Olympus
dictaphones
.dss
Other
Format 8d. If you store data as video, please select the primary format you use:
.avi .mpeg .wmv Flash .swf Quicktime
.mov Other
Format 8e. If you have selected 'Other' for any of the questions 8a-8f, please give details of the software or formats you use:
9. Please estimate how much electronic research data you currently hold/maintain.
o < 1 GB
o 1 - 50 GB
o 50 - 100 GB
o 100- 500 GB
o 500 GB - 1 TB
o 1 - 50 TB's
o 50 - 100 TB's
o > 100 TB's
o Don't know
« Back
Continue »
50% completed
89
Research data storage
10. Do you currently have a data management plan for your research data (for example, data preservation policy, record management policy, data disposal strategy)?
Yes No Don't know
10a. If yes, what was the main driver for developing your strategy?
o Research requirement to access/analyse/annotate others' data
o Requirement of project funder
o Size of project team (i.e. multiple data creators)
o Volume of data associated with project
o Complexity of data associated with project (e.g. multiple formats)
o Absence of university data management policy
o Other:
10b. If no, please confirm why.
o Not required / appropriate to field of research or research group
o Not required by project funder
o Time and effort required
o Lack of training / expertise within research group
o Lack of local support / guidance (e.g. Central Library, ICT)
o Absence of university data management policy
o Don't know
o Other:
11. Who, if anyone, is responsible for managing your electronic research data? (Please select all that apply)
o Myself (select other options only if they are not you)
o Research Project Manager
o Research Assistant
o Research Technician
o PhD Student
o Other designated person in Research Group
o Departmental IT Officer
o Central ICT
o Local Data Centre
o National data centre / data archive
90
o International data centre / data archive
o Don't know
o No one
o Other:
11a. If you use any external data centre or archive, please give details:
12. Have you ever lost research data which was not backed up? (Please select all that apply)
o No
o Yes, through hardware failure
o Yes, through software failure
o Yes, through human error or loss
o Other:
13. Have you ever experienced any problems storing your research data due to the size of the files?
Yes No
13a. If yes, please give details:
« Back
Continue »
66% completed
Research Data Backup
14. On average, how frequently is your data backed up?
o Daily
o Weekly
o Monthly
o Annually
o Ad hoc
o Never
o Dont' know
14a. What data tends to be backed up?
o Everything
o Data critical to project
o Data required for publication
91
o Don't know
14b. Where are they backed up? (Please select all that apply)
o Hard disk drive of computer on campus
o Hard disk drive of computer off campus
o Hard disk drive of laptop/netbook
o Hard disk drive of instrument/sensor which generates data
o External hard drive
o Shared drive/server (e.g. University server)
o Third party (including commercial data storage)
o Web-based service (e.g. Google Docs, Flickr, Box.net, Dropbox, Pando etc. (please
specify under 'Other')
o CD/DVD
o USB/Flash drive
o Email client/server
o Floppy Disk
o VHS/Video Cassette
o Cassette Tape (Audio)
o Photograph
o Slides
o Microfiche
o On paper
o Don't know
o Other:
15. If the service was offered, would you want your university's repository to store any of your research data, either for your exclusive use or for wider access? The hypothetical repository would offer to store whatever research data researchers volunteer (and possess the appropriate rights to volunteer) with a retention period of their choosing. The files would be stored securely with accessibility limited by default to only the researcher in question. The researcher would have the option of widening access anywhere from specific other users to full public open access. The repository would, therefore, provide separate, voluntary facilities for: long-term storage, backups, sharing of data for collaboration purposes with colleagues, and open access. The repository would offer facilities aimed at meeting stricter requirements now made by many funding bodies.
o Yes
o No
16. If yes, how long would you want the repository to retain any of your research data, including data only accessible by you?
92
None of my
data
Some of
my data
Much of my
data All of my data
Not at all Until the end of
the project
For a finite period
after end of project
Until I leave the
University
In perpetuity
« Back
Continue »
83% completed
Research data sharing
17. Who owns the research data you hold?
o I own all of the data I hold
o I own some of the data I hold
o I own none of the data I hold
o Don't know
18. Do you share ownership of any of your research data with others? (Please select all that apply)
o No
o Yes, with other academics/researchers
o Yes, with journals/publishers
o Yes, with funding bodies
o Other:
19. How do you currently share research data with colleagues? (Please select all that apply)
o I never share data with colleagues
o E-mail
o Shared computer
o Shared drive/server (e.g. University server)
o Using portable storage (e.g. CDs, DVDs, external hard drive, memory sticks etc.)
o Web-based service (e.g. Google Docs, Flickr, Box.net, Dropbox, Pando etc. (Please
specify under 'Other'))
o On paper
o Other:
20. What problems have you encountered when sharing data with colleagues?
93
(Please select all that apply)
o Finding suitable shared storage space
o Lack of file naming conventions made it difficult to identify files
o Lack of version control caused confusion
o Legal issues arising from international transfer of data
o Problems establishing ownership of data
o Time consuming to keep all colleagues constantly up to date
o I have not encountered problems
o Other:
21. Apart from yourself, who would you want to be allowed access to your research data?
None of my
data
Some of my
data
Much of my
data
All of my
data
My
colleagues
My
colleagues
My school The whole
university
Specified
academic
communities
beyond the
university
Anyone
(including
general
public)
22. What factors would prevent your research data from being made open access to the general public? (Please select all that apply)
o None
o I do not believe the public would have any use for some of my data
o I do not have the ownership rights to share all of my data
o Data have commercial value
o Funder restrictions
o Data are not ready to be released/concern unpublished work
o Protect own ideas or intellectual property
o Data contain personal information/have not been anonymised
o Ethics requirements of university/funder
94
o Other:
23. Have you ever applied for funding from a body that required some degree of open access to be provided for your research data?
Yes No Don't know
23a. If yes, please state funder and give details:
23b. Have you ever experienced difficulties in meeting these requirements?
o No
o Yes, but I have always been able to meet the requirements
o Yes, as a result I was unable to obtain funding through this body
o Yes, and I need training and guidance
Conclusion
24. Do you have any specific concerns over the current management of your research data or services you would like to see offered by your university to guarantee access to this data in the future?
End of questionnaire Thank you for taking the time to complete this survey. Your contribution is very much appreciated.
« Back
Submit
Never submit passwords through Google Forms. 100%: You made it.
95
Appendix C – Additional Survey Results
Figure C1: Research data categories by discipline
Appendix C1: Formats/software for research data
In terms of formats/software researchers are using for their data, the results show that
spreadsheets and documents are used equally, together accounting for 50% of all the formats
used followed by databases at 15%. Images, emails, audio, websites and video are being used
in low proportions. Figure C2 provides a summary of these formats/software.
0 1 2 3 4 5 6 7 8 9
Agricultural Sciences
Engineering & Architecture
Humanities
Science &Technology
Law
Business & Management Science
Medicine & Health
Social Sciences
Reference Derived Simulated Experimental Survey/Interview/Focus Group Observational
96
Figure C2: Responses to question 8 “What formats/software do you use for your
electronic research data?”
Appendix C2: Common database software
29 out of the 34 respondents indicated that they store data in databases. Figure C3 shows that
more than half of these use SPSS and about one quarter use MS Access. NVivo, MySQL and
OpenOffice are used by less than 10% of the respondents each while 10% indicated that they
use ‘other’ database software such as STATA and Microsoft Excel.
Figure C3: Responses to Question 8a “If you store data in databases, please select
the primary program you use:”
2%
2%
4%
4%
5%
8%
9%
15%
25%
25%
Video
Other
Websites
Unique program/simulation written specifically for project
Audio
Emails (not including other formats attached to emails)
Images
Databases
Documents
Spreadsheets
97
Appendix C3: Image formats
Use of image data seems to be popular with 22 participants (65%) indicating that that they
store data as images. Figure C4 shows that nearly three quarters of these indicated that the
format they use is ‘.jpg/.jpeg’ followed by 14% who indicated ‘Adobe .pdf’ as the format
they use with ‘.tiff’ being used by only 5% of the respondents and 9% saying that they use
‘other’ formats such as ‘post script (ps); and encapsulated post script (eps)’, ‘*.shp; geotiff;
*.ai depending on image types’.
4%3%
7%
10%
24%
52%
OpenOffice .odb
MySQL
NVivo
Other
MS Access .mdb
SPSS
73%
4%
14%
9%
Figure C4: Primary format of images
.jpg/.jpeg
.tiff
Adobe .pdf
Other
98
Appendix C4: Audio formats
Of those who indicated that they store image data, Figure C5 shows that the majority (67%)
use ‘.mp3’ as the primary format and close to a fifth use ‘.wma’ while less than 10% use
‘.wav’. Close to a tenth of them indicated that they use other audio formats.
Appendix C5: Video formats
Of all the respondents, 13 representing 38% reported that they store video data. Figure C6
shows that ‘.mpeg’ is used by approximately half of them. The ‘.avi’, ‘.wmv’ and ‘Flash
.swf’ formats are used by a similar proportion of 15% of the respondents each while less than
a tenth of the respondents primarily use other video formats such as ‘.MP4’.
67%
17%
11%
5%
Figure C5: Primary format of audio
.mp3
.wma
Other
.wav
46%
16%
15%
15%
8%
Figure C6: Primary format of video
.mpeg
.avi
.wmv
Flash .swf
Other
99
Appendix C6: Other applications
Some of the software that participants are using in their different fields is summarised in
Table C1 below.
Table C1: Other file formats/software being used by respondents and their areas of
application
Format/Software Area of application
post script (ps); and encapsulated post script (eps) Mathematical Sciences
SSH Genomics
Computer aided design software Architecture
*.shp; geotiff; *.ai Geosciences
STATA Population studies
Appendix C7: Data that is backed up
Data critical to research projects tends to be backed up more (43%) than the rest followed by
every type of data (38%) and then data required for publication at 14%. Figure C7
summaries these findings.
Figure C7: 14a. What data tends to be backed up?
Everything38%
Data critical to project
43%
Data required for publication
14%
Don't know5%
100
101
Appendix C8: Researchers’ specific concerns
Researchers in Malawi express various concerns over the current management of their
research data or services they would like to see offered by their universities to guarantee
future access to the data. Their responses have been categorised into different themes.
Theme 1: Policy and Storage issues
“Need system of data management and secure server in the department”
“At the moment my storage of research data at my UNI is on a personal basis. I don't know if
there's a data management policy, I will have to check but I think it will nice to have one”
Theme 2: Concerns of data theft
“In most cases there is element of data theft, mainly between IT personnel and the data
'hunters'. Other don’t mind other people's effort and energy engaged in data collection
especially in its raw form. Once published then it can be made public”.
Theme 3: Investment / infrastructure / sharing /access / storage
“My university needs to invest more in ICT access to make it possible to start comfortably
sharing data”
“It would be useful if research data mainly Theses were posted online through institutionally
controlled access for easy access by those interested both nationally and internationally”.
“I would love to have a university central server where I ca deposit my data and be able to
retrieve my data when I am within or outside campus including outside the country”.
Theme 4: Connectivity issues
“The most serious problem is that internet services are poor thereby affecting public access to
some data that we would want to share”.
“Yes, there is a serious challenge with internet connectivity at Polytechnic. Secondly our
publications do not appear in full on our website”.
Theme 5: Training / awareness issues
“Lack of knowledge about data management and there seems to be no-one who minds to
offer some enlightenment on the same”.
Theme 6: Perceived administrative issues
“Management taking too long to put things in place”.
102
Appendix D – Letter of introduction
103
104
Access to Dissertation
A Dissertation submitted to the University may be held by the Department (or School) within which
the Dissertation was undertaken and made available for borrowing or consultation in accordance
with University Regulations.
Requests for the loan of dissertations may be received from libraries in the UK and overseas. The
Department may also receive requests from other organisations, as well as individuals. The
conservation of the original dissertation is better assured if the Department and/or Library can fulfill
such requests by sending a copy. The Department may also make your dissertation available via its
web pages.
In certain cases where confidentiality of information is concerned, if either the author or the
supervisor so requests, the Department will withhold the dissertation from loan or consultation for
the period specified below. Where no such restriction is in force, the Department may also deposit
the Dissertation in the University of Sheffield Library.
To be completed by the Author – Select (a) or (b) by placing a tick in the appropriate box
If you are willing to give permission for the Information School to make your dissertation available in
these ways, please complete the following:
X (a) Subject to the General Regulation on Intellectual Property, I, the author, agree to this dissertation being
made immediately available through the Department and/or University Library for consultation, and for
the Department and/or Library to reproduce this dissertation in whole or part in order to supply single
copies for the purpose of research or private study
(b) Subject to the General Regulation on Intellectual Property, I, the author, request that this dissertation be
withheld from loan, consultation or reproduction for a period of [ ] years from the date of its
submission. Subsequent to this period, I agree to this dissertation being made available through the
Department and/or University Library for consultation, and for the Department and/or Library to
reproduce this dissertation in whole or part in order to supply single copies for the purpose of research
or private study
Name: Thomas Mphatso Bello
Department: Information School
Signed: Thomas Mphatso Bello Date 27 August 2014
To be completed by the Supervisor – Select (a) or (b) by placing a tick in the appropriate box
105
(a) I, the supervisor, agree to this dissertation being made immediately available through the Department
and/or University Library for loan or consultation, subject to any special restrictions (*) agreed with
external organisations as part of a collaborative project.
*Special
restrictions
(b) I, the supervisor, request that this dissertation be withheld from loan, consultation or reproduction for a
period of [ ] years from the date of its submission. Subsequent to this period, I, agree to this
dissertation being made available through the Department and/or University Library for loan or
consultation, subject to any special restrictions (*) agreed with external organisations as part of a
collaborative project
Name
Department
Signed Date
THIS SHEET MUST BE SUBMITTED WITH DISSERTATIONS BY DEPARTMENTAL REQUIREMENTS.