minaret : a recommendation framework for scienti c reviewers · mohamed r. moawad, mohamed maher,...
TRANSCRIPT
MINARET: A Recommendation Framework for ScientificReviewers
Mohamed R. Moawad, Mohamed Maher, Ahmed Awad, Sherif SakrUniversity of Tartu, Estonia
{mohamed.ragab,mohamed.abdelrahman,ahmed.awad,sherif.sakr}@ut.ee
ABSTRACTWe are witnessing a continuous growth in the size of scien-tific communities and the number of scientific publications.This phenomenon requires a continuous effort for ensuringthe quality of publications and a healthy scientific evalu-ation process. Peer reviewing is the de facto mechanismto assess the quality of scientific work. For journal editors,managing an efficient and effective manuscript peer reviewprocess is not a straightforward task. In particular, a maincomponent in the journal editors’ role is, for each submit-ted manuscript, to ensure selecting adequate reviewers whoneed to be: 1) Matching on their research interests withthe topic of the submission, 2) Fair in their evaluation ofthe submission, i.e., no conflict of interest with the authors,3) Qualified in terms of various aspects including scientificimpact, previous review/authorship experience for the jour-nal, quality of the reviews, etc. Thus, manually selectingand assessing the adequate reviewers is becoming tediousand time consuming task.
We demonstrate MINARET, a recommendation frameworkfor selecting scientific reviewers. The framework facilitatesthe job of journal editors for conducting an efficient andeffective scientific review process. The framework exploitsthe valuable information available on the modern scholarlyWebsites (e.g., Google Scholar, ACM DL, DBLP, Publons)for identifying candidate reviewers relevant to the topic ofthe manuscript, filtering them (e.g. excluding those withpotential conflict of interest), and ranking them basedon several metrics configured by the editor (user). Theframework extracts the required information for the rec-ommendation process from the online resources on-the-flywhich ensures the output recommendations to be dynamicand based on up-to-date information.
1 INTRODUCTIONThe world is witnessing a continuous growth in the sizeof scientific communities and the number of scientific pub-lications. With the current rates, it is expected that theglobal scientific output doubles every nine years1. For ex-ample, Figure 1 shows the statistics of the popular DBLPindexing services for computer science publications2. Inparticular, the DBLP library is currently indexing over3.8𝑀 publications. Out of these publications, the numberof journal articles published in 2018 is about 120𝐾 articles.In 2017, Elsevier journals received 3919 submissions, out1http://blogs.nature.com/news/2014/05/global-scientific-output-doubles-every-nine-years.html2source: https://dblp.uni-trier.de/statistics/newrecordsperyear
© 2019 Copyright held by the owner/author(s). Published in Proceed-ings of the 22nd International Conference on Extending DatabaseTechnology (EDBT), March 26-29, 2019, ISBN 978-3-89318-081-3on OpenProceedings.org.Distribution of this paper is permitted under the terms of the Cre-ative Commons license CC-by-nc-nd 4.0.
of which 530 were accepted with acceptance rate of 14%3.This situation raises a crucial need for continuous efforts toensure and improve the quality of the scientific publicationprocess.
In general, peer reviewing is a widely accepted practiceto assess the quality of scientific publications [5, 9]. In thisprocess, the selection of appropriate reviewers for evalu-ating the submitted manuscript is a significant step. Forexample, selecting reviewers without adequate knowledgein the topic of the submissions or selecting inexperiencedreviewers would lead to poor reviews that harm the qualityof the publishing venue, the scientific community, and theauthors of the manuscript [3]. In addition, it is crucial toavoid any reviewers with potential conflict of interest [10].In practice, assigning reviewers for a conference submissionis less challenging than assigning a reviewer for a journalsubmission. In particular, the universe of reviewers is closedwhere it is limited to the program committee (PC) mem-bers who are commonly selected based on their experienceand reputation in the scientific field of the conference. Inaddition, accepting the membership for the PC of a sci-entific conference indicates explicit commitment for theassigned review workload of the conference within the de-fined review deadline. Moreover, conference managementsystems provide a bidding process where each PC membershould select the submissions that he/she would like toreview. Thus, with this setup and conditions, it is possibleto automate the paper-reviewer assignment task [2, 3, 8].
The manuscript review process for a journal submissionis different. In particular, the universe of reviewers is openwith various aspects of uncertainty. Thus, it depends muchon the editor’s experience, effort and professional networkto select the adequate reviewers for a submitted manu-script. For example, there is no pre-defined agreement orarrangement between the journal and a set of committedreviewers. In particular, the manuscript review is a totallyvoluntary work with the only incentive of having a mutualbenefit when the volunteering reviewer is an author of an-other manuscript submission that needs to be reviewed inthe same or other journals. In addition, reviews’ deadlinesare soft constraints that are not obligatory for the reviewer.Thus, in order to achieve efficient and effective review pro-cess, it is the role of the editor to choose the reviewers thatshould at least cover the following main criteria: 1) Havematching research interests with the topic of the submittedmanuscript, 2) Fair in their evaluation of the manuscriptwith no potential conflict of interest, 3) Have a good rankaccording to the editor’s preferences in various criteria [4].In practice, the first point can be managed by the editors’experience with the scholars of the scientific communityof the journal and by browsing the profiles for candidate
3source:https://journalinsights.elsevier.com/journals/0142-9612/acceptance_rate
Demonstration
Series ISSN: 2367-2005 538 10.5441/002/edbt.2019.50
10/16/2018 dblp: New Records per year
https://dblp.uni-trier.de/statistics/newrecordsperyear 1/2
> Home > Statistics
New Records per year
This diagram shows the number of new entries in the database per publication type and year (year when the entry has been added to the database).
Download diagram as png | Download data as csv
Created on 15/10/2018 06:51:48
last updated on 2018-10-15 22:18 CEST by the dblp team
search dblp
Statistics - New Records per year
Trier 1
Year
#Records
New Records per year
Journal ArticlesBook and ThesesData and ArtifactsEditorshipParts in Books or CollectionsInformal PublicationsConference and Workshop PapersReference WorksWithdrawn Items
19901995
20002005
20102015
20200k
100k
200k
300k
400k
500k
maintained by at |home |browse |search about
Figure 1: Statistics of DBLP library content
reviewers. However, the second point would require investi-gating the track record for both the authors and reviewersfor discovering any potential for conflict of interest (e.g.,co-authorship, having current/previous similar affiliations,..., etc). Manually exploring and investigating this infor-mation would be a tedious and time-consuming task forthe editors. The third point involves various aspects thatneed to be considered. For example, inviting a high-profilereviewer who happens to be quite busy will do nothingbut delaying the review process as she might not replyto the invitation in a timely manner, simply reject it oraccepting the invite and sending the review very late. Suchselections may increase the turnaround time for making thedecision on the submitted manuscript. Another aspect thatcan be considered is the history of review activities of thecandidate reviewers and the quality of their reviews. Thus,it is crucially required that the reviewer selection processstrikes a balance between these criteria and aspects.
Recently, we have been witnessing a continuous increasein the number of Websites and services that provide compre-hensive scholary information. For example, DBLP providesthe list of publications for a given author, Google scholarprovides information about important metrics for the sci-entific impact (e.g., citation count, H-index, i10-index),Publons4 provides information about the reviewing activi-ties that have been conducted by a scholar. In this demon-stration, we present MINARET, a recommendation frameworkfor choosing scientific reviewers of a given manuscript in-formation. The framework facilitates the job of journaleditors for conducting an efficient and effective scientificreview process by dynamically exploiting and integratingthe available information on scholarly Websites on-the-fly.Within the search process, keywords representing the sub-mission are semantically expanded to provide a wider rangeof related reviewers as candidates. Extracted informationabout the candidate reviewers are used to automaticallyexclude those with potential conflict of interests with theauthors of the manuscript. Finally, the list of reviewers isranked based on various criteria including the experienceof the reviewer, recency of his familiarity with the topic ofthe submission, likelihood to accept and timely return hisreview, h-index, etc. The weight of these criteria is flexibleto be configured by the users of the framework (editors).
4https://publons.com/
2 REVIEWER RECOMMENDATIONFigure 2 shows the workflow of our recommendation frame-work. Using the basic information of the submitted manu-script (e.g., keywords, authors list and their current affili-ation), the recommendation workflow goes through threemain phases: information extraction, filtration, and rankingof the candidate reviewer list.
2.1 Information ExtractionThe information extraction phase consists of the followingmain steps:
∙ Verification of authors’ identities: This step is con-cerned with the disambiguation of authors’ names [1,6, 7]. For example, in the far east, many scholars mayshare one of the popular names5. The identificationof the correct author profile is crucial as it influencesthe accuracy of the collected information. We usevarious services (e.g., DBLP, Google Scholar, ACM)to gather the information about the author list. Incase of multiple matches, the user has to manuallyidentify the correct profiles for the author list amongthe returned matches.
∙ Extracting the track records of the author list: Thisstep focuses on extracting information about the pub-lications list and affiliation history of the author listusing multiple services (e.g., DBLP, Google Scholar,ACM, Publons). Extracting the authors’ track recordis particulary important to allow discovering any po-tential for conflict of interest.
∙ Retrieval of candidate reviewers’ profiles: The maindriver for candidate reviewers search is the list ofkeywords supplied as part of the manuscript details.Usually, this list contains three to five keywords de-fined by the authors to describe the research topicof their submission. To widen the search space ofcandidate reviwers, we employ a semantic keywordexpansion. For this purpose, as our demonstration isfocusing on the computer science community, we relyon an ontology of computer science topics6. Eachrelevant expanded keyword is assigned a similarityscore 𝑠𝑐 ∈
[0, 1
]that defines the relevance between
the returned keyword from the ontology and theoriginal keyword. For example, if one of the manu-script’s keywords is “RDF”, the expansion module
5https://dblp.uni-trier.de/pers/hd/z/Zhou:Lei6https://cso.kmi.open.ac.uk/downloads
539
Enter Paper Details & Selection Criteria
Crawl Authors/Reviewers DataVerify Authors
Data
Collect List of Reviewers from paper
keywords list
Keyword Expansion for
paper
Topics Ontology
Filter Reviewers
COIs
Rank Reviewers Candidate Reviewers
Crawl Journal/Conf. Data
Topic Relevance Matching
Experience
Figure 2: MINARET workflow
would return “Semantic Web”, “Linked Open Data”,and “SPARQL” as semantically related keywordsamong its results. Using the expanded keywords list,we retrieve the scholars who are registering thesekeywords as research interests by querying multipleservices (e.g., Google Scholar and Publons).
MINARET is currently implemented to extract the in-formation from six main sources: Google Scholar, DBLP,Publons, ACM DL, ORCID and ResearcherID. However, theframework is flexibly designed to include any further infor-mation from any additional scholary resource.
2.2 FilteringIn this phase, the list of candidate reviewers which is re-turned from the expanded keyword-based reviewer searchgets filtered using the following conditions:
∙ Conflict of interest: COI is determined by checkingthe extracted profile information for both of theauthor list and candidate reviewers and based onthe existence of a previous co-authorship betweenthe candidate reviewer and one of author list or theexistence of any shared affiliations on the level of theuniversity or country, as configured by the editor.
∙ Keyword matching score: The editor can specify athreshold on the similarity score between the ex-panded keywords and those attached to the reviewers’profiles. Candidate reviewers with matching scorebelow the defined threshold are filtered out.
∙ A set of expertise constraints defined by the editor :The user/editor can specify filtering out some of thecandidate reviewers based on various user-definedfiltering criteria (e.g., the range of number of citations/ H-index, the number of previous review activities)
2.3 RankingThe last phase of our workflow is ranking the candidatereviewers. MINARET ranks the list of reviewers by means ofa score which is computed as a weighted sum that fusesthe following components:
∙ Topic coverage: Represents the reviewer’s coveragescore for the keywords of the submitted manuscripts.For example, if the paper keywords were “SemanticWeb” and “Big Data” and we have two recommendedreviewers with fields of interest as “Semantic Web,
Ontologies, RDF”and “Semantic Web, Big Data”,respectively. MINARET gives the second reviewer ahigher rank than the first, because the second re-viewer covers more topics/keywords of the paper andtherefore is more related to the paper.
∙ Scientific impact: This component is based on thenumber of citations/H-index of the reviewer, as con-figured by the user. Clearly, the higher the numberof citations/H-index, the higher the rank.
∙ Recency: Reviewers who have recently authored pa-pers in the topic of the reviewed manuscript areranked higher than others with less recent publica-tions in the topic [5].
∙ Experience with manuscript reviewing: This compo-nent is based on the total number of manuscriptreviews that is previously conducted by the candi-date reviewer. This information is obtained from thePublons profile of the candidate reviewer.
∙ Familiarity/Activity with the target outlet: The re-viewer’s familiarity score is calculated based on twosub-components. The first is the number of pre-vious reviewers that are conducted by the candi-date reviewer for the target outlet. The second sub-component is how many times this reviewer has pub-lished papers in this journal.
MINARET allows the user to configure the weights of thedifferent components for computing the final ranking scorefor the candidate reviewers.
3 DEMO SCENARIOMINARET is available both as a Web application7 as well asRESTful APIs8. In this demonstration9, we will present tothe audience the workflow and the phases of the MINARET framewrok(Figure 2). We start by introducing to the audience thechallenges we tackle, the main goal and the functionalitiesof our framework. Then, we take the audience through thereviewers recommendation process for sample manuscripts.We start by completing the manuscript details form (Fig-ure 3) with the basic information including authors’ names,
7https://bigdata.cs.ut.ee/minaret/8The source code of the MINARET framework is available on https://github.com/DataSystemsGroupUT/Minaret9A demonstration screencast is available on https://www.youtube.com/watch?time_continue=7&v=rNHTqdY6GuI
540
Figure 3: Screenshot of adding paper details page
Figure 4: Verification of authors identities
authors’ current affiliations, submission topics/keywords,target journal in addition to any user-defined filters for thetarget reviewers (e.g, citation range, H-index range). Next,we will show how MINARETchecks and verifies authors names(Figure 4). After this, we will show how MINARET queriesthe different scholary sites for extracting the required in-formation (Section 2) for candidate reviewers. Next, wewill continue with the stage of reviewers’ filtration andranking till returning the final results (Figure 5) where thecomputed score of each reviewer is shown. By clicking onthe total score, score details for each ranking componentwill be displayed.
While MINARET is designed for tackling the more challeng-ing case of recommending reviewer for journal submissions.It can be also integrated with conference management sys-tems to automate the paper-reviewer assignment. In thatcase, the list of programme committee members can beused as a further filter. Thus, only candidate reviewers whobelong to the programme committee are retained.
Figure 5: Example recommended reviewers
ACKNOWLEDGMENTThis work is funded by the European Regional DevelopmentFunds via the Mobilitas Plus programme (grant MOBTT75)
REFERENCES[1] Jinseok Kim. 2018. Evaluating author name disambiguation
for digital libraries: a case of DBLP. Scientometrics 116, 3(2018).
[2] Ngai Meng Kou et al. 2015. A Topic-based Reviewer AssignmentSystem. PVLDB 8, 12 (2015).
[3] C. Long, R. C. Wong, Y. Peng, and L. Ye. 2013. On Good andFair Paper-Reviewer Assignment. In ICDM.
[4] Jennifer Nguyen et al. 2018. A decision support tool usingOrder Weighted Averaging for conference review assignment.Pattern Recognition Letters 105 (2018).
[5] Hongwei Peng et al. 2017. Time-Aware and Topic-Based Re-viewer Assignment. In DASFAA.
[6] Jie Tang. 2016. AMiner: Mining Deep Knowledge from BigScholar Data. In WWW ’16 Companion.
[7] Jie Tang et al. 2008. ArnetMiner: Extraction and Mining ofAcademic Social Networks. In KDD.
[8] Kai-Hsiang Yang et al. 2009. A Reviewer RecommendationSystem Based on Collaborative Intelligence. In WI-IAT.
[9] Shu Zhao et al. 2018. A novel classification method for paper-reviewer recommendation. Scientometrics (2018).
[10] Indre Žliobaite and Mikael ForteliusUniversity. 2016. Reviserules on conflicts of interest. Nature 539 (2016), 10.
541