Predcheck: Detecting predatory behaviour in scholarlyworld
Source
Proceedings of the ACM IEEE Joint Conference on Digital Libraries
ISSN
15525996
Date Issued
2020-08-01
Author(s)
Bedmutha, Manas Satish
Modi, Kaushal
Patel, Kevin
Jain, Naman
Singh, Mayank
Abstract
High solicitation for publishing a paper in scientific journals has led to the emergence of a large number of open-access predatory publishers. They fail to provide a rigorous peer-review process, thereby diluting the quality of researchwork and charge high article processing fees. Identification of such publishers has remained a challenge due to the vast diversity of the scholarly publishing ecosystem. Earlier works utilises only the objective features such as metadata. In this work, we aim to explore the possibility of identifying predatory behaviour through text-based features. We propose PredCheck, a four-step classificaton pipeline. The first classifier identifies the subject of the paper using TF-IDF vectors. Based on the subject of the paper, the Doc2Vec embeddings of the text are found. These embeddings are then fed into a Naive Bayes classifier that identifies the text to be predatory or non-predatory. Our pipeline gives a macro accuracy of 95% and an F1-score of 0.89.
Subjects
Classification | Open access journals | Predatory journals
