Abstract

Clickbaits are posts that aim to exploit the natural curiosity of humans by providing incomplete or incorrect information to get users to visit the full posts that typically consist of sensationalized or misleading information. The information provided is just enough to incite their interest. Among various attempts to identify and curb the practice of clickbaits, it can be useful to identify a highly successful method. A clickbait-free environment will help users have a better time browsing the internet. We draw inspiration from the Clickbait Challenge of 2017 in which multiple approaches were proposed to detect clickbaits using machine learning and deep learning techniques. Within these approaches, we find a plethora of features that were used in the models. To the best of our knowledge, a systematic study of these features and their correlation with each other has never been done before. We aim to identify a trade-off between the number of features and the performance of the model to facilitate faster processing without losing much performance. With this knowledge, a more accurate and faster clickbait model can potentially be deployed that would help improve user experience online in real-time. It can also facilitate better research opportunities in the clickbait detection domain.