Clickbait Detection: Using NLP and Machine Learning for Identifying Deceptive Content

Slide Note
Embed
Share

Explore the realm of clickbait through a detailed investigation into identifying and combating misleading content online. With initiatives like the Clickbait Challenge and innovative feature analysis, researchers aim to enhance algorithms and classifiers for accurate detection. Preliminary results show promising outcomes with a significant improvement over baseline classifiers, paving the way for more advanced methods in combating clickbait. Utilizing natural language processing and machine learning techniques, this research delves into the intricate world of deceptive content to safeguard online users.


Uploaded on Sep 30, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. Clickbait Detection using Natural Language Processing and Machine Learning By Varun Shah Advisors: Kristina Striegnitz and Nick Webb

  2. What is Clickbait? Source: Google Images

  3. The Clickbait Challenge! Organizers: M. Potthast, T. Gollub, B. Stein and M. Hagen Competition to build a clickbait detector using provided Twitter data Each post is judged by 5 annotators. Source: http://www.clickbait-challenge.org/

  4. Data Small data set: # Posts # Clickbait # No-clickbait 2459 762 1697 Big data set: # Posts # Clickbait # No-clickbait 19538 4761 14777

  5. Attributes: Post Text SlideHunter Some people are such food snobs. http://link.com/ 2 hours ago Post Text Tweet

  6. Attributes: Title of Linked Article Actual title of the article linked in post text Source: http://proyectoportal.com/

  7. Attributes: Truth Class No-clickbait Clickbait 0 200 400 600 800 1000 1200 1400 1600 1800

  8. Preliminary Results Baseline classifier: ZeroR Attributes in model: Post Text and Article Title Classifier Classification Accuracy ZeroR 50.0% RandomForest 74.9085%* *statistically significant at 95%

  9. Added Features: Use of Superlatives Is LeBron James NBA Finals performance the best ever? http://link.com/ Does given post have a superlative? Superlative? Yes No # Clickbait 60 702 # No-clickbait 87 1610

  10. Added Features: Use of Numbers 5 incredible Italian dishes you haven t tried before. http://link.com/ Does given post have a number? Number? Yes No # Clickbait 191 571 # No-clickbait 416 1281

  11. Added Features: Number of Words Man dies when car plunges from parking garage. http://link.com/ 8 Words How many words are in the post text? Overall Mean Clickbait No-clickbait 12.662 11.901 13.002

  12. Added Features: Similarity between Post Text and Title of linked article Post Text Article Title In example, # Overlaps = 4 Overall Mean 3.997 Clickbait 3.150 No-clickbait 4.378

  13. Added Features: POS Ratio Determiner + Adjective + Singular Noun + Plural Noun + 3rd Person Verb These global warming skeptics have POS Sequence POS Ratio = #Sequence in Clickbait / #Sequence in All In example, POS Ratio = 0.8698 Minimum 0 Maximum 1 Mean 0.412 Std. Dev. 0.326

  14. Results Model tested on (unbalanced) big data set Note: ZeroR (baseline) = 75.8553% Attributes in Model Accuracy Post Text + Article Title + #Words + Overlap Post Text + Article Title + POS Ratio 82.2864% 86.6860% Post Text + Article Title + #Words + Overlap + POS Ratio 88.2051%

  15. Conclusion and Future Work Improved model to achieve a classification accuracy of 88.2051% Identified features that help detect clickbait For the future: - Image analysis - # Ads on article webpage

  16. Thank you! Questions?

Related


More Related Content