Clickbait Detection: Using NLP and Machine Learning for Identifying Deceptive Content

Clickbait Detection using Natural
Language Processing and
Machine Learning
By Varun Shah
Advisors: Kristina Striegnitz and
Nick Webb
What is Clickbait?
Source: Google Images
The Clickbait Challenge!
Organizers: M. Potthast, T. Gollub, B. Stein and
M. Hagen
Competition to build a clickbait detector using
provided Twitter data
Each post is judged by 5 annotators.
Source: 
http://www.clickbait-challenge.org/
 
Data
Small data set:
Big data set:
SlideHunter
SlideHunter
Some people are
Some people are
such food snobs.
such food snobs.
http://link.com/
http://link.com/
2 hours ago
2 hours ago
Tweet
Tweet
Attributes: Post Text
Post Text
Attributes: Title of Linked Article
Actual title of the article linked in post text
Source: 
http://proyectoportal.com/
 
Attributes: Truth Class
Preliminary Results
Baseline classifier: ZeroR
Attributes in model: Post Text and Article Title
*statistically significant at 95%
Added Features: Use of Superlatives
Is LeBron James’ NBA
Finals performance
the 
best
 ever?
http://link.com/
Does given post have a superlative?
Added Features: Use of Numbers
Does given post have a number?
5
 incredible Italian
dishes you haven’t
tried before.
http://link.com/
Added Features: Number of Words
Man dies when car
plunges from
parking garage.
http://link.com/
How many words are in the post text?
8 Words
Added Features: Similarity between
Post Text and Title of linked article
 
Post Text
Article Title
In example, # Overlaps = 4
Added Features: POS Ratio
These global
warming skeptics
have
Determiner +
Adjective + Singular
Noun + Plural Noun +
3
rd
 Person Verb
POS Sequence
POS Ratio = #Sequence in Clickbait / #Sequence in All
In example, POS Ratio = 0.8698
Results
Model tested on (unbalanced) big data set
Note: ZeroR (baseline) = 75.8553%
Conclusion and Future Work
Improved model to achieve a classification
accuracy of 88.2051%
Identified features that help detect clickbait
For the future:
- Image analysis
- # Ads on article webpage
Thank you!
Questions?
Slide Note
Embed
Share

Explore the realm of clickbait through a detailed investigation into identifying and combating misleading content online. With initiatives like the Clickbait Challenge and innovative feature analysis, researchers aim to enhance algorithms and classifiers for accurate detection. Preliminary results show promising outcomes with a significant improvement over baseline classifiers, paving the way for more advanced methods in combating clickbait. Utilizing natural language processing and machine learning techniques, this research delves into the intricate world of deceptive content to safeguard online users.

  • Clickbait Detection
  • NLP
  • Machine Learning
  • Deceptive Content
  • Online Safety

Uploaded on Sep 30, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. Clickbait Detection using Natural Language Processing and Machine Learning By Varun Shah Advisors: Kristina Striegnitz and Nick Webb

  2. What is Clickbait? Source: Google Images

  3. The Clickbait Challenge! Organizers: M. Potthast, T. Gollub, B. Stein and M. Hagen Competition to build a clickbait detector using provided Twitter data Each post is judged by 5 annotators. Source: http://www.clickbait-challenge.org/

  4. Data Small data set: # Posts # Clickbait # No-clickbait 2459 762 1697 Big data set: # Posts # Clickbait # No-clickbait 19538 4761 14777

  5. Attributes: Post Text SlideHunter Some people are such food snobs. http://link.com/ 2 hours ago Post Text Tweet

  6. Attributes: Title of Linked Article Actual title of the article linked in post text Source: http://proyectoportal.com/

  7. Attributes: Truth Class No-clickbait Clickbait 0 200 400 600 800 1000 1200 1400 1600 1800

  8. Preliminary Results Baseline classifier: ZeroR Attributes in model: Post Text and Article Title Classifier Classification Accuracy ZeroR 50.0% RandomForest 74.9085%* *statistically significant at 95%

  9. Added Features: Use of Superlatives Is LeBron James NBA Finals performance the best ever? http://link.com/ Does given post have a superlative? Superlative? Yes No # Clickbait 60 702 # No-clickbait 87 1610

  10. Added Features: Use of Numbers 5 incredible Italian dishes you haven t tried before. http://link.com/ Does given post have a number? Number? Yes No # Clickbait 191 571 # No-clickbait 416 1281

  11. Added Features: Number of Words Man dies when car plunges from parking garage. http://link.com/ 8 Words How many words are in the post text? Overall Mean Clickbait No-clickbait 12.662 11.901 13.002

  12. Added Features: Similarity between Post Text and Title of linked article Post Text Article Title In example, # Overlaps = 4 Overall Mean 3.997 Clickbait 3.150 No-clickbait 4.378

  13. Added Features: POS Ratio Determiner + Adjective + Singular Noun + Plural Noun + 3rd Person Verb These global warming skeptics have POS Sequence POS Ratio = #Sequence in Clickbait / #Sequence in All In example, POS Ratio = 0.8698 Minimum 0 Maximum 1 Mean 0.412 Std. Dev. 0.326

  14. Results Model tested on (unbalanced) big data set Note: ZeroR (baseline) = 75.8553% Attributes in Model Accuracy Post Text + Article Title + #Words + Overlap Post Text + Article Title + POS Ratio 82.2864% 86.6860% Post Text + Article Title + #Words + Overlap + POS Ratio 88.2051%

  15. Conclusion and Future Work Improved model to achieve a classification accuracy of 88.2051% Identified features that help detect clickbait For the future: - Image analysis - # Ads on article webpage

  16. Thank you! Questions?

More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#