A semi-supervised algorithm for detecting extremism propaganda diffusion on social media

M. Francisco; M. Á. Benítez-Castro; E. Hidalgo-Tenorio; Juan L. Castro

doi:10.1075/ps.21009.fra

ISSN 1878-9714
E-ISSN: 1878-9722

GBP

A semi-supervised algorithm for detecting extremism propaganda diffusion on social media
Author(s): M. Francisco¹, M. Á. Benítez-Castro², E. Hidalgo-Tenorio¹, Juan L. Castro¹
View Affiliations Hide Affiliations

Affiliations: ¹ University of Granada ² University of Zaragoza
Source: Pragmatics and Society, Volume 13, Issue 3, Jul 2022, p. 532 - 554
DOI: https://doi.org/10.1075/ps.21009.fra
- Received: 26 Jan 2021
- Accepted: 16 Nov 2021
- Version of Record published : 21 Jul 2022

Abstract

Extremist online networks reportedly tend to use Twitter and other Social Networking Sites (SNS) in order to issue propaganda and recruitment statements. Traditional machine learning models may encounter problems when used in such a context, due to the peculiarities of microblogging sites and the manner in which these networks interact (both between themselves and with other networks). Moreover, state-of-the-art approaches have focused on non-transparent techniques that cannot be audited; so, despite the fact that they are top performing techniques, it is impossible to check if the models are actually fair. In this paper, we present a semi-supervised methodology that uses our Discriminatory Expressions algorithm for feature selection to detect expressions that are biased towards extremist content (Francisco and Castro 2020). With the help of human experts, the relevant expressions are filtered and used to retrieve further extremist content in order to iteratively provide a set of relevant and accurate expressions. These discriminatory expressions have been proved to produce less complex models that are easier to comprehend, and thus improve model transparency. In the following, we present close to 70 expressions that were discovered by using this method alongside the validation test of the algorithm in several different contexts.

Article metrics loading...

/content/journals/10.1075/ps.21009.fra

2022-07-21

2022-09-27

From This Site

/content/journals/10.1075/ps.21009.fra

dcterms_title,dcterms_subject,pub_keyword

-contentType:Journal -contentType:Contributor -contentType:Concept -contentType:Institution

10

5

Full text loading...

References

Alharbi, Ahmed S. M., and Elise de Doncker
2019 ‘Twitter Sentiment Analysis with a Deep Neural Network: An Enhanced Approach Using User Behavioral Information’. Cognitive Systems Research54: 50–61. 10.1016/j.cogsys.2018.10.001
https://doi.org/10.1016/j.cogsys.2018.10.001 [Google Scholar]
Al-Salemi, Bassam, Shahrul Azman Mohd Noah, and Mohd Juzaiddin Ab Aziz
2016 ‘RFBoost: An Improved Multi-Label Boosting Algorithm and Its Application to Text Categorisation’. Knowledge-Based Systems103 (July): 104–17. 10.1016/j.knosys.2016.03.029
https://doi.org/10.1016/j.knosys.2016.03.029 [Google Scholar]
Alvari, Hamidreza, Soumajyoti Sarkar, and Paulo Shakarian
2019 ‘Detection of Violent Extremists in Social Media’. ArXiv:1902.01577 [Cs], February. arxiv.org/abs/1902.01577. 10.1109/ICDIS.2019.00014
https://doi.org/10.1109/ICDIS.2019.00014
Ashktorab, Zahra, Christopher Brown, Manojit Nandi, and Aron Culotta
2014 ‘Tweedr: Mining Twitter to Inform Disaster Response.’ InISCRAM.
[Google Scholar]
Benigni, Matthew C., Kenneth Joseph, and Kathleen M. Carley
2017 ‘Online Extremism and the Communities That Sustain It: Detecting the ISIS Supporting Community on Twitter’. PLOS ONE12 (12): e0181405. 10.1371/journal.pone.0181405
https://doi.org/10.1371/journal.pone.0181405 [Google Scholar]
Caropreso, Maria Fernanda, Stan Matwin, and Fabrizio Sebastiani
2001 ‘A Learner-Independent Evaluation of the Usefulness of Statistical Phrases for Automated Text Categorization’, 15.
[Google Scholar]
Cowan, Nelson
2001 ‘The Magical Number 4 in Short-Term Memory: A Reconsideration of Mental Storage Capacity’. The Behavioral and Brain Sciences24 (1): 87–114; discussion114–185. 10.1017/S0140525X01003922
https://doi.org/10.1017/S0140525X01003922 [Google Scholar]
Deng, Xuelian, Yuqing Li, Jian Weng, and Jilian Zhang
2019 ‘Feature Selection for Text Classification: A Review’. Multimedia Tools and Applications78 (3): 3797–3816. 10.1007/s11042‑018‑6083‑5
https://doi.org/10.1007/s11042-018-6083-5 [Google Scholar]
Ding, Jianli, and Liyang Fu
2018 ‘A Hybrid Feature Selection Algorithm Based on Information Gain and Sequential Forward Floating Search’. Journal of Intelligent Computing9 (3): 93. 10.6025/jic/2018/9/3/93‑101
https://doi.org/10.6025/jic/2018/9/3/93-101 [Google Scholar]
FAT/ML
FAT/ML. n.d. ‘Principles for Accountable Algorithms and a Social Impact Statement for Algorithms’. Accessed8 January 2019. www.fatml.org/resources/principles-for-accountable-algorithms
Forman, George
2003 ‘An Extensive Empirical Study of Feature Selection Metrics for Text Classification [J]’. Journal of Machine Learning Research – JMLR3 (March).
[Google Scholar]
Francisco, Manuel, and Juan Luis Castro
2020 ‘Discriminatory Expressions to Produce Interpretable Models in Microblogging Context’. ArXiv:2012.02104 [Cs], November. arxiv.org/abs/2012.02104
Galavotti, Luigi, Fabrizio Sebastiani, and Maria Simi
2000 ‘Experiments on the Use of Feature Selection and Negative Evidence in Automated Text Categorization’. InResearch and Advanced Technology for Digital Libraries, edited byJosé Borbinha and Thomas Baker, 59–68. Lecture Notes in Computer Science. Berlin, Heidelberg: Springer. 10.1007/3‑540‑45268‑0_6
https://doi.org/10.1007/3-540-45268-0_6 [Google Scholar]
Go, Alec, Richa Bhayani, and Lei Huang
2009 ‘Twitter Sentiment Classification Using Distant Supervision’. Processing150 (January).
[Google Scholar]
Harris, Zellig S.
1954 ‘Distributional Structure’. Word10 (2–3): 146–62. 10.1080/00437956.1954.11659520
https://doi.org/10.1080/00437956.1954.11659520 [Google Scholar]
Kotzias, Dimitrios, Misha Denil, Nando de Freitas, and Padhraic Smyth
2015 ‘From Group to Individual Labels Using Deep Features’. InKDD ’15. 10.1145/2783258.2783380
https://doi.org/10.1145/2783258.2783380 [Google Scholar]
Kubat, Miroslav
2017 An Introduction to Machine Learning. Cham: Springer International Publishing. 10.1007/978‑3‑319‑63913‑0
https://doi.org/10.1007/978-3-319-63913-0 [Google Scholar]
Largeron, Christine, Christophe Moulin, and Mathias Géry
2011 ‘Entropy Based Feature Selection for Text Categorization’. InACM Symposium on Applied Computing, edited byWilliam C. Chu, W. Eric Wong, Mathew J. Palakal, and Chih-Cheng Hung, 924–28. TaiChung, Taiwan: ACM. 10.1145/1982185.1982389
https://doi.org/10.1145/1982185.1982389 [Google Scholar]
Miller, George A.
1956 ‘The Magical Number Seven, plus or Minus Two: Some Limits on Our Capacity for Processing Information’. Psychological Review63 (2): 81–97. 10.1037/h0043158
https://doi.org/10.1037/h0043158 [Google Scholar]
Misangyi, Vilmos F., Jeffery A. LePine, James Algina, and Jr Francis Goeddeke
2016 ‘The Adequacy of Repeated-Measures Regression for Multilevel Research: Comparisons With Repeated-Measures ANOVA, Multivariate Repeated-Measures ANOVA, and Multilevel Modeling Across Various Multilevel Research Designs’. Organizational Research Methods, June. 10.1177/1094428105283190
https://doi.org/10.1177/1094428105283190 [Google Scholar]
O’Dair, M., and A. Fry
2019 ‘Beyond the Black Box in Music Streaming: The Impact of Recommendation Systems upon Artists’. Popular Communication. 10.1080/15405702.2019.1627548
https://doi.org/10.1080/15405702.2019.1627548 [Google Scholar]
Periñán-Pascual, Carlos, and Francisco Arcas-Túnez
2019 ‘Detecting Environmentally-Related Problems on Twitter’. Biosystems Engineering, Intelligent Systems for Environmental Applications, 177 (January): 31–48. 10.1016/j.biosystemseng.2018.10.001
https://doi.org/10.1016/j.biosystemseng.2018.10.001 [Google Scholar]
Phillips, Avery
2018 ‘The Moral Dilemma of Algorithmic Censorship’. Becoming Human: Artificial Intelligence Magazine. 27 August 2018. https://becominghuman.ai/the-moral-dilemma-of-algorithmic-censorship-6d7b6faefe7
[Google Scholar]
Rudin, Cynthia
2018 ‘Please Stop Explaining Black Box Models for High Stakes Decisions’. ArXiv:1811.10154 [Cs, Stat], November. arxiv.org/abs/1811.10154
Rutkowski, Leszek, Ryszard Tadeusiewicz, Lofti A. Zadeh, and Jacek M. Zurada
2008 Artificial Intelligence and Soft Computing – ICAISC 2008: 9th International Conference Zakopane, Poland, June 22–26, 2008, Proceedings. Springer Science & Business Media. 10.1007/978‑3‑540‑69731‑2
https://doi.org/10.1007/978-3-540-69731-2 [Google Scholar]
Senthil, Kumar B. and Varma E. Bhavitha
2016 ‘A Different Type of Feature Selection Methods for Text Categorization on Imbalanced Data’ 5 (9): 7.
[Google Scholar]
Sparck-Jones, Karen
1972 ‘A Statistical Interpretation of Term Specificity and Its Application in Retrieval’. Journal of Documentation28 (1): 11–21. 10.1108/eb026526
https://doi.org/10.1108/eb026526 [Google Scholar]
Twitter Inc.
Twitter Inc. 2019 ‘Q1 2019 Earning Report’. https://s22.q4cdn.com/826641620/files/doc_financials/2019/q1/Q1-2019-Slide-Presentation.pdf
Twitter Usage Statistics – Internet Live Stats
‘Twitter Usage Statistics – Internet Live Stats’ 2013. 2013 www.internetlivestats.com/twitter-statistics/
Villena-Román, Julio, Sara Lana-Serrano, Eugenio Martínez-Cámara, and José Carlos González-Cristóbal
2013 ‘TASS – Workshop on Sentiment Analysis at SEPLN’. Procesamiento del Lenguaje Natural50 (0): 37–44.
[Google Scholar]
Wang, Hao, Dogan Can, Abe Kazemzadeh, François Bar, and Shrikanth Narayanan
2012 ‘A System for Real-Time Twitter Sentiment Analysis of 2012 U.S. Presidential Election Cycle’. InProceedings of the ACL 2012 System Demonstrations, 115–20. ACL ’12. Stroudsburg, Penn.: Association for Computational Linguistics. dl.acm.org/citation.cfm?id=2390470.2390490
[Google Scholar]
Wu, Guohua, Liuyang Wang, Nailiang Zhao, and Hairong Lin
2015 ‘Improved Expected Cross Entropy Method for Text Feature Selection’. In2015 International Conference on Computer Science and Mechanical Automation (CSMA), 49–54. 10.1109/CSMA.2015.17
https://doi.org/10.1109/CSMA.2015.17 [Google Scholar]
Xu, Yan, Gareth Jones, Jintao Li, Bin Wang, and Chunming Sun
2007 ‘A Study on Mutual Information-Based Feature Selection for Text Categorization’. Journal of Computational Information Systems3 (March).
[Google Scholar]
Xue, Bing, Mengjie Zhang, and Will Browne
2013 ‘Particle Swarm Optimization for Feature Selection in Classification: A Multi-Objective Approach’. IEEE Transactions on Cybernetics43 (December): 1656–71. 10.1109/TSMCB.2012.2227469
https://doi.org/10.1109/TSMCB.2012.2227469 [Google Scholar]
Zhao, Z., M. Gao, J. Yu, Y. Song, X. Wang, and M. Zhang
2018 ‘Impact of the Important Users on Social Recommendation System’. Lecture Notes of the Institute for Computer Sciences, Social-Informatics and Telecommunications Engineering, LNICST252: 425–34. 10.1007/978‑3‑030‑00916‑8_40
https://doi.org/10.1007/978-3-030-00916-8_40 [Google Scholar]
Zheng, Hai-Tao, Zhe Wang, Wei Wang, Arun Kumar Sangaiah, Xi Xiao, and Congzhi Zhao
2018 ‘Learning-Based Topic Detection Using Multiple Features’. Concurrency and Computation-Practice & Experience30 (15): e4444. 10.1002/cpe.4444
https://doi.org/10.1002/cpe.4444 [Google Scholar]
Zheng, Zhaohui, Xiaoyun Wu, and Rohini Srihari
2004 ‘Feature Selection for Text Categorization on Imbalanced Data’. ACM SIGKDD Explorations Newsletter6 (1): 80–89. 10.1145/1007730.1007741
https://doi.org/10.1145/1007730.1007741 [Google Scholar]

http://instance.metastore.ingenta.com/content/journals/10.1075/ps.21009.fra

A semi-supervised algorithm for detecting extremism propaganda diffusion on social media

Pragmatics and Society 13, 532 (2022); https://doi.org/10.1075/ps.21009.fra

/content/journals/10.1075/ps.21009.fra

Data & Media loading...

Article Type: Research Article

Keyword(s): discriminatory expressions; extremism; feature selection; interpretability; microblogging; propaganda; social media; text mining

A semi-supervised algorithm for detecting extremism propaganda diffusion on social media

Abstract

From This Site

Most Read This Month

Most Cited

The paradox of communication: Socio-cognitive approach to pragmatics

“Who knows best?”: Evidentiality and epistemic asymmetry in conversation

Evidentiality in social interaction

A cross-cultural investigation of email communication in Peninsular Spanish and British English: The role of (in)formality and (in)directness

“I’m sorry, flower”: Socializing apology, relationships, and empathy in Japan

Images of “good English” in the Korean conservative press: Three processes of interdiscursivity

(Im)politeness during Prime Minister’s Questions in the U.K. Parliament

Describing the Cookie Theft picture

For a constitutive pragmatics: Obama, Médecins Sans Frontières and the measuring stick

Narrating fragile stories about HIV/AIDS in South Africa