General Research Interests: data science, natural language processing, machine learning, sociology
Dissertation GoalsCommittee Members: Dr. Arnold Kim, Dr. Suzanne Sindi, Dr. Erica Rutter
Social media platforms have been documented to serve as a hotbed where problematic ideology propagates and intensifies, resulting in hate speech with malicious intent. Work is necessary to fully understand latent processes leading to its emergence. In addition to classifying hate speech, it is also important to model and analyze these processes to understand them more deeply. To bridge this gap, I seek to expand on existing quantitative metrics and machine learning approaches for classifying hate speech and evaluate long-term dynamical behavior of hate speech and general communication by identifying mechanisms which act to propagate such language. I additionally hope to understand the nature of online disagreement and structural differences between different types of speech, as well as the spread of misinformation online with the goal of prevention.
Covert Signaling in Hate Speech Collaborators: Dr. Arnold Kim & Dr. Paul Smaldino
Covert signaling is a type of message dissemination in which the intended message or signal is accurately recieved by the intended audience (the ingroup) and obscured to those in the outgroup. Such practices can be beneficial to those whose ingroup identity can have social consequences, such as White Nationalists or Radicals. We propose to study the way in which covert signaling takes place in political conversation and allows for dangerous hate speech to burrow itself into our everyday language patterns.
Serving as the first step in addressing such communications, we evaluate the potential for hashtags to serve as identity signals differing across varying populations, resulting in fluctuations in their interpretations by employing and analyzing Qualtrics survey data to uncover patterns/correlations between demographics, hashtag presence, and resulting evaluations.
Factors Predicting Willingness to Share COVID-19 MisinformationCollaborators: Emilio Lobato, Lace Padilla, & Colin Holbrook
Currently, the world is experiencing a global pandemic concerning SARS-CoV-2. Scientific and medical information concerning the virus is being discovered and relayed quickly to try and, among other things, best inform the general public, and policy makers about how best to respond. This creates an opportunity to study the spread of scientific information and misinformation on social media platforms, which serves as the purpose of this research. Primary research objectives for this research are to examine how individual difference variables predict information sharing behaviors.
We extend this project by studying retweet network of viral misinformation tweets about COVID-19 regarding conspiracies or potential treatments and cures.
Discourse Analysis of Pairwise Twitter HashtagsCollaborator: Alex John Quijano, Ayme Tomson, Arnold Kim, & Suzanne Sindi
Social media has recently served as a platform to discuss political topics, and naturally, debate arises. Often times, viral hashtags emerge as a means to call attention to acts of injustice. In response, those who disagree with the sentiment of such viral tweets often create counter hashtags to express disagreement and call attention to their respective stances. Insight into the conversations that happen during these disagreements can be informative about the way in which individuals disagree on issues online. We propose to analyze, compare, and contrast opposing hashtags in two ways: (1) with measures of the Jensen-Shannon Divergence and Shannon entropy and (2) with distance as measured by (BERT) word embeddings.
Effects of misinformation from bots on Twitter during California wildfiresCollaborators: Alex John Quijano & Matthew Mondares
Twitter bots are automated users created to post various information at regular intervals. Past studies have shown that information spread is often spam and/or misinformation, such as by anti-vaxxers. We explored the spread and propagation of misinformation disseminated throughout Twitter by bots during natural disaster (specifically the 2018 California Camp Fire) through user-user and hashtag co-occurence network analysis.
Dynamic Mode DecompositionCollaborators: Alex John Quijano, Arnold Kim, Ayme Tomson, & Suzanne Sindi
Matrix factorization techniques from numerical linear algebra have been proven to be powerful tools for uncovering latent behavior in time-series data. We consider the temporal evolution of words in distinct sentiment categories. We analyze the Google unigram English language corpus with two approaches: Principal Component Analysis (PCA) and Dynamic Mode Decomposition (DMD). We first examine the ability of each method on simulated word frequency data, then move to Google unigram data and characterize differences in the reconstruction power of both methods and the predictive power of DMD.
Cultural Diversity Collaborators: Alexander Dayer, Yumary Vasquez, & Paul Smaldino
Using agent-based modeling, we study the benefits of diversity in the context of group identity on cultural evolution and problem solving with mutual goals in partially versus fully connected populations. We propose that more diverse agents (those with multiple group identities) will excel at problem solving (i.e. converge to most optimal solutions, and potentially at fast rates), thus contributing positively to cultural diversity.