Tracking group identity through natural language within groups

LIWC Research Series:

To what degree can we determine people's connections with groups through the language they use? In recent years, large archives of behavioral data from social media communities have become available to social scientists, opening the possibility of tracking naturally occurring group identity processes. A feature of most digital groups is that they rely exclusively on the written word. Across 3 studies, we developed and validated a language-based metric of group identity strength and demonstrated its potential in tracking identity processes in online communities. In Studies 1a–1c, 873 people wrote about their connections to various groups (country, college, or religion). A total of 2 language markers of group identity strength were found: high affiliation (more words like we, togetherness) and low cognitive processing or questioning (fewer words like think, unsure). Using these markers, a language-based unquestioning affiliation index was developed and applied to in-class stream-of-consciousness essays of 2,161 college students (Study 2). Greater levels of unquestioning affiliation expressed in language predicted not only self-reported university identity but also students’ likelihood of remaining enrolled in college a year later. In Study 3, the index was applied to naturalistic Reddit conversations of 270,784 people in 2 online communities of supporters of the 2016 presidential candidates—Hillary Clinton and Donald Trump. The index predicted how long people would remain in the group (3a) and revealed temporal shifts mirroring members’ joining and leaving of groups (3b). Together, the studies highlight the promise of a language-based approach for tracking and studying group identity processes in online groups.

Read the paper


Ashwini Ashokkumar, James W Pennebaker, Tracking group identity through natural language within groups, PNAS Nexus, Volume 1, Issue 2, May 2022, pgac022,