Tracking group identity through natural language within groups

LIWC Research Series:

Millions of people participate in identity-based social media communities (e.g., Reddit’s political communities). The rich language and behavioral data available from these groups makes them a treasure trove for studying naturally occurring group dynamics. But how do we capture people’s group identities from unstructured social media data? The current report presents evidence that people’s group identities leave traces in the language they use. Specifically, across diverse groups, the language of people with strong identities was marked by (a) a higher focus on affiliation, and (b) lower uncertainty or questioning. Using the two identified language markers, the current work formulated a language-based metric of group identity strength that can track identity processes in large online communities.

Read the paper


Ashwini Ashokkumar, James W Pennebaker, Tracking group identity through natural language within groups, PNAS Nexus, Volume 1, Issue 2, May 2022, pgac022,