BCCN Lecture Series

Propaganda is Already Influencing Large Language Models: Evidence From Training Data, Audits, and Real-world Usage

Online via Webex
Thu 18 December 2025 14:15 - 15:45 (CET)

There has been a flurry of recent concern about the question of who directly controls large language models.  We show through six studies that coordinated propaganda from powerful global political institutions already indirectly influences the output of U.S. large language models (LLMs) via their training data, a pattern which is easiest to see in China. First, we demonstrate that material originating from China's Publicity Department appears in large quantities in open-source pre-training datasets. Second, we connect this to U.S.-based commercial LLMs by showing that they have memorized sequences of propaganda, suggesting that it does appear in their training data. Third, we use an open-weight LLM to show that additional pre-training on Chinese state propaganda generates more positive answers to prompts about Chinese political institutions and leaders---evidence that propaganda itself, not mere differences in culture and language, can be a causal factor in the behavioral differences we observe across languages.  Fourth, we show that prompting commercial models in Chinese generates more positive responses about China's institutions and leaders than the same queries in English.  Fifth, we show that this language difference holds in prompts of actual Chinese-speaking users. Sixth, we extend our findings with a cross-national study that indicates that the languages of countries with lower media freedom show a stronger pro-regime valence than those with higher media freedom. Finally, we show results that demonstrate that the phenomenon described here is broader than propaganda and state media alone. Our findings join the ample recent work demonstrating the persuasive power of LLMs. Together, these results suggest the troubling conclusion that states and powerful institutions will have increased strategic incentives to disseminate propaganda in the hopes of poisoning LLM training data.

Bio:

Eddie Yang is an Assistant Professor of Political Science and faculty member in the Institute for Physical Artificial Intelligence at Purdue University. He received his Ph.D. in political science from the University of California San Diego. Yang studies the politics of innovation and technology. His research has been published at the Proceedings of the National Academy of Sciences, and Political Analysis, among other outlets.

Online via Webex. Please register here: fu-berlin.webex.com/webappng/sites/fu-berlin/webinar/webinarSeries/register/4a9b46cc059949ec85b7360d963cca0a

To Top