An R Package for the Structural Topic Model

Authors: Molly Roberts, Brandon Stewart and Dustin Tingley

The Structural Topic Model is a general framework for topic modeling with document-level covariate information. The covariates can improve inference and qualitative interpretability and are allowed to affect topical prevalence, topical content or both. The software package implements the estimation algorithms for the model and also includes tools for every stage of a standard workflow from reading in and processing raw text through making publication quality figures.

The package currently includes functionality to:

Methods Papers

Supporting Packages

Published Applications

  1. Light, Ryan and Colin Odden. "Managing the Boundaries of Taste: Culture, Valuation, and Computational Social Science" Social Forces 2017.
  2. Kuhn, Kenneth D. "Topics and Trends in Incident Reports: Using Structural Topic Modeling to Explore Aviation Safety Reporting System Data" Twelfth USA/Europe Air Traffic Management Research and Development Seminar (ATM2017) 2017: 1-10.
  3. Kim, In Song. "Political Cleavages within Industry: Firm-level Lobbying for Trade Liberalization" American Political Science Review 2017.
  4. Tingley, Dustin. "Rising Power on the Mind." International Organization. 2017.
  5. Tvinnereim, Endre, Xiaozi Liu, and Eric M. Jamelske. "Public perceptions of air pollution and climate change: different manifestations, similar causes, and concerns." Climatic Change 2016: 1-14.
  6. Truex, Rory. Making Autocracy Work. Cambridge University Press. 2016.
  7. Kolar, Mladen and Matt Taddy. "Discussion of 'Coauthorship and Citation Networks for Statisticians'" The Annals of Applied Statistics 2016.
  8. Bauer, Paul C., Pablo Barberá, Kathrin Ackermann, Aaron Venetz. "Is the Left-Right Scale a Valid Measure of Ideology? Individual-Level Variation in Associations with "Left"" and "Right"" and Left-Right Self-Placement" Political Behavior 2016.
  9. Sachdeva, Sonya, Sarah McCaffrey and Dexter Locke. "Social media approaches to modeling wildfire smoke dispersion: spatiotemporal and social scientific investigations." Information, Communication & Society. 2016.
  10. Munksgaard, Rasmus and Jakob Demant. "Mixing politics and crime- thre prevalence and decline of political discourse on the cryptomarket." International Journal of Drug Policy. 2016.
  11. Huff, Connor and Dominika Kruszewska. "Banners, Barricades, and Bombs The Tactical Choices of Social Movements and Public Opinion" Comparative Political Studies. 2016.
  12. Bail, Christopher A. "Cultural carrying capacity: Organ donation advocacy, discursive framing, and social media engagement." Social Science & Medicine. 2016.
  13. Law, David S. "Constitutional Archetypes" Texas Law Review. 2016.
  14. Farrell, Justin. "Corporate funding and ideological polarization about climate change" Proceedings of the National Academy of Sciences. 2016.
  15. Wang, Baiyang and Diego Klabjan. "Temporal Topic Analysis with Endogenous and Exogenous Processes." Thirtieth AAAI Conference on Artificial Intelligence. 2016.
  16. Reich, Stewart, Mavon and Tingley "The Civic Mission of MOOCs: Measuring Engagement across Political Differences in Forums." Association for Computing Machinery: Learning at Scale. 2016.
  17. Tvinnereim, Endre and Kjersti Flottum. "Explaining topic prevalence in answers to open-ended survey questions about climate change" Nature Climate Change. 2015.
  18. Mishler, Alan, Erin Smith Crabbm Susannah Paletz, Brook Hefright, Ewa Golonka. "Using Structural Topic Modeling to Detect Events and Cluster Twitter Users in the Ukrainian Crisis." International Conference on Human-Computer Interaction. 2015.
  19. Milner, Helen and Dustin Tingley. Sailing the Water's Edge: The Domestic Politics of American Foreign Policy . Princeton University Press. 2015.
  20. Romney, David, Brandon Stewart and Dustin Tingley. " Plain Text: Transparency in the Acquisition, Analysis, and Access Stages of the Computer-assisted Analysis of Texts." Qualitative and Multi-Method Research. 2015.
  21. Genovese, Federica. "Politics ex cathedra: Religious authority and the Pope in modern international relations" Research & Politics 2015.
  22. Reich, Tingley, Leder-Luis, Roberts and Stewart. "Computer-Assisted Reading and Discovery for Student Generated Text in Massive Open Online Courses" Journal of Learning Analytics. 2015.

Installation Instructions

The package is available on CRAN and can be installed using:


You can always get the most stable development release from the Github repository. Assuming you already have R installed (if not see, the easiest way to install from the Github repository is to use the devtools package. First you have to install devtools using the following code. Note that you only have to do this once

if(!require(devtools)) install.packages("devtools")

Then you can load the package and use the function install_github


Note that this will install all the packages suggested and required to run our package. It may take a few minutes the first time, but this only needs to be done on the first use. In the future you can update to the most recent development version using the same code.

You can also grab the binaries or source files for the latest release here: ( Then use install.packages with repos=NULL so that

install.packages(filepath, repos = NULL)

Getting Started

See the vignette for several example analyses. The main function to estimate the model is stm() but there are a host of other useful functions. If you have your documents already converted to term-document matrices you can ingest them using readCorpus(). If you just have raw texts you will want to start with textProcessor().

Have a large text corpus or need a language we don't provide support for? See our sister project txtorg