Install the app

Install this application on your home screen for quick and easy access when you’re on the go.

Just tap Share then “Add to Home Screen”


Install the app

Install this application on your home screen for quick and easy access when you’re on the go.

Just tap Share then “Add to Home Screen”

Content Analysis

Kostas Gemenis

Max Planck Institute for the Study of Societies – MPIfG

Kostas Gemenis is Senior Researcher in Quantitative Methods at the Max Planck Institute for the Study of Societies.

His research interests include measurement in the social sciences, and content analysis with applications to estimating the policy positions of political actors.

He is currently involved in Preference Matcher, a consortium of researchers who collaborate in developing e-literacy tools designed to enhance voter education.


Course Dates and Times

Monday 8 to Friday 12 August 2016
Generally classes are either 09:00-12:30 or 14:00-17:30
15 hours over 5 days

Prerequisite Knowledge

Participants are expected to be familiar with basic statistical concepts such as measures of central tendency (mean, median), dispersion (standard deviation), tests of association (Pearson’s r) and inference (χ2, t-test). These material are covered in the first few chapters of introductory statistics or data analysis textbooks. A useful example is Pollock P.H. III, The Essentials of Political Analysis, fourth edition (Washington, DC: CQ Press, 2012), Chapters 2, 3, 6, and 7. Some familiarity with SPSS, and Stata or R statistical software is also desirable.

Short Outline

The course will introduce participants to the family of methods known as ‘content analysis’ using a variety of examples from political science and other disciplines. The course will cover the basic aspects of content analysis relating to creating coding schemes, document selection, coding, and scaling. Particular attention will be paid to evaluating measurement in the context of content analysis in terms of reliability and validity. The course will cover different approaches to both manual and computer-assisted coding in content analysis, and will be taught in a mix of lectures and seminars. After the lectures, participants will be asked to do some hands-on content analysis exercises (codebooks, text material, and software where needed will be provided via Moodle) and the results will be discussed during the seminars. In additions, participants will be able to present their own project in class and get feedback from the instructor and the other participants.

Long Course Outline

Content analysis is typically defined as a method whose goal is to summarize a body of information, often in the form of text, and to make inferences about the actor behind this body of information. This implies that content analysis can be seen as a data reduction method since its goal is to reduce the text material in to more manageable bits of information. As these manageable bits are often in the form of quantitative data (i.e. numbers), most often than not, researchers refer to content analysis as a ‘quantitative’ method. Content analysis can be also seen as a method for descriptive inference. Weber (1990, p. 9) for instance, defines content analysis as ‘a method that uses a set of procedures to make valid inferences from text’. The idea is that, by analysing the textual output of an actor, we can infer something about this actor. This conceptualization of content analysis implies that we can use it as a tool for measurement in the social sciences.


Since many social science concepts are not directly observable, content analysis can provide a useful method in which we can measure quantities of interest that are otherwise difficult to estimate. For instance, by content analysing the speeches of legislators, we can classify them as charismatic, populist, authoritarian, liberal, and so on. Similarly, by content analysing the content of newspaper editorials, we can infer whether the media in question were biased in favour of a particular candidate during an election campaign. This view of content analysis, however, assumes that we are employing the scientific method and therefore any content analysis application should be concerned with replicability, objectivity, reliability, validity and so on (Neuendorf 2002, pp. 10-15). As such, content analysis should not be confused with other approaches/methods in the ‘qualitative’ research tradition such as discourse analysis, rhetorical analysis, constructivism, ethnography and so on.


Using the chapters in Krippenforff (2004) and Neuendorf (2002), the course will introduce participants to the basic concepts and building blocks in content analysis designs. The course will focus on both  manual and computer-assisted content analysis and compare the respective approaches extensively. Specifically, for manual content analysis, the course will also look at the, often overlooked, distinction between the analysis of manifest content and judgemental coding, whether computer-assisted content analysis will cover a variety of methods (dictionaries, wordscores, wordfish scaling methods and so on). The course will look at relationship between reliability and validity and outline methods for estimating inter-coder reliability and validating the results produced by computer-assisted content analysis. In this respect, the course will use many examples to illustrate the promises as well as the pitfalls of content analysis in various applications across the social sciences (e.g. sentiment analysis of the press, frames analysis of social movements, estimating the positions of political actors, agenda-setting in the EU).


To give a more extensive indication of the the issues that will be discussed in the course, you can consider the following questions:


  • Coding scheme (What are the theoretical underpinnings of the coding scheme? How are the categories selected and operationalized? What are the coding units? How is coding performed? Is our coding scheme valid?)
  • Selection of documents (What guides the selection of texts? Are texts sufficiently comparable? Are our documents valid and reliable indicators of the quantities of interest? How can we acquire and process text for computer-assisted content analysis?)
  • Aggregation (Are texts coded by different coders? If so, how are their results aggregated? If not, how can we ensure inter-coder reliability? What statistical measures can be used to estimate inter-coder reliability?)
  • Scaling (Are we estimating the quantities of interest directly? If not, how do we scale data in order to estimate the quantities of interest? Is our scaling valid and reliable?)


The format of the course will be a mixture of lectures, seminars, participant assignments and presentations. The lectures will outline the building blocks, challenges and trade-offs in content analysis. Participants will need to complete three assignments that will be discussed and extended during seminars. Finally, participants will have the opportunity to present their own content analysis project and receive feedback from the instructor and other participants.

Day Topic Details
Monday Introduction to content analysis; Manual content analysis I

Lecture (90 mins.)


  • Brief presentation of participants and their research projects
  • Defining content analysis; Key concepts in content analysis
  • Reliability and validity and their relationship to measurement error
  • Manifest versus judgemental coding; Manual versus computer-assisted coding


Lecture (90 mins.)


  • Designing a manual content analysis project
  • Best practices for defining a coding scheme, selecting the appropriate documents, coding the documents; scaling the coded data.

Manual coding assignment

Tuesday Manual Content analysis II; Computer-assisted content analysis I

Seminar (90 mins.)


  • Discussion of the manual coding assignment
  • Estimating inter-coder reliability with Krippendorff's alpha


Lecture (90 mins.)


  • The promises of computer-assisted content analysis (and four rules for good practice)
  • Selecting, cleaning and formatting documents
  • Computer-assisted content analysis and dictionary construction

Dictionary content analysis assignment

Wednesday Computer-assisted content analysis II

Seminar (90 mins.)


  • Discussion of the dictionary assignment
  • Effective data visualization and inference incontent analysis


Lecture (90 mins.)


  • Scaling models in computer-assisted content analysis and their assumptions
  • Supervised method: Wordscores
  • Unsupervised method: Wordfish
  • Scaling model content analysis assignment
Thursday Computer-assisted content analysis III

Seminar (90 mins.)


  • Discussion of the scaling model assignment
  • Illustration of Wordscores and Wordfish


Lecture (90 mins.)


  • Classification models in computer-assisted content analysis
  • Supervised methods
  • Unsupervised methods: LDA
Friday Manual content analysis III

Seminar (90 mins.)


  • Participant presentations (if any)
  • Illustration of supervised methods


Lecture (90 mins.)


  • New frontiers for manual coding: latent coding and crowdsourcing
  • Comparisons and trade-offs in content analysis
Day Readings



Neuendorf, Kimberly A. (2002) The content analysis guidebook. Thousand Oaks, CA: Sage, Chapter 1 (defining content analysis)


Krippendorff, Klaus (2004) Content analysis: An introduction to its methodology, second edition. Thousand Oaks, CA: Sage, Chapters 5 (unitizing) and 7 (coding)


Hayes, Andrew F., and Klaus Krippendorff (2007) Answering the call for a standard reliability measure for coding data. Communication Methods and Measures 1: 77–89.





Budge, Ian (2001) Validating party policy placements. British Journal of Political Science 31: 210–223.


Gemenis, Kostas (2013) What to do (and not to do) with the Comparative Manifestos

Project data. Political Studies 61(S1): 3–23


Grimmer, Justin, and Brandon M. Stewart (2013) Text as data: The promise and pitfalls

 of automatic content analysis methods for political texts. Political Analysis 21: 267–297.

 (sections 1–4 and 5.1 only)



Laver, Michael, and John Garry (2000) Estimating policy positions from political texts.

 American Journal of Political Science 44: 619–634.



Young, Lori, and Stuart Soroka (2012) Affective news: The automated coding of sentiment

 in political texts. Political Communication 29: 205–231.





Laver, Michael, Kenneth Benoit, and John Garry (2003) Extracting policy positions from

 political texts using words as data. American Political Science Review 97: 311–331.



Slapin, Jonathan B., and SvenOliver Proksch (2008) A scaling model for estimating time-

series party positions from texts. American Journal of Political Science 52: 705–722.





Lowe, Will (2008) Understanding Wordscores. Political Analysis 16: 356–371.



Hopkins, Daniel J., and Gary King (2010) A method of automated nonparametric content analysis for social science. American Journal of Political Science 54: 229-247.




Benoit, Kenneth, Drew Conway, Benjamin E. Lauderdale, Michael Laver, and Slava Mikhaylov

 (2014) Crowd-sourced coding of political texts. American Political Science Review, forthcoming





Gemenis, K. (2015) An iterative expert survey approach for estimating parties’ policy positions. Quality & Quantity, 49: 2291-2306.

Software Requirements

-Stata version 11 or higher, with wordscores, concord, agrm, polychoric user-written packages installed

-SPSS version 20 or higher, with KALPHA user-written menu installed

-R and R Studio

-Yoshikoder free software download

-Lexicoder free software download

Hardware Requirements

None - a computer lab will be used where necessary.



Recommended Courses to Cover Before this One

<p>Research Designs</p> <p>A Refresher of Inferential Statistics for Political Scientists</p> <p>Introduction to Statistics for Political and Social Scientists</p>

Additional Information


This course description may be subject to subsequent adaptations (e.g. taking into account new developments in the field, participant demands, group size, etc). Registered participants will be informed at the time of change.

By registering for this course, you confirm that you possess the knowledge required to follow it. The instructor will not teach these prerequisite items. If in doubt, please contact us before registering.