ECPR

Install the app

Install this application on your home screen for quick and easy access when you’re on the go.

Just tap Share then “Add to Home Screen”

ECPR

Install the app

Install this application on your home screen for quick and easy access when you’re on the go.

Just tap Share then “Add to Home Screen”

Your subscription could not be saved. Please try again.
Your subscription to the ECPR Methods School offers and updates newsletter has been successful.

Discover ECPR's Latest Methods Course Offerings

We use Brevo as our email marketing platform. By clicking below to submit this form, you acknowledge that the information you provided will be transferred to Brevo for processing in accordance with their terms of use.

Linear Regression with R: Estimation, Interpretation and Presentation

Course Dates and Times

Monday 17 – Friday 21 February 2019, 14:00 – 17:30 (finishing slightly earlier on Friday)
15 hours over five days

Martin Mölder

martin.molder@ut.ee

University of Tartu

Even though the buzzwords of our times are 'big data' and 'computer/data science', the foundation of statistical analyses in the social sciences is still classical linear regression. The robustness and versatility of this basic analytical technique is applicable to various research problems, and knowledge and experience gained in OLS regression is transferrable to other methods. If learning statistics should start from somewhere, it should be linear regression. 

This course will teach you how to apply, evaluate and interpret the results of linear regression models in R.

We start off very briefly with the prerequisites – a revision of the essential knowledge needed prior to running a regression – and move on from basic specifications of the model to more complex problems and interpretations.

We will go through regression assumptions and problems with assumption violations, look at how to use dummy variables and interactions in regression models and how the framework of linear regression can also accommodate non-linear associations.

The class ends with a focus on the presentation of regression results through tables and plots.

Tasks for ECTS Credits

2 credits (pass/fail grade)
Attend at least 90% of course hours, participate fully
 in in-class activities, and carry out the necessary reading and/or other work prior to, and after, class.

3 credits (to be graded) 
As above, plus complete
a take-home assignment which involves fitting and interpreting regression models on the basis of pre-given data and model specifications.

4 credits (to be graded)
As above, plus complete a
 final paper that should be structured like an academic journal/conference article, with the exception that the literature review section can be just 2–3 paragraphs where you present the puzzle. Identify a few hypotheses you are interested in testing, and test them based on data of your choosing.


Instructor Bio

Martin Mölder (PhD in comparative politics) is a researcher Johan Skytte Institute of Political Studies at the University of Tartu, Estonia.

His main research focus is political parties, their ideological and political positions, and the functioning of party systems. He also teaches, among other things, quantitative methods.

Martin has extensive background in the use of R for data management and statistical analysis in the social sciences.

He has taught the following courses at the ECPR Summer School in Methods & Techniques:

  • R Basics 2016 & 2017
  • Intermediate R: Capacities for Analysis and Visualisation 2017, 2018 & 2019
  • Advanced Topics in Applied Regression 2019

  @martinmolder

If you want to move on to complex analyses and statistical models, you need to get the simple things right first. This course will familiarise you with the basic statistical concepts that will enable you to fully and correctly use the framework of linear regression. 

By the end of the course you will have the theoretical and practical skills to responsibly run multivariate linear regressions on a variety of data configurations. This includes estimating multiple model specifications in R, presenting results in tables or in a graphical format and interpreting the coefficients for the reader. It also implies assessing the appropriateness of OLS regression for certain kinds of data and learning to make suitable corrections and adjustments when there is a mismatch between model requirements and data characteristics.

The basic introductory course in statistics usually gets to regression at the very end (if it goes further than that, then the speed of the course was probably too high). This course is for those who have got to that point but have not moved on much further. We will not focus too much on theory, but put the emphasis on correct application and interpretation. You don't need to know the mathematics that happens behind the scenes to understand what a regression model does and what it is capable of doing. 

The course is also suitable for those who have briefly encountered OLS regression as part of a statistics class, but now wish to better understand how it works, where it breaks down, and how it can be applied in a thorough way. Due to the need to constantly focus on the application of linear models, the course is unsuitable for those who want an introductory course in general statistics. During one of the sessions we briefly cover some basic statistical concepts and tests, but this is only so that we can all delve into the topic of linear models from an equal footing. This cannot be considered a substitute for a good coverage of introductory statistics.


Day 1
We start with a condensed review of some fundamental concepts in basic statistics: the z and t distributions, hypothesis testing, confidence intervals and correlation. This overview is intended to provide a solid foundation from which to advance in the following days. We begin to discuss a few basics of regression, such as how it goes beyond correlation, and for what type of questions it is helpful. In the lab session, we will go through a few of the basic data manipulation procedures commonly required before running any regression: data cleaning and recoding, transformations of data, etc. This is a good opportunity for you to get familiar, if need be, with working with syntax files in R and with the RStudio interface.

Day 2
We delve fully into the fundamentals of Ordinary Least Squares (OLS) regression: how the estimation is carried out, and how we interpret the coefficients for simple (one predictor) and multiple regression (two or more predictors). I will present some basic formulas, but the goal will be to gain an intuitive understanding of how the estimation process functions, and what the results mean. In the lab session we put this newly-gained knowledge to the test, by running a few examples of linear models in R. We will interpret the output and the model fit.

Day 3
We advance in our understanding of OLS by focusing on model specifications that almost always appear in empirical research, like models with dummy variables and with interactions between variables. We discuss the interpretation of such models and how you ought to communicate it to your audience. In the lab component we learn how to run these model specifications with R.

Day 4
This day is devoted to preventing abuse in the estimation of linear models. As with the vast majority of statistical procedures, a series of assumptions underpins OLS regression. If these are not met, our results may deceive us. In this session we go over these assumptions, how they influence the results when they are not met, and what strategies we have to overcome this situation. In the lab we turn to these issues from a practical perspective. We run a test regression in R, assess whether the assumptions are met, and correct for assumption violations that exist (if possible). Through a step-by-step process, you will see how your estimates and model fit change when engaging in such a process.

Day 5
On the last day we look into the various ways regression results can be presented. A properly formatted and thought through regression table is a must, but sometimes that is not enough. In fact, I would say that almost always it is not enough for an effective presentation of your models and your conclusions. For that it is necessary to visualise your results. This is especially true for interactions and non-linear effects. While tables of coefficients are still the dominant way of presenting results in academic journals, graphs and predicted values tend to be preferred in reports and analyses for larger, non-technical audiences. I believe strongly that you should be familiar with both types and should tailor the delivery of your results to the audience.

This course presumes a basic knowledge of fundamental statistical concepts such as hypothesis testing and comparison of means (t-tests).

If you have no background in statistics, you should also take Florian Weiler's course Introduction to Statistics for Political and Social Scientists

The class will be carried out in R. Therefore, you should have a basic knowledge of R as a statistical programming language and of RStudio.

The class assumes that you know how to read in data and a knowledge of basic data management skills as well as basic plotting commands.

If you have no experience with R, you should also take Thorsten Schnapp's short entry-level Introduction to R course

Day Topic Details
Day 1 From correlation to regression: revisiting the basics

We cover a few foundational concepts in statistics: correlation, standard error, t test, t and z distributions. We also make our first forays into the regression setup.

In the lab part, we get familiar with R or Stata, and try a few basic data manipulation and transformation tasks. All of these tasks habitually have to be performed before running a regression.

Day 2 OLS fundamentals, coefficients and model fit.

We go through the estimation of OLS models and the interpretation of coefficients for simple and multiple regression.

In the lab session, we run a few regressions in R, and go through interpreting coefficients and measures of model fit once more.

Day 3 Dummy variables, interactions, non-linear associations.

We discuss slightly more complex model specifications which include dummy variables, interactions between variables and non-linear effects of predictors.

In the lab, we go through such models and their interpretations.

Day 4 Regression assumptions: violations and remedies.

This session covers the assumptions underpinning OLS regression, what the implications of assumption violations are, and how to correct for them, if possible.

The lab session will offer practical strategies to identify assumption violations. We also see how estimates and model fit statistics change when correcting for some of these violations.

Day 5 Recap and presentation of regression models through regression tables and plots of coefficients and predicted values

In this last session, we review a few of the most important ideas covered in the past four days, based on participants’ requests. I show a few of the ways in which regression results can be presented to the audience, and discuss the strengths and weaknesses of each.

In the lab I show code for the presentations of results, and also allow for a recap of any topics participants feel we should cover again.

Day Readings
Day 1

Revisiting the basics

Field, Andy, Jeremy Miles, and Zoë Field. 2012
Discovering Statistics Using R, Chapters 2, 3, 6, 9
London: Sage Publications

I assume that many of the topics these chapters cover are familiar to you so they should at least to some extent be a refresher. Skim through as necessary. 

Advanced optional:

Fox, J. (2008)
Applied Regression Analysis and Generalized Linear Models. 2nd edition, Chapter 2
Thousand Oaks, CA: Sage

Day 2

OLS fundamentals

Field, Andy, Jeremy Miles, and Zoë Field. 2012
Discovering Statistics Using R, Chapter 7
London: Sage Publications

Advanced optional:

Fox, J. (2008)
Applied Regression Analysis and Generalized Linear Models. 2nd edition, Chapters 5 and 6
Thousand Oaks, CA: Sage

Day 3

Dummy variables, interactions, non-linear associations

Hardy, M. A. (1993)
Regression with Dummy Variables. Quantitative Applications in the Social Sciences Series, Chapter 3
London: Sage

Brambor, T., Clark, W. R., & Golder, M. (2005)
Understanding Interaction Models: Improving Empirical Analyses
Political Analysis, 14(1), 63–82

Advanced optional:

Fox, J. (2008)
Applied Regression Analysis and Generalized Linear Models. 2nd edition, Chapter 7
Thousand Oaks, CA: Sage

Day 4

Regression assumptions

Fox, J. (1991)
Regression Diagnostics: Quantitative Applications in the Social Sciences Series
London: Sage

Advanced optional:

Fox, J. (2008)
Applied Regression Analysis and Generalized Linear Models. 2nd edition, Chapters 11, 12, 13
Thousand Oaks, CA: Sage

Day 5

Presentation of regression results

Gelman, A., Pasarica, C., & Dodhia, R. (2002)
Let’s Practice What We Preach: Turning Tables into Graphs
The American Statistician, 56(2), 121–130

Breheny P and Burchett W (2017)
Visualization of Regression Models Using visreg
The R Journal, 9: 56–71

The primary textbook for the course is: 

Field, Andy, Jeremy Miles, and Zoë Field. 2012
Discovering Statistics Using R
London: Sage Publications

It is an engaging introductory level textbook into statistics that also covers the basics of regression and provides examples in R.

For each topic I will also indicate additional readings that help to take you further into the topics.

Software Requirements

Up-to-date versions of R and RStudio.

Hardware Requirements

Please bring your laptop.

Literature

Literature on regression is ubiquitous. It is part of every introductory textbook and there is myriad literature that can take you into more advanced topics on regression. Before I get to some of those, I'd like to mention two textbooks that give a good overview of topics about and around regression. The first is a very simple textbook, the second more advanced:

Gravetter, F.J. and Wallnau, L.B. (2016)
Statistics for the Behavioral Sciences
Cengage Learning

Agresti, A and Finlay, B (2008)
Statistical Methods for the Social Sciences
Prentice Hall

Below is a by no means an exhaustive list of literature related to regression that will take your knowledge about regression further.

Belsley, D. A., Kuh, E., & Welsch, R. E. (2004)
Regression Diagnostics: Identifying Influential Data and Sources of Collinearity
New York: Wiley

Berry, W. D. (1993)
Understanding Regression Assumptions. Quantitative Applications in the Social Sciences
Thousand Oaks, CA: Sage Publications

Braumoeller, B. F. (2004)
Hypothesis Testing and Multiplicative Interaction Terms
International Organization, 58(4), 807–820

Cohen, J., Cohen, P., West, S. G., & Aiken, L. S. (2003)
Applied Multiple Regression/Correlation Analysis for the Behavioral Sciences, 3rd ed.
Mahwah, NJ: Lawrence Erlbaum Associates

Jaccard, J., & Turrisi, R. (2003)
Interaction Effects in Multiple Regression (2nd ed.)
London: Sage Publications.

Kaufman, R. L. (2013)
Heteroskedasticity in regression: Detection and correction (Vol. 172)
Sage Publications

Lewis-Beck, M. S. (1980)
Applied Regression: An Introduction. Quantitative Applications in the Social Sciences Series
London: Sage

Motulsky, H. J., & Ransnas, L. A. (1987)
Fitting curves to data using nonlinear regression: a practical and nonmathematical review
The FASEB Journal, 1(5), 365–374

Ritz, C., & Streibig, J. C. (2008)
Nonlinear Regression with R
New York: Springer

Ryan, T. P. (2008)
Modern Regression Methods (2nd ed.)
Hoboken, NJ: Wiley

Sheather, S. J. (2009)
A Modern Approach to Regression with R
New York: Springer

Weisberg, S. (2005)
Applied Linear Regression (3rd ed.)
Hoboken, NJ: Wiley-Interscience

Recommended Courses to Cover Before this One

Summer School

Introduction to R
Introduction to Inferential Statistics: What you need to know before you take regression

Winter School

Introduction to R
Introduction to Statistics for Political and Social Scientists

Recommended Courses to Cover After this One

Summer School

Introduction to General Linear Models: Binary, Ordered and Multinomial Logistic, and Count Regression

Winter School

Interpreting Binary Logistic Regression Models