ECPR

Install the app

Install this application on your home screen for quick and easy access when you’re on the go.

Just tap Share then “Add to Home Screen”

ECPR

Install the app

Install this application on your home screen for quick and easy access when you’re on the go.

Just tap Share then “Add to Home Screen”

Advanced Topics in Applied Regression

Levente Littvay
littvayl@ceu.edu

Central European University

Levente Littvay researches survey and quantitative methodology, twin and family studies and the psychology of radicalism and populism.

He is an award-winning teacher of graduate courses in applied statistics with a topical emphasis in electoral politics, voting behaviour, political psychology and American politics.

He is one of the Academic Convenors of ECPR’s Methods School, and is Associate Editor of Twin Research and Human Genetics and head of the survey team at Team Populism.

 @littvay

Course Dates and Times

Monday 1 to Friday 5 August 2016
Generally classes are either 09:00-12:30 or 14:00-17:30
15 hours over 5 days

Prerequisite Knowledge

A solid understanding of linear and logistic regression to the level that is described in the following texts.  Michael Lewis-Beck. (1980). Applied Regression: An Introduction. Newbury Park, CA: Sage, John Fox. (1991). Regression Diagnostics. Newbury Park, CA: Sage and Fred C. Pampel. (2000). Logistic Regression: A Primer. Newbury Park, CA: Sage (All books are from the Quantitative Applications in the Social Science, aka. little green books, series.)

 

You should also be comfortable to conduct basic data management, import and export and the analyses described in the listed books in at least one statistical package of your choice.  You also need to be open to learning other statistical packages.  In this course we will use R.  First lab session we will have a quick review of how to run regressions in R.  It would be helpful if you knew R or at least took the pre-session class.

 

This course starts where the ECPR Summer Course on Multiple Regression Analysis: Estimation, Diagnostics and Modelling (SD105 - Week 1) and Intro to GLM: Binary, Ordered and Multinomial Logistic, and Count Regression Models (SD106 - Week 2) ends.  If you do not feel prepared to come to Advanced topics, I recommend you take these classes.


Short Outline

Once a researcher becomes comfortable with regression, often the question arises. What next? Building on the assumptions regression models make (especially independence and lack of measurement error), this course offers an overview of multitude of ways the assumptions can be relaxed. In the process the course trains researchers to carefully think about these assumptions and become better data analysts and social scientists at the same time. The relaxing of regression assumptions allows us to look at the world from a new angle, to ask novel research questions.

 

The course offers an introduction to many statistical techniques that either complement or build on regression analysis. These include fixed and random effects, ideas behind multilevel modeling, measurement, reliability and validity, missing data and deeper understanding of model fit and model selection.

 

On a practical note. Most of this class will be in the classroom.  I may demonstrate some techniques using R (and if you are a proficient R user, you may be able to follow along on your laptop if you bring it) the purpose of the course is not to do practicals, but to teach you the methods.  The practicalities you can do at home and if you get stuck, we have consultation.  Scripts to do what we learn in class will be provided.


Long Course Outline

Once a person becomes comfortable with basic statistics and learns to use regression, often a new question arises. What next? While the possible answers to this question are endless, this course offers one such answer. Building on the assumptions regression models make (which are reviewed extensively in the course), this course offers an overview of multitude of ways the assumptions can be relaxed. In the process the course trains researchers to carefully think about these assumptions and become better data analysts and social scientists at the same time. The relaxing of regression assumptions allows us to look at the world from a new angle, to ask novel research questions that do not always follow the logic of one dependent and multiple independent variables familiar from regression models. Since many of the assumptions of regression models can be relaxed in a large number of ways, the course offers an introduction to many statistical techniques that either complement or build on regression analysis. Many of these techniques would deserve their own course (one of them, Multilevel Regression Modeling, I teach at the ECPR Winter School in Methods). Despite the number of topics covered, the course not only allows the students to master the basics of these techniques, it goes much further. It arms participants with the basic knowledge to comprehend the related literature and acquire an in depth understanding of the broader issues on their own. The course aims to tear down the barriers that come between written applied statistical textbooks and the consumer of the techniques which often exists because of a lack of appropriate foundation in the specific areas of statistics, that stem from the lack of understanding of what problems these advanced techniques solve and why they are absolutely crucial in producing solid scientific work.

 

This course used to be two weeks, but this year I cut the number of topics and made the course more intensive.  This year, with the expansion of the methods school, we are offering entire classes on some of the omitted topics.

 

The class focuses on the following assumptions of regression models: random sampling, independence and the absence of measurement error. After the first day’s overview of the class, the assumptions of regression models are reviewed in depth on the second day. The course will cover what happens when these assumptions are violated, how to test these assumptions and, in the easy cases, how to correct your analysis to avoid violating any assumptions.

 

Tuesday’s class will revolve around the issue of heterogeneity. Regression models make the assumption that observations (more specifically the post-control variable residuals of observations) are independent of each other. This assumption is often hard to meet. If any heterogeneity is present among observations that are not accounted for in the model, the model coefficients and significance tests will be biased. How the independent observation assumption can be met is the topic of the class. We discuss the explicit modeling of known heterogeneity with both control variables, fixed and random effects and explicit development of multilevel models designed to deal with this specific issue. If time permits I will briefly mention the modeling of unobserved heterogeneity. Mixture models can inductively derive subgroups of the observations and estimate different regression results for the sub-groups. All this is done in a way that maximizes model fit. These mixture models are not only useful in eliminating latent heterogeneity in the regression model, it is also useful in producing sub-classifications of our population that adhere to different characteristics based on our specified model. Which cases belong in which subgroups can become a research question of its own.  But these models come with a set of problems that are difficult to overcome and therefore practical use of the approach is rare and limited.  Additionally, this class will also be devoted to overcoming measurement error issues. Measurement is often an under-appreciated process of the quantitative social science, despite the fact that the problem unites both qualitative and quantitative paradigms. Poor measurements bias regression estimates by making them appear less strong and significant. In Tuesday’s class we will consider the theories of measurement and ways to assess quality of measurements in practice.

 

Wednesday we cover bootstrapping with a focus on estimating confidence intervals with the technique. Bootstrapped confidence intervals are more robust to some assumption violations in regression than methods that derive confidence intervals from the standard errors. This is especially true for smaller samples, in presence of heteroskedasticity and when outliers are present. Bootstrapped confidence intervals are more robust to violations of linearity and correct model specification as well.

 

Thursday’s class will consider the use of regression weights. Weights can be incorporated into regression models for a multitude of reasons. They can be used to correct for sampling error or survey (unit) nonresponse. The class will address the debate if weights are useful and if they should be used at all. In addition to demonstrating the use of weights in regressions, the class will also show how to avoid common mistakes when using regression weights.  As a related topic, this class is also devoted to the topic of missing data. What do we do when our regression has item missing data? The class will cover various theories of missing data that should be considered when devising a solution to the problem. In practice the two most commonly used modern approaches for missing data correction is imputation and direct estimation using full information.  The class will also demonstrate commonly used methods that are probably left alone never to be used.

 

Finally, on Friday, we introduce modern methods designed to aid the selection of various alternative model specifications, and also discuss model averaging.

Day Topic Details
Monday Review of Regression Modeling Assumptions and Discussion of How to Overcome Possible Problems

All Classroom, No Lab

Tuesday Measurement (reliability and validity) and Heterogeneity Models (fixed effects, random effects and Multilevel Models - including a discussion of interaction models – and latent heterogeneity)

All Classroom, No Lab

Wednesday The logic of bootstrapping

All Classroom, No Lab

Thursday Weighting and Missing Data (Both Item and Unit Nonresponse)

All Classroom, No Lab

Friday The logic of bootstrapping

All Classroom, No Lab

Day Readings
Monday

John Fox. (1991). Regression Diagnostics: An Introduction. Sage Publications;

Tuesday

Robert Adcock and David Collier. (2001). “Measurement Validity: A Shared Standard for Qualitative and Quantitative Research”. The American Political Science Review, 95(3):529-546;

 

Edward G. Carmines and Richard A. Zeller. (1979). Reliability and Validity Assessment. Sage Publications;

 

Melissa A Hardy. (1993). Regression with Dummy Variables. Sage Publications;

 

Marco Steenbergen and Bradford Jones. (2002). “Modeling Multilevel Data Structures”.  American Journal of Political Science 46(1): 218-237;

Wednesday

Chapter 21 - Bootstrapping Regression Models - Fox, John. (2008). Applied Regression Analysis and

Generalized Linear Models, Sage;

Thursday

Christopher Winship and Larry Radbill. (1994). “Sampling Weights and Regression Analysis”. Sociological Methods and Research, 23(2):230-257;

 

J. Bethlehem. (2002). “Weighting Nonresponse Adjustments Based on Auxiliary Information”. In Survey Nonresponse. Edited by R.M. Groves, D. Dillman, J.L. Eltinge, and R.J.A. Little. John Wiley and Sons. Pp. 275-288;

 

Joseph L. Schafer and John W. Graham. (2002). “Missing Data: Our View of the State of the Art”. Psychological Methods, 7(2):147–177;

 

John W. Graham. (2003). “Adding Missing-Data-Relevant Variables to FIML-Based Structural Equation Models”. Structural Equation Modeling, 10(1):80-100;

Friday

Chapter 22 - Model Selection, Averaging, and Validation - Fox, John. (2008). Applied Regression

Analysis and Generalized Linear Models, Sage;

Software Requirements

R -- http://www.r-project.org  (Newest version - FREE)

Hardware Requirements

If you bring your laptop (not a must) - 2GB RAM (4GB Preferred.)  Intel Atom CPU OK.  But anything produced after 2008 is OK.

Literature

Allison, Paul D. (2001). Missing Data. Newbury Park, Sage;

 

Enders, Craig K. (2010). Applied Missing Data Analysis. The Guilford Press;

 

Fox, John. (2008). Applied Regression Analysis and Generalized Linear Models, Sage;

 

Luke, Douglas. (2004). Multilevel Modeling. Sage;

 

Raudenbush, Stephen W. and Anthony S. Bryk. (2001). Hierarchical Linear Models: Applications and Data Analysis Methods, 2nd ed. Sage;

Recommended Courses to Cover Before this One

<p>Interpreting Binary Logistic Regression Models</p> <p>Multiple Regression Analysis: Estimation, Diagnostics and Modelling</p> <p>Intro to GLM: Binary, Ordered and Multinomial Logistic, and Count Regression Models</p> <p>Data analysis course (introductory)</p> <p>&nbsp;</p>

Recommended Courses to Cover After this One

<p>Multilevel Regression Modelling</p> <p>Structural Equation Modeling</p> <p>Handling Missing Data</p> <p>Causal Analysis</p> <p>Panel Data Analysis</p> <p>Introduction to Bayesian Inference</p>


Additional Information

Disclaimer

This course description may be subject to subsequent adaptations (e.g. taking into account new developments in the field, participant demands, group size, etc). Registered participants will be informed at the time of change.

By registering for this course, you confirm that you possess the knowledge required to follow it. The instructor will not teach these prerequisite items. If in doubt, please contact us before registering.