IMC: Andrew Gelman, “The Statistical Crisis in Science” this Friday, 10/14 at 12:00 PM Eastern

This Friday, October 14th at noon Eastern time, the International Methods Colloquium will inaugurate its Fall 2016 series of talks with a presentation by Andrew Gelman of Columbia University. Professor Gelman’s presentation is titled “The Statistical Crisis in Science.” The presentation will draw on these two papers:

“Beyond Power Calculations: Assessing Type S (Sign) and Type M (Magnitude) Errors”

“Disagreements about the Strength of Evidence”

To tune in to the presentation and participate in the discussion after the talk, visit and click “Watch Now!” on the day of the talk. To register for the talk in advance, click here:

The IMC uses Zoom, which is free to use for listeners and works on PCs, Macs, and iOS and Android tablets and phones. You can be a part of the talk from anywhere around the world with access to the Internet. The presentation and Q&A will last for a total of one hour.

Posted in Uncategorized | 3 Comments

2016-2017 Schedule of the International Methods Colloquium

I’m pleased to announce the schedule of speakers in the International Methods Colloquium series for 2016-2017!

Andrew Gelman, Columbia University October 14th
Sarah Bouchat, University of Wisconsin October 21st
Jacob Montgomery, WUSTL October 28th
Marc Ratkovic, Princeton University November 4th
Pete Mohanty, Stanford University November 11th
Women Also Know Stuff Roundtable (Emily Beaulieu, Amber Boydstun, Yanna Krupnikov, Melissa Michelson, Christina Wolbrecht) November 18th
Gina Yannitell Reinhardt, University of Essex February 3rd
Phil Schrodt, Parus Analytics February 10th
Jane Sumner, University of Minnesota February 17th
Ines Levin, UC Irvine February 24th
Lauren Prather, UCSD March 3rd
Christopher Lucas, Harvard University March 10th
Yuki Shiraito, Princeton University March 17th

Additional information for each talk (including a title and link to relevant paper) will be released closer to its date.

The International Methods Colloquium (IMC) is a weekly seminar series of methodology-related talks and roundtable discussions focusing on political methodology; the series is supported by a grant from the National Science Foundation. The IMC is free to attend from anywhere around the world using a PC or Mac, a broadband internet connection, and our free software. You can find out more about the IMC at our website:

where you can register for any of these talks and/or join a talk in progress using the “Watch Now!” link. You can also watch archived talks from previous IMC seasons at this site.

Posted in Uncategorized | Leave a comment

Spring 2016 Print Edition Released!

The Spring 2016 edition of The Political Methodologist is now available!

I am also relieved to announce that my term as editor is ending, and the search for a new editorial team for The Political Methodologist has now begun! The search is being spearheaded by the current Society President, Jeff Lewis. Those who are interested in taking over editorship of TPM should contact Jeff via e-mail at The current Rice editorial team is also happy to answer any questions about editing TPM; please direct inquiries to

Posted in Uncategorized | Leave a comment

Shiny App: Course Workload Estimator

I recently had the opportunity to dip my toe in developing web applications using Shiny through R. I was first introduced to the idea by one of my former graduate students, Jane Sumner (now starting as an Assistant Professor in political methodology at the University of Minnesota). Jane developed a tool to analyze the gender balance of assigned readings using Shiny that later went viral (at least among the nerds I run with).

Elizabeth Barre (of Rice University’s Center for Teaching Excellence) had talked to me about creating an application that allows both teachers and students to figure out exactly how much out-of-class work is being expected of them. A web application would enable visitors to input information from a syllabus and obtain a workload estimate based on pedagogical research about reading and writing speeds. She did the research and much of the graphic design of the app; I wrote the back-end code, set up a hosting server, and chipped in a little on the user interface design.

The result of our efforts is available at; clicking the link will open a new window containing the app and extensive details about how we calculate our workload estimates. Unfortunately, won’t let me directly embed the app on this page (although generally Shiny apps can be embedded via iframes). However, you can see what it looks like here:


The Course Workload Estimator Tool, available at

The tool is interesting on its own because it illustrates just how easy it is to expect too much of our students, even our graduate students, in a three or four credit hour course (see an extended discussion of these issues on the CTE’s blog). I found that a normal graduate student syllabus can easily contain 20+ hours of out-of-class work, a substantial burden for a student taking three courses and working on his/her own research. But it was also interesting as an experiment in using R skills for web developing.

The Shiny website provides a great tutorial that can help you get started, and going through that tutorial will probably be necessary even for an experienced R programmer just because Shiny’s architecture is a little different. But I was still surprised how low the start-up costs were for me. It’s especially easy if you’re using RStudio as an IDE and hosting your application through Shiny’s service, In just a few hours, I was able to create and publish a rough-draft but working web application for testing purposes.

Things got a little rougher when I decided that I wanted to save on bandwidth costs by building my own Shiny server in the cloud and hosting my apps there. Two tutorials posted by veteran users (available here and here) helped me get started, but experience using Linux (and particularly Ubuntu) was helpful for me in understanding what was going on. Past experience became particularly relevant when I realized that the tutorials’ recipes for setting up /etc/nginx/sites-available/default were causing my server to break and I had to go through line by line to figure out what was wrong. (Eventually, I was able to sequentially modify the default file until I had the features I wanted). Still, within 3 or 4 hours, I had my own server set up and hosted on DigitalOcean, with a domain at pointing to the server and the course workload app running smoothly.

In summary, I highly recommend checking out Shiny as a tool for teaching statistics and publicizing your research. It’s incredibly easy to do if you’re already experienced with programming in R and are hosting your apps through

Source code for the Course Workload Estimator tool is available here.

Posted in Uncategorized | Leave a comment

Embrace Your Fallibility: Thoughts on Code Integrity

Two years ago, I wrote a piece about my experiences over two years testing the code for papers being published in the Quarterly Journal of Political Science, which found problems in the code of many papers. The piece was first published in The Political Methodologist, and later in PS: Politics and Political Science. This piece is an extension of that article based on conversations that article sparked and my own experiences over the past two years. 

It’s natural to think that the reason we find problems in the code behind published papers is carelessness or inattention on behalf of authors, and that the key to minimizing problems in our code is to be more careful. The truth, I have come to believe, is more subtle: humans are effectively incapable of writing error-free code, and that if we wish to improve the quality of the code we write, we must start learning and teaching coding skills that help maximize the probability our mistakes will be found and corrected.

I myself once firmly believed the fallacy that the key to preventing errors was “to be more careful.” Indeed, I fear this belief may have colored the tone of of my past work on this subject in unproductive ways. Over the last few years, however, my research has brought me into close contact with computer scientists, and I discovered that computer scientists’ mentality about programming is fundamentally different from the mental model I had been carrying around. Computer scientists assume programmers will make mistakes, and instead of chiding people to “just be careful,” they have developed a battery of practices to address the problem. These practices — often referred to as “defensive programming” — are designed to (a) minimize the probability mistakes occur and (b) maximize the probability that mistakes that do occur are caught.

If we as social scientists wish to continue adopting more and more computational techniques in our research, I feel this is a mentality we must also adopt. This will not always be easy. Defensive programming is a skill, and if it is to become a part of the discipline it will require effort on behalf of researchers to learn, implement, and most important teach these skills to the next generation. But I think this is necessary to ensure the integrity of our work.

With that in mind, I would like to advocate for two changes to our approach to the computational component of social science.

First, I think we must adopt a number of practices from defensive programming in our own code. This piece lays out a few simple practices that I think are most applicable and practical for social scientists, both for individuals and co-authors working collaboratively. They aren’t meant as complete tutorials, but rather as illustrations of the type of practices I think should be promoted.

Second, I think we need to begin teaching these practices to students. Too often, students are either expected to figure out how to program on their own during their econometrics classes, or they are offered short, graduate-student-led workshops to introduce basic skills. Coding is now too central to our discipline to be given this second-tier status in our curriculum. If we are going to expect our students to engage in computational research, it is our obligation to equip them with the tools they need to stay out of danger.

Together, I think these two changes will improve the integrity of our research as coding becomes ever more central  to our discipline. Will they preclude errors completely? Unlikely — even when perfectly employed,  “defensive programming” is not fool-proof, and there will always be problems that these tools will not catch. But at least with these tools we can start to  minimize the likelihood of errors, especially large ones.

This piece is organized into five sections. Section 1 presents an overview of specific defensive programming practices we can all implement in our own code. Section 2 then lays out some examples of how “defensive programming” principles can guide workflow in collaborative projects. Finally, after introducing these concrete skills, I offer a few reflections on the implications of the “defensive programming” paradigm for third-party review of code by academic journals in Section 3, and for how the discipline responds to errors in Section 4. Section 5 concludes with a short list of other resources, to which additional suggestions are welcome!

1. Defensive Programming Practices

Defensive Programming Practice 1: Adding Tests

If we could only adopt one practice to improve the quality of our code, my vote would be for the addition of tests.

Tests are simple true-false statements users place in their code. A test checks for a certain condition (like whether the sample size in a regression is what you expect), and if the condition is not met, stops your code and alerts you to the problem.

Right now, many users may say “Yeah, I always check that kind of stuff by hand when I’m writing my code. Why do I need to add tests?”

The answer is four-fold:

  1.  Tests are executed every time your code is run. Most of us check things the first time we write a piece of code. But days, weeks, or months later, we may come back, modify code the occurs earlier in our code stream, and then just re-run the code. If those changes lead to problems in later files, we don’t know about them. If you have tests in place, then those early changes will result in an error in the later files, and you can track down the problem.
  2. It gets you in the habit of always checking. Most of us only stop to check aspects of our data when we suspect problems. But if you become accustomed to writing a handful of tests at the bottom of every file — or after ever execution of a certain operation (I’m trying to always including them after a merge), we get into the habit of always stopping to think about what our data should look like.
  3. Catch your problems faster. This is less about code integrity than sanity, but a great upside to tests is that they ensure that if a mistake slips into your code, you become aware of it quickly, making it easier to identify and fix the changes that caused the problem.
  4. Tests catch more than anticipated problems. When problems emerge in code, they often manifest in lots of different ways. Duplicate observations, for example, will not only lead to inaccurate observation counts, but may also give rise to bizarre summary statistics,  bad subsequent merges, etc. Thus adding tests not only guards against errors we’ve thought of, but may also guard against errors we don’t anticipate during the test writing process.

Writing Tests

Tests are easy to write in any language. In Stata, for example, tests can be performed using the assert statement. For example, to test whether your data set has 100 observations or that a variable meant to hold percentages has reasonable values, you could write:

* Test if data has 100 observations
assert `r(N)'==100

* Test variable percent_employed has reasonable values
assert percent_employed > 0 & percent_employed < 100

Similarly in R, one could do the same tests on a data.frame df using:

# Test if data has 100 observations

# Test variable has reasonable values
stopifnot(df$percent_employed > 0 & df$percent_employed < 100)


Defensive Programming Practice 2: Never Transcribe

We’ve already covered tricks to maximize the probability we catch our mistakes, but how do we minimize the probability they will occur?

If there is anything we learned at the QJPS, it is that authors should never transcribe numbers from their statistical software into their papers by hand. This was easily the largest source of replication issues we encountered, as doing so introduced two types of errors:

  • Mis-transcriptions: Humans just aren’t built to transcribe dozens of numbers by hand reliably. If the error is in the last decimal place, it doesn’t mean much, but when a decimal point drifts or a negative sign is dropped, the results are often quite substantively important.
  • Failures to Update: We are constantly updating our code, and authors who hand transcribe their results often update their code and forget to update all of their results, leaving old results in their paper.

How do you avoid this problem? For LaTeX users, I strongly suggest tools that export .tex files that can be pulled directly into LaTeX documents. I also suggest users not only do this for tables – which is increasingly common – but also statistics that appear in text. In your code, generate the number you want to cite, convert it to a string, and save it as a .tex file (e.g. exported_statistic.tex). Then in your paper, simply add a \input{exported_statistic.tex}call, and LaTeX will insert the contents of that .tex file verbatim into your paper.

Directly integrating output is somewhat harder to do if you work in Word, but is still feasible. For example, most packages that generate.tex files that can be easily integrated into LaTeX also often have options to export to .txt or .rtf files that you can easily use in Word. write.table() in R or esttab in Stata, for example, will both create output of this type you can put in a Word document. These tools can be used to generate tables can either be (a) copied whole-cloth into Word by hand (minimizing the risk of mis-transcriptions that may occur when typing individual values), or (b) using Word’s Link to Existing File feature to connect your Word document to the output of your code in a way that ensures the Word doc loads the most recent version of the table every time Word is opened. Some great tips for combining R with Word can be found here.

Defensive Programming Practice 3: Style Matters

Formatting isn’t just about aesthetics, it also makes it easier to read your code and thus recognize potential problems. Here are a few tips:

  • Use informative variable names. Don’t call something var212 if you can call it unemployment_percentage. Informative names require more typing, but they make your code so much easier to read. Moreover, including units in your variables names (percentage, km, etc.) can also help avoid confusion.
  • Comment! Comments help in two ways. First, and most obviously, they make it easy to figure out what’s going on when you come back to code days, weeks, or months after it was originally written. And second, it forces you to think about what you’re doing in substantive terms (“This section calculates the share of people within each occupation who have college degrees”) rather than just in programming logic, which can help you catch substantive problems with code that may run without problems but will not actually generate the quantity of interest.
  • Use indentation. Indentation is a way of visually representing the logical structure of code — use it to your advantage!
  • Let your code breathe. In general, you should put a space between every operator in your code, and feel free to use empty lines. Space makes your code more readable, as illustrated in the following examples:
    # Good
    average <- mean(feet / 12 + inches, na.rm = TRUE)
    # Bad

A full style guide for R can be found here, and a Stata style guide can be found here.

Defensive Programming Practice 4: Don’t Duplicate Information

Tricks to minimize the probability of errors often require a little more sophisticated programming, so they won’t be for everyone (tests, I feel, are more accessible to everyone). Nevertheless, here’s another valuable practice: Never replicate information.

Information should only be expressed once in a file. For example, say you want to drop observations if the value of a set of variables is greater than a common cutoff (just assume this is something you want to do — the specific operation is not important). In Stata, for example, you could do this by:

drop if var1 < 110
drop if var2 < 110
drop if var3 < 110

And indeed, this would work. But suppose you decided to change that cutoff from 110 to 100. The way this is written, you’ve opened yourself up to the possibility that in trying to change these cutoffs, you may change two of these but forget the third (something especially likely if the uses of the cutoff aren’t all in exactly the same place in your code).

A better way of expressing this that avoids this possibility is:

local cutoff = 110
drop if var1 < `cutoff'
drop if var2 < `cutoff'
drop if var3 < `cutoff'

Written like this, if you ever decide to go back and change the common cutoff, you only have to make one change, and there’s no way to make the change in some cases but forget others.

2. Collaboration

Until now, the focus of this piece has been on individual coding practices that minimize the risk of errors. But as social science becomes increasingly collaborative, we also need to think about how to avoid errors in collaborative projects.

In my experience, the way most social scientists collaborate on code (myself included, historically) is to place their code in a shared folder (like Dropbox or Box) and have co-authors work on the same files. There are a number of problems with this strategy, however:

  • Participants can ever be certain about the changes the other author has made. Changes may be obvious when an author adds a new file or large block of code, but if one participant makes a small change in an existing file, the other authors are unlikely to notice. If the other authors then write their code assuming the prior coding was still in place, problems can easily emerge.
  • There is no clear mechanism for review built into the workflow. Edits occur silently, and immediately become part of the files used in a project.

I am aware of three strategies for avoiding these types of problems.

The first and most conservative solution to this is full replication, where each author conducts the full analysis independently and authors then compare results. If results match, authors can feel confident there are no problems in their code. But this strategy requires a massive duplication of effort — offsetting many of the benefits of co-authorship — and requires both authors be able to conduct the entire analysis, which is not always the case.

The second strategy is compartmentalization, in which each author is assigned responsibility for coding specific parts of the analysis. Author A, for example, may be responsible for importing, cleaning, and formatting data from an outside source while Author B is responsible for subsequent analysis. In this system, if Author B finds she need an additional variable for the analysis, she ask Author A to modify Author A’s code rather than making modifications herself. This ensures responsibility for each block of code is clearly delimited, and changes are unlikely to sneak into an Author’s code without their knowledge. In addition, authors can also then review one another’s code prior to project finalization.[1][2]

The final strategy is to use version control, which is by far the most robust solution and the one most used by computer scientists, but also the one that requires the most upfront investment in learning a new skill.

“Version control” is the name for a piece of software specifically designed to manage collaboration on code (several exist, but git is by far the most well known and the only one I would recommend). Version control does several things. First, as the name implies, it keeps track of every version of your code that has ever existed and makes it easy to go back to old versions. This service is often provided by services like Dropbox, it is much easier to review old versions and identifying differences between old and new versions in git than through a service like Dropbox, whose interface is sufficiently cumbersome and most of us never use it unless we accidentally delete an important file.

What really makes version control exceptional is that it makes it easy to (a) keep track of what differs between any two versions, and to (b) “propose” changes to code in a way that other authors can easily review before those changes are fully integrated. If Author A wants to modify code in version control, she first creates a “branch” — a kind of working version of the project. She then makes her changes on that branch and propose the branch be re-integrated into the main code. Version control is then able to present this proposed change in a very clear way, highlighting every change that the new branch would make to the code base to ensure no changes — no matter how small — go unnoticed. The author that made the proposed changes can then ask their co-author to review the changes before they are integrated into the code base. To illustrate, Figure 1 shows an example of what a simple proposed change to code looks like on GitHub, a popular site for managing git projects online.

Figure 1: git Pull Request on GitHub

git_exampleFigure shows an example of a small proposed change to the code for a project on GitHub. Several aspects of the interface are worth noting. First, the interface displays all changes and the lines just above and below the changes across all documents in the project. This ensures no changes are overlooked. (Authors can click to “unfold” the code around a change if they need more context.) Second, the interface shows the prior contents of the project (on the left) and new content (on the right). In the upper pane, content has been changed, so old content is shown in red and new content in green. In the lower pane, new content has just been added, so simple grey space is shown on the left. Third, authors can easily comment (and discuss) individual lines of code, as shown here.

Version control is an incredible tool, but it must be noted that it is not very user friendly. For those interested in making the jump, the tool to learn is git, and you can find a terrific set of tutorials from Atlassian here, a nice (free, online) book on git here, and a very nice, longer discussion of git for political scientists on The Political Methodologist here.

In addition, there are also two projects that attempt to smooth out the rough edges of the git user-interface. Github Desktop, for example, offers a Graphical User Interface and streamlines how git works. Similarly, git-legit mimics the changes Github Desktop has made to how git works, but in the form of a command-line interface. These services are fully compatible with normal git, but learning one of these versions has the downside of not learning the industry-standard git interface. For researchers who don’t plan to engage in contributing to open-source software or get a job in industry, however, that’s probably not a huge loss.

3. Third-Party Code Review by Journals

As the discipline increasingly embraces in-house review of code prior to publication, one might wonder whether this is necessary. I am a strong advocate of Third-Party Review, but I think it is important to understand its limitations.

First, journals that conduct systematic review of replication code like QJPS and more recently the AJPS can only conduct the most basic of reviews. At the QJPS, in-house review only consists of ensuring the code is well-documented, that it runs without errors, and that the output it generates matches the results in the paper being published. Journals simply do not have the resources to check code line-by-line for correctness.

Second, even if Third-Party Review protects the integrity of the discipline, it does nothing to protect individual researchers. Appropriately or not, we expect researcher’s code to be error free, and when errors are found, the career implications can be tremendous. Indeed, it is for this reason that I think we have an obligation to teach defensive programming skills to our students.

Finally, even detailed Third-Party Review is not fool-proof. Indeed, the reason writing tests has become popular in computer science is a recognition of the fact that people aren’t built to stare at code and think about all possible issues that might arise. Even in computer science, Third-Party Review of code focuses on whether code passes comprehensive suites of tests.

4. Responding to Errors

For all the reasons detailed here, I think it makes sense for the discipline to think more carefully about how we respond to errors discovered in the code underlying published papers.

The status quo in the discipline is, I think most people would agree, to assume that most code both is and should be error-free. When errors are discovered, therefore, the result is often severe professional sanction.

But is this appropriate? The people who work most with code (computer scientists) have long ago moved away from the expectation that code can be error-free. At the same time, however, we cannot simply say “errors are ok.” The middle road, I believe, lies in recognizing that not all errors are the same, and that we must tailor our responses to the nature of the coding error. Errors caused by gross negligence are obviously unacceptable, but I feel we should be more understanding of authors who write careful code but nevertheless make mistakes.

To be more specific, I think that as a discipline we should try to coordinate on a set of coding practices we deem appropriate. Then if an error is uncovered in the work of someone who has followed these practices — adding tests, commenting their code, using good naming conventions, no duplicating information, etc. — then we should recognize that they took their programming seriously and that to err is human, even in programming. Errors should not be ignored, but in these settings I feel it is more appropriate to respond to them in the same way we respond to an error in logical argumentation, rather than as an indicator of sloppiness or carelessness.

To be clear, this is not to say we should persecute authors who do not follow these practices. Someday — when these practices are being consistently taught to students — I think it will be reasonable to respond to errors differentially depending on whether the author was employing error-minimizing precautions. But the onus is on us — the instructors and advisors of rising to scholars — to ensure students are appropriately armed with these tools if we wish to later hold them responsible for programming mistakes. To do otherwise is simply unfair.

But what we can do today is agree to be especially understanding of scholars who work hard to ensure the integrity of their code who nevertheless make mistakes, now and in the future. This, I feel, is not only normatively appropriate, but also creates a positive incentive for the adoption of good programming practices.

5. Other Resources

This document includes just a handful of practices that I think can be of use to social scientists. I’m sure there are many more I am unaware of, and I encourage readers who are aware of useful practices to send them my way.[3] Here are some starting references:


Thank you to Adriane Fresh, Simon Ejdemyr, Darin ChristensenDorothy KronickJulia PaysonDavid Hausman and Justin Esarey for their comments and contributions to this piece!

1. Note that the separation of responsibility does not need to be as crude as “cleaning” and “analysis” — this strategy simply requires that a single person has clear and sole responsibility for every line of code in the project.}

2. Another intermediate strategy — which can be combined with compartmentalization — is to maintain a change log where authors record the date, files, and line-numbers of any changes they make. This eliminates the problem of edits going unnoticed. However, it is worth noting that this strategy only works if both authors are sufficiently diligent. If either (a) the author making changes fails to log all changes or does not describe them well, or (b) the reviewing author fails to go back into the code to check all the changes reported in the change log, the system may still fail.

3.Users who google “defensive programming” will find many resources, but be aware many may not seem immediately applicable. Most defensive programming resources are written for computer scientists who are interested in writing applications to be distributed to users. Thus much of what is written is about how coders should “never trust the user to do what you expect.” There’s a clear analogy to “never assume your data looks like what you expect,” but nevertheless mapping the lessons in those documents to data analysis applications can be tricky.

Posted in Uncategorized | 1 Comment

Book Review: Political Analysis Using R

James E. Monogan, III. 2015. Political Analysis Using R. Springer.

There are a lot of books about R. A partial list on The R Project’s website lists 157 as of May 2016 and that list has some glaring omissions such as Thomas Lumley’s Complex Surveys and Hadley Wickham’s (in press) R for Data Science. Jamie Monogan gives us a new addition to this long list in the form of Political Analysis Using R. Even in a crowded field Monogan’s text – hereafter PAUR – is a welcome addition and one that will fit nicely into a political science course on quantitative methods.


PAUR offers 11 chapters beginning with a novice, illustrated introduction to R and ending with a relatively high level discussion of R programming. Each chapter contains clearly highlighted example code, reproductions of R console output, numerous tables and figures, and a set of practice problems to test knowledge of content covered in the chapter.

Chapter 1 offers the very basics: how to download and install the software, and how to install add-on packages and use the R graphical user interfaces across Windows, Mac OS, and Linux. Chapter 2 covers data import (and export), along with basic manipulations, merging, and recoding.

Chapter 3 introduces R’s base graphics functionality, covering histograms, bar charts, boxplots, scatterplots, and line graphs. It then offers a quick overview of lattice graphics to implement these and other visualizations. A notable absence from the chapter (and one noted by the author) is the increasingly popular ggplot2 package. The choice to rely exclusively on base graphics walks a difficult line, favoring the underlying strengths and limitations of the core library over those of add-on packages. Instructors using PAUR but wishing to teach ggplot2 will have to look elsewhere for relevant coverage of this material, perhaps to Wickham’s ggplot2 book, now in its second edition. Chapter 4 covers familiar territory of descriptive statistics, including central tendency and dispersion. I appreciated the way PAUR covered these topics, presenting formulae, code, and graphical depictions of distributions close to one another.

As is a consistent theme throughout the text, PAUR presents practical R implementations of statistical problems as part of larger substantive discussion of real political science examples. Indeed, one of PAUR‘s key strengths for a political science audience is its reliance on a familiar set of datasets from real political science applications. Leveraging and role modelling good open science practices, PAUR provides a Dataverse with complete data and code for all examples, which are in turn drawn from publicly available data and code used in published research articles. This should make it extremely easy for instructors to use PAUR in a quantitative methods sequence, by closely linking formal coverage of techniques, the substantive application of those techniques in political science articles, and implementation of those techniques in R. PAUR means there is little excuse to continue to use iris to teach scatterplots or mtcars to teach linear regression.

Chapter 5 offers basic statistical hypothesis testing, as well as other techniques of bivariate association (e.g., cross-tabulation). This chapter uses the gmodels package to provide cross-tabulations, which is a somewhat unfortunate reminder of R’s weaknesses in basic cross-tabulation, but a good decision from the perspective of teaching tabulation to those new to statistics or coming to the language from Stata or SPSS. This chapter probably could have taken a different route and used R to teach the logic of statistical significance (e.g., through simulations), but instead focuses mainly on how to implement specific procedures (t-test, correlation coefficient, etc.).

Chapter 6 marks a rapid acceleration in the breadth and density of content offered by PAUR. While the first 5 chapters provide a first course in statistical analysis in R, the second half of the book quickly addresses a large number of approaches that may or may not fit in that setting. Chapter 6 covers OLS and Chapter 7 covers logit and probit models, ordinal outcome models, and count models. (By comparison, John Verzani’s (2005) Using R for Introductory Statistics ends with half of a chapter on logistic regression; John Fox and Sanford Weisberg’s (2011) An R Companion to Applied Regression covers GLMs over two chapters as a final advanced topic.)

This transition from an elementary textbook on statistics to a sophisticated introduction to the most commonly used methods in political science is a strength and challenge for PAUR. On the one hand it greatly expands the usefulness of the text beyond an undergraduate companion text to something that could reasonably fit in a masters-level or even PhD methods sequence. On the other, it means that some instructors may find it difficult to cover all of the topics in the text during a 15-week semester (and certainly not in a 10-week quarter). That said, the text covers many of the topics that were addressed in the “grab bag” 1st year methods course I took in graduate school and would have been an immensely helpful companion as I first trudged through linear algebra, maximum likelihood estimation, and time-series in R.

To highlight some of the content covered here, Chapter 6 addresses linear regression and does a good job of leveraging add-on packages to introduce model output (with xtable), model diagnostics (with lmtest and car, and heteroskedasticity (with sandwich). Chapter 7 turns to generalized linear models using examples from the Comparative Study of Electoral Systems.

Chapter 8 is a real gem. Here Monogan made the right choice to enlist an army of excellent packages to teach advanced topics not commonly covered in competing textbooks: lme4 to teach mixed effects or multi-level models, along with some of political scientists’ contributions to R in the form of MCMCpack to teach bayesian regression, cem to showcase matching methods, and wnominate to teach roll call analysis. These are topics and packages that would be unusual to see in other introductions to R or other statistical texts, which clearly shows Monogan’s intention in PAUR to provide a textbook for up-to-date political analysis.

Chapter 9 covers time series methods. I am always a bit ambivalent about teaching these in a general course, but the chapter presents the methods clearly so the key aspects are there for those who want to include them. Chapter 10 and 11 serve as a high-level capstone with coverage of matrix operations, basic programming (functions, loops, conditional expressions, etc.), optimization, and simulation. Again, as with everything in the latter third of the book, these elements make PAUR stand out among competitors as a text that is particularly appropriate for teaching methods of quantitative political science as it is currently practiced.

Overall Evaluation

PAUR is not a reference manual nor a book about R as a programming language. It is, as its title clearly states, a guidebook to practicing quantitative political science. It is the kind of text that will make it easier to teach postgraduate students how to use R, as well as provide a relevant companion text to an intermediate or advanced course in quantitative methods to be taught at other levels.

I suspect political scientists coming to R from Stata would also find the text particularly attractive given its coverage of nearly all statistical techniques in wide use in the discipline today and its reliance on familiar disciplinary examples. It rightly does not attempt to rival say Cameron and Trivedi’s (2010) Microeconometrics Using Stata in scope, but adopts a focus on more cutting edge techniques at the expense of minutiae about older methods.

I applaud Monogan for Political Analysis Using R, for the ambition to provide a broadly relevant and useful new text on R, and for showcasing the value added of data sharing and reproducible research as a model of learning and teaching quantitative research. And I only dock him a few points for leaving out ggplot2. Well done.

Posted in Uncategorized | Leave a comment

Visualize Dynamic Simulations of Autoregressive Relationships in R

by: Christopher Gandrud, Laron K. Williams, and Guy D. Whitten

Two recent trends in the social sciences have drastically improved the interpretation of statistical models. The first trend is researchers providing substantively meaningful quantities of interest when interpreting models rather than putting the burden on the reader to interpret tables of coefficients (King et al., 2000). The second trend is a movement to more completely interpret and present the inferences available from one’s model. This is seen most obviously in the case of time-series models with an autoregressive series, where the effects of an explanatory variable have both short- and a long- term components. A more complete interpretation of these models therefore requires additional work ranging from the presentation of long-term multipliers (De Boef and Keele, 2008) to dynamic simulations (Williams and Whitten, 2012).

These two trends can be combined to allow scholars to easily depict the long-term implications from estimates of dynamic processes through simulations. Dynamic simulations can be employed to depict long-run simulations of dynamic processes for a variety of substantively-interesting scenarios, with and without the presence of exogenous shocks. We introduce dynsim (Gandrud et al., 2016) which makes it easy for researchers to implement this approach in R.

Dynamic simulations

Assume that we estimate the following partial adjustment model: Yt = α0 + α1Yt−1 + β0Xt + εt, where Yt is a continuous variable, Xt is an explanatory variable and εt is a random error term. The short-term effect of X1 on Yt is simple, β0. This is the inference that social science scholars most often make, and unfortunately, the only one that they usually make (Williams and Whitten, 2012). However, since the model incorporates a lagged dependent variable (Yt−1), a one-unit change in Xt on Yt also has a long-term effect by influencing the value of Yt−1 in future periods. The appropriate way of calculating the long-term effect is with the long-term multiplier, or κ1 = β0 / (1 − α1). We can then use the long-term multiplier to calculate the total effect that Xt has on Yt distributed over future time periods. Of course, the long-term multiplier will be larger as β0 or α1 gets larger in size.

We can use graphical depictions to most effectively communicate the dynamic properties of autoregressive time series across multiple time periods. The intuition is simple. For a given scenario of values of the explanatory variables, calculate the predicted value at time t. At each subsequent observation of the simulation, the predicted value of the previous scenario replaces the value of yt−1 to calculate the predicted value in the next iteration. Inferences such as long-term multipliers and dynamic simulations are based on estimated coefficients that are themselves uncertain. It is therefore very important to also present these inferences with the necessary measures of uncertainty (such as confidence intervals).

Dynamic simulations offer a number of inferences that one cannot make by simply examining the coefficients. First, one can determine whether or not the confidence interval for one scenario overlaps across time, suggesting whether or not there are significant changes over time. Second, one can determine whether the confidence intervals of different scenarios overlap at any given time period, indicating whether the scenarios produce statistically different predicted values. Finally, if one includes exogenous shocks, then one can determine the size of the effect of the exogenous shock as well as how quickly the series then returns to its pre-shock state. These are all invaluable inferences for testing one’s theoretical expectations.

dynsim process and syntax

Use the following four step process to simulate and graph autoregressive relationships with dynsim:

  1. Estimate the linear model using the core R function lm.
  2. Set up starting values for simulation scenarios and (optionally) shock values at particular iterations (e.g. points in simulated time).
  3. Simulate these scenarios based on the estimated model using the dynsim function.
  4. Plot the simulation results with the dynsimGG function.

Before looking at examples of this process in action, let’s look at the dynsim and dynsimGG syntax.

The dynsim function has seven arguments. The first–obj–is used to specify the model object. The lagged dependent variable is identified with the ldv argument. The object containing the starting values for the simulation scenarios is identified with scen. n allows you to specify the number of simulation iterations. These are equivalent to simulated ‘time periods’. scen sets the values of the variables in the model at ‘time’ n = 0. To specify the level of statistical significance for the confidence intervals use the sig argument. By default it is set at 0.95 for 95 percent significance levels. Note that dynsim currently depicts uncertainty in the systematic component of the model, rather than forecast uncertainty. The practical implication of this is that the confidence intervals will not grow over the simulation iteration. The number of simulations drawn for each point in time–i.e. each value of n–is adjusted with the num argument. By default 1,000 simulations are drawn. Adjusting the number of simulations allows you to change the processing time. There is a trade-off between the amount of time it takes to draw the simulations and the resulting information you have about about the simulations’ probability distributions (King et al., 2000, 349). Finally the object containing the shock values is identified with the shocks argument.

Objects for use with scen can be either a list of data frames–each data frame containing starting values for a different scenario–or a data frame where each row contains starting values for different scenarios. In both cases, the data frames have as many columns as there are independent variables in the estimated model. Each column should be given a name that matches the names of a variable in the estimation model. If you have entered an interaction using * then you only need to specify starting values for the base variables not the interaction term. The simulated values for the interaction will be found automatically.

shocks objects are data frames with the first column called times containing the iteration number (as in n) when a shock should occur. Note each shock must be at a unique time that cannot exceed n. The following columns are named after the shock variable(s), as they are labeled in the model. The values will correspond to the variable values at each shock time. You can include as many shock variables as there are variables in the estimation model. Again only values for the base variables, not the interaction terms, need to be specified.

Once the simulations have been run you will have a dynsim class object. Because dynsim objects are also data frames you can plot them with any available method in R. They contain at least seven columns:

  • scenNumber: The scenario number.
  • time: The time points.
  • shock. ...: Optional columns containing the values of the shock variables at each point in time.
  • ldvMean: Mean of the simulation distribution.
  • ldvLower: Lower bound of the simulation distribution’s central interval set with sig.
  • ldvUpper: Upper bound of the simulation distribution’s central interval set with sig.
  • ldvLower50: Lower bound of the simulation distribution’s central 50 percent interval.
  • ldvUpper50: Upper bound of the simulation distribution’s central 50 percent interval.

The dynsimGG function is the most convenient plotting approach. This function draws on ggplot2 (Wickham and Chang, 2015) to plot the simulation distributions across time. The distribution means are represented with a line. The range of the central 50 percent interval is represented with a dark ribbon. The range of the interval defined by the sig argument in dynsim, e.g. 95%, is represented with a lighter ribbon.

The primary dynsimGG argument is obj. Use this to specify the output object from dynsim that you would like to plot. The remaining arguments control the plot’s aesthetics. For instance, the size of the central line can be set with the lsize argument and the level of opacity for the lightest ribbon with the alpha argument. Please see the ggplot2 documentation for more details on these arguments. You can change the color of the ribbons and central line with the color argument. If only one scenario is plotted then you can manually set the color using a variety of formats, including hexadecimal color codes. If more than one scenario is plotted, then select a color palette from those available in the RColorBrewer package (Neuwirth, 2014). The plot’s title, y-axis and x-axis labels can be set with the title, ylab, and xlab arguments, respectively.

There are three arguments that allow you to adjust the look of the scenario legend. allows you to choose the legend’s name and leg.labels lets you change the scenario labels. This must be a character vector with new labels in the order of the scenarios in the scen object. legend allows you to hide the legend entirely. Simply set legend = FALSE.

Finally, if you included shocks in your simulations you can use the shockplot.var argument to specify one shock variable’s fitted values to include in a plot underneath the main plot. Use the shockplot.ylab argument to change the y-axis label.

The output from dynsimGG is generally a ggplot2 gg class object. Because of this you can further change the aesthetic qualities of the plot using any relevant function from ggplot2 using the + operator. You can also convert them to interactive graphics with ggplotly from the plotly (Sievert et al., 2016) package.


The following examples demonstrate how dynsim works. They use the Grunfeld (1958) data set. It is included with dynsim. To load the data use:

data(grunfeld, package = "dynsim")

The linear regression model we will estimate is:

Iit = α+β1Iit−12Fit3Citit,

where Iit is real gross investment for firm i in year t. Iit−1 is the firm’s investment in the previous year. Fit is the real value of the firm and Cit is the real value of the capital stock.


In the grunfeld data set, real gross investment is denoted invest, the firm’s market value is mvalue, and the capital stock is kstock. There are 10 large US manufacturers from 1935-1954 in the data set (Baltagi, 2001). The variable identifying the individual companies is called company. We can easily create the investment one-year lag within each company group using the slide function from the DataCombine package (Gandrud, 2016). Here is the code:


grunfeld <- slide(grunfeld, Var = "invest", GroupVar = "company",
                   TimeVar = "year", NewVar = "InvestLag")

The new lagged variable is called InvestLag. The reason we use slide rather than R’s core lag function is that the latter is unable to lag a grouped variable. You could of course use any other appropriate function to create the lags.


Dynamic simulation without shocks

Now that we have created our lagged dependent variable, we can begin to create dynamic simulations with dynsim by estimating the underlying linear regression model using lm, i.e.:

M1 <- lm(invest ~ InvestLag + mvalue + kstock, data = grunfeld)

The resulting model object–M1–is used in the dynsim function to run the dynamic simulations. We first create a list object containing data frames with starting values for each simulation scenario. Imagine we want to run three contrasting scenarios with the following fitted values:

  • Scenario 1: mean lagged investment, market value and capital stock held at their 95th percentiles,
  • Scenario 2: all variables held at their means,
  • Scenario 3: mean lagged investment, market value and capital stock held at their 5th percentiles.

We can create a list object for the scen argument containing each of these scenarios with the following code:

Scen1 <- data.frame(InvestLag = mean(InvestLag, na.rm = TRUE),
                     mvalue = quantile(mvalue, 0.95),
                     kstock = quantile(kstock, 0.95))
Scen2 <- data.frame(InvestLag = mean(InvestLag, na.rm = TRUE),
                     mvalue = mean(mvalue),
                     kstock = mean(kstock))
Scen3 <- data.frame(InvestLag = mean(InvestLag, na.rm = TRUE),
                     mvalue = quantile(mvalue, 0.05),
                     kstock = quantile(kstock, 0.05))

ScenComb <- list(Scen1, Scen2, Scen3)

To run the simulations without shocks use:


Sim1 <- dynsim(obj = M1, ldv = "InvestLag", scen = ScenComb, n = 20)

Dynamic simulation with shocks

Now we include fitted shock values. In particular, we will examine how a company with capital stock in the 5th percentile is predicted to change its gross investment when its market value experiences shocks compared to a company with capital stock in the 95th percentile. We will use market values for the first company in the grunfeld data set over the first 15 years as the shock values. To create the shock data use the following code:

# Keep only the mvalue for the first company for the first 15 years
grunfeldsub <- subset(grunfeld, company == 1)
grunfeldshock <- grunfeldsub[1:15, "mvalue"]
# Create data frame for the shock argument
grunfeldshock <- data.frame(times = 1:15, mvalue = grunfeldshock)

Now add grunfeldshock to the dynsim shocks argument.

Sim2 <- dynsim(obj = M1, ldv = "InvestLag", scen = ScenComb, n = 15,
                shocks = grunfeldshock)

Interactions between the shock variable and another exogenous variable can also be simulated for. To include, for example, an interaction between the firm’s market value (the shock variable) and the capital stock (another exogenous variable) we need to rerun the parametric model like so:

M2 <- lm(invest ~ InvestLag + mvalue*kstock, data = grunfeld)

We then use dynsim as before. The only change is that we use the fitted model object M2 that includes the interaction.

Sim3 <- dynsim(obj = M2, ldv = "InvestLag", scen = ScenComb, n = 15,
                shocks = grunfeldshock)

Plotting simulations

The easiest and most effective way to communicate dynsim simulation results is with the package’s built-in plotting capabilities, e.g.:


We can make a number of aesthetic changes. The following code adds custom legend labels, the ‘orange- red’ color palette–denoted by OrRd–, and relabels the y-axis to create the following figure.

Labels <- c("95th Percentile", "Mean", "5th Percentile")

dynsimGG(Sim1, = "Scenarios", leg.labels = Labels, 
          color = "OrRd",
          ylab = "Predicted Real Gross Investment\n")


When plotting simulations with shock values another plot can be included underneath the main plot showing one shock variable’s fitted values. To do this use the shockplot.var argument to specify which variable to plot. Use the shockplot.ylab argument to change the y-axis label. For example, the following code creates the following figure:

dynsimGG(Sim2, = "Scenarios", leg.labels = Labels, 
               color = "OrRd",
               ylab = "Predicted Real Gross Investment\n", 
               shockplot.var = "mvalue",
               shockplot.ylab = "Firm Value")



We have demonstrated how the R package dynsim makes it easy to implement Williams and Whitten’s (2012) approach to more completely interpreting results from autoregressive time-series models where the effects of explanatory variables have both short- and long-term components. Hopefully, this will lead to more meaningful investigations and more useful presentations of results estimated from these relationships.


Baltagi, B. (2001). Econometric Analysis of Panel Data. Wiley and Sons, Chichester, UK.

De Boef, S. and Keele, L. (2008). Taking time seriously. American Journal of Political Science, 52(1):184–200.

Gandrud, C. (2016). DataCombine: Tools for Easily Combining and Cleaning Data Sets. R package version 0.2.21.

Gandrud, C., Williams, L. K., and Whitten, G. D. (2016). dynsim: Dynamic Simulations of Autoregressive Relationships. R package version 1.2.2.

Grunfeld, Y. (1958). The Determinants of Corporate Investment. PhD thesis University of Chicago.

King, G., Tomz, M., and Wittenberg, J. (2000). Making the Most of Statistical Analyses: Improving Interpretation and Presentation. American Journal of Political Science, 44(2):347–361.

Neuwirth, E. (2014). RColorBrewer: ColorBrewer palettes. R package version 1.1-2.

Sievert, C., Parmer, C., Hocking, T., Chamberlain, S., Ram, K., Corvellec, M., and Despouy, P. (2016). plotly: Create Interactive Web Graphics via ‘plotly.js’. R package version 3.4.13.

StataCorp. (2015). Stata statistical software: Release 14.

Wickham, H. and Chang, W. (2015). ggplot2: An implementation of the Grammar of Graphics. R package version 1.0.1.

Williams, L. K. and Whitten, G. D. (2011). Dynamic Simulations of Autoregressive Relationships. The Stata Journal, 11(4):577–588.

Williams, L. K. and Whitten, G. D. (2012). But Wait, There’s More! Maximizing Substantive Inferences from TSCS Models. Journal of Politics, 74(03):685–693.

Posted in Software, Statistics, Uncategorized | 1 Comment