NAS Workshop: Statistical Challenges in Assessing and Fostering the Reproducibility of Scientific Results

On February 26-27, the National Research Council of the National Academies is hosting a workshop of the Committee on Applied and Theoretical Statistics titled “Statistical Challenges in Assessing and Fostering the Reproducibility of Scientific Results.” The workshop will include presentations and roundtable discussions by statisticians and quantitative researchers from a variety of disciplines. I’ll also be making a short presentation on behalf of The Political Methodologist in light of our recent series on research replicability. You can view the full agenda here.

Perhaps most notably, you can attend the workshop from the comfort of your own computer!

Register by Tuesday, February 24 to attend in person or online.

Posted in Call for Papers / Conference, Statistics, The Discipline | Leave a comment

Introducing the Annual TPM Most Viewed Post Award (and our 2014 winner)

2014 was the first full year of The Political Methodologist‘s on-line blog presence. As we take stock of the year that was, I thought it would be informative for readers and potential submitters (not to mention, ahem, editors) to take a look at how the blog site is doing.

I also thought it important for TPM to formally recognize its most popular contributors each year with an award, as a way of expressing the Editorial Staff’s gratitude toward and respect for authors who help to support our Society through practical, high-quality scholarship–even in the newsletter. I am therefore inaugurating the annual TPM Most Viewed Post award for the article published in the previous calendar year that generated the most page views. Because TPM’s blog began at the end of 2013, I’m including all 2013 articles in this year’s award. (I’ll also include all articles published in December for the next year’s award on an ongoing basis, as it’s a bit of a raw deal to be published on December 31st and have just one day to generate page views to be counted toward an award.) This award entitles the recipient to a line on his or her curriculum vitae, and one (1) high-five from the incumbent TPM editor (to be collected at the next annual meeting of the Society for Political Methodology).[1]

First, let’s take a look at the site overall. Here are TPM’s viewing statistics for 2014:


Pretty good! The site had 43,924 page views and 29,562 unique viewers in 2014. I don’t think we’re in any danger of putting Science out of business, but I think that’s an excellent readership base for the newsletter of the Society for Political Methodology. I look forward to bettering these numbers in the year to come, with the help of new and interesting contributed posts!

Now, let’s look at the top ten most-viewed posts in 2014. These statistics count page views in calendar year 2014 only, but all-time viewing statistics leave the same two posts in the same order at the top of the rankings.


Thomas Leeper‘s post on making high-resolution graphics is, by a factor of nearly three, the most viewed article in TPM for 2014. You’ll note that it’s viewed even more often than the home page; this is likely a reflection of the fact that visitors reach the page by Googling for help making high-resolution graphics and then finding Thomas’s article.

So, by the power vested in me by WordPress, I award Dr. Thomas J. Leeper The Political Methodologist Most Viewed Post Award for 2014. Congratulations!

The runner-up is Chris Achen‘s post on diversity in the Society for Political Methodology. I’m pleased to see this topic attracting such a great deal of positive attention from our readers, and I hope that this attention foreshadows improved diversity in our ranks in the future.

Last, an editorial message: Royce Carroll is departing as Associate Editor, leaving Rick Wilson and Randy Stevenson as the Associate Editors and me (Justin Esarey) as Editor. Thanks to Royce for his assistance as The Political Methodologist launched its blog!


[1] A low-five may be substituted upon request.

Posted in Editorial Message, The Discipline | Tagged | Leave a comment

One Norm, Two Standards: Realizing Transparency in Qualitative Political Science

Editor’s Note: this post is contributed by Andrew Moravcsik, Professor of Politics and Public Policy and Director of the European Union Program at Princeton University. This posting was updated at 11:00 PM CDT 1/3/2015 at the request of the author.

Quantitative and qualitative political scientists are currently working closely together, and with their counterparts in other disciplines, to render social science research more transparent. The Data Access and Research Transparency (DA-RT) initiative launched by the American Political Science Association (APSA) has resulted in the new professional responsibility norms mandating transparency. To realize this goal, APSA co-hosted a workshop several months ago assembling thirteen editors of top political science journals of all types—including American Political Science Review, International Security, PS, American Journal of Political Science, and Comparative Political Studies. The editors issued in a joint public statement committing their journals to implement new qualitative and quantitative transparency standards by January 2016. It is already gaining new adherents from journals outside the original group.[i] A number of other institutions, including the Qualitative Data Repository at the Institute for Qualitative and Multi-Method Research (IQMR), National Science Foundation (NSF), Social Science Research Council (SSRC), Berkeley Initiative for Transparency in the Social Sciences (BITSS) have are mounting efforts in this area as well.[ii] Social science transparency is an idea whose time has come.

This paper addresses one perceived constraint on this trend: the lack of broad understanding (outside of those closely involved) about exactly what transparency implies for qualitative political science. Quantitative analysts have more self-awareness in this domain. They generally understand enhanced transparency to mean “more of the same,” that is, wider adoption of rules (already in place at some journals as a precondition for review or publication) requiring that data be placed in database at a trusted third-party repository with a digitally automated analytical protocol.

Yet what is the equivalent default transparency scheme for qualitative researchers, most of whom conduct process-tracing analyses of case studies? Finding an effective means to promote qualitative transparency is likely to have a deeper impact on political science than improving quantitative transparency—or, indeed, any other immediate methodological reform. This is true for three reasons. First, the majority of researchers in political science—over 90% in a sub-discipline like International Relations—employ qualitative methods.[iii] Second, little has been done so far to promote qualitative transparency, so any change is likely to have a greater marginal impact. Third, for any changes at the disciplinary level to succeed, the active involvement of non-quantitative scholars will be required.

The answer may seem straightforward. Qualitative transparency ought to be enhanced by familiar means: either by generalizing conventional citation currently employed by qualitative analysts, or by introducing the same centralized data archiving employed by quantitative analysts. Yet scholars have recently come to understand—for reasons this essay will discuss in detail—that for most qualitative political science, neither of these solutions offers a practical means to enhance transparency. More innovative use of digital technology is required. A consensus is forming that the most workable and effective default transparency standard for qualitative political science is Active Citation: a system of digitally-enabled citations linked to annotated excerpts from original sources.

To explain how scholars reached these conclusions, this essay proceeds in four sections. The first section presents a consensual definition of research transparency and why almost all political scientists accept it. The second sets forth essential criteria for judging how this transparency norm should be implemented, realized and applied within specific research communities. The third explains why the best-known approaches—conventional citation, hyperlinks and digital archiving—fail to meet these criteria for most qualitative political science today. The fourth section concludes by describing Active Citation in detail and explains why it best fulfills the essential criteria of an applied standard.

  1. What is Research Transparency?

“Research transparency” designates a disciplinary norm whereby scholars publicize the process by which they generate empirical data and analysis.[iv] It obliges scholars to present the evidence, analysis and methodological choices they use to reach research conclusions about the social world—in plain English, to “show their work.” Recent scholarship has refined this definition by isolating three dimensions of research transparency.[v] The first, data transparency, obliges social scientists to publicize the evidence on which their research rests. The second dimension, analytic transparency, obliges social scientists to publicize how they measure, code, interpret, and analyze that data. The third dimension, process transparency, obliges social scientists to publicize the broader set of research design choices that gave rise to the particular combination of data, theories, and methods they employ.

Unlike almost any other methodological ideal, transparency unifies rather than divides social scientists across the full range of disciplines, epistemologies, methods, theories and substantive interests. Transparency enjoys this consensual status because it constitutes social science as a legitimate collective activity. To publicize data, theory and methods is to fulfill a basic ethical responsibility to act toward other scholars with openness and honesty. Underlying this responsibility is the realization that scholars are humans whose research choices are inevitably subject, sometimes unconsciously, to arbitrary interests, commitments and biases. To be part of a common scholarly conversation requires, therefore, that one admit fallibility and pay respect to readers and potential interlocutors by opening the choices one makes to meaningful public discussion and debate.[vi] The Ethics Committee of the American Political Science Association (APSA)—which represents a very broad range of methods and interests—recently recognized this by expanding the professional responsibilities political scientists share to include data access and research transparency.[vii]

Social scientific transparency is essential for a more pragmatic reason as well, namely that the legitimacy and credibility of academic findings (inside and outside of academia) rests in large part on the belief that they result from well-informed scholarly discussion and debate.[viii] The recent discovery of a large number of non-replicable positive results in the natural and social sciences has called this legitimacy and credibility into question. Transparency offers one check on this tendency by inviting scholars to investigate, replicate, critique, extend and reuse the data, analysis and methods that their colleagues have published. When new work appears, other scholars are inspired to accept, reject or improve its findings. Citizens, private organizations, sponsoring bodies, and public decision makers can evaluate and apply the results, feeding back their experiences to researchers as new data and questions. Scholars are trained to contribute to the advancement of this collective enterprise and are recognized and rewarded for doing so well. They challenge, extend, or borrow from prior data, analysis and methods to move in innovative directions, thereby renewing the flow of research and starting anew. Transparency not only permits this cycle of research to take place, but displays it publicly, enhancing its credibility and legitimacy—thereby ultimately justifying society’s investment in it.

  1. Principled and Pragmatic Criteria for Implementing Qualitative Transparency

Almost all social scientists recognize transparency as an abstract good. Yet political science is divided into diverse research communities with different methods, theories and substantive interests. It would be inappropriate for all to seek to realize transparency by applying the same concrete rules. Different styles of research require different procedures.[ix] To understand how contemporary qualitative political scientists can best achieve transparency, we must understand two aspects of qualitative political science. One is the basic “process-tracing” epistemology most qualitative political scientists employ. The other is the set of everyday practical constraints they face.

The “Process-Tracing” Epistemology of Most Qualitative Political Science

Most qualitative political science researchers employ some form of “process-tracing” to investigate a few cases intensively. Comparing across cases (let alone analyzing a large number of qualitative observations in a database) plays a secondary role. Process-tracing analysts focus primarily within single cases, citing individual pieces of textual and factual evidence to infer whether initial conditions and subsequent events actually took place, and to establish the existence of (and relative importance of) descriptive or causal processes that may link cause and effect. Evidence appears as a set of heterogeneous insights or pieces of data about a causal mechanism and its context (“causal process observations”) linked to specific successive points in a narrative or causal sequence. This inferential structure differs from that found in most quantitative political science, where data is typically analyzed as “part of a larger, systematized array of observations” (“dataset observations”) describing cross-case variation.[x]

There are many reasons—ranging from scholarly, normative or policy interest in a particular case to unit heterogeneity or uniqueness—why a scholar might legitimately wish to focus on internal validity in this way rather than average outcomes across a larger population.[xi] One methodological advantage of the process-tracing mode of qualitative analysis is that researchers can exercise a unique degree of interpretive subtlety in assigning an appropriate role, weight, and meaning to each piece of causal process evidence, depending on its position in the underlying temporal narrative or causal sequence and on its basic reliability in context. For example, they may view a particular fact as a necessary condition (a “hoop test”), a sufficient condition (a “smoking gun” test), INUS (an insufficient but necessary part of a condition which is itself unnecessary but sufficient for the result), SUIN (a sufficient but unnecessary part of a factor that is insufficient but necessary for an outcome), a probabilistic link (a “straw in the wind”), or another type of condition.[xii] Within each category, a test may be viewed as more or less probative, based on the nature of the test, the particulars of the text, the reliability of the evidence, the broader political, social and cultural context, and the Bayesian priors in the literature. This nuanced but rigorous, discursively contextualized, source-by-source interpretation, which is highly prized in fields such as history and law, contrasts with the norm of imposing uniform rules that weight all data in a general database equally or randomly, which is more generally found in quantitative political science.

Pragmatic Constraints on Process-Tracing Research

Perfect research transparency seems attractive in theory. Yet five practical considerations limit the feasibility and desirability of any effort to maximize the evidentiary, analytic and procedural information that any scholar, qualitative or quantitative, reveals.

  • Intellectual property law imposes limits on the secondary use of published material: in most countries a modest quotation can be employed for non-profit or scholarly purposes, but entire published documents cannot be reproduced at will. Even sources formally in the public domain (such as archival documents) are often subject to informal restrictions, and any scholar who disseminated them wholesale might find it difficult to research a related topic again in the future.
  • Confidentiality for human subject protection comes into play when scholars agree (either informally or as part of a formal research design or Institutional Review Board-approved arrangement) to keep information confidential. Some such information cannot be cited at all; some has to be sanitized.
  • An unreasonable logistical burden may arise if scholars are obliged to deposit, sanitize or reproduce massive quantities of data in qualitative form, particularly if they did not expect to do so when collecting the data.
  • Scholars should expect to enjoy a reasonable first-use right to exploit newly collected data and analysis before other scholars can access it. This helps incentivize and fairly reward scholars, particularly younger ones, who collect and analyze data in new ways.
  • Publication procedures and formats are often costly to change, especially in the short-term.[xiii]

What Should We Look for in an Applied Transparency Standard?

For qualitative process-tracing to be intelligible, credible and legitimate, data and analysis must be transparent and the major research choices must be justified. The unique interpretive flexibility qualitative scholars enjoy only serves to increase transparency requirements. Any appropriate and workable standard of qualitative transparency be suited to this type of research. It must preserve the basic narrative “process-tracing” structure of presentation. Scholars should provide readers with the data and analytical interpretation of each piece of evidence in context, and a methodological justification for their selection. Readers must be able to move efficiently, in real time, from a point in the main narrative directly to the source and its analysis, and back again—a function traditionally carried out by footnotes and endnotes. Third, analytic transparency provisions must permit scholars to explain the interpretive choices they have made with regard to each piece of evidence. All this must take place within the real-world constraints set by intellectual property law, human subject protection, logistics, first use rights and existing publication formats.

  1. Can Existing Transparency Instruments Enhance Qualitative Research Transparency?

This section examines three relatively familiar options for enhancing social scientific transparency: conventional citations, hyperlinks to external web sources and data archiving. Though all are useful in specific circumstances, none offers a workable default standard for qualitative political scientists given the constraints discussed above.

Conventional Citation

The transparency standard employed today in political science, conventional citation, is manifestly unable to provide even minimal qualitative research transparency.[xiv] To be sure, the potential exists. Legal academics and historians employ “best practices,” such as ubiquitous discursive footnotes containing long, annotated quotations and a disciplinary discourse in which other scholars often challenge textual claims. Law reviews permit readers to scan at a glance the main argument, quotations from sources and the basis of the interpretation. Historical journals note the exact location, nature and interpretation of a document, often backed by quotations.

Yet in political science qualitative transparency has been rendered all but impossible by recent trends in formatting journals and books: notably ever tighter word limits and so-called scientific citations designed to accommodate references only to secondary literature rather than data. Political science has regressed: decades ago, discursive long-form citations permitted political scientists to achieve greater research transparency than they can today. This trend is unlikely to reverse. The results have been profound, extending far beyond format. Political science has seen a decay in the humanistic culture of appreciation, debate, reuse, extension and increasing depth of text-based empirical understanding—and an atrophy in the skills to engage in such debates, such as linguistic, functional, regional area, policy and historical knowledge remain prized, skills that remain commonplace in legal academia, history and other disciplines. Only in small pockets of regional area, historical or policy analysis linked by common linguistic, functional or historical knowledge does qualitative transparency and depth of debate persist. This is a major reason for the crisis in qualitative political science today.

Two Digital Alternatives: External Hyperlinks and Data Archiving

If fundamental reform of conventional citation practice is unlikely, then effective enhancement of research transparency will require intensive application of digital technology. In recent years, political scientists have considered the digital alternatives. Two possibilities mentioned most often—external hyperlinks and data archiving—prove on close inspection to be too narrowly and selectively applicable to be broad default instruments to enhance qualitative transparency.

The first digital option is to hyperlinks to external web-based sources. Hyperlinks are now ubiquitous in journalism, policy analysis, social media, government reports and law, as well as scholarly disciplines (such as medicine) in which researchers mainly reference other scholars, not data. A few areas of political science are analogous. Yet most sources political scientists cite—including the bulk of archival materials, other primary documents, pamphlets and informally published material, books published in the past 70 years, raw interview transcripts, ethnographic field materials, photographs, diagrams and drawings—are unavailable electronically. Even when sources exist on-line, external hyperlinks offer only data access, not analytic or process transparency. We learn what a scholar cited but not why or how. In any case, as we shall see below, other transparency instruments subsume the advantages of hyperlinks.

The second digital option is to archive data in a single unified database. Data archiving has obvious advantages: it is consistent with common practice in quantitative analysis, builds a procedural bulwark against selection bias (cherry-picking) of favorable quotations and documents; and, with some further technological innovation, and might even permit specific textual passages to be linked conveniently to individual citations in a paper. Certainly data archiving is essential to preserve and publicize complete collections of new and unique field data, such as original interviews, ethnographic notes, primary document collections, field research material, and web scrapes—an important responsibility for social scientists. Data archiving is also the best means to present qualitative data that is analyzed as a single set of “dataset observations,” as in some content, ethnographic or survey analyses. Databases such as Atlas, Access, Filemaker, and Endnote now offer promising and innovative way to store, array and manipulate textual data particularly useful in research designs that emphasize macro-comparative inquiry, systematic coding, content analysis, or weighing of a large number of sources to estimate relatively few, carefully predefined variables—often in “mixed-method” research designs.[xv] These important considerations have recently led political scientists to launch data repositories for textual material, notably the Qualitative Data Repository established with NSF funding at Syracuse University.[xvi]

Yet, however useful it may be for these specific purposes, data archiving is not an efficient format for enhancing general qualitative research transparency. This is because it copes poorly with the modal qualitative researcher, who employs process tracing to analyze causal process observations. The reasons are numerous. First, intellectual property law imposes narrow de jure and de facto limits on reproducing textual sources, including almost all secondary material published in the seventy years and a surprising portion of archival government documents, private primary material, web text, commercial material and artistic and visual products. Second, many sources (even if legally in the public domain) cannot be publicized due to confidentiality agreements and human subject protection enforced by Institutional Review Boards (IRBs). These include many interview transcripts, ethnographic observation, and other unpublished material. Even when partial publication is possible, the cost and risk of sanitizing the material appropriately can be prohibitively high, if possible at all.[xvii] Third, the logistical burden of data archiving imposes additional limits. In theory addressing “cherry picking” by archiving all qualitative data a scholar examines seems attractive, yet it is almost always impractical. Serious primary source analysts sift through literally orders of magnitude more source material than they peruse intensively, and peruse orders of magnitude more source material than are important enough to cite. Almost all the discarded material is uninteresting. Sanitizing, depositing and reading tens or hundreds of times more material than is relevant would create prohibitive logistical burdens for scholar and reader alike. Finally, even if all or some relevant data is archived efficiently, it does little to enhance analytic transparency. As with hyperlinking, we learn what a scholar cited but not why. Overall, these limitations mean that data archiving may be a first best transparency approach for qualitative evidence that is arrayed and analyzed in the form of database observations, but it is sub-optimal for the modal process-tracing form of analysis.

  1. A Practical yet Effective Standard: Active Citation

Conventional citation, hyperlinks and data archiving, we have seen, have inherent weaknesses as general transparency standards. A new and innovative approach is required. Scholars are converging to the view that the most promising and practical default standard for enhancing qualitative transparency is Active Citation (AC): a system of digitally-enabled citations linked to annotated excerpts from original sources.

In the AC format, any citation to a contestable empirical claim is hyperlinked to an entry in an appendix appended to the scholarly work (the “Transparency Appendix” (TRAX)).[xviii] Each TRAX entry contains four elements, the first three required and the last one optionally:

  1. a short excerpt from the source (presumptively 50-100 words long);
  2. an annotation explaining how the source supports the underlying claim in the main text (of a length at the author’s discretion);
  3. the full citation;
  4. optionally, a scan of or link to the full source.

The first entry of the TRAX is reserved exceptionally for information on general issues of production transparency. Other TRAX entries can easily be adapted to presenting sources in visual, audio, cinematic, graphic, and other media. AC can be employed in almost any form of scholar work: unpublished papers, manuscripts submitted for publication, online journal articles, and e-books, or as separate online appendices to printed journals and books. Examples of AC can be found on various demonstration websites and journal articles.

AC is an innovative yet pragmatic standard. It significantly enhances research transparency while respecting and managing legal, administrative, logistical and commercial constraints. The relatively short length of the source excerpt assures basic data transparency, including some interpretive context, while remaining avoiding most legal, human subject, logistical, and first-use constraints.[xix] (If such constraints remain, they trump the requirement to publicize data, but scholars may employ an intermediate solution, such as providing a brief summary of the data.) The optional scan or link offers the possibility of referencing a complete source, when feasible—thereby subsuming the primary advantage of hyperlinking and data archiving. The annotation delivers basic analytic transparency by offering an opportunity for researchers to explain how the source supports the main text—but the length remains at the discretion of the author. The exceptional first entry enhances process transparency by providing an open-ended way of addressing research design, selection bias, and other general methodological concerns that remain insufficiently explained in the main text, footnotes, empirical entries, or other active citations. The full citation assures that each TRAX entry is entirely self-contained, which facilitates convenient downloading into word-processing, bibliographical and database software. This contributes over time to the creation of a type of networked database of annotated quotations, again subsuming an advantage of data archiving at lower logistical cost. Finally, AC achieves these goals while only minimally disrupting current publishing practices. Except for the hyperlinks, existing formats and word limits remain unchanged (the TRAX, like all appendices, lies outside word limits). Journals would surely elect to include active footnotes only in electronic formats. Editors need not enforce ex post archiving requirements.

AC is a carefully crafted compromise reflecting years of refinement based on numerous methodological articles, workshops, discussions with editors, interdisciplinary meetings, consultations with potential funders, instructional sessions with graduate students, and, perhaps most importantly, consultation sessions with researchers of all methodological types. Underneath the digital technology and unfamiliar format lies an essentially conservative and cautious proposal. Interpretivists and traditional historians will note that it recreates discursive citation practices they have long employed in a digital form. Quantitative researchers will note that it adheres to the shared norm of placing data and analysis in a single third-party “database,” though it interprets that term in a very different manner than conventional statistical researchers do.

For these reasons, AC has emerged as the most credible candidate to be the default transparency standard for qualitative papers, articles and, eventually, books in political science. It is rapidly sinking disciplinary roots. AC has been elaborated and taught in published articles, at disciplinary and interdisciplinary conferences, at training institutes and at graduate seminars. The NSF-funded Qualitative Data Repository has commissioned ten “active citation compilations” by leading scholars of international relations and comparative politics, who are retrofitting classic and forthcoming articles or chapters to the active citation format. Software developers are creating tools to assist in preparing TRAXs, in particular via software add-ons to automate the formatting of transparency appendices and individual entries in popular word processing programs. The new APSA ethics and professional responsibility rules recommend AC, as do, most importantly, the APSA-sponsored common statement in which 15 journals (so far) jointly committed to move to enhanced transparency in January 2016.[xx]

  1. Conclusion: A Transparent Future

At first glance, transparency may appear to be an unwelcome burden on qualitative researchers. Yet in many respects it should be seen instead as a rare and valuable opportunity. The cost of implementing active citation is relatively low, particularly when one knows in advance that it is expected.[xxi] After all, legal scholars and many historians—not to mention political scientists in generations past—have long done something similar as a matter of course, and without the assistance of word processers, hand-held cameras and the web, which ease archival and textual preservation and recognition. More importantly, greater transparency offers political scientists large individual and collective benefits. It provides an opportunity to demonstrate clearly, and to be rewarded for, scholarly excellence. This in turn is likely to incentivize scholars to invest in relevant skills, which include interpretive subtlety in reading texts, process-tracing and case selection techniques, and deep linguistic, area studies, historical, functional and policy knowledge. These skills will be in greater demand not just to conduct research, but to referee and debate it. Perhaps most important, the networked pool of transparent data, analysis and methods in active citations will constitute an ever-expanding public good for researchers, who can appreciate, critique, extend and reuse that data and analysis, just as quantitative scholars make low-cost use of existing datasets and analytical techniques.

The trend toward more open research is not just desirable; it is inevitable. In almost every walk of life—from science and government to shopping and dating—the wave of digital transparency has proven irresistible. The question facing qualitative political scientists is no longer whether they will be swept along in this trend. The question is when and how. Our responsibility is to fashion norms that acknowledge and respect the epistemological priorities and practices qualitative political scientists share.




[iii] Daniel Maliniak, Susan Peterson, and Daniel J. Tierney, TRIP around the World: Teaching, Research and Policy Views of International Relations Faculty in 20 Countries (Williamsburg, Virginia: Institute for the Theory and Practice of International Relations, College of William and Mary, 2012), Charts 28–30, 42. 57, at

[iv] Gary King, “Replication, Replication,” PS: Political Science and Politics (September 1995), p. 444.

[v] This tripartite distinction revises that found in Arthur Lupia and Colin Elman, “Openness in Political Science: Data Access and Research Transparency,” PS: Political Science & Politics (January 2014): 19–42, appendices A and B.

[vi] Brian A. Nosek, Jeffrey R. Spies, and Matt Motyl, “Scientific Utopia: II. Restructuring Incentives and Practices to Promote Truth over Publishability,” Perspectives on Psychological Science (2012) 7(6): 615–631.

[vii] APSA, Guidelines for Qualitative Transparency (2013), see fn. 6.

[viii] For a more complete discussion of this topic and others in this essay, see Andrew Moravcsik, “Trust, yet Verify: The Transparency Revolution and Qualitative International Relations,” Security Studies 23:4 (December 2014) (2014b), pp. 663-688; “Transparency: The Revolution in Qualitative Political Science,” PS: Political Science & Politics 47, no. 1 (January 2014) (2014b): 48–53; “Active Citation and Qualitative Political Science,” Qualitative & Multi-Method Research 10, no.1 (Spring 2012); Andrew Moravcsik, “Active Citation: A Precondition for Replicable Qualitative Research,” PS: Political Science & Politics 43, no. 1 (January 2010): 29–35.

[ix] Lupia and Elman, fn 6.

[x] This distinction was originally introduced by Henry Brady, David Collier and Justin Seawright, subsequently adopted by James Mahoney, Gary Goertz and many others, and has now become canonical. The causal process mode of inferring causality, in Brady and Collier’s words, “contributes a different kind of leverage in causal inference…A causal-process observation may [give] insight into causal mechanisms, insight that is essential to causal assessment and is an indispensable alternative and/or supplement to correlation-based causal inference.” Henry Brady and David Collier, Rethinking Social Inquiry: Diverse Tools, Shared Standards (Lanham, MD: Rowman & Littlefield, 2010), pp. 252–53.

[xi] See Moravcsik 2014b for a discussion.

[xii] Steven Van Evera, Guide to Methods for Students of Political Science (Ithaca, NY: Cornell University Press, 1997), 32; James Mahoney, “The Logic of Process-Tracing Tests in the Social Sciences,” Sociological Methods & Research 41(4) 570–597; David Collier, “Understanding Process Tracing,” PS: Political Science & Politics 44, no. 4 (2011): 823–30; Peter A. Hall, “Systematic Process Analysis: When and How to Use It,” European Management Review 3 (2006): 24–31.

[xiii] For previous discussions of these factors, see Moravcsik 2010, 2012, 2014a, 2014b.

[xiv] See Moravcsik 2010 for a lengthier discussion of this point.

[xv] Evan S. Lieberman, “Bridging the Qualitative-Quantitative Divide: Best Practices in the Development of Historically Oriented Replication Databases,” Annual Review of Political Science 13 (2010): 37–59.

[xvi] For link, see fn. 3.

[xvii] It is striking, for example, that even quantitative political science journals that pride themselves on a rigorous standard of “replicability” often apply that standard only to quantitative data, and resolutely refuse to publicize the qualitative data that is coded to create it—precisely for these reasons.

[xviii] For a longer discussion of AC, with examples, see Moravcsik 2014b.

[xix] In the US, fifty to one hundred words for non-profit scholarly use lies within the customary “fair use” exception, with the exception of artistic products. Most other countries have similar legal practices. This length is also relatively easy to scan, logistically feasible to sanitize for human subject protection, and poses less of a threat to “first use” exploitation rights of new data.

[xx] For a link, see fn 3.

[xxi] For a detailed argument and evidence that the costs are low, see Moravcsik 2012 and 2014b. Even those who point out that retrofitting articles to the standard is time-consuming point out that with advance knowledge, the workload will diminish considerably. See Elizabeth Saunders, “Transparency without Tears: A Pragmatic Approach to Transparent Security Studies Research,” Security Studies 23(4) (December 2014), pp. 689-698.

Posted in Uncategorized | 1 Comment

Improving Research Transparency in Political Science: Replication and Political Analysis

Editor’s note: this post is contributed by R. Michael Alvarez, Professor of Political Science at the California Institute of Technology and Co-Editor of Political Analysis.

Recently, the American Political Science Association (APSA) launched an initiative to improve research ethics. One important outcome of this initiative is a joint statement that a number of our discipline’s major journals have signed: the Data Access and Research Transparency statement. These journals include Political Analysis, the American Political Science Review, the American Journal of Political Science, and at present a handful of other journals.

The joint statement outlines four important goals for these journals:

  1. Require that authors provide their datasets at the time of publication in a trusted digital repository.
  2. Require that authors provide analytic procedures so that the results in their publication can be replicated.
  3. Develop and implement a data citation policy.
  4. Make sure that journal guidelines and other materials delineate these requirements.

We are happy to report that Political Analysis has complied with these requirements and in the case of our policies and procedures on research replication, our approach has provided an important example for how replication can be implemented by a major journal. Our compliance with these requirements is visible in the new instructions for authors and reviewers that we recently issued.

Political Analysis has long had a policy that all papers published in our journal need to provide replication data. Recently we have taken steps to strengthen our replication policy and to position our journal as a leader on this issue. The first step in this process was to require that before we send an accepted manuscript to production that the authors provide the materials necessary to replicate the results reported in the manuscript. The second step was for us to develop a relatively simple mechanism for the archiving of this replication material the development of the Political Analysis Dataverse. We now have over 200 replication studies in our Dataverse, and these materials have been downloaded over 15,000 times.

Exactly how does this work? Typically, a manuscript successfully exits the review process, and the editors conditionally accept the manuscript for publication. One of the conditions, of course, is that the authors upload their replication materials to the journal’s Dataverse, and that they insert a citation to those materials in the final version of their manuscript. Once the materials are in the journal’s Dataverse, and the final version of the manuscript has been returned to our editorial office, both receive final review. As far as the replication materials go, that review usually involves:

  1. An examination of the documentation provided with the replication materials.
  2. Basic review of the provided code and other analytic materials.
  3. A basic audit of the data provided with the replication materials.

The good news is that in most cases, replication materials pass this review quickly authors know our replication requirement and most seem to have worked replication into their research workflow.

Despite what many may think (especially given the concerns that are frequently expressed by other journal editors when they hear about our replication policy), authors do not complain about our replication policy. We’ve not had a single instance where an author has refused or balked about complying with our policy. Instead, the vast majority of our authors will upload their replication materials quickly after we request them, which indicates that they have them ready to go and that they build the expectation of replication into their research workflow.

The problems that we encounter generally revolve around adequate documentation for the replication materials, clarity and usability of code, and issues with the replication data itself. These are all issues that we are working to develop better guidelines and policies regarding, but here are some initial thoughts.

First, on documentation. Authors who are developing replication materials should strive to make their materials as usable as possible for other researchers. As many authors already know, by providing well-documented replication materials they are increasing the likelihood that another scholar will download their materials and use them in their own research which will likely generate a citation for the replication materials and the original article they come from. Or a colleague at another university will use well-documented
replication materials in their methods class, which will get the materials and the original article in front of many students. Perhaps a graduate student will download the materials and use them as the foundation for their dissertation work, again generating citations for the materials and the original article. The message is clear; well-documented replication materials are more likely to be used, and the more they are used the more attention the original research will receive.

Second, clarity and usability of code. For quantitative research in social science, code (be it R, Stata, SPSS, Python, Perl or something else) is the engine that drives the analysis. Writing code that is easy to read and use is critical for the research process. Writing good code is something that we need to focus more attention on in our research methodology curriculum, and as a profession we need more guidelines regarding good coding practices. This is an issue that we are Political Analysis will be working on in the near future, trying to develop guidelines and standards for good coding practices so that replication code is more usable.

Finally data. There are two primary problems that we see with replication data. The first is that authors provide data without sufficiently clearing the data of “personally identifying information” (PII). Rarely is PII necessary in replication data; again, the purpose of the replication material is to reproduce the results reported in the manuscript. Clearly there may be subsequent uses of the replication data, in which another scholar might wish to link the replication materials to other datasets. In those cases we urge the producer of the replication materials to provide some indication in their documentation about how they can be contacted to assist in that linkage process.

The second problem we see regards the ability of authors to provide replication data that can be made freely available. There are occasions where important research uses proprietary data; in those situations we encourage authors to let the editors know that they are using proprietary or restricted data upon submission so that we have time to figure out how to recommend that the author comply with our replication requirement. Usually the solution entails having the author provide clear details about how one would reproduce the results in the paper were one to have access to the proprietary or restricted data. In many cases, those who wish to replicate a published paper may be able to obtain the restricted data from the original source, and in such a situation we want them to be able to know exactly each step that goes from raw data to final analysis.

Recently we updated our replication policies, and also developed other policies that help to increase the transparency and accessibility of the research that is published in Political Analysis. However, policies and best practices in these areas are currently evolving and very dynamic. We will likely be updating the journal’s policies frequently in the coming years, as Political Analysis is at the forefront of advancing journal policies in these areas. We are quite proud of all that we have accomplished regarding our replication and research policies at Political Analysis, and happy that other journals look to us for guidance and advice.

Posted in Uncategorized | Leave a comment

The Use of Replication in Graduate Education and Training

Editor’s note: this post is contributed by Wendy Martinek, Associate Professor of Political Science at Binghamton University.

Writing almost 20 years ago as part of a symposium on the subject,[1] King (1995) articulated a strong argument in favor of the development of a replication standard in political science. As King wrote then, “Good science requires that we be able to reproduce existing numerical results, and that other scholars be able to show how substantive findings change as we apply the same methods in new contexts” (451). Key among the conditions necessary for this to occur is that the authors of published work prepare replication data sets that contain everything needed to reproduce reported empirical results. And, since a replication data set is not useful if it is not accessible, also important are authors’ efforts to make replication data sets easily available. Though his argument and the elements of his proposed replication standard were not tied to graduate education per se, King did make the following observation: “Reproducing and then extending high-quality existing research is also an extremely useful pedagogical tool, albeit one that political science students have been able to exploit only infrequently given the discipline’s limited adherence to the replication standard” (445). Given the trend towards greater data access and research transparency, King (2006) developed a guide for the production of a publishable manuscript based on the replication of a published article.

With this guide in hand, and informed by their own experiences in teaching graduate students, many faculty members have integrated replication assignments into their syllabi. As Herrnson has observed, “Replication repeats an empirical study in its entirety, including independent data collection” (1995: 452). As a technical matter, then, the standard replication assignment is more of verification assignment than a true replication assignment. Regardless, such assignments have made their way onto graduate syllabi in increasing numbers. One prominent reason—and King’s (2006) motivation—is the facilitation of publication by graduate students. The academic job market is seemingly tighter than ever (Jaschik 2009) and publications are an important element of an applicant’s dossier, particularly when applying for a position at a national university (Fuerstman and Lavertu 2005). Accordingly, incorporating an assignment that helps students produce a publishable manuscript whenever appropriate makes good sense. Well-designed replication assignments, however, can also serve other goals. In particular, they can promote the development of practical skills, both with regard to the technical aspects of data access/manipulation and with regard to best practices for data coding/maintenance. Further, they can help students to internalize norms of data accessibility and research transparency. In other words, replication assignments are useful vehicles for advancing graduate education and training.

A replication assignment that requires students to obtain the data set and computer code to reproduce the results reported in a published article (and then actually reproduce those results) directs student attention to three very specific practical tasks. They are tasks that require skills often taken for granted once mastered, but which most political science graduate students do not possess when starting graduate school (something more advanced graduate students and faculty often forget). Most basically, it requires students to work out how to obtain the data and associated documentation (e.g., codebook). Sometimes this task turns out to be ridiculously easy, as when the data is publicly archived, either on an author’s personal webpage or through a professional data archive (e.g., Dataverse Network, ICPSR). But that is certainly not always the case, much to students’ chagrin and annoyance (King 2006: 120; Carsey 2014: 74-75). To be sure, the trend towards greater data accessibility is reflected in, for example, the editorial policies of many political science journals[2] and the data management and distribution requirements imposed by funding agencies like the National Science Foundation.[3] Despite this trend, students undertaking replication assignments not infrequently find that they have to contact the authors directly to obtain the data and/or the associated documentation. The skills needed for the simple (or sometimes not-so-simple) task of locating data may seem so basic as to be trivial for experienced researchers. However, those basic skills are not something new graduate students typically possess. A replication assignment by definition requires inexperienced students to plunge in and acquire those skills.

The second specific task that such a replication assignment requires is actually figuring out how to open the data file. A data file can be in any number of formats (e.g., .dta, .txt, .xls, .rda). For the lucky student, the data file may already be available in a format that matches the software package she intends to use. Or, if not, the student has access to something like Stat/Transfer or DBMS-Copy to convert the data file to a format compatible with her software package. This, too, may seem like a trivial skill to an experienced researcher. That is because it is a trivial skill to an experienced researcher. But it is not trivial for novice graduate students. Moreover, even more advanced graduate students (and faculty) can find accessing and opening data files from key repositories such as ICPSR daunting. For example, students adept at working with STATA, SAS, and SPSS files might still find it less than intuitive to open ASCII-format data with setup files. The broader point is that the mere act of opening a data file once it has been located is not necessarily all that obvious and, as with locating a data file, a replication assignment can aid in the development of that very necessary skill.

The third specific task that such a replication assignment requires is learning how to make sense of the content of someone else’s data file. In an ideal world (one political scientists rarely if ever occupy), the identity of each variable and its coding are crystal clear from a data set’s codebook alone. Nagler outlines best practices in this regard, including the use of substantively meaningful variable names that indicate the subject and (when possible) the direction of the coding (1995: 490). Those conventions are adhered to unevenly at best, however, and the problem is exacerbated when relying on large datasets that use either uninformative codebook numbers or mnemonics that make sense but only to experienced users. For example, the General Social Survey (GSS) includes the SPWRKSTA variable. Once the description of the variable is known (“spouse labor force status”) then the logic of the mnemonic makes some sense: SP = spouse, WRK = labor force, STA = status. But it makes no sense to the uninitiated and even an experienced user of the GSS might have difficulty recalling what that variable represents without reference to the codebook. There is also a good deal of variation in how missing data is coded across data sets. Not uncommonly, numeric values like 99 and -9 are used to denote a missing value for a variable. That is obviously problematic if those codes are used as nonmissing numeric values for the purposes of numeric calculations. Understanding what exactly “mystery name” variables reference and how such things as missing data have been recorded in the coding process are crucial for a successful replication. The fact that these things are so essential for a successful replication forces students to delve into the minutia of the coded data and become familiar with it in a way that is easier to avoid (though still unadvisable) when simply using existing data to estimate an entirely new model.

Parenthetically, the more challenges students encounter early on when learning these skills, the better off they are in the long run for one very good reason. Students receive lots of advice and instruction regarding good data management and documentation practices (e.g., Nagler 1995). But there is nothing like encountering difficulty when using someone else’s data to bring home the importance of relying on best practices in coding one’s own data. The same is true with regard to documenting the computer code (e.g., STATA do-files, R scripts). In either case, the confusions and ambiguities with which students must contend when replicating the work of others provide lessons that are much more visceral and, hence, much more effective in fostering the development of good habits and practices than anything students could read or be told by their instructor.

These three specific tasks (acquiring a data set and its associated documentation, then opening and using that data set) require skills graduate students should master very early on in their graduate careers. This makes a replication assignment especially appealing for a first- or second-semester methods course. But replication assignments are also valuable in more advanced methods courses and substantive classes. An important objective in graduate education is the training and development of scholars who are careful and meticulous in the selection and use of methodological tools. But, with rare exception, the goal is not methodological proficiency for its own sake but, rather, methodological proficiency for the sake of advancing theoretical understanding of the phenomena under investigation. A replication assignment is ideal for grounding the development of methodological skills in a substantively meaningful context, thereby helping to fix the notion in students’ minds of methodological tools as in the service of advancing theoretical understanding.

Consider, for example, extreme bounds analysis (EBA), a useful tool for assessing the robustness of the relationship between a dependent variable and a variety of possible determinants (Leamer 1983). The basic logic of EBA is that, the smaller the range of variation in a coefficient of interest given the presence or absence of other explanatory variables, the more robust that coefficient of interest is. It is easy to imagine students focusing on the trivial aspects of determining the focus and doubt variables (i.e., the variables included in virtually all analyses and the variables that may or may not be included depending upon the analysis) in a contrived class assignment. A replication assignment by its nature, however, requires a meaningful engagement with the extant literature to understand the theoretical consensus among scholars as to which variable(s) matter (and, hence, which should be considered focus rather than doubt variables). Matching methods constitute another example. Randomized experiments, in which the treatment and control groups differ from one another only randomly vis-à-vis both observed and unobserved covariates, are the gold standard for causal inference. However, notwithstanding innovative resources such as Time-Sharing Experiments for the Social Sciences (TESS) and Amazon’s Mechanical Turk and the greater prevalence of experimental methods, much of the data available to political scientists to answer their questions of interest are observational. Matching methods are intended to provide leverage for making causal claims based on observational data through the balancing of the distribution of covariates in treatment and control groups regardless of the estimation technique employed post-matching (Ho et al. 2007). Considering matching in the context of a published piece of observational research of interest to a student necessitates that the student is thinking in substantive terms about what constitutes the treatment and what the distribution of covariates looks like. As with EBA, a replication assignment in which students are obligated to apply matching methods to evaluate the robustness of a published observational study would insure that the method was tied directly to the assessment (and, hopefully, advancement) of theoretical claims rather than as an end to itself.

Though there remain points of contention and issues with regard to implementation that will no doubt persist, there is currently a shared commitment to openness in the political science community, incorporating both data access and research transparency (DA-RT). This is reflected, for example, in the data access guidelines promulgated by the American Political Science Association (Lupia and Elman 2014). The training and mentoring provided to graduate students in their graduate programs are key components of the socialization process by which they learn to become members of the academic community in general and their discipline in particular (Austin 2002). Replication assignments in graduate classes serve to socialize students into the norms of DA-RT. As Carsey notes, “Researchers who learn to think about these issues at the start of their careers, and who see value in doing so at the start of each research project, will be better able to produce research consistent with these principles” (2014: 75). Replication assignments serve to inculcate students with these principles. And, while they have obvious value in the context of methods courses, to fully realize the potential of replication assignments in fostering the development of these professional values in graduate students they should be part of substantive classes as well. The more engaged students are with the substantive questions at hand, the easier it should be to engage their interest in understanding the basis of the inferences scholars have drawn to answer those questions and where the basis for those inferences can be improved to the betterment of theoretical understanding. In sum, the role of replication in graduate education and training is both to develop methodological skills and enhance theory-building abilities.

Works Cited

Austin, Ann E. 2002. “Preparing the Next Generation of Faculty: Graduate School as Socialization to the Academic Career.” Journal of Higher Education 73(1): 94-122.

Carsey, Thomas M. 2014. “Making DA-RT a Reality.” PS: Political Science & Politics 47(1): 72-77.

Fuerstman, Daniel and Stephen Lavertu. 2005. “The Academic Hiring Process: A Survey of Department Chairs.” PS: Political Science and Politics 38(4): 731-736.

Ho, Daniel E., Kosuke Imai, Gary King, and Elizabeth A. Stuart. 2007. “Matching as Nonparametric Preprocessing for Reducing Model Dependence in Parametric Causal Inference.” Political Analysis 15(3): 199-236.

Jaschik, Scott. 2009. “Job Market Realities.” Inside Higher Ed, September 8. (November 29, 2014).

King, Gary. 2006. “Publication, Publication.” PS: Political Science & Politics 39(1): 119-125.

Leamer, Edward. 1983. “Let’s Take the ‘Con’ Out of Econometrics.” American Economic Review 73(1): 31-43.

Lupia, Arthur and Colin Elman. 2014. “Openness in Political Science: Data Access and Research Transparency.” PS: Political Science & Politics 47(1): 19-42.

Nagler, Jonathan. 1995. “Coding Style and Good Computing Practices.” PS: Political Science & Politics 39(1): 488-492.

[1] The symposium appeared in the September 1995 issue of PS: Political Science and Politics.

[2] See, for example, (November 15, 2014).

[3] See (November 15, 2014).

Posted in Uncategorized | Leave a comment

How tough should reviewers be?

At lunch with two colleagues the other day, an interesting question came up: how often should we as reviewers aim to give favorable reviews (conditional acceptances and strong revise-and-resubmit recommendations) to articles at selective, high-prestige journals?

It’s a complicated question, and I’m not aiming to cover every possible angle. Rather, I’m assuming, as a part of this question, that reviewers and journal editors are aiming to publish a certain proportion of submitted articles that represent the best research being produced at that time. For example, the American Political Science Review might aim to publish 8-10% of the articles that it receives (presumably the best 8-10%!). To start off, I’m also assuming that unanimous support from three reviewers is necessary and sufficient to receive an invitation to revise and resubmit; I’ll relax this assumption later. For ease of interpretation, I assume that all articles invited to R&R are published.

What I want to know is: if reviewers agree with the journal editor’s target, how often should they grant strong reviews to articles?

The answer is surprising to me: presuming that reviewer opinions are less-than-perfectly correlated and that unanimous reviewer approval is required for acceptance, reviewers should probably be giving positive reviews 25% of the time or more in order to achieve an overall acceptance rate of about 10%. 

How did I arrive at this answer? Using R, I simulated a review process wherein three reviews are generated. Each reviewer grants a favorable review with probability pr.accept; these reviews are correlated with coefficient rho between 0 and 0.98. I generated review outcomes for 2,000 papers using this process, then calculated the proportion of accepted papers under the system. The code looks like this (the entire replication code base is here):


rho <- seq(from=0, to=0.98, by=0.02)
pr.accept <- 0.5

for(k in 1:length(rho)){
reviews <- rCopula(2000, normalCopula(param=c(rho[k], rho[k], rho[k]), dim=3, dispstr="un")) < pr.accept
decisions <- apply(X=reviews, MARGIN=1, FUN=min)

# acceptance rate
accept[k] <- sum(decisions)/length(decisions)

plot(accept ~ rho, ylim=c(0, 0.45), col=gray(0.5), ylab = "proportion of accepted manuscripts", xlab = "correlation of reviewer opinions", main=c("How Tough Should Reviewers Be?", "3 Reviewers, Unanimity Approval Needed")) <- predict(loess(accept~rho))
lines(, lty=1)

I plot the outcome for three different individual reviewer pr.accept values below; the three pr.accept values are 50%, 25%, and 10%.


What’s most interesting is that, the less-correlated that reviewer opinions are, the more frequently that individual reviewers should be inclined to grant a positive review in order to achieve the overall publication target. If reviewer opinions are not-at-all correlated, then only a little more than 10% of articles will actually receive an invitation to revise and resubmit if reviewers recommend R&R 50% of the time. If reviewer opinions are correlated at 0.6, then an individual reviewer approval rate of 25% corresponds to an overall publication rate of a little under 10%.

What if the editor is more actively involved in the process, and unanimity is not required? I added a fourth reviewer (the editor) to the simulation, and required that 3 out of the 4 reviews be positive in order for an article to be invited to R&R. This means that the editor and two reviewers, or all three reviewers, have to like an article in order for it to be published.

The results are below. As you can see, the result is that acceptances do go up. Now, if reviewer opinions are correlated at 0.6, slightly over 10% of papers are eventually published.

editor-reviewI think the conclusion to draw from this analysis is that individual reviewers need not be extremely demanding in order for the overall system as a whole to be quite stringent. If a reviewer aims to accept 10% of papers on the theory that the journal wishes to accept 10% of papers, probably all s/he accomplishes is ensuring that his/her area is underrepresented in the overall distribution of publications.

Make no mistake–I don’t think my analysis indicates that reviewers should be less critical in the substantive evaluation of a manuscript, or that review standards should be lowered in some sense. Rather, I think that reviewers should recognize that achieving even majority support for a paper is quite challenging, and they should be individually more-willing to give papers with scholarly merit a chance to be published even if they don’t believe the paper is in their personal top 10% of publications. It might be better if reviewers instead aimed to accept papers in their personal top 25%, recognizing that the process as a whole will still filter out a great many of these papers.


Posted in Uncategorized | Leave a comment

On the Replication of Experiments in Teaching and Training

Editor’s note: this piece is contributed by Jon Rogers, Visiting Assistant Research Professor and member of the Social Science Experimental Laboratory (SSEL) at NYU Abu Dhabi.


Students in the quantitative social sciences are exposed to high levels of rational choice theory.  Going back to Marwell and Ames (1981), we know that economists free ride, but almost no one else does (in the strict sense anyway).  In part, this is because many social science students are essentially taught to free ride.  They see these models of human behavior and incorrectly take the lesson that human beings should be rational and free ride.  To not free ride would be irrational.  Some have difficulty grasping that these are models meant to predict, not prescribe and judge human behavior.

Behaviorally though, it is well established that most humans are not perfectly selfish.  Consider the dictator game, where one player decides how much of her endowment to give to a second player.  A simple Google Scholar search for dictator game experiments returns nearly 40,000 results.  It is no stretch to posit that almost none of these report that every first player kept the whole endowment for herself (Engel, 2011).  When a new and surprising result is presented in the literature, it is important for scholars to replicate the study to examine its robustness.  Some results however, are so well known and robust, that they graduate to the level of empirical regularity.

While replication of surprising results is good for the discipline, replication of classic experiments is beneficial for students.  In teaching, experiments can be used to demonstrate the disconnect between Nash equilibrium and actual behavior and to improve student understanding of the concept of modeling.  Discussions of free-riding, the folk theorem, warm glow, and the like can all benefit from classroom demonstration.  For graduate students, replication of experiments is also useful training, since it builds programming, analysis, and experimenter skills in an environment where the results are low risk to the grad student’s career.  For students of any type, replication is a useful endeavor and one that should be encouraged as part of the curriculum.

Replication in Teaching

Budding political scientists and economists are virtually guaranteed to be introduced, at some level, to rational choice.  Rational choice is characterized by methodological individualism and the maximization of self interest.  That is, actors (even if the actor of interest is a state or corporation) are assumed to be individuals who make choices based on what they like best.  When two actors are placed in opposition to one another, they are modeled as acting strategically to maximize their own payoffs and only their own payoffs.

Consider the classic ultimatum game.  Player A is granted an endowment of 10 tokens and is tasked with choosing how much to give to player B.  Player B can then decide to either accept or reject the offer.  If she accepts, then the offer is enforced and subjects receive their payments.  If she rejects the offer, then both players receive nothing.  In their game theory course work, students are taught to identify the Nash equilibrium through backward induction.  In the second stage, player B chooses between receiving 0 and receiving the offer x, with certainty. Since she is modeled as being purely self interested, she accepts the offer, no matter how small.  In the first stage, player A knows that player B will accept any offer, so she gives the smallest ε > 0 possible.  This yields equilibrium payoffs of (10-ε , ε).

Students are taught to identify this equilibrium and are naturally rewarded by having test answers marked correct.  Through repeated drilling of this technique, students become adept at identifying equilibria in simple games, but make the unfortunate leap of seeing those who play the rational strategy as being smarter or better.  A vast literature reports that players rarely make minimal offers and that such offers are frequently rejected (Oosterbeek, Sloof, and van de Kuilen, 2004).  Sitting with their textbooks however, students are tempted to misuse the terminology of rational choice and deem irrational any rejection or non-trivial offer.  Students need to be shown that Nash equilibria are sets of strategy profiles derived from models and not inherently predictions in and of themselves.  Any model is an abstraction from reality and may omit critical features of the scenario it attempts to describe.  A researcher may predict that subjects will employ equilibrium strategies, but she may just as easily predict that considerations such as trust, reciprocity, or altruism might induce non-equilibrium behavior.  The Nash Equilibrium is a candidate hypothesis, but it is not necessarily unique.

This argument can be applied to games with voluntary contribution mechanisms.  In the public goods game for example, each player begins with an endowment and chooses how much to contribute to a group account. All contributions are added together, multiplied by an efficiency factor, and shared evenly among all group members, regardless of any individual’s level of contribution.  In principal, the group as a whole would be better off, if everyone gave the maximum contribution. Under strict rationality however, the strong free rider hypothesis predicts 0 contribution from every player.  Modeling certain situations as public goods games then leads to the prediction that public goods will be under-provided.  Again however, students are tempted to misinterpret the lesson and consider the act of contribution to be inherently irrational.  Aspects of other-regarding behavior can be rational, if they are included in the utility function (Andreoni, 1989).

In each of the above circumstances, students could benefit from stepping back from their textbooks and remembering the purpose of modeling.  Insofar as models are neither true nor false, but useful or not (Clarke and Primo, 2012), they are meant to help researchers predict behavior, not prescribe what a player should do, when playing the game.  Simple classroom experiments, ideally before lecturing on the game, combined with post experiment discussion of results, help students to remember that while a game may have a pure strategy Nash equilibrium, it’s not necessarily a good prediction of behavior.  Experiments can stimulate students to consider why behavior may differ from the equilibrium and how they might revise models to be more useful.

Returning to voluntary contribution mechanisms, it is an empirical regularity in repeated play that in early rounds contributions are relatively high, but over time tend to converge to zero.  Another regularity is that even if contributions have hit zero, if play is stopped and then restarted, then contributions will leap upward, before again trending toward zero.  Much of game theory in teaching is focused on identifying equilibria without consideration of how these equilibria (particularly Nash equilibria) are reached.  Replication of classic experiments allows for discussion of equilibrium selection, coordination mechanisms, and institutions that support pro-social behavior.

One useful way to engage students in a discussion of modelling behavior is to place them in a scenario with solution concepts other than just pure strategy Nash equilibrium.  For instance, consider k-level reasoning.  The beauty contest game takes a set of N players and gives them three options: A, B, and C.  The player’s task is to guess which of the three options will be most often selected by the group.  Thus, players are asked not about their own preferences over the three options, but their beliefs on the preferences of the other players.  In a variant of this game, Rosmary Nagel (1995) takes a set of N players and has them pick numbers between one and one hundred.  The player’s task is to pick a number closest to what they believe will be the average guess, times a parameter p.  If p = 0.5, then subjects are attempting to guess the number between one and one hundred that will be half of the average guess.  The subject with the guess closest to the amount wins.

In this case, some players will notice that no number x ∈ (50,100] can be the correct answer, since these numbers can never be half of the average.  A subject who answers 50 would be labeled level-0, as she has avoided strictly dominated strategies.  Some subjects however, will believe that all subjects have thought through the game at least this far and will realize that the interval of viable answers is really (0,50].  These level-1 players then respond that one half of the average will be x = 25.  The process iterates to its logical (Nash Equilibrium) conclusion.  If all players are strictly rational, then they will all answer 0.  Behaviorally though, guesses of 0 virtually never win.

In a classroom setting, this game is easy to implement and quite illustrative.  Students become particularly attentive, if the professor offers even modest monetary stakes, say between $0.00 and $10.00, with the winning student receiving her guess as a prize.  A class of robots will all guess 0 and the professor will suffer no monetary loss.  But all it takes is a small percentage of the class to enter guesses above 0 to pull the winning guess away from the Nash Equilibrium.  Thus the hyper-rational students who guessed 0 see that the equilibrium answer and the winning answer are not necessarily the same thing (note: the 11-20 money request game by Arad and Rubinstein (2012) is an interesting variant of this without a pure strategy Nash equilibrium at all.).

In each of the above settings, it is well established that many subjects do not employ the equilibrium strategy.  This is surprising to no one beyond those students who worship too readily at the altar of rational choice.  By replicating classic experiments to demonstrate to students that models are not perfect in their ability to predict human behavior, we demote game theory from life plan to its proper level of mathematical tool.  We typically think of replication as a check on faulty research or a means by which to verify the robustness of social scientific results.  Here, we are using replication of robust results to inspire critical thinking about social science itself.  For other students however, replication has the added benefit of enabling training in skills needed to carry out more advanced experiments.

Replication in Training

To some extent, the internet era has been a boon to the graduate student of social sciences, providing ready access to a wide variety of data sources.  Responsible researchers make their data available on request at the very least, if not completely available online.  Fellow researchers can then attempt to replicate findings to test their robustness.  Students, in turn, can use replication files to practice the methods they’ve learned in their classes.

The same is true of experimental data sets.  However the data analysis of experiments is rarely a complex task.  Indeed, the technical simplicity of analysis is one of the key advantages of true experiments.  For the budding experimentalist, replication of data analysis is a useful exercise, but one not nearly as useful as the replication of experimental procedures.  Most data generating processes are, to some extent, sensitive to choices made by researchers.  Most students however, are not collecting their own nationally representative survey data.  Particularly at early levels of development, students may complete course work entirely from existing data.  The vast majority of their effort is spent on the analysis.  Mistakes can be identified and often corrected with what may be little more than a few extra lines of code.

For experimentalists in training though, the majority of the work comes on the front end, as does the majority of the risk.  From writing the experimental program in a language such as zTree (Fischbacher, 2007), which is generally new to the student, to physically running the experimental sessions, a student’s first experiment is an ordeal. The stress of this endeavor is compounded, when its success or failure directly relates to the student’s career trajectory and job market potential.  It is critical for the student to have solid guidance from a well trained advisor.

This is, of course, true of all research methods.  The better a student’s training, the greater their likelihood of successful outcomes.  Data analysis training in political science graduate programs has become considerably more sophisticated in recent years, with students often required to complete three, four, or even more methods courses.  Training for experimentalists however, exhibits considerably more variance and formal training may be unavailable.  Some fortunate students are trained on the job, assisting more senior researchers with their experiments.  But while students benefit from an apprenticeship with an experimentalist, they suffer, ironically enough, from a lack of experimentation.

Any student can practice working with large data.  Many data sets can be accessed for free or via an institutional license.  A student can engage in atheoretical data mining and practice her analysis and interpretation of results.  She can do all of this at home with a glass of beer and the television on.  When she makes a mistake, as a young researcher is wont to do, little is lost and the student has gained a valuable lesson. Students of experiments, however, rarely get the chance to make such mistakes.  A single line of economic experiments can cost thousands of dollars and a student is unlikely to have surplus research funds with which to gain experience.  If she is lucky enough to receive research funding, it will likely be limited to subject payments for her dissertation’s experiment(s).  A single failed session could drain a meaningful portion of her budget, as subjects must be paid, even if the data is unusable.  The rule at many labs is that subjects in failed sessions must still receive their show up fees and then additional compensation for any time they have spent up to the point of the crash.  Even with modest subject payments, this could be hundreds of dollars.

How then is the experimentalist to develop her craft, while under a tight budget constraint?  The answer lies in the empirical regularities discussed earlier.  The size of financial incentives in an experiment does matter, at least in terms of salience (Morton and Williams, 2010), but some effects are so robust as to be present in experiments with even trivial or non-financial incentives.  In my own classroom demonstrations, I have replicated prisoner’s dilemma, ultimatum game, public good game, and many other experiments, using only fractions of extra credit points as incentives and the results are remarkably consistent with those in the literature.[1]   At zero financial cost, I gained experience in the programming and running of experiments and simultaneously ran a lesson on end game effects, the restart effect, and the repeated public goods game.

Not all graduate students teach courses of their own, but all graduate students have advisors or committee members who do.  It is generally less of an imposition for an advisee to ask a faculty member to grant their students a few bonus points than it is to ask for research funds, especially funds that would not be directly spent on the dissertation.  These experiments can be run in every way identical to how one would be run with monetary incentives, but without the cost or risk to the student’s career.  This practice is all the more important at institutions without established laboratories, where the student is responsible for building an ad hoc network.

Even for students with experience assisting senior researchers, independently planning and running an experiment from start to finish, without direct supervision, is invaluable practice.  The student is confronted with the dilemma of how she will run the experiment, not how her advisor would do so.  She then writes her own program and instructions, designs her own physical procedures, and plans every detail on her own.  She can and should seek advice, but she is free to learn and develop her own routine.  The experiment may succeed or fail, but the end product is similar to atheoretical playing with data.  It won’t likely result in a publication, but it will prove to be a valuable learning experience (note: A well-run experiment is the result of not only a properly written program, but also of strict adherence to a set of physical procedures such as (among many others) how to seat subjects, how to convey instructions, and how to monitor laboratory conditions.  A program can be vetted in a vacuum, but the experimenter’s procedures are subject to failure in each and every session, thus practice is crucial.)


Many of the other articles in this special issue deal with the replication of studies, as a matter of good science, in line with practices in the physical sciences.  But in the physical sciences, replication also plays a key role in training.  Students begin replicating classic experiments often before they can even spell the word science.  They follow structured procedures to obtain predictable results, not to advance the leading edge of science, but to build core skills and methodological discipline.

Here though, physical scientists have a distinct advantage.  Their models are frequently based on deterministic causation and are more readily understood, operationalized, tested, and (possibly) disproved.  To the extent that students have encountered scientific models in their early academic careers, these models are likely to have been deterministic.  Most models in social science however, are probabilistic in nature.  It is somewhat understandable that a student in the social sciences, who reads her textbook and sees the mathematical beauty of rational choice, would be enamored with its clarity.  A student, particularly one who has self selected into majoring in economics or politics, can be forgiven for seeing the direct benefits of playing purely rational strategies.  It is not uncommon for an undergraduate to go her entire academic career without empirically testing a model.  By replicating classic experiments, particularly where rational choice fails, we can reinforce the idea that these are models meant to predict behavior, not instructions for how to best an opponent.

In contrast, graduate students explicitly train in designing and testing models.  A key component of training is the ability to make and learn from mistakes.  Medical students learn by practicing on cadavers who cannot suffer.  Chemists learn by following procedures and comparing results to established parameters.  Large-n researchers learn by working through replication files and testing for robustness of results.  In the same spirit, experimentalists can learn by running low risk experiments based on established designs, with predicable results.  In doing so, even if they fail, they build competence in the skills they will need to work independently in the future.  At any rate, while the tools employed in the social sciences differ from those in the physical sciences, the goal is the same: to improve our understanding of the world around us.  Replicating economic experiments aids some in their study of human behavior and others on their path to learn how to study human behavior.  Both are laudable goals.


[1] Throughout the course, students earn “Experimental Credit Units.” The top performing student at the end of the semester receives five extra credit points.  All other students receive extra credit indexed to that of the top performer.  I would love to report the results of the experiments here, but at the time I had no intention of using the data for anything other than educational purposes and thus did not apply for IRB approval.


1. Andreoni, James. 1989. “Giving with Impure Altruism: Applications to Charity and
Ricardian Equivalence.” The Journal of Political Economy 97(6):1447-1458.

2. Arad, Ayala & Ariel Rubinstein. 2012. “The 11-20 Money Requestion Game: A Level-k
Reasoning Study.” The American Economic Review 102(7):3561-3573.

3. Clarke, Kevin A. & David M. Primo. 2012. A Model Discipline: Political Science and
the Logic of Representations. New York, NY: Oxford University Press.

4. Engel, Christoph. 2011. “Dictator Games: a Meta Study.” Experimental Economics

5. Fischbacher, Urs. 2007. “z-Tree: Zurich Toolbox for Ready-made Economic Experiments.” Experimental Economics 10(2):171-178.

6. Marwell, Gerald & Ruth E. Ames. 1981. “Economists Free Ride, Does Anyone Else?
Experiments on the Provision of Public Goods, IV.” Journal of Public Economics
15 (3):295-310.

7. Morton, Rebecca B. & Kenneth C. Williams. 2010. Experimental Political Science and
the Study of Causality. New York, NY: Cambridge University Press.

8. Nagel, Rosemarie. 1995. “Unraveling in Guessing Games: And Experimental Study.”
The American Economic Review 85(5):1313-1326.

9. Oosterbeek, Hessel, Randolph Sloof & Gijs van de Kuilen. 2004. “Cultural Dierences
in Ultimatum Game Experiments: Evidence from a Meta-Analysis.” Experimental
Economics 7(2):171-188.

Posted in Uncategorized | Leave a comment