How tough should reviewers be?

At lunch with two colleagues the other day, an interesting question came up: how often should we as reviewers aim to give favorable reviews (conditional acceptances and strong revise-and-resubmit recommendations) to articles at selective, high-prestige journals?

It’s a complicated question, and I’m not aiming to cover every possible angle. Rather, I’m assuming, as a part of this question, that reviewers and journal editors are aiming to publish a certain proportion of submitted articles that represent the best research being produced at that time. For example, the American Political Science Review might aim to publish 8-10% of the articles that it receives (presumably the best 8-10%!). To start off, I’m also assuming that unanimous support from three reviewers is necessary and sufficient to receive an invitation to revise and resubmit; I’ll relax this assumption later. For ease of interpretation, I assume that all articles invited to R&R are published.

What I want to know is: if reviewers agree with the journal editor’s target, how often should they grant strong reviews to articles?

The answer is surprising to me: presuming that reviewer opinions are less-than-perfectly correlated and that unanimous reviewer approval is required for acceptance, reviewers should probably be giving positive reviews 25% of the time or more in order to achieve an overall acceptance rate of about 10%. 

How did I arrive at this answer? Using R, I simulated a review process wherein three reviews are generated. Each reviewer grants a favorable review with probability pr.accept; these reviews are correlated with coefficient rho between 0 and 0.98. I generated review outcomes for 2,000 papers using this process, then calculated the proportion of accepted papers under the system. The code looks like this (the entire replication code base is here):

library(copula)

rho <- seq(from=0, to=0.98, by=0.02)
pr.accept <- 0.5

accept<-c()
for(k in 1:length(rho)){
reviews <- rCopula(2000, normalCopula(param=c(rho[k], rho[k], rho[k]), dim=3, dispstr="un")) < pr.accept
decisions <- apply(X=reviews, MARGIN=1, FUN=min)

# acceptance rate
accept[k] <- sum(decisions)/length(decisions)
}

plot(accept ~ rho, ylim=c(0, 0.45), col=gray(0.5), ylab = "proportion of accepted manuscripts", xlab = "correlation of reviewer opinions", main=c("How Tough Should Reviewers Be?", "3 Reviewers, Unanimity Approval Needed"))
y.fit <- predict(loess(accept~rho))
lines(y.fit~rho, lty=1)

I plot the outcome for three different individual reviewer pr.accept values below; the three pr.accept values are 50%, 25%, and 10%.

unanimity-review

What’s most interesting is that, the less-correlated that reviewer opinions are, the more frequently that individual reviewers should be inclined to grant a positive review in order to achieve the overall publication target. If reviewer opinions are not-at-all correlated, then only a little more than 10% of articles will actually receive an invitation to revise and resubmit if reviewers recommend R&R 50% of the time. If reviewer opinions are correlated at 0.6, then an individual reviewer approval rate of 25% corresponds to an overall publication rate of a little under 10%.

What if the editor is more actively involved in the process, and unanimity is not required? I added a fourth reviewer (the editor) to the simulation, and required that 3 out of the 4 reviews be positive in order for an article to be invited to R&R. This means that the editor and two reviewers, or all three reviewers, have to like an article in order for it to be published.

The results are below. As you can see, the result is that acceptances do go up. Now, if reviewer opinions are correlated at 0.6, slightly over 10% of papers are eventually published.

editor-reviewI think the conclusion to draw from this analysis is that individual reviewers need not be extremely demanding in order for the overall system as a whole to be quite stringent. If a reviewer aims to accept 10% of papers on the theory that the journal wishes to accept 10% of papers, probably all s/he accomplishes is ensuring that his/her area is underrepresented in the overall distribution of publications.

Make no mistake–I don’t think my analysis indicates that reviewers should be less critical in the substantive evaluation of a manuscript, or that review standards should be lowered in some sense. Rather, I think that reviewers should recognize that achieving even majority support for a paper is quite challenging, and they should be individually more-willing to give papers with scholarly merit a chance to be published even if they don’t believe the paper is in their personal top 10% of publications. It might be better if reviewers instead aimed to accept papers in their personal top 25%, recognizing that the process as a whole will still filter out a great many of these papers.

 

Posted in Uncategorized | Leave a comment

On the Replication of Experiments in Teaching and Training

Editor’s note: this piece is contributed by Jon Rogers, Visiting Assistant Research Professor and member of the Social Science Experimental Laboratory (SSEL) at NYU Abu Dhabi.

Introduction

Students in the quantitative social sciences are exposed to high levels of rational choice theory.  Going back to Marwell and Ames (1981), we know that economists free ride, but almost no one else does (in the strict sense anyway).  In part, this is because many social science students are essentially taught to free ride.  They see these models of human behavior and incorrectly take the lesson that human beings should be rational and free ride.  To not free ride would be irrational.  Some have difficulty grasping that these are models meant to predict, not prescribe and judge human behavior.

Behaviorally though, it is well established that most humans are not perfectly selfish.  Consider the dictator game, where one player decides how much of her endowment to give to a second player.  A simple Google Scholar search for dictator game experiments returns nearly 40,000 results.  It is no stretch to posit that almost none of these report that every first player kept the whole endowment for herself (Engel, 2011).  When a new and surprising result is presented in the literature, it is important for scholars to replicate the study to examine its robustness.  Some results however, are so well known and robust, that they graduate to the level of empirical regularity.

While replication of surprising results is good for the discipline, replication of classic experiments is beneficial for students.  In teaching, experiments can be used to demonstrate the disconnect between Nash equilibrium and actual behavior and to improve student understanding of the concept of modeling.  Discussions of free-riding, the folk theorem, warm glow, and the like can all benefit from classroom demonstration.  For graduate students, replication of experiments is also useful training, since it builds programming, analysis, and experimenter skills in an environment where the results are low risk to the grad student’s career.  For students of any type, replication is a useful endeavor and one that should be encouraged as part of the curriculum.

Replication in Teaching

Budding political scientists and economists are virtually guaranteed to be introduced, at some level, to rational choice.  Rational choice is characterized by methodological individualism and the maximization of self interest.  That is, actors (even if the actor of interest is a state or corporation) are assumed to be individuals who make choices based on what they like best.  When two actors are placed in opposition to one another, they are modeled as acting strategically to maximize their own payoffs and only their own payoffs.

Consider the classic ultimatum game.  Player A is granted an endowment of 10 tokens and is tasked with choosing how much to give to player B.  Player B can then decide to either accept or reject the offer.  If she accepts, then the offer is enforced and subjects receive their payments.  If she rejects the offer, then both players receive nothing.  In their game theory course work, students are taught to identify the Nash equilibrium through backward induction.  In the second stage, player B chooses between receiving 0 and receiving the offer x, with certainty. Since she is modeled as being purely self interested, she accepts the offer, no matter how small.  In the first stage, player A knows that player B will accept any offer, so she gives the smallest ε > 0 possible.  This yields equilibrium payoffs of (10-ε , ε).

Students are taught to identify this equilibrium and are naturally rewarded by having test answers marked correct.  Through repeated drilling of this technique, students become adept at identifying equilibria in simple games, but make the unfortunate leap of seeing those who play the rational strategy as being smarter or better.  A vast literature reports that players rarely make minimal offers and that such offers are frequently rejected (Oosterbeek, Sloof, and van de Kuilen, 2004).  Sitting with their textbooks however, students are tempted to misuse the terminology of rational choice and deem irrational any rejection or non-trivial offer.  Students need to be shown that Nash equilibria are sets of strategy profiles derived from models and not inherently predictions in and of themselves.  Any model is an abstraction from reality and may omit critical features of the scenario it attempts to describe.  A researcher may predict that subjects will employ equilibrium strategies, but she may just as easily predict that considerations such as trust, reciprocity, or altruism might induce non-equilibrium behavior.  The Nash Equilibrium is a candidate hypothesis, but it is not necessarily unique.

This argument can be applied to games with voluntary contribution mechanisms.  In the public goods game for example, each player begins with an endowment and chooses how much to contribute to a group account. All contributions are added together, multiplied by an efficiency factor, and shared evenly among all group members, regardless of any individual’s level of contribution.  In principal, the group as a whole would be better off, if everyone gave the maximum contribution. Under strict rationality however, the strong free rider hypothesis predicts 0 contribution from every player.  Modeling certain situations as public goods games then leads to the prediction that public goods will be under-provided.  Again however, students are tempted to misinterpret the lesson and consider the act of contribution to be inherently irrational.  Aspects of other-regarding behavior can be rational, if they are included in the utility function (Andreoni, 1989).

In each of the above circumstances, students could benefit from stepping back from their textbooks and remembering the purpose of modeling.  Insofar as models are neither true nor false, but useful or not (Clarke and Primo, 2012), they are meant to help researchers predict behavior, not prescribe what a player should do, when playing the game.  Simple classroom experiments, ideally before lecturing on the game, combined with post experiment discussion of results, help students to remember that while a game may have a pure strategy Nash equilibrium, it’s not necessarily a good prediction of behavior.  Experiments can stimulate students to consider why behavior may differ from the equilibrium and how they might revise models to be more useful.

Returning to voluntary contribution mechanisms, it is an empirical regularity in repeated play that in early rounds contributions are relatively high, but over time tend to converge to zero.  Another regularity is that even if contributions have hit zero, if play is stopped and then restarted, then contributions will leap upward, before again trending toward zero.  Much of game theory in teaching is focused on identifying equilibria without consideration of how these equilibria (particularly Nash equilibria) are reached.  Replication of classic experiments allows for discussion of equilibrium selection, coordination mechanisms, and institutions that support pro-social behavior.

One useful way to engage students in a discussion of modelling behavior is to place them in a scenario with solution concepts other than just pure strategy Nash equilibrium.  For instance, consider k-level reasoning.  The beauty contest game takes a set of N players and gives them three options: A, B, and C.  The player’s task is to guess which of the three options will be most often selected by the group.  Thus, players are asked not about their own preferences over the three options, but their beliefs on the preferences of the other players.  In a variant of this game, Rosmary Nagel (1995) takes a set of N players and has them pick numbers between one and one hundred.  The player’s task is to pick a number closest to what they believe will be the average guess, times a parameter p.  If p = 0.5, then subjects are attempting to guess the number between one and one hundred that will be half of the average guess.  The subject with the guess closest to the amount wins.

In this case, some players will notice that no number x ∈ (50,100] can be the correct answer, since these numbers can never be half of the average.  A subject who answers 50 would be labeled level-0, as she has avoided strictly dominated strategies.  Some subjects however, will believe that all subjects have thought through the game at least this far and will realize that the interval of viable answers is really (0,50].  These level-1 players then respond that one half of the average will be x = 25.  The process iterates to its logical (Nash Equilibrium) conclusion.  If all players are strictly rational, then they will all answer 0.  Behaviorally though, guesses of 0 virtually never win.

In a classroom setting, this game is easy to implement and quite illustrative.  Students become particularly attentive, if the professor offers even modest monetary stakes, say between $0.00 and $10.00, with the winning student receiving her guess as a prize.  A class of robots will all guess 0 and the professor will suffer no monetary loss.  But all it takes is a small percentage of the class to enter guesses above 0 to pull the winning guess away from the Nash Equilibrium.  Thus the hyper-rational students who guessed 0 see that the equilibrium answer and the winning answer are not necessarily the same thing (note: the 11-20 money request game by Arad and Rubinstein (2012) is an interesting variant of this without a pure strategy Nash equilibrium at all.).

In each of the above settings, it is well established that many subjects do not employ the equilibrium strategy.  This is surprising to no one beyond those students who worship too readily at the altar of rational choice.  By replicating classic experiments to demonstrate to students that models are not perfect in their ability to predict human behavior, we demote game theory from life plan to its proper level of mathematical tool.  We typically think of replication as a check on faulty research or a means by which to verify the robustness of social scientific results.  Here, we are using replication of robust results to inspire critical thinking about social science itself.  For other students however, replication has the added benefit of enabling training in skills needed to carry out more advanced experiments.

Replication in Training

To some extent, the internet era has been a boon to the graduate student of social sciences, providing ready access to a wide variety of data sources.  Responsible researchers make their data available on request at the very least, if not completely available online.  Fellow researchers can then attempt to replicate findings to test their robustness.  Students, in turn, can use replication files to practice the methods they’ve learned in their classes.

The same is true of experimental data sets.  However the data analysis of experiments is rarely a complex task.  Indeed, the technical simplicity of analysis is one of the key advantages of true experiments.  For the budding experimentalist, replication of data analysis is a useful exercise, but one not nearly as useful as the replication of experimental procedures.  Most data generating processes are, to some extent, sensitive to choices made by researchers.  Most students however, are not collecting their own nationally representative survey data.  Particularly at early levels of development, students may complete course work entirely from existing data.  The vast majority of their effort is spent on the analysis.  Mistakes can be identified and often corrected with what may be little more than a few extra lines of code.

For experimentalists in training though, the majority of the work comes on the front end, as does the majority of the risk.  From writing the experimental program in a language such as zTree (Fischbacher, 2007), which is generally new to the student, to physically running the experimental sessions, a student’s first experiment is an ordeal. The stress of this endeavor is compounded, when its success or failure directly relates to the student’s career trajectory and job market potential.  It is critical for the student to have solid guidance from a well trained advisor.

This is, of course, true of all research methods.  The better a student’s training, the greater their likelihood of successful outcomes.  Data analysis training in political science graduate programs has become considerably more sophisticated in recent years, with students often required to complete three, four, or even more methods courses.  Training for experimentalists however, exhibits considerably more variance and formal training may be unavailable.  Some fortunate students are trained on the job, assisting more senior researchers with their experiments.  But while students benefit from an apprenticeship with an experimentalist, they suffer, ironically enough, from a lack of experimentation.

Any student can practice working with large data.  Many data sets can be accessed for free or via an institutional license.  A student can engage in atheoretical data mining and practice her analysis and interpretation of results.  She can do all of this at home with a glass of beer and the television on.  When she makes a mistake, as a young researcher is wont to do, little is lost and the student has gained a valuable lesson. Students of experiments, however, rarely get the chance to make such mistakes.  A single line of economic experiments can cost thousands of dollars and a student is unlikely to have surplus research funds with which to gain experience.  If she is lucky enough to receive research funding, it will likely be limited to subject payments for her dissertation’s experiment(s).  A single failed session could drain a meaningful portion of her budget, as subjects must be paid, even if the data is unusable.  The rule at many labs is that subjects in failed sessions must still receive their show up fees and then additional compensation for any time they have spent up to the point of the crash.  Even with modest subject payments, this could be hundreds of dollars.

How then is the experimentalist to develop her craft, while under a tight budget constraint?  The answer lies in the empirical regularities discussed earlier.  The size of financial incentives in an experiment does matter, at least in terms of salience (Morton and Williams, 2010), but some effects are so robust as to be present in experiments with even trivial or non-financial incentives.  In my own classroom demonstrations, I have replicated prisoner’s dilemma, ultimatum game, public good game, and many other experiments, using only fractions of extra credit points as incentives and the results are remarkably consistent with those in the literature.[1]   At zero financial cost, I gained experience in the programming and running of experiments and simultaneously ran a lesson on end game effects, the restart effect, and the repeated public goods game.

Not all graduate students teach courses of their own, but all graduate students have advisors or committee members who do.  It is generally less of an imposition for an advisee to ask a faculty member to grant their students a few bonus points than it is to ask for research funds, especially funds that would not be directly spent on the dissertation.  These experiments can be run in every way identical to how one would be run with monetary incentives, but without the cost or risk to the student’s career.  This practice is all the more important at institutions without established laboratories, where the student is responsible for building an ad hoc network.

Even for students with experience assisting senior researchers, independently planning and running an experiment from start to finish, without direct supervision, is invaluable practice.  The student is confronted with the dilemma of how she will run the experiment, not how her advisor would do so.  She then writes her own program and instructions, designs her own physical procedures, and plans every detail on her own.  She can and should seek advice, but she is free to learn and develop her own routine.  The experiment may succeed or fail, but the end product is similar to atheoretical playing with data.  It won’t likely result in a publication, but it will prove to be a valuable learning experience (note: A well-run experiment is the result of not only a properly written program, but also of strict adherence to a set of physical procedures such as (among many others) how to seat subjects, how to convey instructions, and how to monitor laboratory conditions.  A program can be vetted in a vacuum, but the experimenter’s procedures are subject to failure in each and every session, thus practice is crucial.)

Discussion

Many of the other articles in this special issue deal with the replication of studies, as a matter of good science, in line with practices in the physical sciences.  But in the physical sciences, replication also plays a key role in training.  Students begin replicating classic experiments often before they can even spell the word science.  They follow structured procedures to obtain predictable results, not to advance the leading edge of science, but to build core skills and methodological discipline.

Here though, physical scientists have a distinct advantage.  Their models are frequently based on deterministic causation and are more readily understood, operationalized, tested, and (possibly) disproved.  To the extent that students have encountered scientific models in their early academic careers, these models are likely to have been deterministic.  Most models in social science however, are probabilistic in nature.  It is somewhat understandable that a student in the social sciences, who reads her textbook and sees the mathematical beauty of rational choice, would be enamored with its clarity.  A student, particularly one who has self selected into majoring in economics or politics, can be forgiven for seeing the direct benefits of playing purely rational strategies.  It is not uncommon for an undergraduate to go her entire academic career without empirically testing a model.  By replicating classic experiments, particularly where rational choice fails, we can reinforce the idea that these are models meant to predict behavior, not instructions for how to best an opponent.

In contrast, graduate students explicitly train in designing and testing models.  A key component of training is the ability to make and learn from mistakes.  Medical students learn by practicing on cadavers who cannot suffer.  Chemists learn by following procedures and comparing results to established parameters.  Large-n researchers learn by working through replication files and testing for robustness of results.  In the same spirit, experimentalists can learn by running low risk experiments based on established designs, with predicable results.  In doing so, even if they fail, they build competence in the skills they will need to work independently in the future.  At any rate, while the tools employed in the social sciences differ from those in the physical sciences, the goal is the same: to improve our understanding of the world around us.  Replicating economic experiments aids some in their study of human behavior and others on their path to learn how to study human behavior.  Both are laudable goals.

Notes

[1] Throughout the course, students earn “Experimental Credit Units.” The top performing student at the end of the semester receives five extra credit points.  All other students receive extra credit indexed to that of the top performer.  I would love to report the results of the experiments here, but at the time I had no intention of using the data for anything other than educational purposes and thus did not apply for IRB approval.

References

1. Andreoni, James. 1989. “Giving with Impure Altruism: Applications to Charity and
Ricardian Equivalence.” The Journal of Political Economy 97(6):1447-1458.

2. Arad, Ayala & Ariel Rubinstein. 2012. “The 11-20 Money Requestion Game: A Level-k
Reasoning Study.” The American Economic Review 102(7):3561-3573.

3. Clarke, Kevin A. & David M. Primo. 2012. A Model Discipline: Political Science and
the Logic of Representations. New York, NY: Oxford University Press.

4. Engel, Christoph. 2011. “Dictator Games: a Meta Study.” Experimental Economics
14:583-610.

5. Fischbacher, Urs. 2007. “z-Tree: Zurich Toolbox for Ready-made Economic Experiments.” Experimental Economics 10(2):171-178.

6. Marwell, Gerald & Ruth E. Ames. 1981. “Economists Free Ride, Does Anyone Else?
Experiments on the Provision of Public Goods, IV.” Journal of Public Economics
15 (3):295-310.

7. Morton, Rebecca B. & Kenneth C. Williams. 2010. Experimental Political Science and
the Study of Causality. New York, NY: Cambridge University Press.

8. Nagel, Rosemarie. 1995. “Unraveling in Guessing Games: And Experimental Study.”
The American Economic Review 85(5):1313-1326.

9. Oosterbeek, Hessel, Randolph Sloof & Gijs van de Kuilen. 2004. “Cultural Dierences
in Ultimatum Game Experiments: Evidence from a Meta-Analysis.” Experimental
Economics 7(2):171-188.

Posted in Uncategorized | Leave a comment

Request for Proposals for Editor, APSR

Editor’s note: this notice is posted at the request of Barbara Walthall, Managing Editor or PS, on behalf of the APSR Editorial search committee.

In June 2016, the University of North Texas editorial team at the American Political Science Review will complete its current term. APSA President Rodney Hero has named an APSR search committee to help identify a successor or successors to be presented for Council approval in August 2015.

The members of the search committee are Melanie Manion, University of Wisconsin, Madison, chair; Frank Baumgartner, University of North Carolina, Chapel Hill; Paulina Ochoa Espejo, Haverford College; Amy Mazur, Washington State University; Reuel Rogers, Northwestern University; Peter Trubowitz, London School of Economics; and Christopher Zorn, Pennsylvania State University.

The Review is the centerpiece of the association’s publications. Its contents represent the best work in political science to political scientists in the United States and abroad, to other social scientists, and to interested parties in foundations, government, and the private sector.

The search committee invites proposals for an editor or group of editors to lead the Review. The new editor or editors will be charged with maintaining the centrality of the Review to the profession and upholding the standards of excellence cultivated by Review editors since 1906. The search committee seeks an editor or editorial team with a commitment to publishing articles that represent the substantive and methodological diversity of the discipline, including qualitative and multi-method research, and support of data access and research transparency in journal publication. Any proposal should include ideas for maintaining the high standards of the journal and increasing the diversity of articles published.

Typically, the cost of running the journal is shared by the editorial home institution and APSA. Potential editors should consider this cost-sharing norm in submitting proposals to the committee. The association is especially open to proposals that include innovative individual and institutional collaborations and welcomes preliminary discussions with candidates about how such proposals might be structured and funded.

Proposals, accompanied by resumes, should be sent to apsaeditorsearch@apsanet.org. Questions about the search and the editor position can be directed to Steven Rathgeb Smith, executive director of APSA, at smithsr@apsanet.org. The committee will begin to review proposals on April 15, 2015. The committee may review proposals submitted after the deadline. Proposals submitted on or before the deadline will receive full consideration.

Posted in Call for Papers / Conference, The Discipline | Leave a comment

A Decade of Replications: Lessons from the Quarterly Journal of Political Science

Editor’s note: this piece is contributed by Nicholas Eubank, a PhD Candidate in Political Economy at the Stanford University Graduate School of Business.

The success of science depends critically on the ability of peers to interrogate published research in an effort not only to confirm its validity but also to extend its scope and probe its limitations. Yet as social science has become increasingly dependent on computational analyses, traditional means of ensuring the accessibility of research — like peer review of written academic publications — are no longer sufficient. To truly ensure the integrity of academic research moving forward, it is necessary that published papers be accompanied by the code used to generate results. This will allow other researchers to investigate not just whether a paper’s methods are theoretically sound, but also whether they have been properly implemented and are robust to alternative specifications.

Since its inception in 2005, the Quarterly Journal of Political Science (QJPS) has sought to encourage this type of transparency by requiring all submissions to be accompanied by a replication package, consisting of data and code for generating paper results. These packages are then made available with the paper on the QJPS website. In addition, all replication packages are subject to internal review by the QJPS prior to publication. This internal review includes ensuring the code executes smoothly, results from the paper can be easily located, and results generated by the replication package match those in the paper.

This policy is motivated by the belief that publication of replication materials serves at least three important academic purposes. First, it helps directly ensure the integrity of results published in the QJPS. Although the in-house screening process constitutes a minimum bar for replication, it has nevertheless identified a remarkable number of problems in papers. In the last two years, for example, 13 of the 24 empirical papers subject to in-house review were found to have discrepancies between the results generated by authors’ own code and the results in their written manuscripts.

Second, by emphasizing the need for transparent and easy-to-interpret code, the QJPS hopes to lower the costs associated with other scholars interrogating the results of existing papers. This increases the probability other scholars will examine the code for published papers, potentially identifying errors or issues of robustness if they exist. In addition, while not all code is likely to be examined in detail, it is the hope of the QJPS that this transparency will motivate submitting authors to be especially cautious in their coding and robustness checks, preventing errors before they occur.

Third and finally, publication of transparent replication packages helps facilitate research that builds on past work. Many papers published in the QJPS represent methodological innovations, and by making the code underlying those innovations publicly accessible, we hope to lower the cost to future researchers of building on existing work.

(1) In-House Replication

The experience of the QJPS in its first decade underscores the importance of its policy of in-house review. Prior to publication, all replication packages are tested to ensure code runs cleanly, is interpretable, and generates the results in the paper.

This level of review represents a sensible compromise between the two extremes of review. On the one hand, most people would agree that an ideal replication would consist of a talented researcher re-creating a paper from scratch based solely on the paper’s written methodology section. However, undertaking such replications for every submitted paper would be cost-prohibitive in time and labor, as would having someone check an author’s code for errors line-by-line. On the other hand, direct publication of replication packages without review is also potentially problematic. Experience has shown that many authors submit replication packages that are extremely difficult to interpret or may not even run, defeating the purpose of a replication policy.

Given that the QJPS review is relatively basic, however, one might ask whether it is even worth the considerable time the QJPS invests. Experience has shown the answer is an unambiguous “yes.” Of the 24 empirical papers subject to in-house replication review since September 2012, [1] only 4 packages required no modifications. Of the remaining 20 papers, 13 had code that would not execute without errors, 8 failed to include code for results that appeared in the paper, [2] and 7 failed to include installation directions for software dependencies. Most troubling, however, 13 (54 percent) had results in the paper that differed from those generated by the author’s own code. Some of these issues were relatively small — likely arising from rounding errors during transcription — but in other cases they involved incorrectly signed or mis-labeled regression coefficients, large errors in observation counts, and incorrect summary statistics. Frequently, these discrepancies required changes to full columns or tables of results. Moreover, Zachary Peskowitz, who served as the QJPS replication assistant from 2010 to 2012, reports similar levels of replication errors during his tenure as well. The extent of the issues — which occurred despite authors having been informed their packages would be subject to review — points to the necessity of this type of in-house interrogation of code prior to paper publication.

(2) Additional Considerations for a Replication Policy

This section presents an overview of some of the most pressing and concrete considerations the QJPS has come to view as central to a successful replication policy. These considerations — and the specific policies adopted to address them — are the result of hard-learned lessons from a decade of replication experience.

2.1 Ease of Replication
The primary goal of QJPS policies is ensuring replication materials can be used and interpreted with the greatest of ease. To the QJPS, ease of replication means anyone who is interested in replicating a published article (hereafter, a “replicator”) should be able to do so as follows:

  1. Open a README.txt file in the root replication folder, and find a summary of all replication materials in that folder, including subfolders if any.
  2. After installing any required software (see Section 2.4 on Software Dependencies) and setting a working directory according to directions provided in the README.txt file, the replicator should be able simply to open and run the relevant files to generate every result and figure in the publication. This includes all results in print and/or online appendices.
  3. Once the code has finished running, the replicator should be able easily to locate the output and to see where that output is reported in the paper’s text, footnotes, figures, tables, or appendices.

2.2 README.txt File

To facilitate ease of replication, all replication packages should include a README.txt file that includes, at a minimum:

  1. Table of Contents: a brief description of every file in the replication folder.
  2. Notes for Each Table and Figure: a short list of where replicators will find the code needed to replicate all parts of the publication.
  3. Base Software Dependencies: a list of all software required for replication, including the version of software used by the author (e.g. Stata 11.1, R 2.15.3, 32bit Windows 7, OSX 10.9.4).
  4. Additional Dependencies: a list of all libraries or added functions required for replication, as well as the versions of the libraries and functions that were used and the location from which those libraries and functions were obtained.
    1. R: the current R versions can be found by typing R.Version() and information on loaded libraries can be found by typing sessionInfo().
    2. Stata: Stata does not specifically “load” extra-functions in each session, but a list of all add-ons installed on a system can be found by typing ado.dir.
  5. Seed locations: Authors are required to set seeds in their code for any analyses that employ randomness (e.g., simulations or bootstrapped standard errors. For further discussion, see Section 2.5). The README.txt file should include a list of locations where seeds are set in the analyses so that replicators can find and change the seeds to check the robustness of the results.

2.3 Depth of Replication

The QJPS requires that every replication package include the code that computes the primary results of the paper. In other words, it is not sufficient to provide a file of pre-computed results along with the code that formats the results for LaTeX. Rather, the replication package must include everything that is needed to execute the statistical analyses or simulations that constitute the primary contribution of the paper. For example, if a paper’s primary contribution is a set of regressions, then the data and code needed to produce those regressions must be included. If a paper’s primary contribution is a simulation, then code for that simulation must be provided—not just a dataset of the simulation results. If a paper’s primary contribution is a novel estimator, then code for the estimator must be provided. And, if a paper’s primary contribution is theoretical and numeric simulation or approximation methods were used to provide the equilibrium characterization, then that code must be included.

Although the QJPS does not necessarily require the submitted code to access the data if the data are publicly available (e.g., data from the National Election Studies, or some other data repository), it does require that the dataset containing all of the original variables used in the analysis be included in the replication package. For the sake of transparency, the variables should be in their original, untransformed and unrecoded form, with code included that performs the transformations and recodings in the reported analyses. This allows replicators to assess the impact of transformations and recodings on the results.

2.3.1 Proprietary and Non-Public Data
If an analysis relies on proprietary or non-public data, authors are required to contact the QJPS Editors before or during initial submission. Even when data cannot be released publicly, authors are often required to provide QJPS staff access to data for replication prior to publication. Although this sometimes requires additional arrangements — in the past, it has been necessary for QJPS staff to be written in IRB authorizations — in-house review is especially important in these contexts, as papers based on non-public data are difficult if not impossible for other scholars to interrogate post-publication.

2.4 Software Dependencies
Online software repositories — like CRAN or SSC — provide authors with easy access to the latest versions of powerful add-ons to standard programs like R and Stata. Yet the strength of these repositories — their ability to ensure authors are always working with the latest version of add-ons — is also a liability for replication.

Because online repositories always provide the most recent version of add-ons to users, the software provided in response to a given query actually changes over time. Experience has shown this can cause problems when authors use calls to these repositories to install add-ons (through commands like install_packages(“PACKAGE”) in R or ssc install PACKAGE in Stata. As scholars may attempt to replicate papers months or years after a paper has been published, changes in the software provided in response to these queries may lead to replication failures. Indeed, the QJPS has experienced replication failures due to changes in the software hosted on the CRAN server that occurred between when a paper was submitted to the QJPS and when it was reviewed.

With that in mind, the QJPS now requires authors to include copies of all software (including both base software and add-on functions and libraries) used in the replication in their replication package, as well as code that installs these packages on a replicator’s computer. The only exceptions are extremely commonly tools, like R, Stata, Matlab, Java, Python, or ArcMap (although Java- and Python-based applications must be included). [3]

2.5 Randomizations and Simulations

A large number of modern algorithms employ randomness in generating their results (e.g., the bootstrap). In these cases, replication requires both (a) ensuring that the exact results in the paper can be re-created, and (b) ensuring that the results in the paper are typical rather than cherry-picked outliers. To facilitate this type of analysis, authors should:

  1. Set a random number generator seed in their code so it consistently generates the exact results in the paper;
  2. Provide a note in the README.txt file indicating the location of all such commands, so replicators can remove them and test the representativeness of result.

In spite of these precautions, painstaking experience has shown that setting a seed is not always sufficient to ensure exact replication. For example, some libraries generate slightly different results on different operating systems (e.g. Windows versus OSX) and on different hardware architectures (e.g. 32-bit Windows 7 versus 64-bit Windows 7). To protect authors from such surprises, we encourage authors to test their code on multiple platforms, and document any resulting exceptions or complications in their README.txt file.

2.6 ArcGIS
Although we encourage authors to write replication code for their ArcGIS-based analyses using the ArcPy scripting utility, we recognize that most authors have yet to adopt this tool. For the time being, the QJPS accepts detailed, step-by-step instructions for replicating results via the ArcGIS Graphical User Interface (GUI). However, as with the inclusion and installation of add-on functions, the QJPS has made available a tutorial on using ArcPy available to authors which we hope will accelerate the transition towards use of this tool. [4]

(3) Advice to Authors
In addition to the preceding requirements, the QJPS also provides authors with some simple guidelines to help prevent common errors. These suggestions are not mandatory, but they are highly recommended.

  1. Test files on a different computer, preferably with a different operating system: Once replication code has been prepared, the QJPS suggests authors email it to a different computer, unzip it, and run it. Code often contains small dependencies—things like unnoticed software requirements or specific file locations—that go unnoticed until replication. Running code on a different computer often exposes these issues in a way that running the code on one’s own does not.
  2. Check every code-generated result against your final manuscript PDF: The vast majority of replication problems emerge because authors either modified their code but failed to update their manuscript, or made an error while transcribing their results into their paper. With that in mind, authors are strongly encouraged to print out a copy of their manuscript and check each result before submitting your final version of the manuscript and replication package.

(4) Conclusion

As the nature of academic research changes, becoming ever more computationally intense, so too must the peer review process. This paper provides an overview of many of the lessons learned by the QJPS‘s attempt to address this need. Most importantly, however, it documents not only the importance of requiring the transparent publication of replication materials but also the strong need for in-house review of these materials prior to publication.

 

 

[1] September 2012 is when the author took over responsibility for all in-house interrogations of replication packages at the QJPS.

[2] This does not include code which failed to execute, which might also be thought of as failing to replicate results from the paper.

[3] To aid researchers in meeting this requirement, detailed instructions on how to include CRAN or SSC packages in replication packages are provided through the QJPS.

[4] ArcPy is a Python-based tool for scripting in ArcGIS.

Posted in Uncategorized | 8 Comments

Reproducibility and Transparency

The Political Methodologist is joining with 12 other political science journals in signing the Data Access and Research Transparency (DA-RT) joint statement.

The social sciences receive little respect from politicians and segments of the mass public. There are many reasons for this, including:

A partial solution to building trust is to increase the transparency of our claims and this is why The Political Methodologist is signing on to DA-RT.

As researchers we need to ensure that the claims we make are supported by systematic argument (either formal or normative theory) or by marshaling empirical evidence (either qualitative or quantitative). I am going to focus on empirical quantitative claims here (in large part because many of the issues I point to are more easily solved for quantitative research). This idea of DA-RT is simple and has three elements. First, an author should ensure that data are available to the community. This means putting it in a trusted digital repository. Second, an author should ensure that the analytic procedures on which the claims are based are public record. Third, data and analytic procedures should be properly cited with a title, version and persistent identifier. Interest in DA-RT extends beyond political science. From November 3-4, 2014 the Center for Open Science co-sponsored a workshop designed to produce standards for data accessibility, transparency and reproducibility. At the table were journal editors from the social sciences and Science. The latter issued a rare joint editorial with Nature detailing standards for the biological sciences to ensure reproducibility and transparency. Science magazine aims at doing the same for the social sciences.

Ensuring that our claims are grounded in evidence may seem non-controversial. Foremost, the evidence used to generate claims needs to be publicly accessible and interpretable. For those using archived data (e.g., COW or ANES) this is relatively easy. For those collecting original data it may be more difficult. Original data require careful cataloging in a trusted digital repository (more on this in a bit). It means that the data you have carefully collected will persist and will be available to other scholars. Problematic are proprietary data. Some data may be sensitive, some may be protected under Human Subjects provisions and some may be privately owned. In lieu of providing such data, authors have a special responsibility to carefully detail the steps that could, in principle, be taken to access the data. Absent the data supporting claims, readers should be skeptical of any conclusions drawn by an author.

Surprisingly, there are objections to sharing data. Many make the claim that original data is proprietary. After all, the researcher worked hard to generate them and doesn’t need to share. This is not a principled defense. If the researcher chooses not to share data, I see no point in allowing the researcher to share findings. Both can remain private. A second claim to data privacy is that the data have not yet been fully exploited. Editors have the ability to embargo the release of data, although this should happen under rare circumstances. It seems odd that a researcher would request an embargo, given that the data of concern is that which supports the claims of the researcher. Unless the author intends to use exactly the same data for another manuscript, there is no reason to grant an embargo. If the researcher is intending to exactly use the same data, editors should be concerned about self-plagiarism. Reproducible data should focus on the data used to make a claim.

The second feature of reproducibility and transparency involves making the analytic procedures publicly available. This gets to the key element of transparency. The massaged data that are publicly posted have been generated through numerous decisions by the researcher. A record of those decisions is critical for understanding the basis of empirical claims. For most researchers, this means providing a complete listing of data transformation steps. All statistical programs allow for some form of a log file that document what a researcher did. More problematic may be detailing the instruments that generated some of the data. Code used for scraping data from websites, videos used as stimuli for an experiment or physical recoding devices all pose problems for digital storage. However, if critical for reaching conclusions, a detailed record of the steps taken by a researcher must be produced. The good news is that most young scholars are trained to do this routinely.

There are objections to providing this kind of information. Typically it has to do with it being too difficult to recreate what was done to get to the final data set. If true, then it is likely that the data are problematic. If the researcher is unable to recreate the data, then how can it be judged?

The final element of transparency deals with the citation of data and code. This has to be encouraged. Assembling and interpreting data is an important intellectual endeavor. It should be rewarded by proper citation – not just by the researcher, but by others. This means that the record of the researcher must have a persistent and permanent location. Here is where trusted digital repositories come into play. These may be partners in the Data Preservation Alliance for the Social Sciences (Data-PASS) < http://www.data-pass.org > or institutional repositories. They are not an author’s personal website. If you’re like me, my website is outdated, and I should not be trusted to maintain it. The task of a trusted data repository is to ensure that the data are curated and appropriately archived. Repositories do not have the responsibility for documenting the data and code – this is the responsibility of the researcher. All too often stored data have obscure variable names that are only meaningful to the researcher and there is little way to match the data to what the researcher did in a published article.

The aim for transparency, of course, is to ensure that claims can be subject to replication. Replication has a troubled history in that it often looks like “gotcha” journalism. There is bias in publication in that replications overturning a finding are much more likely to be published. This obscures the denominator and raises the question of how often findings are confirmed, rather than rejected. We have very few means for encouraging the registration of replications. It is a shame, since we have as much to learn from instances where a finding appears to be confirmed as when it may not. If journals had unlimited resources, no finding would be published unless independently replicated. This isn’t going to happen. However, good science should ensure that findings are not taken at face value, but subjected to further test. In this age of electronic publication it is possible to link to studies that independently replicate a finding. Journals and Associations are going to have to be more creative about how claims published in their pages are supported. Replication is going to have to be embraced.

It may be that authors will resist data sharing or making their analytic decisions public. However, resistance may be futile. The journals, including The Political Methodologist, are taking the high road and eventually will require openness in science.

Posted in Uncategorized | Leave a comment

Introduction to the Special Issue on Replication

I’m pleased to begin publishing contributions to TPM‘s special issue on replication to our blog today. We begin with a piece by one of our associate editors, Rick Wilson, on the Data Access and Research Transparency initiative and the value of reproducibility and transparency to political scientists. In the coming weeks, we will feature contributions by other political scientists on similar topics. All contributions will be included in the Fall 2014 edition of TPM.

I hope you enjoy this series of articles on a very important and active area of research methodology.

Posted in Uncategorized | Leave a comment

Encountering your IRB: What political scientists need to know

Editor’s note: This is a condensed version of an essay appearing in Qualitative & Multi-Method Research [Newsletter of the APSA Organized Section for Qualitative and Multi-Method Research] Vol. 12, No. 2 (Fall 2014). The shorter version is also due to appear in the newsletters of APSA’s Immigration & Citizenship and Law & Courts sections. The original, which is more than twice the length and develops many of these ideas more fully, is available from the authors (Dvora.Yanow@wur.nl, psshea@poli-sci.utah.edu).

This post is contributed by Dvora Yanow (Wageningen University) and Peregrine Schwartz-Shea (University of Utah).

Pre-script.  After we finished preparing this essay, a field experiment concerning voting for judges in California, Montana, and New Hampshire made it even more relevant. Three political scientists—one at Dartmouth, two from Stanford—mailed potential voters about 300,000 flyers marked with the states’ seals, containing information about the judges’ ideologies. Aside from questions of research design, whether the research passed IRB review is not entirely clear (reports say it did not in Stanford but was at least submitted to the Dartmouth IRB; for those who missed the coverage, see this link and political scientist Melissa Michelson’s blog (both accessed November 3, 2014). Two bits of information offer plausible explanations for what have been key points in the public discussion:

  1. Stanford may have had a reliance agreement with Dartmouth, meaning that it would accept Dartmouth’s IRB’s review in lieu of its own separate review;
  2. Stanford and Dartmouth may have “unchecked the box” (see below), relevant here because the experiments were not federally funded, meaning that IRB review is not mandated and that universities may devise their own review criteria.

Still, neither explains what appear to be lapses in ethical judgment in designing the research (among others, using the state seals without permission and thereby creating the appearance of an official document). We find this a stellar example of a point we raise in the essay: the discipline’s lack of attention to research ethics, possibly due to reliance on IRBs and the compliance ethics that IRB practices have inculcated.

 * * *

Continuing our research on US Institutional Review Board (IRB) policies and practices (Schwartz-Shea and Yanow 2014, Yanow and Schwartz-Shea 2008) shows us that many political scientists lack crucial information about these matters. To facilitate political scientists’ more effective interactions with IRB staff and Boards, we would like to share some insights gained from this research.

University IRBs implement federal policy, monitored by the Department of Health and Human Services’ Office of Human Research Protections (OHRP). The Boards themselves are comprised of faculty colleagues (sometimes social scientists) plus a community member. IRB administrators are often not scientists (of any sort), and their training is oriented toward the language and evaluative criteria of the federal code. Indeed, administering an IRB has become a professional occupation with its own training and certification. IRBs review proposals to conduct research involving “human subjects” and examine whether potential risks to them have been minimized, assessing those risks against the research’s expected benefits to participants and to society. They also assess researchers’ plans to provide informed consent, protect participants’ privacy, and keep the collected data confidential.

The federal policy was created to rest on local Board decision-making and implementation, leading to significant variations across campuses in its interpretation. Differences in practices often hinge on whether a university has a single IRB evaluating all forms of research or different ones for, e.g., medical and social science research. Therefore, researchers need to know their own institutions’ IRBs. In addition, familiarity with key IRB policy provisions and terminologies will help. We explain some of this “IRB-speak” and then turn to some procedural matters, including those relevant to field researchers conducting interviews, participant-observation/ethnography, surveys, and/or field experiments, whether domestically or overseas.

IRB-speak: A primer

Part of what makes IRB review processes potentially challenging is its specialized language. Regulatory and discipline-based understandings of various terms do not always match. Key vocabulary includes the following.

  • Research.” IRB regulations tie this term’s meaning to the philosophically-contested idea of “generalizable knowledge” (CFR §46.102(d)). This excludes information-gathering for other purposes and, on some campuses, other scholarly endeavors (e.g., oral history) and course-related exercises.
  • Human subject.” This is a living individual with whom the researcher interacts to obtain data. “Interaction” is defined as “communication or interpersonal contact between investigator and subject” (CFR §46.102(f)). But “identifiable private information” obtained without interaction, such as through the use of existing records, also counts.
  • Minimal risk.” This research is when “the probability and magnitude of harm or discomfort anticipated in the research are not greater in and of themselves than those ordinarily encountered in daily life or during the performance of routine physical or psychological examinations or tests” (CFR §46.102(i)). But everyday risks vary across subgroups in American society, not to mention worldwide, and IRB reviewers have been criticized for their lack of expertise in risk assessment, leading them to misconstrue the risks associated with, e.g., comparative research (Schrag 2010, Stark 2012).
  • Vulnerable populations.” Six categories of research participants “vulnerable to coercion or undue influence” are subject to additional safeguards: “children, prisoners, pregnant women, mentally disabled persons, or economically or educationally disadvantaged persons” (CFR 46.111(b)). Federal policy enables universities also to designate other populations as “vulnerable,” e.g., Native Americans.
  • Levels of review. Usually, IRB staff decide a proposed project’s level of required review: “exempt,” “expedited,” or “convened” full Board review. “Exempt” does not mean that research proposals are not reviewed. Rather, it means exemption from full Board review, a status that can be determined only via some IRB assessment. Only research entailing no greater than minimal risk is eligible for “exempt” or “expedited” review. The latter means assessment by either the IRB chairperson or his/her designee from among Board members. This reviewer may not disapprove the proposal, but may require changes to its design. Projects that entail greater than minimal risk require “convened” (i.e., full) Board review.
  • Exempt category: Methods. Survey and interview research and observation of public behavior are exempt from full review if the data so obtained do not identify individuals and would not place them at risk of “criminal or civil liability or be damaging to the subjects’ financial standing, employability, or reputation” if their responses were to be revealed “outside of the research” (CFR 46.101(b)(2)(ii)). Observing public behaviors as political events take place (think: “Occupy”) is central to political science research. Because normal IRB review may delay the start of such research, some IRBs have an “Agreement for Public Ethnographic Studies” that allows observation to begin almost immediately, possibly subject to certain stipulations.
  • Exempt category: Public officials. IRB policy explicitly exempts surveys, interviews, and public observation involving “elected or appointed public officials or candidates for public office” (45 CFR §46.101(b)(3))—although who, precisely, is an “appointed public official” is not clear. This exemption means that researchers studying public officials using any of these three methods might—in complete compliance with the federal code—put them at risk for “criminal or civil liability” or damage their “financial standing, employability, or reputation” (CFR §46.101(b)(2)). The policy is consistent with legal understandings that public figures bear different burdens than private citizens.
  • Exempt category: Existing data. Federal policy exempts from full review “[r]esearch involving the collection or study of existing data, documents, [or] records, … if these sources are publicly available or if the information is recorded by the investigator in such a manner that subjects cannot be identified, directly or through identifiers linked to the subjects” (§46.101(b)(4)). However, university IRBs vary considerably in how they treat existing quantitative datasets, such as the Inter-University Consortium for Political and Social Research collection (see icpsr.umich.edu/icpsrweb/ICPSR/irb/). Some universities require researchers to obtain IRB approval to use any datasets not on a preapproved list even if those datasets incorporate a responsible use statement.
  • Unchecking the box.” The “box” in question—in the Federal-wide Assurance form that universities file with OHRP registering their intention to apply IRB regulations to all human subjects research conducted by employees and students, regardless of funding source—when “unchecked” indicates that the IRB will omit from review any research funded by sources other than the HHS (thereby limiting OHRP jurisdiction over such studies). IRB administrators may still, however, require proposals for unfunded research to be reviewed.

Procedural matters: Non-experimental field research

The experimental research design model informing IRB policy creation and constituting the design most familiar to policy-makers, Board members and staff means that field researchers face particular challenges in IRB review.

As the forms and online application sites developed for campus IRB uses reflect this policy history, some of their language is irrelevant for non-experimental field research designs (e.g., the number of participants to be “enrolled” in a study or “inclusion” and “exclusion” criteria, features of laboratory experiments or medical randomized controlled clinical trials). Those templates can be frustrating for researchers trying to fit them to field designs. Although that might seem expeditious, conforming to language that does not fit the methodology of the proposed research can lead field researchers to distort the character of their research.

IRB policy generally requires researchers to inform potential participants—to “consent” them—about the scope of both the research and its potential harms, whether physical, mental, financial or reputational. Potential subjects also need to be consented about possible identity revelations that could render them subject to criminal or civil prosecution (e.g., the unintentional public revelation of undocumented workers’ identities). Central to the consent process is the concern that potential participants not be coerced into participating and understand that they may stop their involvement at any time. Not always well known is that federal code allows more flexibility than some local Boards consider. For minimal risk research, it allows: (a) removal of some of the standard consent elements; (b) oral consent without signed forms; (c) waiver of the consent process altogether if the “research could not practicably be carried out without the waiver or alteration” (CFR §46.116(c)(2)).

Procedural matters: General

IRB review process backlogs can pose significant time delays to the start of a research project. Adding to potential delay is many universities’ requirement that researchers complete some form of training before they submit their study for review. Such delay has implications for field researchers negotiating site “access” to begin research and for all empirical researchers receiving grants, which are usually not released until IRB approval is granted. Researchers should find out their campus IRB’s turnaround time as soon as they begin to prepare their proposals.

Collaborating with colleagues at other universities can also delay the start of a research project. Federal code explicitly allows a university to “rely upon the review of another qualified IRB…[to avoid] duplication of effort” (CFR §46.114), and some IRBs are content to have only the lead researcher proceed through her own campus review. Other Boards insist that all participating investigators clear their own campus IRBs. With respect to overseas research, solo or with foreign collaborators, although federal policy recognizes and makes allowances for international variability in ethics regulation (CFR §46.101(h)), some US IRBs require review by a foreign government or research setting or by the foreign colleague’s university’s IRB, not considering that not all universities or states, worldwide, have IRBs. Multiple review processes can make coordinated review for a jointly written proposal difficult. Add to that different Boards’ interpretations of what the code requires, and one has a classic instance of organizational coordination gone awry.

In sum

On many campuses political (and other social) scientists doing field research are faced with educating IRB members and administrative staff about the ways in which their methods differ from the experimental studies performed in hospitals and laboratories. Understanding the federal regulations can put researchers on more solid footing in pointing to permitted research practices that their local Boards may not recognize. And knowing IRB-speak can enable clearer communications between researchers and Board members and staff. Though challenging, educating staff as well as Board members potentially benefits all field researchers, graduate students in particular, some of whom have given up on field research due to IRB delays, often greater for research that does not fit the experimental model (van den Hoonard 2011).

IRB review is no guarantee that the ethical issues relevant to a particular research project will be raised. Indeed, one of our concerns is the extent to which IRB administrative processes are replacing research ethics conversations that might otherwise (and, in our view, should) be part of departmental curricula, research colloquia, and discussions with supervisors and colleagues. Moreover, significant ethical matters of particular concern to political science research are simply beyond the bounds of US IRB policy, including recognition of the ways in which current policy makes “studying up” (i.e., studying societal elites and other power holders) more difficult.

Change may still be possible. In July 2011, OHRP issued an Advanced Notice of Proposed Rulemaking, calling for comments on its proposed regulatory revisions. As of this writing, the Office has not yet announced an actual policy change (which would require its own comment period). OHRP has proposed revising several of the requirements discussed in this essay, including allowing researchers themselves to determine whether their research is “excused” (their suggested replacement for “exempt”). Because of IRB policies’ impact, we call on political scientists to monitor this matter. Although much attention has, rightly, been focused on Congressional efforts to curtail National Science Foundation funding, as IRB policy affects all research engaging human participants, it deserves as much disciplinary attention.

References

Schrag, Zachary M. 2010. Ethical imperialism: Institutional review boards and the social sciences, 1965–2009. Baltimore, MD: Johns Hopkins University Press.

Schwartz-Shea, Peregrine and Yanow, Dvora. 2014. Field research and US institutional review board policy. Betty Glad Memorial Symposium, University of Utah (March 20-21). http://poli-sci.utah.edu/2014-research-symposium.php

Stark, Laura. 2012. Behind closed doors: IRBs and the making of ethical research. Chicago: University of Chicago Press.

US Code of Federal Regulations. 2009. Title 45, Public Welfare, Department of Health and Human Services, Part 46, Protection of human subjects. www.hhs.gov/ohrp/humansubjects/guidance/45cfr46.html.

van den Hoonaard, Will. C. 2011. The seduction of ethics. Toronto: University of Toronto Press.

Yanow, Dvora and Schwartz-Shea, Peregrine. 2008. Reforming institutional review board policy. PS: Political Science & Politics 41/3, 484-94.

 

Posted in Uncategorized