PNAS is Not a Good Journal
(and Other Hard Truths about Journal Prestige)
A few years ago in our diversity science reading group we read a paper published in the prestigious Proceedings of the National Academy of Sciences (PNAS, or PPNAS, if you prefer). It was an impressive-looking article that generated a spirited discussion. The authors stated that the study was preregistered, and at some point in the discussion, I mentioned that they had deviated so markedly from their preregistration plan—without disclosing this in the paper, mind you—that I thought everyone should interpret the findings with a healthy dose of skepticism. Following this reveal, one of the students asked, “How did this get published in PNAS?” My response: “well, PNAS is not a good journal.”
Despite the clickbait title and the opening anecdote, this post is not about PNAS, per se. Sure, I stand by my statement that PNAS is not a good journal—a point I will elaborate on later—but neither are other journals at the top of the prestige market. This includes other vanity journals such as Science and Nature, but also the “top” journals in my field, psychology, such as Psychological Science, Journal of Personality and Social Psychology, Child Development, and so on. This post is about how we determine what makes for a “good journal,” and how all available data indicate that we are wrong. At the end, I will also briefly comment on how our erroneous beliefs about journal prestige shape empirical work on replications and metascience.
What is a “Good” Journal?
There are two primary sources of information we rely on to determine the quality of a journal. First, is received wisdom: Certain journals are good, prestigious, and desirable to publish in because people say they are. This knowledge of the journal hierarchy is socialized to early career researchers, who then internalize it and socialize it to others. Although this process certainly still operates, over the last 20+ years it has largely given way1 to the second dominant indicator of journal quality: the journal impact factor.
Real quick—because I am always shocked that people don’t know this—the impact factor is calculated via a simple formula of number of citations to articles in a journal over a two-year period divided by the number of citable items during that same period. In other words, it corresponds to the average number of citations to an article published in a journal during a specific period2. Although not intended for this purpose when developed, the journal impact factor is used as relative quality ranking to create a hierarchy of prestige.
That all sounds reasonable, I suppose, until you discover that the empirical research on the impact factor indicates no association, or in some cases a negative association, with nearly any quality indicator you can think of. If you don’t believe me, then you may want to take a look here, here, here, here, here, here, here, here, here, here, here, here, here, here, here, or here. Defenders of the impact factor will sometimes point out that it is associated with more citations to papers in the journal, to which I respond to please read the previous paragraph on how the impact factor is calculated, as well as papers like this that show how retracted articles continue to get cited at alarming rates, or this one that shows that non-replicated findings are cited more than replicated ones, or this one that shows that citations are status driven, or even this one that shows general sloppiness in citations practices among metascientists, wouldn’t you know.
One of my favorite examples of how journals with higher impact factors publish lower quality research comes from observations on the decline effect. Initial “discoveries” tend to have large effects and are thus published in journals with higher impact factors. Over time, subsequent studies yield smaller effect sizes than the original, but tend to be published in journals with lower impact factors. Figure 1, taken from Brembs, Button, and Munafò (2013) is a striking visual representation of this pattern using data on candidate genes studies (DRD2) and alcoholism.
Why is our primary indicator of journal quality so bad? There are quite a lot of reasons for this, explored in depth by the sources I linked previously, but they all generally trace back to the root problem of making publication decisions based on the findings of a study rather than based on the quality of the conceptualization and design (obligatory plug for Registered Reports here). This decision criterion leads to researchers cutting all kinds of corners to achieve findings that are publishable in “high impact” journals, and cutting all kinds of corners is generally not a good practice for ensuring quality.
What’s So Bad about PNAS Anyway?
Back to my original point: PNAS is not a good journal. This is true in two ways: following from the above, there is really no such thing as a “good journal.” Journals are simply not diagnostic of the articles published therein, and thus there is no way any particular journal could be construed as “good.” To be clear, here I am talking about what most people would consider legitimate journals. Predatory journals are clearly very bad. Things get murky quickly, though, once we set aside the obviously predatory journals. Publishers that I consider quasi-predatory, such as Frontiers, MDPI, and Hindawi, will often publish high quality and important articles, despite the fact that their general brand is rather poor and bad for science. But once you make that argument, you have to also recognize that the operating principles of the big publishers such as Sage, Wiley, and APA are also rather poor and bad for science.
Putting this general argument about the concept of the “good journal” aside, PNAS is remarkable in being particularly bad. It has published a variety of questionable papers, from the absurd, to the ridiculous, to the embarrassing3. But this is true of all journals, so PNAS is not special in this regard. Rather, my assessment is based on a single feature of the journal: the contributed track. Yes, PNAS has two tracks for submission, the direct submissions that are the typical procedure for journal publishing amongst us plebes, and the contributed submissions, in which members of the National Academy of Science can submit their papers (up to two per year!) and get to select their own reviewers. Surprisingly, data from 2013 indicate a 98% acceptance rate for contributed submissions vs. only 18% for direct submissions4. I am no great defender of the value of peer review, but even I see some major structural problems with the practice of “publish whatever you want” in a journal that is perceived to be one of the best in all of the sciences. That this is a feature of an ostensibly leading journal in the field is a serious indictment of our entire system. I do not take PNAS seriously, and neither should you.
Oh, but those pesky perceptions! Publishers are experts at exploiting the academic need for prestige. Springer Nature is the master of this. Whereas there was once only one Nature, which sat at the very top of the steaming pile of journal prestige, there are now 69 “Nature” branded journals. Just looking at my field of psychology illustrates how absurd things have become: we have Nature, then Nature Human Behavior, then Nature Mental Health, and now—new for 2022!!—Nature Reviews Psychology. In November 2020 we got some internet yuks about the imminent emergence of Nature Total Landscaping, and whereas that title does not yet exist in name, it certainly exists in spirit. Things have gotten so out of hand, that some years ago the Big Ten Academic Alliance of Libraries told Springer Nature that they would no longer play their game, refusing to subscribe to any new Nature journals. Things have gotten so out of hand, that a while back when I saw someone reference PNAS Nexus I thought it was a joke, that they were just trolling Nature. Sadly, I was wrong. PNAS is not good, PNAS Nexus is certainly not good. None of the 69 Nature Total Landscaping journals are good. None of this is good.
Empirical Metascience Relies Too Heavily on Prestigious Journals
I have been meaning to write this post for a long time, but I finally moved it up my imaginary to-do list after coming across a new paper in….PNAS. The paper, A discipline-wide investigation of the replicability of psychology papers over the past two decades, by Youyou, Yang, and Uzzi, relies on a text-based analysis of published papers to estimate replication rates across six subfields of psychology. I have not had an opportunity to go deep into the paper, and thus I don’t have anything to say about whether or not their claims are appropriately calibrated to their analytic approach, but I have read enough to make shallow comments about whether or not their claims are appropriately calibrated to their source data. The key figure—the one that was widely shared online and led to either hand-wringing or self-righteousness, depending on your subfield alignment—plotted the replicability estimates against the prevalence of experimental methods (Figure 2).
Depending on how you interpret these data, they make personality and organizational psychology look pretty good and developmental and social psychology look rather bad. There is a lot to say about this figure and what these data mean, most of which I will not say right now. Rather, the problem with this figure is that the data do not represent “personality psychology” or “developmental psychology.” The authors scraped data from six “top-tier” journals, and used their estimates from those journals to draw conclusions about the entire subfields. For example, the data on developmental psychology comes exclusively from articles published in Child Development and the paltry number of relevant articles published in Psychological Science. Not only is Child Development not representative of the lifespan nature of developmental psychology, but it is a prestige outlet that is broadly viewed as the best journal in the subfield. There are 92 journals in the developmental psychology category! Labeling the data as representing an entire subfield, when the source is mostly just one journal, seems like a bit of an overreach. Can data from this one journal reasonably generalize to the entire subfield? The answer is apparently yes, in PNAS.
This strategy is not uncommon, and I am picking on this paper really only because in this post I am picking on PNAS. Empirical metascience tends to focus on the perceived top-tier journals in the field, those with the highest impact factors. This was the strategy in the Open Science Collaborative’s (2015) paper on estimating the replicability of psychological science, and this was the strategy of Roberts et al.’s (2020) analysis of the inclusion of race in psychological science. It is a reasonable strategy given how our system operates and the value we place on journal prestige. But it is not a good strategy when we take seriously the fact that our beliefs about journal prestige are wildly mistaken. For me, a foundational aspect of metascience is to bring a critical lens to our scientific systems and rigorous data to expose the limitations of our colloquial beliefs about optimal scientific practice. Yet, in many cases, we see metascience reproducing and reinforcing the same faulty status hierarchies.
A few years ago, Ruben Arslan observed that articles in higher impact journals likely receive less pre-publication scrutiny. When researchers have results they are excited about, they tend to “aim high” and submit their papers to a journal with a high impact factor. If it is rejected, they work their way down the prestige hierarchy until is accepted. Assuming that this behavior is widespread, that means that articles published in journals with mid-tier impact factors have been reviewed by many more people than those published in the highest-tier journals, and thus are subject to more opportunities to catch mistakes, inappropriate analyses, or inferential errors. Arslan ran a simulation (based on a number of assumptions) and found support for his intuition (see Figure 3).
I don’t know of any actual data that supports this, but I can say it is absolutely the case for my own research lab, and the idea is sensible enough that it should be followed up through careful study. It also suggests that future metascientific work should look more broadly than the articles published in journals with the highest impact factors. Would the results look much different? Would we see higher rates of replicability, for example, among journals sitting in the middle of the impact factor scrum? The pessimist in me thinks probably not, but at least we would have a broader view of the field beyond what is published in a few journals, and we would not be reinforcing a flawed prestige ranking system5. PNAS is not a good journal, but neither are any others. If we want to know whether a particular article is good—however you want to define that—I think you might have to actually read it.
Conflict of interest statement: I made a bet ($5.00) with Ira Hyman in 2019 that the journal impact factor would no longer be used within five years, and thus I have financial motivation to write negatively about impact factors.
Thanks for reading Get Syeducated! Subscribe for free to receive new posts.
These are obviously not independent sources. In fact, I would say that journal impact factors are largely based on received wisdom about prestige. Prior beliefs about where journals rank in the hierarchy have driven submission and citation patterns that created a quantitative metric that reflects those prior beliefs.
Although the formula is simple, determining what goes in the numerator and—especially—the denominator, is complex and opaque, with some evidence indicating that journals negotiate to make that number smaller.
I am not “naming names” here because doing so is beside the point and therefore would simply be mean. I did, however, have specific papers in mind when I wrote that sentence. You can choose your own and fill in the blanks! It is a deep pool.
Why am I reporting on data from 10 years ago? Those are the only data that are available. These rates were formerly posted on the PNAS website, but it seems like they forgot to include them when the revamped the site.