A few years ago in our diversity science reading group we read a paper published in the prestigious Proceedings of the National Academy of Sciences (PNAS, or PPNAS, if you prefer). It was an impressive-looking article that generated a spirited discussion. The authors stated that the study was preregistered, and at some point in the discussion, I mentioned that they had deviated so markedly from their preregistration plan—without disclosing this in the paper, mind you—that I thought everyone should interpret the findings with a healthy dose of skepticism. Following this reveal, one of the students asked, “How did this get published in PNAS?” My response: “well, PNAS is not a good journal.”
Despite the clickbait title and the opening anecdote, this post is not about PNAS, per se. Sure, I stand by my statement that PNAS is not a good journal—a point I will elaborate on later—but neither are other journals at the top of the prestige market. This includes other vanity journals such as Science and Nature, but also the “top” journals in my field, psychology, such as Psychological Science, Journal of Personality and Social Psychology, Child Development, and so on. This post is about how we determine what makes for a “good journal,” and how all available data indicate that we are wrong. At the end, I will also briefly comment on how our erroneous beliefs about journal prestige shape empirical work on replications and metascience.
What is a “Good” Journal?
There are two primary sources of information we rely on to determine the quality of a journal. First, is received wisdom: Certain journals are good, prestigious, and desirable to publish in because people say they are. This knowledge of the journal hierarchy is socialized to early career researchers, who then internalize it and socialize it to others. Although this process certainly still operates, over the last 20+ years it has largely given way1 to the second dominant indicator of journal quality: the journal impact factor.
Real quick—because I am always shocked that people don’t know this—the impact factor is calculated via a simple formula of number of citations to articles in a journal over a two-year period divided by the number of citable items during that same period. In other words, it corresponds to the average number of citations to an article published in a journal during a specific period2. Although not intended for this purpose when developed, the journal impact factor is used as relative quality ranking to create a hierarchy of prestige.
That all sounds reasonable, I suppose, until you discover that the empirical research on the impact factor indicates no association, or in some cases a negative association, with nearly any quality indicator you can think of. If you don’t believe me, then you may want to take a look here, here, here, here, here, here, here, here, here, here, here, here, here, here, here, or here. Defenders of the impact factor will sometimes point out that it is associated with more citations to papers in the journal, to which I respond to please read the previous paragraph on how the impact factor is calculated, as well as papers like this that show how retracted articles continue to get cited at alarming rates, or this one that shows that non-replicated findings are cited more than replicated ones, or this one that shows that citations are status driven, or even this one that shows general sloppiness in citations practices among metascientists, wouldn’t you know.
One of my favorite examples of how journals with higher impact factors publish lower quality research comes from observations on the decline effect. Initial “discoveries” tend to have large effects and are thus published in journals with higher impact factors. Over time, subsequent studies yield smaller effect sizes than the original, but tend to be published in journals with lower impact factors. Figure 1, taken from Brembs, Button, and Munafò (2013) is a striking visual representation of this pattern using data on candidate genes studies (DRD2) and alcoholism.
Why is our primary indicator of journal quality so bad? There are quite a lot of reasons for this, explored in depth by the sources I linked previously, but they all generally trace back to the root problem of making publication decisions based on the findings of a study rather than based on the quality of the conceptualization and design (obligatory plug for Registered Reports here). This decision criterion leads to researchers cutting all kinds of corners to achieve findings that are publishable in “high impact” journals, and cutting all kinds of corners is generally not a good practice for ensuring quality.
What’s So Bad about PNAS Anyway?
Back to my original point: PNAS is not a good journal. This is true in two ways: following from the above, there is really no such thing as a “good journal.” Journals are simply not diagnostic of the articles published therein, and thus there is no way any particular journal could be construed as “good.” To be clear, here I am talking about what most people would consider legitimate journals. Predatory journals are clearly very bad. Things get murky quickly, though, once we set aside the obviously predatory journals. Publishers that I consider quasi-predatory, such as Frontiers, MDPI, and Hindawi, will often publish high quality and important articles, despite the fact that their general brand is rather poor and bad for science. But once you make that argument, you have to also recognize that the operating principles of the big publishers such as Sage, Wiley, and APA are also rather poor and bad for science.
Putting this general argument about the concept of the “good journal” aside, PNAS is remarkable in being particularly bad. It has published a variety of questionable papers, from the absurd, to the ridiculous, to the embarrassing3. But this is true of all journals, so PNAS is not special in this regard. Rather, my assessment is based on a single feature of the journal: the contributed track. Yes, PNAS has two tracks for submission, the direct submissions that are the typical procedure for journal publishing amongst us plebes, and the contributed submissions, in which members of the National Academy of Science can submit their papers (up to two per year!) and get to select their own reviewers. Surprisingly, data from 2013 indicate a 98% acceptance rate for contributed submissions vs. only 18% for direct submissions4. I am no great defender of the value of peer review, but even I see some major structural problems with the practice of “publish whatever you want” in a journal that is perceived to be one of the best in all of the sciences. That this is a feature of an ostensibly leading journal in the field is a serious indictment of our entire system. I do not take PNAS seriously, and neither should you.
Oh, but those pesky perceptions! Publishers are experts at exploiting the academic need for prestige. Springer Nature is the master of this. Whereas there was once only one Nature, which sat at the very top of the steaming pile of journal prestige, there are now 69 “Nature” branded journals. Just looking at my field of psychology illustrates how absurd things have become: we have Nature, then Nature Human Behavior, then Nature Mental Health, and now—new for 2022!!—Nature Reviews Psychology. In November 2020 we got some internet yuks about the imminent emergence of Nature Total Landscaping, and whereas that title does not yet exist in name, it certainly exists in spirit. Things have gotten so out of hand, that some years ago the Big Ten Academic Alliance of Libraries told Springer Nature that they would no longer play their game, refusing to subscribe to any new Nature journals. Things have gotten so out of hand, that a while back when I saw someone reference PNAS Nexus I thought it was a joke, that they were just trolling Nature. Sadly, I was wrong. PNAS is not good, PNAS Nexus is certainly not good. None of the 69 Nature Total Landscaping journals are good. None of this is good.
Empirical Metascience Relies Too Heavily on Prestigious Journals
I have been meaning to write this post for a long time, but I finally moved it up my imaginary to-do list after coming across a new paper in….PNAS. The paper, A discipline-wide investigation of the replicability of psychology papers over the past two decades, by Youyou, Yang, and Uzzi, relies on a text-based analysis of published papers to estimate replication rates across six subfields of psychology. I have not had an opportunity to go deep into the paper, and thus I don’t have anything to say about whether or not their claims are appropriately calibrated to their analytic approach, but I have read enough to make shallow comments about whether or not their claims are appropriately calibrated to their source data. The key figure—the one that was widely shared online and led to either hand-wringing or self-righteousness, depending on your subfield alignment—plotted the replicability estimates against the prevalence of experimental methods (Figure 2).
Depending on how you interpret these data, they make personality and organizational psychology look pretty good and developmental and social psychology look rather bad. There is a lot to say about this figure and what these data mean, most of which I will not say right now. Rather, the problem with this figure is that the data do not represent “personality psychology” or “developmental psychology.” The authors scraped data from six “top-tier” journals, and used their estimates from those journals to draw conclusions about the entire subfields. For example, the data on developmental psychology comes exclusively from articles published in Child Development and the paltry number of relevant articles published in Psychological Science. Not only is Child Development not representative of the lifespan nature of developmental psychology, but it is a prestige outlet that is broadly viewed as the best journal in the subfield. There are 92 journals in the developmental psychology category! Labeling the data as representing an entire subfield, when the source is mostly just one journal, seems like a bit of an overreach. Can data from this one journal reasonably generalize to the entire subfield? The answer is apparently yes, in PNAS.
This strategy is not uncommon, and I am picking on this paper really only because in this post I am picking on PNAS. Empirical metascience tends to focus on the perceived top-tier journals in the field, those with the highest impact factors. This was the strategy in the Open Science Collaborative’s (2015) paper on estimating the replicability of psychological science, and this was the strategy of Roberts et al.’s (2020) analysis of the inclusion of race in psychological science. It is a reasonable strategy given how our system operates and the value we place on journal prestige. But it is not a good strategy when we take seriously the fact that our beliefs about journal prestige are wildly mistaken. For me, a foundational aspect of metascience is to bring a critical lens to our scientific systems and rigorous data to expose the limitations of our colloquial beliefs about optimal scientific practice. Yet, in many cases, we see metascience reproducing and reinforcing the same faulty status hierarchies.
A few years ago, Ruben Arslan observed that articles in higher impact journals likely receive less pre-publication scrutiny. When researchers have results they are excited about, they tend to “aim high” and submit their papers to a journal with a high impact factor. If it is rejected, they work their way down the prestige hierarchy until is accepted. Assuming that this behavior is widespread, that means that articles published in journals with mid-tier impact factors have been reviewed by many more people than those published in the highest-tier journals, and thus are subject to more opportunities to catch mistakes, inappropriate analyses, or inferential errors. Arslan ran a simulation (based on a number of assumptions) and found support for his intuition (see Figure 3).
I don’t know of any actual data that supports this, but I can say it is absolutely the case for my own research lab, and the idea is sensible enough that it should be followed up through careful study. It also suggests that future metascientific work should look more broadly than the articles published in journals with the highest impact factors. Would the results look much different? Would we see higher rates of replicability, for example, among journals sitting in the middle of the impact factor scrum? The pessimist in me thinks probably not, but at least we would have a broader view of the field beyond what is published in a few journals, and we would not be reinforcing a flawed prestige ranking system5. PNAS is not a good journal, but neither are any others. If we want to know whether a particular article is good—however you want to define that—I think you might have to actually read it.
Conflict of interest statement: I made a bet ($5.00) with Ira Hyman in 2019 that the journal impact factor would no longer be used within five years, and thus I have financial motivation to write negatively about impact factors.
These are obviously not independent sources. In fact, I would say that journal impact factors are largely based on received wisdom about prestige. Prior beliefs about where journals rank in the hierarchy have driven submission and citation patterns that created a quantitative metric that reflects those prior beliefs.
Although the formula is simple, determining what goes in the numerator and—especially—the denominator, is complex and opaque, with some evidence indicating that journals negotiate to make that number smaller.
I am not “naming names” here because doing so is beside the point and therefore would simply be mean. I did, however, have specific papers in mind when I wrote that sentence. You can choose your own and fill in the blanks! It is a deep pool.
Why am I reporting on data from 10 years ago? Those are the only data that are available. These rates were formerly posted on the PNAS website, but it seems like they forgot to include them when the revamped the site.
A very intelligent person might point out that I am doing the same thing, writing a post about PNAS; I direct such a person here.
I had lunch with Benoit Mandelbrot a few years before his death. It was quite fun. I had an observation about fractals applied to the way mRNA is translated and gene replication contributing to evolution, as in the famous example of feathers. That was great. But Mandelbrot was still bitter that his most seminal papers were rejected by the top journals and had to be published in minor journals.
I agree with you about the "gray journals". I had an unfortunate experience with that. I submitted an invited paper to a new open-access journal been started by a professor in Montana, Nicholas Burgis. It was a good experience of review. https://www.omicsonline.org/security-in-a-goldfish-bowl-the-nsabbs-exacerbation-of-the-bioterrorism-threat-2157-2526.S3-013.php?aid=11953
This paper could never have gotten published in the "proper journal" because every academic associated with it is offended at what I have to say about their singular contribution to biosecurity, which is to shout in the ears of our enemies precisely what we do not want them to know. (Their only saving grace is that they are usually quite wrong about what is the most dangerous information.) Sadly, now those who want to ignore what I have said can grandly dismiss it as an article in a "predatory journal" pay to publish.
I'll also offer a criticism, and observe that this article, like much I read on the problem, makes an implicit presumption that is, in my experience, false. That assumption is that reviewers are competent and sensible as a general rule. In my experience this is sometimes true, and in my career it is more so when submitting from a university. But I could show you reviews that are beyond appalling. I have one fairly recent one from a subfield of biology that is outraged to have fundamental assumptions questioned. That review rejected the concept of using mathematics in biology, citing Mario Livio's book "The Golden Ratio" as proof, calling it, "numerology" that nobody could take seriously. This same review mocked a datasource because it was bilingual in Chinese and English, ignoring that each referenced paper had been located, read, and the figure(s) verified or not.
Reality is that many with academic positions are, in the words of a director I discussed some issues with, "Just stupid." The smart ones spend little time on reviews. And of course, the problem of assigning grad students is well known, though such reviews are unlikely to be abusive. I have even have one I am quite certain was using first year undergrads to "review" material. This problem of over-promoted academics is a conundrum but I think it is getting worse, and with ChatGPT the problem may become overwhelming. In corruption, the bad drives out the good, and the corrupt are bound to help each other far more than the honest. I saw this operate up close in 2 labs when I was in grad school.
I like this article very much, having become horrified at what junk gets published even in top tier journals.
Unfortunately this junk often makes it to the coalface of actual psychological practice, polluting the therapeutic environment and making a mockery of the claim of “evidence basis”.