What happens to a paper submitted to a top journal?

Among a set of manuscripts sent out for review by Cell in 2018:

-33% were published in Cell
-26% were published in another Cell-family journal
-7% are still under review at Cell
-The median time to publication was 391 days
To back up: in 2018, Cell started the “Sneak Peek” program, in which authors had the option of posting a preprint of their manuscript if it was sent out for review by a Cell-family journal. https://www.cell.com/sneakpeek 
Using this site, I found 46 papers that were sent out for review at Cell and posted on “Sneak Peek” between June 1st and Dec 31st, 2018. Each paper’s current status was also noted: “Published”, “Under review”, or “Review Complete” (a nice euphemism for “rejected”).
For papers that were rejected by Cell, I searched on Google Scholar and Pubmed to see if and when they had been published by another journal.
Some important limitations: I don’t know if Sneak Peek is a representative sample of all manuscripts sent out for review at Cell, and I also don’t know the relationship between when a paper was submitted and the date a preprint was posted.
Still - publishing is an incredibly opaque process, and the major journals have an outsized role shaping research trends, funding, and hiring. I’m not aware of any actual data on what happens to a paper submitted to Cell, so I thought analyzing this could be interesting.
Findings: 33% of papers sent out for review were published in Cell. This is consistent with the common belief that desk rejections are the major way papers get rejected at high-profile journals, and a paper that gets sent out for review has a decent chance of being published.
If a paper was rejected after being reviewed at Cell, the most common journals where it would end up were Cell Reports and eLife.
No papers within this dataset were rejected by Cell and then published by Nature or Science. One paper was rejected by Cell and subsequently published in Nature Genetics.
The timing data was very interesting: if a paper wound up in Cell, the median time from posting to publication was 273 days. If a paper was published in a Cell-family journal, it took 333 days. But if it was bounced out of the Cell family, it took 513 days till publication!
Here are frequency-difference word clouds: words that are more common in the abstracts of papers accepted by Cell vs. those published in other journals. “antisense”, “phage”, and “CTCF” are in, “actin”, “signaling”, and “modules” are out.
Finally, 15% of papers were either still under review at Cell or had been rejected and not published elsewhere - almost two years after being submitted! Publishing is **slow**, and many people are (understandably) motivated to chase acceptances in high-profile journals.
I can imagine a number of other interesting questions that could be asked using this dataset:

-PI’s institution vs. acceptance rate?
-Gender vs. acceptance rate?
-NAS vs. acceptance rate?

I may look at some of those in the future, but 46 is not a huge sample size.

Thoughts?
You can follow @JSheltzer.
Tip: mention @twtextapp on a Twitter thread with the keyword “unroll” to get a link to it.

Latest Threads Unrolled: