Preprint servers: challenges and consequences

In our previous post, we explored the recent rise of preprint servers, especially in the life sciences, chemistry and humanities. In this post, we explore why it has taken so long for these fields to embrace preprint servers, and delve into what the rise of preprint servers might mean for scholarly publishing.

The persistence of resistance

In an article marking the 25th anniversary of arXiv, founder Paul Ginsparg answers a series of FAQs (from biologists in particular) about the dynamics of preprint servers. Why researchers outside the physical sciences have been reluctant to use preprint servers is still debated, but what is clear is that there are concerns from many quarters about a range of issues.

One of the biggest worries for researchers is being scooped – if their work is on an unvetted, little-known website, how will they maintain primacy? As mentioned in our previous post, some journals are now explicitly addressing this possibility by modifying their policies on originality and previous publication. The increasing professionalism and entry of large companies to the hosting of preprint servers means increased acceptance by researchers and integration into services such as search engine indexing. This adds surety for authors, with the days of research competitors claiming “I didn’t see it” soon to be numbered.

Another concern is information overload: will authors’ work be lost in the mass of unfiltered and uncurated content that is regularly uploaded to preprint servers? Probably not, at least in the longer term. Researchers have always found ways to discover the latest work in their fields, which they know well and can be quite narrow. Artificial intelligence and machine learning approaches may hold great promise to help researchers navigate, find and screen the literature.

Copyright and licensing issues may have also dissuaded authors from using preprint servers. Authors may be reticent to post preprints if they are unsure about the legal consequences and their subsequent rights; a combination of conservatism and (perhaps studied) ignorance is well known among many authors. Recent data show that authors uploading their work to bioRxiv choose the most restrictive license on offer – retaining full copyright – for their work. The reason(s) for this choice are not clear. This option is first on the list, so the tick-the-first-box response is one possible explanation. We suspect that authors choose a restrictive license because they want to ensure their work remains theirs to control, even if more liberal licenses still allow them to retain ownership, some control and the rights to attribution.

Another downside to preprint servers – for both authors and readers – is that they offer no evaluation or certification of an author’s work. Other publication outlets, such as journals, have traditionally provided these services through editorial assessment and peer review. As we touch on below, some preprint servers allow comments, which is a form of evaluation or post-publication peer review. However, there is no certification and there is a danger that the wider, non-research, community presumes that articles on preprint servers have been certified. Preprint servers such as bioRxiv are working to ameliorate this risk by labelling articles as non-peer reviewed. (Many proponents of preprint servers suggest that peer review provides little assurance of quality anyway.)

The uncertainty around the sustainability of preprint servers may have also discouraged researchers from using them. Authors are wary of services that come and go, and are reluctant to spent time and effort – not to mention stake their most precious assets, their articles and reputations – on unknown entities. Readers are similarly unlikely to rely on a resource that might not be around for long. Even known entities, such as Nature Precedings, can close at short notice. ArXiv has had very public financial difficulties, while Cold Spring Harbor Laboratory has explicitly addressed this issue by stating that they are providing bioRxiv as part of their remit. However, any service or product that has no associated revenues will always be at some financial risk, thereby threatening its long-term sustainability.

Potential consequences

What then are the consequences for scholarly publishing if preprint servers become an integral part of it? It is early days, but we are beginning to see some interesting – and possibly unintended – outcomes.

In any nascent marketplace, it takes some time before product names become well known – and this lag in brand recognition can be exploited. The American Chemical Society will soon launch chemRxiv, but in the meantime, Open Academic Publishers (which was listed as a “potential, possible or probable” predatory publisher on the former Beall’s List) has unveiled chemArxiv. Unscrupulous operators may pose no financial risk in a free-to-publish, free-to-read environment, but a deluge of low-quality sites would waste researchers’ time, could threaten the integrity and persistence of their data, and would damage the reputation of all preprint servers. The rapid rise in the number of new preprint servers – and their use of all or part of the “arXiv” name – can make it difficult to track which are legitimate enterprises backed by trusted organisations (see the table below).

Could preprint servers boost the use of post-publication peer review? Some preprint servers allow users to comment on posted papers, much like some journals allow online feedback on published papers. However, not all preprint servers see a benefit in offering such a feature – particularly as it raises the question of whether moderating comments is necessary, and if so, how to pay for it. As Ginsparg explains, arXiv has explicitly chosen not to allow comments; a user survey confirmed that its “drama-free minimalist dissemination” of content is one of its biggest virtues.

We are already seeing the rise of new services that aim to mitigate some of the challenges posed by preprint servers. Overlay journals collect articles from preprint servers, but provide additional assessment or peer review to help curate content for readers. PrePubMed allows users to find articles in a range of preprint servers using a single search function.

Journal publishers and owners are watching recent developments in scholarly publishing, including those around preprint servers, with great interest – and possibly some trepidation. Authors can now publish their work in various forms: standalone figures (with DOIs), lab workbooks, data, so-called nano-publications, as well as the traditional forms of articles and reviews. If authors (and their funding bodies) begin to trust post-publication peer review for certification and feedback, and can receive citations to their preprints, are journals in danger of becoming obsolete?

The scholarly community’s broad adoption of preprint servers is still fluid and progressing via trial and error; the variation among the new preprint servers reflects the imagination and entrepreneurship of those developing them. Ultimately, preprint servers and journals serve different functions: preprint servers quickly distribute work to a core research audience, while journals provide a quality assurance mechanism that helps to certify a researcher’s discoveries.

NameFieldsStart date# 2016 submissions (approx.)
Preprint servers
arXiv Physics, mathematics, computing, quantitative biology, quantitative finance, statistics1991113,308
bioRxiv Life sciences20134712
PeerJ Preprints General2013~1000
Preprints (MDPI)General2016~1000
SocArxiv Social sciences2016633
PsyArXiv Psychology2016191
EngrXivEngineering201635
ChemRxivChemistry2017Not launched
AgriXiv AgricultureUpcomingNot launched
Services with preprint functions
Social Science Research Network (SSRN)Social sciences199466,310
Figshare General2012Unknown
Zenodo General2013318
F1000Research General2013215
Authorea / Winnower General2015 / 2014Unknown
Other services
HaLGeneral200122,006
mp-arc Mathematics199197

Authors: Dugald McGlashan and Caroline Hadley