How Big-Data is shaping the Publishing Industry Landscape

Publishing industry is going through rough times, struggling to retain its identity and the old ways. Ironically, publishing once was the technology that disrupted the society in many ways, forging revolutionary ideas forward spreading the need for new always, and now it is at the receiving end of getting disrupted with other technologies, trying to find its way in a brand new world.

Couple of years back when a client of mine, a CEO who owned large volumes of educational content that was carefully garnered and curated over the years, asked me on how to monetize his data, my suggestion "make your data openly accessible" did not go particularly well with him. In those days the then emerging social monetization techniques were quite 'unorthodox' to his taste and securing data with payments looked more attractive to him.

However, the scene is rapidly changing and one can see more and more publishers embracing the open-access philosophy across the publishing landscape, especially in the scholarly content domain. With research spendings increasing over the years, tying the publication costs with research budgets is more attractive proposition than to tie it with that of readers'.

In the traditional model, author publishes the content for free, and the reader pays to get access to it The underlying idea there is: Good quality content is worth paying for. A side-effect: vast majority of the published knowledge was lying behind pay-walls; But there is a challenge for the reader in this model: How do you know what is worth paying for?

This is where the journal value comes into picture. Usually referred to as the Journal Impact Factor, it roughly represents:

  1. The reach and influence a journal has over the reader landscape and scientific output, and
  2. a rigorous set of review processes that only allow and publish the content that is up-to mark with the expectations of the said impact factor
Both of which make the high impact-factor journals more and more desirable for publishing:
  1. Readers know that they are paying for high valued content, and
  2. Authors know that their work is ensured wide publicity / reach
This model worked well. Till now, that is.

With the emergence of an alternate publication mechanism, popularized as social media, that is much cheaper (Free) and far-far wide-reaching than ever imagined before, the value of 'journal impact factor' became questionable (more to the publishers, than the readers). Because authors found this alternate venue of publishing is more easier to publish at, and has wider reach.

This is where the technology has forced the publishers to rethink their business strategy and adapt new trend, the paid-publication open-access trend.

In this model, the published content is made openly accessible to all and the publication charges are covered by the authors. While this 'publication charges paid by the author' looks like a questionable strategy, given that for most authors the publication charges are covered by their research grants, it is not that bad. Besides, the publisher has to cover their costs for retaining the quality and having to compete for shorter turn-around times. The former 'retaining the quality' is now the major differentiating factor between the free social publication vs paid journal publication, since in all other areas they both promise the same possibilities.

In this emerging model, content in both social and journal publications is openly accessible for all, and the revenues of journal publishers is now tied to much wider and stronger economy (the research grants), and authors get the same level of reach and publicity for their work, and as an additional bonus, knowledge is more widely spread than before.

All is good. Everyone wins - right?


The social publishing, with all its fanfare, has one major drawback (that the journal publishing did not have): data deluge / information explosion.

(cont.) Aww, I hate this site. I was looking forward to reading this study but anyone can publish to PLOS ONE

— Cole Pram (@colepram) April 27, 2016

Articles are free, alright - but they are useless if no one reads them.

In other words, how do you ensure "circulation" / "subscription" is not dropping?

Given that all publishers now have "all readers" as the potential reach and audience base, the "journal impact factor" does not really make sense anymore. In such case, how do you distinguish between one journal and another?

By adapting open access and paid-publishing model, authors become "customers" of publishers and research grants can (and would love to) dictate terms of which journal is worth publishing their work in. With journal impact factors, earlier it was easy to set high bars for the grant allowance (by demanding publications in only high impact factor journals). Now, with potentially all journals being equally wide spread, it poses a major problem to distinguish high quality work from the ordinary or otherwise.

This is where new kind of metrics are required - ones that can track the impact of an article at the reader level on a daily basis, to provision the authors, publishers and readers (and research grant providers) with a sense of direction and measurement as to what is ticking and what is not, which is what the role of Big Data is in all this.

Journal impact factor is private to any given journal and easy to calculate (by the publisher of that journal), taking into account the circulation of the journal, subscription base, cross-journal citations etc. on a monthly basis. But in an ever changing landscape of audience's interests, measuring the impact of an article on the (scientific) society and (scholarly) work is non-trivial task for traditional computation methods. Open-access brings more content to the readers, and if not properly supplied in an orderly fashion, important content may be missed out on reaching the right audience. Readers would like to have their content filtered based on their interests, which are mostly influenced by:

The concept of 'article level metrics' aims at capturing these above metrics for the articles and channeling the readers into reading more content that is specifically curated to match their personal interests. Big Data architectures designed around capturing these micro-level metrics from real-time user activity feeds are helping the publishers with: This helps: However, the real influence of big-data for the authors in the scientific publication is in terms of cross-referencing the previous work (citations) and creating reproducible research work. In the traditional publication setting, readers may not have had access to the cited work in a publication, which is not true anymore in the open-access setting. The digital object identifier (DOI) system brings uniform and universal access to all the cited content (including the data), which makes cross-referencing and reproducing the scholarly content much easier than earlier.

At Cenacle we are constantly working on bringing these reproducible research tools closer to more audience with less costs and the combination of their availability with open scholarly content and open data is a way of bringing down the barriers to high quality research outputs.

GK Palem, Strategy Consultant
Published: 20-Jul-2016

Keywords: GK, Consultant, IOT, Blockchain, Artificial Intelligence, Open Source, CarMusTy, CFugue, C/C++ Music Library, Carnatic Music, Song, Notation, MIDI, Typesetting, PDF, Books, Maya, Visual Effects, DirectX, OpenGL, Simulation, Predictive Analytics, Big Data, M2M Telematics, Predictive Maintenance, Condition-based Maintenance, Research, Cryptography, Distributed Ledger.