Skip to content

Your browser is no longer supported. Please upgrade your browser to improve your experience.

The matter with metadata: why are taxonomies still so taxing?

We live in a world where the word “publishing” – at least to the layperson – conjures up images of books, bindings and heaving (but very physical) slush piles. And while we in the industry have a very different perspective on publishing’s digital reality, the legacy of the physical book is still one which even the most technologically savvy of us can find hard to shake.

The truth is, the whole way in which publishers do business is still set up and organized around the idea of the book as end-product; and this continues to impact (and limit) the way in which content is authored, assembled, edited and disseminated.

As Carl Robinson notes in his whitepaper, Thinking Outside the Books“By envisaging the product in this way, key decisions are already determined: the delivery format (the book) determines the shape of the content even before proverbial pen is put to paper. By removing [this] format-based thinking, the publisher is free to innovate and explore new revenue opportunities.”

This is echoed In Brian O’Leary’s essay, Context Not Container, where he rightly points out that approaching content in a way that is format-first “define[s] content in two dimensions, necessarily ignoring context, defined here as tagged content, research, footnoted links, sources, and audio and video background, as well as title-level metadata.

It is metadata that we will be concentrating on in this article, since, as Laura Dawson puts it in her essay What We Talk About When We Talk About Metadata, “Metadata assumes a critical importance once the content is out of the container.”

By releasing content from its container – be that a book, magazine, or any other linear narrative – we can move away from the idea of the book as a static, ‘canonical’ object and open up new ways of thinking about content. Content suddenly ceases to be something static and linear, pinned within the pages of a book, and becomes something dynamic and flexible.

And metadata is the primary enabler and driver of this dynamism and flexibility. Tagged and semantically enriched content not only becomes more discoverable (serving both internal users and external consumers), but also develops powerful associability. This means it can be resurfaced where it’s most relevant (for example, in a personalized data feed, or as recommended content), or linked to other pre-existing data-sets to create brand new content services. By annotating content with metadata, you can enrich its meaning, enhance its value and support product innovation.

Metadata is also critical for development of more complex services and functionality, such as:

– Dynamic auto-assembly of content

– Highly personalised data feeds

– ‘Constantly curated content’ automatically updates on an ongoing basis

– Opening up networks of linked data (such as scientific research) and making them directly available and discoverable to the consumer

– Analysis and tracking of content and content processes

In short: metadata is essential if you want to do anything more complicated or interesting with your content then simply shove it into a book!

Most modern publishers of course are already well versed in the importance of metadata, but what they may not have taken into account – or indeed be fully aware of, but struggling to manage – is the quality of that metadata. How useful your metadata is, is defined by its quality.

Aside from impacting on how useful or usable the metadata, low quality metadata can have a negative effect in and of itself – as Renee Swank, VP of Discovery Initiatives at Ixxus, points out:

In our online world, metadata can often be the user’s first experience with the publisher’s assets. If there’s low quality metadata, the user’s misperception may be that content assets themselves are of low quality too. In other words, metadata can reflect a user’s perception of the overall quality of a document.”

Often when advising publishing clients, we like to emphasize the importance of the ‘3 Cs of Metadata’ : Completeness, Correctness and Consistency. We recommend that publishers define a Quality Assurance Methodology for metadata, defining standards and governance over what is meant by ‘accurate metadata’, as well as the process for how to measure the quality of the metadata.

For organizations with a large volume of content assets, maintaining metadata and ensuring its quality and consistency across your content environment can represent a massive manual overhead. This is time-consuming, costly to the business and heavily prone to human bias and error – meaning content may end up mistagged, or metadata may be inconsistently applied.

So what can publishers do to lighten the load?

Bulk metadata editing tools can alleviate some of the burden by allowing you to edit common fields across large batches of content. Automation of metadata allocation where possible can also be a huge value-add, both in terms of time-saving and in supporting the ‘3 Cs’. For example, in our work at Ixxus, we’ve developed a light-weight auto-tagging application (called Taxonixx), which analyses documents, matches their contents against selected controlled vocabularies and then automatically and consistently applies tags. Automating processes such as these can be a huge time-save, giving publishers all the benefits of metadata-rich content without the huge time-suck of classifying every piece of content.

Historically, being able to truly enrich your content in a way which would deliver services of the type of complexity mentioned earlier has been the preserve of the major publishers, who are able to invest in large-scale project to develop ‘cathedrals of metadata’.

This is changing though; and with light-weight, out-of-the-box tools such as Taxonixx, we are looking at how this can be democratised to the advantage of smaller and mid-level publishers, as well as industry giants.


Natalie Guest is head of marketing at Ixxus, a content technology company working with some of the world’s largest publishers.  She has been published in The Independent, The Sunday Times, The New Statesman and The Bookseller, and curated Tower Hamlets WriteIdea Festival Literary Fringe in 2013.

Back to Archive