A Metadata Primer (and why you should care)


The fundamental ‘selling point’ of DITA (Darwin Information Typing Model) XML is the concept of content reuse. The idea of having a small ‘chunk’ of content that is stored in one place and used in many instances. Content reuse can dramatically decrease the actual amount of content that the technical publications group has to manage and significantly increases the quality of the content (because the same idea is stated the same way always).

Soren Weimann has a video explaining the idea of content reuse in DITA using Lego™ blocks (which represent individual topics), paper outlines of Lego™ blocks (maps), a bowl (to represent a Content Management System) and a camera (to illustrate output). The video is at https://vimeo.com/82182488

One of the oft-cited challenges of implementing content reuse is that it’s so hard to actually do. Many of the organizations that have implemented DITA have very little (or none) content reuse. It’s typically viewed as a future enhancement that hasn’t been implemented yet.

We often hear “It’s hard to find similar topics” from those that are having difficulty implementing reuse.

A comprehensive metadata strategy would go a long ways to help implement reuse. So what is metadata and why would a metadata strategy (and taxonomy strategy) help?


What is metadata?

Metadata is information about your content. Think of it as the label on a can of content.

Without having to open the can you can know a lot about what’s inside. Ingredients, directions for use, nutritional information, related recipes, etc. – it’s all there on the label. Think of the label as a collection of metadata. It’s not the soup, but it tells you a lot about the soup.

Now imagine if that data was organized in a manner that would allow you to search on it – classified in ways that were useful to readily find the information. For a can of soup the label strategy includes information about the soup (ingredients and nutritional information), how to use it (directions for use and recipes) and who made it.




Why is metadata important for content reuse?

Metadata in a DITA setting allows you to know a lot about the topic without having to open the topic. It is a key component of enabling content reuse because – if implemented correctly – it makes it easier to find similar topics.
Imagine a scenario where your company is developing the SuperWorks 300. It’s based on the SuperWorks 200 but has improvements in the electronics that allow it to communicate with a remote controller, including the capability to update the integrated software remotely. Much of the technical information for the SuperWorks 300 is identical to the relevant parts of the SuperWorks 200, but you need to pay close attention to the electronics aspects of the technical information to make sure that you’re providing the right information for the SuperWorks 300. Finding the relevant topics that you may be able to reuse segregates your content into two categories: information that you already have (from the SuperWorks 200 or otherwise) and areas you’ll need to work with Engineering to create new content.


What is a metadata strategy?

In a similar manner to a soup label, a metadata strategy for technical content topics may include information about the topic, what product it’s applicable for and who created or revised it. A good metadata structure is a mix of the obvious and intuitive. Not only is a good idea of what you have important, there also needs to be a solid understanding of the direction the organization is headed to try and best address future needs. A successful strategy should be based on discussions with writers to understand the current process and the perceived shortcomings.

Metadata strategy is a constantly evolving process as product lines evolve. Criteria that was unknown five years ago (e.g., IoT sensor capability) may be critical now. Criteria that was paramount twenty years ago (e.g., uses Freon-12 ) may be obsolete.

Options for a (useful) metadata include:

  • Author information
  • Audience information
  • Content information/usage
  • Applicable product information
  • Review status

Our experience is that full text search is not a useful metadata strategy because there are too many false positives.


A (potential) boon for end-users

Extending metadata into and outside of the enterprise not just for content reuse– it also supports end users searching your content to find that needle in the (content) haystack. Examples include supporting a faceted search on a website, retrieving a repair topic based on diagnostic codes or serving up content based on product feature.

Amazon uses metadata to help you zero-in your search to help you find (and buy) exactly what you are looking for. A search for “solar panel” allows me to narrow my search based on:

  • Rated Power
  • Feature keyword
  • Brand
  • Avg. Customer Review


What about taxonomy?

Related to metadata is Taxonomy - the practice and science of classification. The basic idea behind taxonomy is to provide a controlled vocabulary for metadata attributes, and to specify relationships between terms in the controlled vocabulary. Taxonomies allow a search for one thing and have results that are related to that thing - automatically. If you use Amazon then you’re probably familiar with the concept of taxonomy. Selecting a hammer will also show results for items that are related to a hammer.

Screwdrivers and tool bags have a similar classification to hammers and an interest in one may lead to an interest in another.


Putting it together

A thoughtful metadata and taxonomy strategy can be a critical aspect to implementing a viable content reuse strategy. Investing the time to standardize the vocabulary and identify the “Five Ws” (Who, What, Why, When, Where) go a long way to helping find similar topics.