Metadata is such an essential part of the work we do, and yet it can at times be so complex that it feels a bit esoteric if you’re not working with it every day. Razorfish’s Rachel Lovinger did a great job of breaking through the mysteries of metadata in this two-part CSA 2012 workshop.
Metadata is often described as the content about the content. When a piece of content is published, metadata gives it context and meaning, and also enables people to more easily interact with it: by search, filtering, aggregation, contextual linking, syndication, personalization, and more. There are three types of metadata, Rachel explained at the outset:
- structural: models the content types and attributes
- administrative: provides information about how the content was created, when and by whom it was created, and access and ownership rights
- descriptive: the subject matter – what is it about?
Structural metadata involves determining the different content types and determining how they relate to each other — Rachel refers to this as content modeling. Unless you are working with XML (which many content people avoid, Rachel admits), this is often something that becomes a function of the content management system (CMS): identifying the different attributes of a content type and asking content authors to select those attributes when creating a piece of content. A few notes on structural metadata:
- An example of relationships defined as part of content modeling would be a book review site, where the book and the author each have their own page; they are cross-linked to one another and may have pieces of content embedded in each other’s pages.
- Content strategists must think about how minutely information must be broken down when defining content attributes — because there is no end to the number of details that could be defined, but content authors may actually only use two or three of them.
- Content modeling is typically done collaboratively with the technical team, developing the content model and wireframes simultaneously.
Administrative metadata provides details about how the content is used, published and delivered. It provides for considerations such as:
- Are these things important for what we are trying to create?
- Where did the content come from?
- Are there restrictions on how it can be used
- Is the content time-sensitive or evergreen?
- Who can access it?
- When it’s archived or indexed, how will it be ordered?
- Does the content have to adhere to legal regulations?
A few notes about administrative metadata:
- It is extremely important to identify consistent formats for administrative metadata (such as format of dates and values, Boolean values, selection lists, etc.) especially if the metadata will be used to enable sorting or filtering.
- Versioning and tracking are important especially if you are importing third-party feeds, to make sure you don’t create duplicates of content from feeds of corrected or updated transcripts.
- Some administrative metadata may be generated automatically by the machine that created it (cameras, CMSs).
Descriptive metadata lets us classify content according to the subject’s characteristics. As described in structural metadata, we all face a number of choices in deciding how and what to classify about a subject. What depth do we actually need to support functionality?
To illustrate this, workshop attendees were tasked with listing all the high-level dimensions we could think of to describe superheroes. The group came up with a number of different ways to classify our superfriends, including innate powers, vulnerabilities, place of birth, and even whether the superhero wore pants on the inside or outside.
Rachel also showed the example of a website that sells flooring, demonstrating how deep it is possible to go with a taxonomy — types of wood, grains, colors and countless other characteristics presenting themselves as possibilities. This segued to a discussion of taxonomy development, including the benefits of using standardized taxonomies as a starting point for structural metadata in order to make content more findable by users. She suggested a number of resources:
- Dublin Core (a set of core attributes that can be used for any type of content)
- Schema.org (a collection of frameworks for a range of content types, developed through a collaboration between Google, Bing and Yahoo)
These are open taxonomies, which means they serve as a starting point and allow you to add other fields and attributes. There are new standards emerging all the time, Rachel says; deciding which to choose requires looking at who published the standard and whether it will likely be adopted (for example, taxonomies by the Motion Picture Association or the ISO will have a better chance at being credible industry standards). There are also a number of online databases with commercial taxonomies, many of which are linked to hundreds of other databases. If you add to these taxonomies, you may consider pushing yours back out to online databases for sharing with others.
You can also create your own taxonomies using your own content — and Rachel suggests looking as much at offline content to do this as at online content.