Wednesday, November 2, 2011

Choosing a taxonomy for the INSTAAR web site

Why a taxonomy?

On our new web site, we want to use a set of research themes--a taxonomy--to tag most of the content: publications, projects, labs, news items, the works. The goal of the taxonomy is to group similar content together, rather than to distinguish between particular content elements.  The whole point is to help people find related content.

The right taxonomy will not only allow people to find content that interests them, but will work with our identity and showcase the way we form connections and cross disciplines.  The taxonomy terms are metadata; we can use them to do several things:
  • Feature stories on the home page and various landing pages that represent the full spectrum of INSTAAR research—we want to make sure that we’re not covering some people and their research but skipping others entirely.
  • Browse web site content.  Keyword searching is useful if you know what you’re looking for, but there are times when browsing works best.  Of our primary audiences, we expect prospective students and potential donors to browse a lot.  The taxonomy gives us a contextual, engaging, effective browsing aid that helps convey the interdisciplinary nature of our research.
  • Link people, labs, and previously written content to new content easily.
  • Help solve our identity problem, in which the INSTAAR name doesn’t match the full gamut of what we do.  By showcasing research in both cold regions and not-cold regions, we can show people what we’re about.
  • As time goes on, we can use the taxonomy to create some really cool, interactive experiences on the site.
Requirements

What factors make a taxonomy appropriate for us?
  • Not too big – too many words create confusion, not clarity.
  • Not too small – the taxonomy should be able to adequately describe major facets of INSTAAR.
  • It should make sense to us –INSTAAR people need to be able to tag their own work without a lot of hassle or training.  If it doesn’t make sense to us, it won’t do a good job of making sense of our work to other people.
  • It should make sense to the intelligent non-scientists who comprise some of our primary audiences (such as potential donors).
  • Consistent treatment of topics – cover topics evenhandedly, at the same level of detail.
  • Consistent treatment of terms – terms should use the same case and format across the entire vocabulary.
  • Room to grow – should allow us to add expertise and follow new research directions without re-doing the whole taxonomy.
Candidates

I evaluated several sets of terms using the above requirements.

Some failed immediately:

The three INSTAAR groups: Ecosystems, Geophysics, and Past Global Change.
Issues:    Too small, no room to grow (we’ve already got people who don’t fit in any category), doesn’t make sense to external groups.  At the retreat we decided to do away with the three groups as a way show INSTAAR to outside world.

SPIN codes: the keywords used in FRPAs to describe research interests.
Issues:    It’s like someone threw 19 unrelated glossaries in a blender and hit frappe.  Too big (thousands of terms, even when you cut out the ones not relevant to INSTAARs), not evenhanded, no equal treatment of terms, doesn’t make sense to anyone.

Others were good contenders that met some, though not all, criteria:

AGU Session Topics: the categories used at the AGU Fall Meeting to parse presentations at the AGU Fall Meetings.  Two Directorate members at the retreat suggested looking at these.
Results:    The terms do quite well in their context at AGU, helping scientists parse which sessions they might want to attend.  They seem to cover most INSTAAR topics.  The terms did not test well, however, with non-scientists.  There was frequent confusion over the meaning of terms (“What on earth is Nonlinear Geophysics?”).  Many had difficulty matching a term with the topic the person had in mind (“Where is climate change?”).
    The taxonomy also does not have consistent treatment of terms.  For example, some terms are the names of scientific disciplines (Atmospheric Sciences, Seismology) but others are environments (Cryosphere) or other categories (Natural Hazards, Public Affairs).  The list of terms also seems to change from year to year and even between paper submissions and the final program, so I’d have a hard time selecting a final list.
    The geographic terms, being fairly general and evenhanded, could work for us.

CIESIN subject headings (Center for International Earth Science Information Network): terms used to categorize data, applications, and other information cataloged by CIESIN.  Find it at http://www.ciesin.org/sub_guide.html
Results:    The taxonomy seems to be about the right size (14 terms).  The terms are consistent in form and coverage. 
But the terms don’t seem to be a good match for INSTAAR research.  There are a lot of policy, health, and data-related terms that don’t really apply (Poverty, Economic Activity, Environmental Treaties) or that should be encapsulated in a broader term for us.  And there are no geology-related terms for people involved in land surface processes, glaciology, volcanism, etc.

A cut-down version of the American Geological Institute (AGI) Thesaurus: major headings only.
Results:    I love the AGI Thesaurus, because it is a professionally developed, consistent, rigorously parsed, and above all thorough taxonomy.  It is used to tag all the references in the GeoRef database.  It has beautiful clarity and consistency.
    It is also, however, enormous.  Even if we only use the highest level of terms, we’re talking hundreds, maybe thousands, of words.
    Furthermore, while the AGI Thesaurus has extensive coverage of all geological topics, it does not have enough terms to describe ecosystems or archeological research—they are out of scope for GeoRef.  So we’d be stuck with uneven coverage for those groups at INSTAAR.

Custom taxonomy: developed in-house from our INSTAAR publications.
Results:    This is feasible.  We can come up with terms that describe—at a general level:
field of study (paleoclimatology, alpine ecology)
environment (coastal environment, forests, tundra)
topic investigated (glaciers, water supply, climate change)
substance analyzed (lake sediments, foraminifera)
technique used (dO18, C-14)
I took a stab at this and got about 200 terms.  On the plus side, I tried to follow AGI’s guidelines and I think they are fairly consistent.  They are applicable to INSTAAR research and we can extend the taxonomy, using the guidelines, in the future.
But I worry about that many terms.  I’m concerned that it’s overkill, and that no one but me will add all the finicky little tags to their work.  Too complicated!
   
In the end, one taxonomy was left standing.

NASA Global Change Master Directory: like the CIESIN subject headings, these terms were developed as metadata for a data clearinghouse.  Find them at http://gcmd.nasa.gov/.
Results:    The right size: 14 terms total; 12 apply to INSTAAR.  Enough to cover our research topics, but not so many as to create confusion.
    They seem to make sense to the INSTAAR people and outside people I’ve showed the list to so far.  (Of course we need to test that idea!)
    Evenhanded treatment of topics, and consistent treatment of terms.
    The terms were developed by scientists and information professionals and have been widely adopted.  They are used by hundreds of organizations.  The terms have been meticulously tested and updated ever since 1995, when they entered the public sphere.  (There is also a fully developed, formal thesaurus that underlies these categories, which I swoon over, but which is overkill for us.)

I went through the INSTAAR publications lists from 2009-2010 and tagged them all with NASA terms.  It worked pretty well.  Every paper was connected with at least one category.  Most connected with two; a few with three categories.  That seems like about the right level for cross-web site browsing.

The terms:
  • Agriculture
  • Atmosphere
  • Biosphere
  • Biological Classification
  • Climate Indicators
  • Cryosphere
  • Human Dimensions
  • Land Surface
  • Oceans
  • Paleoclimate
  • Solid Earth
  • Terrestrial Hydrosphere
Recommendation

At this point, the NASA GCMD terms are the only set that meets our requirements for a taxonomy.  It also has the advantage of existing now, so we can just pick it up.

I would like to take the next step of testing it with both INSTAAR and outside audiences.  I want to make sure that the terms make sense to both groups, and that INSTAARs will feel comfortable having their research described using the terms.  I especially want feedback from researchers who work with carbon and nitrogen cycles.

If the taxonomy works for most people, I propose we adopt it for the INSTAAR web site.

If it seems to work only partially, I suggest that we use the feedback we get from the tests to modify the NASA GCMD terms to better suit our purposes.

4 comments:

Anonymous said...

Hey! The NASA taxonomy works well for my research on Antarctic diatom ecology, but to me, the terms seem a bit vague to describe folks doing say, hydrologic research. Also, it seems wrong to be at INSTAAR and not include the search term "alpine".

Anonymous said...

I want to echo the feedback just given - although this list seems pretty good, hydrology and hydrologic sciences don't seem to have a clearly defined home.

Anonymous said...

Can our list be an expanded version of the NASA GCMD, adding the few words we think are missing?

Shelly said...

Thanks, folks, for your feedback. We added the term "terrestrial hydrosphere" for hydrology and hydrologic sciences - that was a definite gap. Working on the others (alpine etc.). Please do comment with any other specific terms you think of!