Updates, taxonomy, and post-publication review

Updates: Last week I tweaked the ethogram (taxonomy) view so that entering the name of a higher level taxon will retrieve behaviors for all included (subsumed) taxa.  This is implemented in the simple, non-elegant way – crawl the tree and retrieve the annotations using SPARQL for navigating, but the control is all implemented in java.  Of course traversing the tree has one advantage over a reasoner query to retrieve all included taxa – the results are guaranteed to come back in some sort of tree traversal order.  It works (try ‘Tetragnatha‘), but it is a bit slow.  I’ve also configured a more capable server, but haven’t deployed it yet, so be patient with these queries (there are some that seems to require 2-3 minutes to complete, I’ll let you figure out which).

Taxonomy: There’s not a lot new to report here – OpenTree has been keeping me busy these past few weeks.  I have been doing some more curation tool work to support taxa outside of NCBI and thanks to Chris Mungall and James Overton, there will soon be a new OWL rendering of the NCBI taxonomy in OWL which should make its way into the backend database soon.  I’m still tracking the addition of Arachnid taxa into NCBI – the majority of updates seem are sample records which won’t help with behavior, new species for ticks and spiders are trickling in as well.

Also, yesterday was Taxonomist Appreciation Day.  Although I have dabbled in taxonomy informatics (TDWG, VTO, a bit in OpenTree, as well as the taxonomy work here) I would never consider myself to be a taxonomist.  I do, as should any biologist, appreciate and thank the generations of taxonomists in the 250+ years since Linnaeus who have brought order and names for the millions of species we share this planet with.

Curation and Post Publication Review: A couple of items I found in twitter over the past few days have struck an interesting thought.  The first was a discussion of how curators of the UniProtKB database deal with changing understanding of the activity of the SiRT-5 protein.  This paper looked at how the UniProt curators responded to a changing understanding of the activity of this protein.  Initially this protein was understood to exhibit deacetylase activity, based primarily on documented activity of other members of the family and some in vitro assays that demonstrated the deacetylase activity. More recent papers have documented that the in vivo activity of this protein is more likely to be succinylation.   The paper describes how annotations in the UniProtKB were modified to incorporate both classes of activity in the appropriate contexts, providing a review process for the earlier reports in high of later results.  Thus the curation process provides a post-publication, albeit specialized, peer review.

This is relevant in light of this post I saw this morning on the likely limits of post-publication peer review.  Now, the particular papers discussed in the UniProt example were published in high profile journals such as Cell and Science, so the particular case does not speak against the 1% notion mentioned in the Dynamic Ecology post.  But not all curation is focused on the sort of topics that make it into the elite 1% of published papers.  My publication database does have a few papers from Science, Nature and one or two other high profile publications.  But the majority come from places such as the Journal of Arachnology, Animal Behavior, or lesser known journals from Japan or Latin America.  This leads me to a somewhat more optimistic conclusion about the future of post-publication peer review than Jeremy Fox.

It’s Darwin Day – I’ll celebrate by updating the site

I’ve spent a couple of hours cleaning up the arachnolingua site.  In particular, the link to the old ontology file is gone from the top menu bar – that file hasn’t been valid in nearly a year and even the current file doesn’t deserve that level of billing.  If you want to see the OWL file, there is a link on the project page (available from the ‘about’ menu item).  The new OWL file is, I’m happy to announce, a reasoned over version, with inferred axioms covering  inferences of subclass, equivalent class, and class assertion being added to the ontology.   ELK didn’t complain (unlike when it reasons over the mashup of supporting ontologies) and a quick inspection in Protege showed no apparent changes when the resulting ontology was re-reasoned with FACT++ (unlike when the unreasoned ontology was loaded – several top level classes disappeared after reasoning).  The lack of improvement resulting from reasoning in Protege doesn’t say much about FACT++ vs. ELK, simply that the KB is expressively simple enough that it is covered by the portion of OWL-EL that ELK reasons over.  It’s also still quite small, so the speed advantage of ELK isn’t very obvious in the second pass.  Hopefully arachb will keep growing past the point where this is no longer the case.

The project page has also been updated to link to the source ontologies directly rather than the MIREOT’ed owl files that went with the old KB file.  Perhaps these and other questionable links have something to do with all the hits on the projects page (and only the projects page) from Russian sources that my logs the past week or so have been showing.

The other big change is the attention I’ve given to the taxonomy status page.  First, a word of explanation: the taxonomy status page is a list of taxa I’ve encountered during curation (so the publications listed on this page have undergone some review and behavior annotation and will likely be the next ones to appear in the KB) that do not appear in the NCBI taxonomy.  The list here is over a year old, but none of the names listed here have appeared in the intervening months.  In some cases (e.g., synonymy) this is to be expected.  In any case, I’m using the World Spider Catalog as my authority for cases where names don’t appear in NCBI.  Because the WSC text pages don’t contain full species names and lists of genera are split by family, I’ve used wikipedia and EoL to get family names for these taxa.  Looks like this may trigger some EoL contributions on my part.

First behavior assertions available

Ok, the first annotations are up (crudely).  The choice of what to start with was fairly arbitrary, though these had the advantage of being very simple.  The paper cited is just a survey of Tetragnatha species that turned up first in the alphabetized list of publications.  I have about 150 more annotations sitting in a spreadsheet waiting for the infrastructure to settle down.  Of course there is also a lot more to do with both the OWL generation (publications currently have nothing but their doi’s or internal ids) and the display (e.g., each of the labels correponds URI that could be linked, though OntoBee pages aren’t the most friendly).

Getting this table to work is the end result of a lengthy SPARQL session on Saturday, mostly spent building up a query that captured the taxon, anatomy, behavior and publication as it crawled up a deeply nested RDF hierarchy.  Once I had the query working, it was back to the javascript to update the little utility that transforms the JSON-ized SPARQL query results into a table (which now has more than one column) and then fix the cases where an NCBI id was available (e.g., Habronattus) which just returns the name and the identifier,  and failing properly for taxa not supported (e.g., Homo sapiens), as well as non-taxa (e.g., SPARQL injection attempts).

The server is a bit slow, meaning I’m close to hitting the limits of the AWS micro-instance I’ve been serving from for the past 18 months.  Probably time to start planning the upgrade (which also may involve switching from Ubuntu to Debian).

For now though, I’ll enjoy what’s there and add more goodies within the limits.





A front page facelift

I’ve done some rework on the front page, mostly updating the layout to take advantage of Twitter’s bootstrap.  I also finally have a suitable photo for arachnolingua branding.  You can see a minimized image at the upper left of the front page, and here is a larger image showing one of my resident spiders on the open pages of the glossary from the 1917 editor of Bright’s Anglo-Saxon Reader.  The spider is standing over the Anglo-Saxon for both ‘weave’ and ‘grow’, which are a pretty good compromise for not having ‘spider’ in the glossary.