It’s Darwin Day – I’ll celebrate by updating the site

I’ve spent a couple of hours cleaning up the arachnolingua site.  In particular, the link to the old ontology file is gone from the top menu bar – that file hasn’t been valid in nearly a year and even the current file doesn’t deserve that level of billing.  If you want to see the OWL file, there is a link on the project page (available from the ‘about’ menu item).  The new OWL file is, I’m happy to announce, a reasoned over version, with inferred axioms covering  inferences of subclass, equivalent class, and class assertion being added to the ontology.   ELK didn’t complain (unlike when it reasons over the mashup of supporting ontologies) and a quick inspection in Protege showed no apparent changes when the resulting ontology was re-reasoned with FACT++ (unlike when the unreasoned ontology was loaded – several top level classes disappeared after reasoning).  The lack of improvement resulting from reasoning in Protege doesn’t say much about FACT++ vs. ELK, simply that the KB is expressively simple enough that it is covered by the portion of OWL-EL that ELK reasons over.  It’s also still quite small, so the speed advantage of ELK isn’t very obvious in the second pass.  Hopefully arachb will keep growing past the point where this is no longer the case.

The project page has also been updated to link to the source ontologies directly rather than the MIREOT’ed owl files that went with the old KB file.  Perhaps these and other questionable links have something to do with all the hits on the projects page (and only the projects page) from Russian sources that my logs the past week or so have been showing.

The other big change is the attention I’ve given to the taxonomy status page.  First, a word of explanation: the taxonomy status page is a list of taxa I’ve encountered during curation (so the publications listed on this page have undergone some review and behavior annotation and will likely be the next ones to appear in the KB) that do not appear in the NCBI taxonomy.  The list here is over a year old, but none of the names listed here have appeared in the intervening months.  In some cases (e.g., synonymy) this is to be expected.  In any case, I’m using the World Spider Catalog as my authority for cases where names don’t appear in NCBI.  Because the WSC text pages don’t contain full species names and lists of genera are split by family, I’ve used wikipedia and EoL to get family names for these taxa.  Looks like this may trigger some EoL contributions on my part.

In any case, enjoy and feel free to comment on what you see, suggest papers to review (or bump up the queue) or any other suggestions.





Playing with ELK (the reasoner)

It’s been a long slog the past few weeks.  I’ve been pulling IRI’s for taxon, anatomy, substrates (entities that aren’t part_of an actor) that are associated with participants in a behavior, as well as IRIs for behavior and publication for assertions.  Yesterday, I finally had all (at least most) of the pieces together – I was now ready to start matching term IRIs against the support ontologies and copying OWL ‘entities’ from the support ontologies into the target ontology that will become the OWL file that’s loaded onto the server.

The first step was to merge all the support ontologies and run a classifier over them – primary to determine the class hierarchy.   The merging went smoothly and didn’t take too long, but trying the OWLAPI’s structural reasoner on the 7436381 assertions that resulted from merging the 8 support ontologies seemed a bit too much for it.  After 45 minutes on a 4-core i7, I decided it was time to try something else.

ELK  has been attracting some interest in the biological ontologies community in the past couple of years as a very fast way to do reasoning for ontologies that can stay within the limits of the OWL-EL language profile.  As it turns out, the current version of ELK currently implements only a subset, but that subset is more than sufficient my very limited immediate needs.

The first task for the reasoner was simply to extract the superclass closure (all the classes above) Arachnida in the NCBI taxonomy hierarch, followed by all subclasses of the same taxon.  Those, along with Arachnida itself (which is a taxonomic class as well as a OWL class) are copied into the target ontology, along with the axioms specifying their super/sub class relations and their labels (= Linnean names).

It all works – owlbuilder is generating an owl file that loads in Protege (after making sure the DOI cleanup was getting called in the right place), and contains a couple of other classes pertaining to an as yet incomplete representation of a posture in Tetragnatha straminea.  Nothing special about this behavior or species (surprised the first species wasn’t a jumping spider?), just the first publication that came up in the literature search all those years ago was about a couple of Tetragnatha species.  You’ll be hearing more about this behavior and a couple of other behaviors in this species and some congeners as I fill in the pieces and start pushing real data to the server.