Note to self and others

Arachnolingua focuses its OWL expressions on the claim (= assertion, statement) of an individual behavior event or class expression and properties should start there and work out.  Thus: courtship_event –has_participant–>palp–part_of–>male–has_type–>Habronattus sp.  There may be consequences for this decision (especially for class level statements), but it is better to be consistent and document the design decision here for now.

This should eventually make it into the ‘primer’ documents for the curation tool and the database schema as well.  I wonder if there are any tools in Protege for looking at graphs of individuals – maybe lego?

Advertisements

Playing with ELK (the reasoner)

It’s been a long slog the past few weeks.  I’ve been pulling IRI’s for taxon, anatomy, substrates (entities that aren’t part_of an actor) that are associated with participants in a behavior, as well as IRIs for behavior and publication for assertions.  Yesterday, I finally had all (at least most) of the pieces together – I was now ready to start matching term IRIs against the support ontologies and copying OWL ‘entities’ from the support ontologies into the target ontology that will become the OWL file that’s loaded onto the server.

The first step was to merge all the support ontologies and run a classifier over them – primary to determine the class hierarchy.   The merging went smoothly and didn’t take too long, but trying the OWLAPI’s structural reasoner on the 7436381 assertions that resulted from merging the 8 support ontologies seemed a bit too much for it.  After 45 minutes on a 4-core i7, I decided it was time to try something else.

ELK  has been attracting some interest in the biological ontologies community in the past couple of years as a very fast way to do reasoning for ontologies that can stay within the limits of the OWL-EL language profile.  As it turns out, the current version of ELK currently implements only a subset, but that subset is more than sufficient my very limited immediate needs.

The first task for the reasoner was simply to extract the superclass closure (all the classes above) Arachnida in the NCBI taxonomy hierarch, followed by all subclasses of the same taxon.  Those, along with Arachnida itself (which is a taxonomic class as well as a OWL class) are copied into the target ontology, along with the axioms specifying their super/sub class relations and their labels (= Linnean names).

It all works – owlbuilder is generating an owl file that loads in Protege (after making sure the DOI cleanup was getting called in the right place), and contains a couple of other classes pertaining to an as yet incomplete representation of a posture in Tetragnatha straminea.  Nothing special about this behavior or species (surprised the first species wasn’t a jumping spider?), just the first publication that came up in the literature search all those years ago was about a couple of Tetragnatha species.  You’ll be hearing more about this behavior and a couple of other behaviors in this species and some congeners as I fill in the pieces and start pushing real data to the server.

It’s Spider Monday; no pretty pictures, but lots of database work

I’ve been focusing on the frontend Arachadmin webapp for several months, and now that I can, albeit slowly, add assertions that annotate behavior descriptions, it’s time to pay some more attention to generating OWL files.  Although there are still several housekeeping tasks (e.g., updating the OWLAPI and other libraries), I focused on how the database has changed recently.  The most important changes were adding code for loading Term beans (objects) from the term table as well as starting support for loading ontology files into the OWL tool.  Loading the support ontologies into the backend will allow full access to a reasoner to support filtering terms and axioms for inclusion in the final knowledgebase.  There are a number of support tables – those that define domains and authorities that were necessary for meaningfully loading terms that were loaded as well.  One of the drivers for adding support for these secondary tables so quickly was getting the testing framework setup for terms and support ontologies, both of which refer to the set of domains (e.g., taxonomy, anatomy, behavior) defined in the database and assigned to each term and support ontology.  This also mean that the test database needed to be updated to more closely reflect the schema structure in the working arachadmin database.  The updated test database has been uploaded to github.  As usual, I will upload updates of the full working database to dropbox and figshare.

Note: I’ve just published the arachadmin export on figshare.  If I understand how this works, this data now has a permanent (across future revisions) identifier, namely this.

 

Needless to say, I did enjoy the many #SpiderMonday photos people tweeted today.  My arthropod photography skills are gradually improving, maybe another season and I’ll have something to share.

Tackling OBO imports

Descriptions need vocabulary, since I am using vocabulary from OBO ontologies, I need to make the terms available for constructing assertions.  Making available means reading them from somewhere, finding the terms, definitions, etc, and either displaying them or reading them from a cache that needs to be updated from time to time.  The NCBO taxonomy is updated daily and although most of their daily updates will not include Arachnids, it still makes sense to be able to force updates on Taxonomy as well as other ontologies, either automatically at each startup (as done by Jim Balhoff’s excellent, but not quite right for present purposes, tool Phenex) or on command, trusting the curator to do the updates from time to time.

For parsing the RDF/XML that remains the default format for the OBO foundry’s rendering of OWL files, I’m using the SAX-like iterparse facility of the lxml python library.  The choice to use lxml comes again from my day job, though most nexml files are, unlike NCBITaxon.owl, small enough to be processed with a DOM parser without blowing out memory.  I’ve tested the parsing and, at least for now, building of a list of classes with both a relatively small ontology (Evidence codes) as well as the OWL serialization of the NCBI Taxonomy.  So far, good enough.

Also made the ontology status page available via a menu item.  The parser (in a module called ontology_tools) is not yet hooked up to the status page, I want to turn the parsing result into something more useful first (probably a db table).  Also added some hooks for linking publication citations on the publication status page back to the editing page for that publication.