A bit of curation

Over the weekend I spent some time messing with publications and merging author strings and made some progress but decided it wasn’t as much of a priority as shaking down the existing system by adding some more annotations.  So last night I added 7 more annotations from Aiken and Coyle’s 2000 Tetragnatha survey.  This added variety in behavior, taxonomy and anatomy as well as forcing me to confront the lack of vocabulary for substrates.  In particular there are annotations for prey handling and wrapping.  Since the paper doesn’t explicitly identify the prey involved, assuming anything beyond arthropod (which isn’t even 100% certain), it may be best to identify a term for prey from an ecological ontology (maybe something to pursue at the PCO workshop in Tucson I’ll be attending in two weeks).

I ran into a couple of minor problems, one of which appeared as a crash caused by a publication that had an empty string rather than a NULL in its database DOI field.  The more interesting fix was to add the code from pulling in parents and annotations from anatomy terms (I missed that in the first pass and just got lucky with the two anatomy terms I used).  This revealed itself when I loaded the OWL file into Protege and to see that ‘whole organism’ was no longer labeled (it appeared by its OBO identifier).  In retrospect this is a little strange, since ‘whole organism’ was one of two anatomy terms that were used with the first two annotations, however, adding the parent and annotation (e.g., rdfs:label) extraction seems to have resolved the problem.

Yesterday I got a query on this blog about my use of the ELK reasoner.  I paid a bit more attention to what it was reporting last night.  It is currently run over the merge of all the support ontologies (which constitutes the import closure of the ontologies that actually get used in annotation) and allows querying of the subsuming (parent) classes and, in the case of taxonomy, the subsumed (child) classes.  These queries determine what will be pulled into the target, which is represented as a separate OWL ontology.  Now this fairly large collection of 11 ontologies cover a range of expressiveness from AL (what Protege calls the base attributive language) through to a number of ontologies that Protege reports as SIQ.  Apparently these are complex enough to cause ELK to complain, but ignore a number of axioms.  For the present purpose the reasoning is sufficient even if it may be incomplete.

What I haven’t done yet, and probably should, is run ELK across the generated target KB and see whether there are problems.  I noticed that running FACT++ within Protege did reduce the number of root concepts when I reviewed the output of OWLBuilder, so there is something to be gained.

In any case, there are now short ethogram listings for the genera Tetragnatha and Deinopis, as well as some new annotations for T. straminea.  I’m not sure whether to tackle improved display for the ethogram table or try a behavior catalog (behavior hierarchy with taxon counts) or a anatomical catalog yet.

Also, the response time is starting to be noticable (generally 1.5-2 sec for ethogram queries).  It may be that speed will bump me up to the next AWS tier, while I had expected memory to be the constraint, but I’m not ready to make that move yet.  Stay tuned.

 

 

Advertisement

Playing with ELK (the reasoner)

It’s been a long slog the past few weeks.  I’ve been pulling IRI’s for taxon, anatomy, substrates (entities that aren’t part_of an actor) that are associated with participants in a behavior, as well as IRIs for behavior and publication for assertions.  Yesterday, I finally had all (at least most) of the pieces together – I was now ready to start matching term IRIs against the support ontologies and copying OWL ‘entities’ from the support ontologies into the target ontology that will become the OWL file that’s loaded onto the server.

The first step was to merge all the support ontologies and run a classifier over them – primary to determine the class hierarchy.   The merging went smoothly and didn’t take too long, but trying the OWLAPI’s structural reasoner on the 7436381 assertions that resulted from merging the 8 support ontologies seemed a bit too much for it.  After 45 minutes on a 4-core i7, I decided it was time to try something else.

ELK  has been attracting some interest in the biological ontologies community in the past couple of years as a very fast way to do reasoning for ontologies that can stay within the limits of the OWL-EL language profile.  As it turns out, the current version of ELK currently implements only a subset, but that subset is more than sufficient my very limited immediate needs.

The first task for the reasoner was simply to extract the superclass closure (all the classes above) Arachnida in the NCBI taxonomy hierarch, followed by all subclasses of the same taxon.  Those, along with Arachnida itself (which is a taxonomic class as well as a OWL class) are copied into the target ontology, along with the axioms specifying their super/sub class relations and their labels (= Linnean names).

It all works – owlbuilder is generating an owl file that loads in Protege (after making sure the DOI cleanup was getting called in the right place), and contains a couple of other classes pertaining to an as yet incomplete representation of a posture in Tetragnatha straminea.  Nothing special about this behavior or species (surprised the first species wasn’t a jumping spider?), just the first publication that came up in the literature search all those years ago was about a couple of Tetragnatha species.  You’ll be hearing more about this behavior and a couple of other behaviors in this species and some congeners as I fill in the pieces and start pushing real data to the server.