A bit of curation

Over the weekend I spent some time messing with publications and merging author strings and made some progress but decided it wasn’t as much of a priority as shaking down the existing system by adding some more annotations.  So last night I added 7 more annotations from Aiken and Coyle’s 2000 Tetragnatha survey.  This added variety in behavior, taxonomy and anatomy as well as forcing me to confront the lack of vocabulary for substrates.  In particular there are annotations for prey handling and wrapping.  Since the paper doesn’t explicitly identify the prey involved, assuming anything beyond arthropod (which isn’t even 100% certain), it may be best to identify a term for prey from an ecological ontology (maybe something to pursue at the PCO workshop in Tucson I’ll be attending in two weeks).

I ran into a couple of minor problems, one of which appeared as a crash caused by a publication that had an empty string rather than a NULL in its database DOI field.  The more interesting fix was to add the code from pulling in parents and annotations from anatomy terms (I missed that in the first pass and just got lucky with the two anatomy terms I used).  This revealed itself when I loaded the OWL file into Protege and to see that ‘whole organism’ was no longer labeled (it appeared by its OBO identifier).  In retrospect this is a little strange, since ‘whole organism’ was one of two anatomy terms that were used with the first two annotations, however, adding the parent and annotation (e.g., rdfs:label) extraction seems to have resolved the problem.

Yesterday I got a query on this blog about my use of the ELK reasoner.  I paid a bit more attention to what it was reporting last night.  It is currently run over the merge of all the support ontologies (which constitutes the import closure of the ontologies that actually get used in annotation) and allows querying of the subsuming (parent) classes and, in the case of taxonomy, the subsumed (child) classes.  These queries determine what will be pulled into the target, which is represented as a separate OWL ontology.  Now this fairly large collection of 11 ontologies cover a range of expressiveness from AL (what Protege calls the base attributive language) through to a number of ontologies that Protege reports as SIQ.  Apparently these are complex enough to cause ELK to complain, but ignore a number of axioms.  For the present purpose the reasoning is sufficient even if it may be incomplete.

What I haven’t done yet, and probably should, is run ELK across the generated target KB and see whether there are problems.  I noticed that running FACT++ within Protege did reduce the number of root concepts when I reviewed the output of OWLBuilder, so there is something to be gained.

In any case, there are now short ethogram listings for the genera Tetragnatha and Deinopis, as well as some new annotations for T. straminea.  I’m not sure whether to tackle improved display for the ethogram table or try a behavior catalog (behavior hierarchy with taxon counts) or a anatomical catalog yet.

Also, the response time is starting to be noticable (generally 1.5-2 sec for ethogram queries).  It may be that speed will bump me up to the next AWS tier, while I had expected memory to be the constraint, but I’m not ready to make that move yet.  Stay tuned.



Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s