Exploring Alternatives

One of the nice parts about the switch to pyramid is that it supports a range of alternatives.  For example, at the moment I am trying out forms in WTForms after converting the templates to jinja2.  In some ways the arachcurator tool has become a hybrid of technologies from pylons and from flask.  I guess this means if pylons disappears tomorrow, I won’t be starting from scratch.  It was a gut-level instinct call that lead me to prefer pyramid over flask in the first place.

Meanwhile, all the browsing code is built, tested, and now back in a state of flux as I shift forms technology.  There are 3djs graphs for both individuals and their part as well as expanded display trees on the claim page.  I have added users (mostly for change tracking) and three levels of authorization with the hope to put an arachcurator server up on AWS at some point (sharing more than code and demonstrating replicability I hope).  I even went so far as to add password encoding with passlib.

Meanwhile, as discussed here, I’ve started building a T-box vocabulary for spider behavior.  This will extend what can be said in annotations, in a way that changes here will only simplify.




Adding some status tools

Most of my work over the past couple of days has been on a publication status tool.  This page lists the publications and for each a list of identified issues for curation.


The buttons in the left trigger a couple of update tools.  The first auto-updates new, derivative fields that can be filled from existing fields.  For example, the original spreadsheet contained a disposition field with free text like ‘Not found’ or ‘Downloaded’.  These are not sufficient for the type of curation and provenance I have in mind, so I’ve added a vocabulary of publication curation status types (the ‘vocabulary’ is currently just a table of ids and strings).  So the update tool tries to set the curation status from the contents of the disposition field.

The second button links to a, as yet unimplemented, tool for checking and updating doi strings.  The plan is to validate existing dois and, if possible, to query CrossRef to fill dois I haven’t already found (either pubs I missed or cases where older publications added dois post hoc).  I’ve run into a couple of places where CrossRef listed multiple dois for what seems to be the same publication, a situation that I was able to resolve manually by checking the publisher’s site, but I’m not planning to deal with the problem of different doi’s across different registrars discussed by Rod Page.

The current status display is simply a list of problems, which can only include, for now, problems with processing the disposition field as described above.  The colored empty lists are a temporary issue I’ll get back to shortly.

Meanwhile, there is also the start of support for pulling vocabulary (e.g., anatomy, taxonomy) from external ontologies.  This will require some OWL parsing as well as parsing the NCBI taxonomy download files (should be easier than parsing OWL).  There is the start of database support as well as a page for ontology sources (e.g., name, url) and a vocabulary of ontology processing types (file formats).  I expect this will follow the lead of the publication status page.

Generating OWL

After I posted yesterday I wound up implementing some new unit tests and allowing assertions to be identified with generated ids.  Tonight I extended this by adding the first non-trival OWL expressions I’ve generated so far.  Now each assertion (statement about behavior) is asserted in owl to be an instance of a text expression that denotes some behavioral process.  Technically, the intersection of textual entity (IAO:0000300) with things that denote (IAO:0000219) some behavior process (NBO:0000313).  There are some kluges in what I’ve implemented, and these will need to be cleaned up to be able to talk about textual entities that talk about particular behaviors which have particular taxa as participants.  But this is where the OWL really begins.

A couple of new papers

I haven’t been as active in adding papers recently – I collected a fair number during my time at NESCent, and I’ve been more focused on getting something to work and starting to work through the curation backlog.  Nevertheless, I added a couple of papers on spontaneous male death in different groups of spiders today (Schwartz et al. 2013 and Foellmer and Fairbairn 2003).  At least in the Fishing spiders studied by Schwartz, the situation is more a case of self incapacitation which facilitates post-mating cannibalism than immediate death as a result of copulation.  I went ahead and pushed these to the Mendeley group – apparently the first update to the group in over a year.  I need to do something about that.

On the software side, updated database query methods so the existing unit tests will work (mostly related to the schema change from term_usage to assertion).

Unit testing for Owlbuilder

Since I’m using maven to build owlbuilder, unit testing comes pretty easy.  Of course since there’s a database involved, it would be nice to have a separate database so testing could write junk without messing up the real thing.  I should look in to doing this for arachadmin too.  This will require finding out how a web2py project can use multiple databases or have testing use something else.

A Beginning

This really isn’t the beginning.  Arachnolingua came out of a discussion at the February 2011 Phenotype RCN which lead to me proposing to collect a set of terms describing spider behavior so that existing behavior ontologies (at that time only the ABO core, NBO was not initiated until later that spring).  Since then, I constructed a (non-functional) front page and built a OWL file with Protege that used Ontofox to mireot terms out of several ontologies (especially NCBItaxon, Spider anatomy, IAO, as well as NBO as it developed).  I presented a lightning talk about this first stage at the 2012 iEvoBio meeting in Ottawa.

The slides for that talk made their way on to slideshare, and and were subsequently cited in an overview chapter by Melissa Haendel and Elissa Chelser.  If you go to the site front page today (June 2013), little has changed, just a few spelling corrections, and a new page for status queries on the backend triple store.

Of course there was no backend triple store last year.  All the semantic content on the site consisted of a collection of owl files, the most important of which was a first pass at representing behavior descriptions in OWL using vocabulary primarily from the OBO Information Artifact Ontology.  There were also 6 import files that I generated using the Mireot capabilities of the OntoFox tool.  Unfortunately, due to DNS issues, you couldn’t even load the complete ontology into Protege from the site (that has been subsequently fixed with the addition of the arachb.org alternate domain).

Meanwhile, I’ve also been collecting target papers mentioning spider behavior (currently about 500).  Some of these are complete ethograms, some are on other topics that contain brief mentions of a spider behavior pattern in passing.  Although the former are more immediately interesting, including the others are potentially more valuable just because of their obscurity.

The list (originally a spreadsheet) is now mostly moved to a mysql database, which isn’t visible from the site, but is used to generate the ‘new’ version of the semantics.  Some of the papers in the collection have made their way on to the Mendeley Spider Behavior group (after some vetting).   I just noticed that I am rather behind on this push process.

Meanwhile the real curation work consists of extracting assertions about spider behavior (more about those later) and putting those in the database.  These are coming more slowly and the design of the ‘assertion’ table is, unsurprisingly, driven by what the generated OWL will look like.

After an abortive attempt to build a java tool to maintain the database (I’m rather tired of swing and building new java desktop applications at this point) I was introduced to web2py as part of my new assignment with phylografter within the OpenTree of Life project.  I’ve built a tool called arachadmin for entering and editing the database.   I am using java for the generation part, primarily because the OWLAPI and associated tools still seem to be the best option for these sort of things.

That’s enough for now.  Next post I’ll describe what is happening behind the scenes in the site.