doi’s solved (at least for now)

Yesterday’s post discussed some difficulties with processing doi’s and reviewing the generated OWL file in Protege.  It looks like I’ve resolved the problem for now.  The trick is first to declare the doi prefix (mapping to “http://dx.doi.org/”) using the owl format for the ontology:

OWLOntologyFormat format = manager.getOntologyFormat(ontology);
format.asPrefixOWLOntologyFormat().setPrefix("doi", "http://dx.doi.org/");

followed by generating the IRIs for the publications using two-argument version of IRI.create(), specifying the prefix as the “http” string (not “doi”). 

Note that Protege displays URIs/IRIs without prefixes, which mean the doi’s will display (unless overridden) as, for example: 10.1636%2F0161-8202%282000%29028%5B0097%3AHDLHAB%5D2.0.CO%3B2

Not the prettiest, and I’ll probably override these with an rdfs:label once I decide on something (and probably generate it on the arachadmin side to be read from the database).

 

Advertisement

Trouble with doi’s

My first attempt with Arachnolingua was building the knowledge base directly in OWL using Protege.  Protege is a very good ontology editor and because the OWL support in Protege 4+ is based on the OWLAPI, which I have and continue to use (one of my main reasons for continuing to use Java), it should play well with owlbuilder which uses the OWLAPI to store and generate the ontology files.  Unfortunately Protege isn’t very good with doi’s.  It seemed to do better with http: style doi’s (e.g. http://dx.doi.org/10&#8230;. vs. doi:10…), at least for naming the individuals for each publication (and declaring them to be publications).  However, when I attempt to refer to them in an assertion that a statement about behavior is part_of a containing publication, Protege complained on loading that the URI had no recognizable prefix.  Defining doi: as a prefix didn’t seem to resolve the complaint, so I went back and generated doi: prefixed URIs.  It seemed to accept this, but display of the individuals within Protege as <doi:10…> suggests it isn’t entirely happy with this.  I’m not sure if I have everything correctly (un)escaped in these new doi’s either.  Not the highest priority in the big picture, but I’ll try to resolve it before adding more logic (taxonomy will be next).

Generating OWL

After I posted yesterday I wound up implementing some new unit tests and allowing assertions to be identified with generated ids.  Tonight I extended this by adding the first non-trival OWL expressions I’ve generated so far.  Now each assertion (statement about behavior) is asserted in owl to be an instance of a text expression that denotes some behavioral process.  Technically, the intersection of textual entity (IAO:0000300) with things that denote (IAO:0000219) some behavior process (NBO:0000313).  There are some kluges in what I’ve implemented, and these will need to be cleaned up to be able to talk about textual entities that talk about particular behaviors which have particular taxa as participants.  But this is where the OWL really begins.

A couple of new papers

I haven’t been as active in adding papers recently – I collected a fair number during my time at NESCent, and I’ve been more focused on getting something to work and starting to work through the curation backlog.  Nevertheless, I added a couple of papers on spontaneous male death in different groups of spiders today (Schwartz et al. 2013 and Foellmer and Fairbairn 2003).  At least in the Fishing spiders studied by Schwartz, the situation is more a case of self incapacitation which facilitates post-mating cannibalism than immediate death as a result of copulation.  I went ahead and pushed these to the Mendeley group – apparently the first update to the group in over a year.  I need to do something about that.

On the software side, updated database query methods so the existing unit tests will work (mostly related to the schema change from term_usage to assertion).

Unit testing for Owlbuilder

Since I’m using maven to build owlbuilder, unit testing comes pretty easy.  Of course since there’s a database involved, it would be nice to have a separate database so testing could write junk without messing up the real thing.  I should look in to doing this for arachadmin too.  This will require finding out how a web2py project can use multiple databases or have testing use something else.

A Beginning

This really isn’t the beginning.  Arachnolingua came out of a discussion at the February 2011 Phenotype RCN which lead to me proposing to collect a set of terms describing spider behavior so that existing behavior ontologies (at that time only the ABO core, NBO was not initiated until later that spring).  Since then, I constructed a (non-functional) front page and built a OWL file with Protege that used Ontofox to mireot terms out of several ontologies (especially NCBItaxon, Spider anatomy, IAO, as well as NBO as it developed).  I presented a lightning talk about this first stage at the 2012 iEvoBio meeting in Ottawa.

The slides for that talk made their way on to slideshare, and and were subsequently cited in an overview chapter by Melissa Haendel and Elissa Chelser.  If you go to the site front page today (June 2013), little has changed, just a few spelling corrections, and a new page for status queries on the backend triple store.

Of course there was no backend triple store last year.  All the semantic content on the site consisted of a collection of owl files, the most important of which was a first pass at representing behavior descriptions in OWL using vocabulary primarily from the OBO Information Artifact Ontology.  There were also 6 import files that I generated using the Mireot capabilities of the OntoFox tool.  Unfortunately, due to DNS issues, you couldn’t even load the complete ontology into Protege from the site (that has been subsequently fixed with the addition of the arachb.org alternate domain).

Meanwhile, I’ve also been collecting target papers mentioning spider behavior (currently about 500).  Some of these are complete ethograms, some are on other topics that contain brief mentions of a spider behavior pattern in passing.  Although the former are more immediately interesting, including the others are potentially more valuable just because of their obscurity.

The list (originally a spreadsheet) is now mostly moved to a mysql database, which isn’t visible from the site, but is used to generate the ‘new’ version of the semantics.  Some of the papers in the collection have made their way on to the Mendeley Spider Behavior group (after some vetting).   I just noticed that I am rather behind on this push process.

Meanwhile the real curation work consists of extracting assertions about spider behavior (more about those later) and putting those in the database.  These are coming more slowly and the design of the ‘assertion’ table is, unsurprisingly, driven by what the generated OWL will look like.

After an abortive attempt to build a java tool to maintain the database (I’m rather tired of swing and building new java desktop applications at this point) I was introduced to web2py as part of my new assignment with phylografter within the OpenTree of Life project.  I’ve built a tool called arachadmin for entering and editing the database.   I am using java for the generation part, primarily because the OWLAPI and associated tools still seem to be the best option for these sort of things.

That’s enough for now.  Next post I’ll describe what is happening behind the scenes in the site.