taxonomy and curation

Over the weekend I spent some time on the taxa that, although mentioned in curated papers, haven’t yet made their way into NCBI.  You can see a list on the taxonomy status page.  Although there already was a taxon table in the admin database, it was never filled after my decision to put all the terms (including taxonomy) in one common table, tagged with domain and authority codes.  So I spent some time adding fields and implementing infrastructure so these names and their identifiers and authority can be captured, and eventually have terms generated.  These need to be maintained in a separate table so they can be regenerated when the term table is reloaded from the set of support ontologies.  The merge remains to be implemented.  Happily I have been able to resolve all the names so far using the World Spider Catalog, which seems to be authoritative.  

The taxonomy status page started last year as a simple list of the taxa I couldn’t find in NCBI.  A little over a month ago, I cleaned it up and found urn identifiers for each of the missing names in the spider catalog.  In some case these were synonyms.  On Sunday I discovered that I had missed a synonym (the name contained an alternative, unsupported, spelling of the genus), which has been cleaned up now as well.

I also cleaned up a couple of issues that prevented full access for people coming from having full access to the site.

Meanwhile I am waiting the answer to a few modeling questions that will hopefully allow me to display some text showing the original context of the behavior terms.  This should may make the modeling and granularity issues I’m facing a little clearer.  Arachnolingua is fundamentally a database of usages, not simply terms, with the primary intent of supporting comparative analysis of narrative description and other ‘pre-character’ data relevant to behavior.  Part of this is allowing conflicting assertions to be highlighted, if not resolved.

Curation on a project like this involves growth in both breadth and depth.  Most of the work at the start will be with depth so that something can be said about anything.  Arachnolingua is still at that stage, but it is possible to start building the set of annotations, at least at the granularity of a first pass.  The total number of annotations in the KB is up to 15 and there are things, even in the existing annotations that the system is not capable of displaying.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s