Tackling OBO imports

Descriptions need vocabulary, since I am using vocabulary from OBO ontologies, I need to make the terms available for constructing assertions.  Making available means reading them from somewhere, finding the terms, definitions, etc, and either displaying them or reading them from a cache that needs to be updated from time to time.  The NCBO taxonomy is updated daily and although most of their daily updates will not include Arachnids, it still makes sense to be able to force updates on Taxonomy as well as other ontologies, either automatically at each startup (as done by Jim Balhoff’s excellent, but not quite right for present purposes, tool Phenex) or on command, trusting the curator to do the updates from time to time.

For parsing the RDF/XML that remains the default format for the OBO foundry’s rendering of OWL files, I’m using the SAX-like iterparse facility of the lxml python library.  The choice to use lxml comes again from my day job, though most nexml files are, unlike NCBITaxon.owl, small enough to be processed with a DOM parser without blowing out memory.  I’ve tested the parsing and, at least for now, building of a list of classes with both a relatively small ontology (Evidence codes) as well as the OWL serialization of the NCBI Taxonomy.  So far, good enough.

Also made the ontology status page available via a menu item.  The parser (in a module called ontology_tools) is not yet hooked up to the status page, I want to turn the parsing result into something more useful first (probably a db table).  Also added some hooks for linking publication citations on the publication status page back to the editing page for that publication.




Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s