Ontologies managed (for now); java takes me back to test driven development

Although I was quiet last weekend, I was quiet busy, switching over ontologies to local loading and copying in arachadmin.  I discovered that, although NCBITaxon was the largest, GO and Chebi were sizable ontologies as well.  I should be able to trim both of them down, especially on arachadmin side, since most of their terms won’t be relevant to spider behavior, though that hasn’t happened yet.  After adding the four additional ontologies to the ontology_source table, I found that doing a full download and update seems to take about 15 minutes, which is slow, but since it should happen rarely, that’s manageable.

Then I switched over to owlbuilder and implemented ontology loading that used the ontology_source table for locating files.  After committing this, I was surprised to receive a build-failure email from Travis-CI.  I had forgotten that I had created an account linked to the owlbuilder github repository.  I did this out of curiosity when people discussed using it on a project call for my day job.   Since I hadn’t made any owlbuilder changes for a couple of months, both the mail and the failure were unexpected. 

Setting up Travis had actually been very easy – I just added a simple description file to the root of the project repository, linked my github account, and Travis could pull the files, find out that the build was maven-based and attempt to build and test.  Since running the tests locally generally worked (I like to, whenever possible, leave/commit things only when the tests pass), I wondered for a moment why they weren’t working, then remembered that Travis didn’t have access to the mysql database where everything, including a test database resided.

So, then the question was whether there was a way to make the tests pass in a meaningful way in the absence of the test database.  I pretty quickly came up with a solution involving mock objects and abstractions for the database connection as well as the ResultSet objects that the JDBC (library connecting java to SQL databases) use to return the results of queries.  Briefly, since I had a DBConnection class that wrapped around the JDBC connection object, I just built an abstracted interface that supported everything I defined for my connection class and then defined a mock object that answered to the same interface.  When the static method of DBConnection was called, it returned a DBConnection when the attempt to connect succeeded and a MockConnection when the attempt failed.  The MockConnection simply returned MockResults objects that implemented the same methods as a very thin wrapper (DBResults) that wrapped the ResultSet from JDBC.  Not really that complicated, but (as is typical for java) a lot of updating method signatures, as well as defining the methods in the MockConnection that returned MockResults.  Fortunately, MockResults are only used to fill ‘bean’ objects that represent each of the kinds of things represented by rows in the most important database tables.  Got a full set of test cases to run on with the mock objects by Tuesday, and I’m finishing up the updates to the test database so it passes exactly the same test methods as the mock data (which I actually more completely implemented than the test database – which had been lying fallow for a few months now).  In someways, the mock data test methods have driven the database test methods which will in turn affect the ‘real’ code used to build things.

Although there is still a taxonomy table in the database, I’ve determined that there is no reason for owlbuilder to actually read it – its present purpose is simply to hold names that don’t currently exist in NCBI, which can then be merged in as terms with arachnolingua identifiers.  Besides the identifiers, the rows for these terms will also specify a different authority (I assume the World Spider Catalog).  Using authority fields like this will allow me to generate a taxonomy status page without resorting to treat term identifiers in a non-opaque manner (e.g., looking for the domain specified in a URI).  So I’ve taken for reading the taxa out of the backend.  The code for loading terms by id is implemented, as are assertions and their associated participants.  There are separate methods for reading the primary participant (the spider most of the time) and secondary participants (e.g., prey or substrate).

If it is eventually determined that BFO compliant ontologies will not support the notion of process qualities, I assume that qualities of some sort will be attached to the primary participant (e.g. fast movement of legs vs. fast legs moving).  This is part of the reason I’m not rushing to add qualities here.  Other reasons are that my focus has been more on the structure (topography) of behavior (e.g., the relation of the component pieces) rather than qualities of the whole behavior.  This also gives PATO (the phenotype qualities ontology) some more time to mature.