Most of my work over the past couple of days has been on a publication status tool. This page lists the publications and for each a list of identified issues for curation.
The buttons in the left trigger a couple of update tools. The first auto-updates new, derivative fields that can be filled from existing fields. For example, the original spreadsheet contained a disposition field with free text like ‘Not found’ or ‘Downloaded’. These are not sufficient for the type of curation and provenance I have in mind, so I’ve added a vocabulary of publication curation status types (the ‘vocabulary’ is currently just a table of ids and strings). So the update tool tries to set the curation status from the contents of the disposition field.
The second button links to a, as yet unimplemented, tool for checking and updating doi strings. The plan is to validate existing dois and, if possible, to query CrossRef to fill dois I haven’t already found (either pubs I missed or cases where older publications added dois post hoc). I’ve run into a couple of places where CrossRef listed multiple dois for what seems to be the same publication, a situation that I was able to resolve manually by checking the publisher’s site, but I’m not planning to deal with the problem of different doi’s across different registrars discussed by Rod Page.
The current status display is simply a list of problems, which can only include, for now, problems with processing the disposition field as described above. The colored empty lists are a temporary issue I’ll get back to shortly.
Meanwhile, there is also the start of support for pulling vocabulary (e.g., anatomy, taxonomy) from external ontologies. This will require some OWL parsing as well as parsing the NCBI taxonomy download files (should be easier than parsing OWL). There is the start of database support as well as a page for ontology sources (e.g., name, url) and a vocabulary of ontology processing types (file formats). I expect this will follow the lead of the publication status page.