EZProxy Logs
Our EZProxy server logs all of our offsite users’ accesses of licensed resources. The logs are in standard web-server format, so we can use standard tools like Analog to parse them and generate reports. There’s wonderful information in there that could support our collection development activities – but it’s locked up in cryptic URLs. We need a way to map those URLs back to the products we license. This is no easy task: the URLs represent the structures and functions of the various vendor sites, and we’re reduced to gross generalizations like assuming that an article download will have “.pdf” somewhere in the URL.
Who can help us with this? How about the makers of link resolvers and ERM systems? They’re the ones who are tracking the contents and characteristics of licensed resources. If they could generate and maintain signatures for URLs representing various kinds of access for various resources, they could sell us an EZProxy log analyzer that would give us much better information about our community’s usage of the resources we license.
I did alot of this analysis of EZProxy log files at my previous job. We built software to do this. One of the more interesting things you can garner from the files (via the referrer) is what path the user took to access the resource. Did they come to the resource via your catalog, website (which page on the website), OpenURL resolver or federated search tool. You can check out the article I wrote on this Coombs, K. A "Lessons Learned From Analyzing Library Database Usage Data" Library Hi Tech 23(4), 598-609. Feel free to contact me if you have questions.
On a very practical level: have a look at work + software by Chris Keene in the UK: http://www.sussex.ac.uk/Users/cjk20/ezproxy/index.html It very cleverly first analyses the so called ezproxy.cfg config file that has the basic URL's of e-journal publishers to get a starting point for analyzing the EZProxy log. Otherwise:I like the idea of predictable "access signatures" for publishers to come up with. If they can come up with predictable URL's for article access to facilitate OpenURL resolving, why not something similar for usage analysis. It might take a reformulation though of the standard weblog format: what happened to the work at Virgina Tech on a new XML standard for weblogs / usage logs in digital library context ? Repke de Vries, national library of the Netherlands