SIICS

In the wake of the showcasing of various Solr-based OPAC projects at Code4Lib (both the conference and the preconference), it’s a good time to think about the way forward for Solr. We’ve seen that Solr’s search functionality can let us build the kinds of services we want, and that its scalability will cope with the kinds of loads we’re likely to throw at it. There are still questions about how to handle facets with large numbers of values, and the best strategy for handling dynamic metadata (like circulation status or user tags) alongside the relatively static bibliographic metadata, but these are being worked on by the Solr development community and we can expect solutions to be found.

We have a class of problems I propose to call Scalable Indexing of ILS Content for Solr, or SIICS (pronounced like “six”, as a tribute to Nines, the pioneering digital library application for Solr, using Eric Hatcher’s Collex).

* Scalable: we need to build and maintain indexes of millions of records
* Indexing: this is just the indexing side of the process; the search interface side is well in hand and doesn’t present major technical challenges at this point
* ILS Content: not just MARC records but locations, holdings and circulation status, all residing in more or less open, more or less standards-based systems, depending on the vendor system
* for Solr: cuz that’s what we’re talking about

In order to exploit Solr in the OPAC space, we need to be able to access our ILS data in a timely way, that doesn’t disrupt the workings of the ILS, and that doesn’t lose data that we need for indexing along the way. What constitutes timeliness will vary for different players, but we need to be able to get big lumps of data into the index with a minimum of delay. With dynamic data like circ status or tagging, the ideal is instantaneous updates: if you sign a book out or assign a tag, you should be able to see the change immediately in the interface.

To make this happen, libraries are going to need a wide range of solutions, depending on their vendors, licensing arrangements, non-disclosure agreements, systems architecture, and so on. This limits the extent to which we can collaborate on these problems. To the extent that we can, however, we should; these are going to be hard problems.

If Solr fails in the OPAC space, it will be because it was deep-SIICSed.