Search « Quædam cuiusdam
Google and the Digital Border
Friday 1 February 2008 @ 3:39 pm

Here’s something I had been waiting for a chance to quantify. Dian Schaffhauser writes:

Type “sonoma” and “mission” into books.google.com and choose “Full view” to eliminate those books that haven’t granted permission to be fully displayed or that are still in copyright because they were published post-1923. About 550 titles show up…

Or try it from the frozen wastes north of the 49th parallel, and 166 titles show up. Google is so careful about copyright, it hurts.




Posting XML from Ant
Thursday 1 February 2007 @ 9:01 pm
Update 2011-09-16: a better solution now is the Missing Link http task, which provides a one-stop full-featured http implementation for Ant.

Erik Hatcher, Steve Loughran and Thorsten Scherler on the solr-usr user list helped me to a method for posting batches of files into Solr from Ant. To start with, you need the ant-contrib package and the Jakarta commons-codec package; put them on Ant’s classpath, most conveniently in Ant’s lib directory.

Then, in your build.xml, declare the ant-contrib tasks:

<taskdef resource="net/sf/antcontrib/antlib.xml"/>

Now you’re ready to post some files. We use the foreach task to iterate through a bunch of files, and the postMethod task to send them up.

<target name="update" description="update all solr files">
  <foreach target="dopost" param="filename">
    <path>
      <fileset dir="path/to/records">
        <include name="*.xml"/>
      </fileset>
    </path>
  </foreach>
</target>
<target name="dopost" description="do post">
  <postMethod url="http://localhost:8983/solr/update">
    <!-- note: filename contains absolute path -->
    <file path="${filename}" contentType="text/xml;charset=utf-8"/>
  </postMethod>
</target>

Run ant update to send all the files up to Solr (or to another REST-oriented service). The update task passes each file to the dopost task in the filename parameter.

Why not set up targets to commit and optimize as well:

<target name="commit" description="commit">
<postmethod url="http://localhost:8983/solr/update">
    <text value="<commit/>"/>
  </postmethod>
</target>
<target name="optimize" description="optimize">
<postmethod url="http://localhost:8983/solr/update">
    <text value="<optimize/>"/>
  </postmethod>
</target>

This will come in very handy in my Ant-based approach to large xslt projects.

Comments (1) - Posted in Search,XML by  



Lucene/Solr-based OPAC replacements
Tuesday 2 January 2007 @ 9:44 pm

You can’t throw an exception these days without hitting a developer working on a Lucene-based OPAC. Casey Durfee‘s doing it, and will be presenting it at the Code4Lib conference under the provocative title “Open-Source Endeca in 250 Lines or Less”. Art Rhyno and Ross Singer have been doing it for a long time. Lucene guru Erik Hatcher is developing Solr Flare, a Ruby/Solr DSL (domain specific language) in his spare time over the next couple months, partly “to demonstrate a faceted browsing front-end on our library’s holdings (~3.7M records)”—and he too will present at Code4Lib. SirsiDynix is using Lucene in Horizon 8.0 (click the Horizon 8.0 tab). And yes, Bess called it last May, in the posting that introduced me to Solr. But would she have twigged if she hadn’t already been working on Lucene, at my suggestion?

This is the year of Lucene in library-technology-land. By the time another Ghost of Christmas Present comes around, these pioneers will have taught us all a lot about such things as pouring metadata from our ILS’s into Lucene indexes, keeping them in sync, managing faceted browsing based on dynamic data such as circ status, and most important, whether Lucene and Solr scale to the levels we need. 2007 is going to be a very educational year. The Lucene/Solr Preconference attached to Code4Lib will be historic.

Comments (2) - Posted in Search by  



Older Posts »