Libraries « Quædam cuiusdam
A Walk with Love and Data
Wednesday 26 October 2011 @ 9:22 pm

Last week I attended the annual Access conference, this time in Vancouver with the theme “The Library is Open”. It’s always an overstimulating conference, but this year I was made drunk with the confluence of fresh thoughts about things I care about more and more, the intersections of the presentations with things I’ve been thinking about in other contexts, and the engaging personalities. And there was beer. For a politically naive introvert, it was too much too quick. This posting is a preliminary and personal attempt sort it all out.

Tweet: @calvinmah On my way to #access2011 hotel to escort people to #hackfest

It started with Hackfest, where the geeks get together to spend a day coding up some interesting project from the list of submitted ideas, or from their own fertile and over-caffeinated brains. I had put in an idea for a personal digital archive, called “Spalatum” after the Roman fortress/palace that formed the matrix for the medieval city of Split in Croatia. When the Roman architects moved out and stopped maintaining the place, the locals moved in and kept the town going in their own way, subdividing the Roman buildings or putting up small houses against the old walls. Diocletian’s mausoleum became the cathedral. In the same way, I want a digital archive that can continue to be used and preserved by non-techie heirs after the geek who built it is gone. My idea wasn’t taken up by any of the groups (but I mean to get back to it). I worked on a Linked Open Data project, playing with the Silk Workbench application to enhance RDF data by matching fields with an external SPARQL-enabled source like dbpedia. It was fun and we learned a lot about Silk but didn’t get very far with our data.

That evening a bunch of the hackers went out to dinner at a Japanese/Korean place, then headed off to find some good beer. On the way we picked up Bess Sadler, who had (typically) found a medieval reenactment operation nearby and spent the last hour playing with swords.

IMG_2710
Bess (photo: BigD)

Bess is a talented developer, and surrenders to laughter more completely than anyone I know: always a welcome addition. We ended up at the Steamworks Brewery and I talked with Bess and the dangerous-looking Nick Ruest, black locks spreading in all directions, gorgeously tattooed up one arm. We talked about working conditions at our various institutions. The passion that keeps us interested in the stuff we do isn’t always nourished in the workplace, unfortunately.

Ended up in the hospitality suite to taste a selection of craft beers, as prescribed by the magisterial man-mountain of San Diego.

@bohyunkim #access2011 keynote is excellent in describing where libraries are standing now - on the border b/w commons & increasing privatization

The conference proper opened the next morning with the keynote by Jon Beasley-Murray of UBC. He started with Borges’ vision of the infinite library and the fertility and sufficiency of the library as a public good. He developed this into a call to the barricades to defend the commons against corporate encroachment, holding that the domination of formerly open online space by closed systems like Apple’s and Facebook’s is precisely parallel to the early modern enclosures like the Highland clearances. (The displaced Highlanders came to Canada: what new lands will the refugees from Facebook colonize? Diaspora may be aptly named indeed.) He put it in Marxist terms: what the corporations are doing is “primitive accumulation”, which he preferred to call “accumulation through dispossession” (to avoid the implication that it only happened in the distant past). Their wealth derives directly from the resources which they have seize from the public domain for their own use. A roomful of librarians was on his side the whole way: information wants to be free, after all. He encouraged us to engage in “massive projects of common productivity” such as Wikipedia. I wondered wimpishly whether the leftist framing of the question would attach it to polarizing issues in right-leaning Alberta (where talk about protecting the commons from corporate exploitation is likely to be seen by the establishment as a coded attack on the oil industry), and asked a question about whether the idea could be framed in a non-partisan way. Beasley-Murray helpfully put it in terms of the university’s role in fostering social critique, regardless of party. “There are no saints here”: everyone is subject to critical investigation.

I was therefore thinking about democracy when the next presentation started. This was Jer Thorp’s dazzling tour through the data visualizations he does for the New York Times and some outside projects such as the 9/11 memorial. This talk was emotionally fraught for me, since it was the David Binkley Lecture, in honour of my brother Dave, who died in 2005. The idea was Gary Gibson’s, and each year the lecture is generously supported by Gibson Library Connections; my only contribution was to suggest that it be used to bring in a speaker from outside the library world. Dave was a regular at Access in the early days, and after I came into the library biz he and I overlapped at a few Accesses. (I treasure the time he came to a symposium in Edmonton on the future of the ILS, and he was being introduced as my brother for a change.)

Tweet: Tweet: @cIRcle_UBC Cascade view of "But Will It Make You Happy?" - seeing the story (conversation) actually unfolding from different views #Access2011

The visualizations were dazzling and joyful. One was an aggregation of tweets that include the phrase “good morning”, represented as bouncing blocks on a rotating globe. You see the wave of pogoing cubes advance around the planet, following the sun, as people wake up and tweet. Another, more elaborate one was Cascade: a real-time tracker of social-media responses to NYT articles — not just tweets but also url-compressions and decompressions — represented in three dimensions. The most moving visualizations, though, were the ones that brought things down to the personal level. The first was a visualization of your movements based on the GPS data that iPhones were found to have been storing; the other was a timeline of NYT articles against which you could map the phases of your life (childhood, college, marriage, etc.), creating links to stories that were important to you. Thorp described the transition from irritation at another big-brother intrusion into his privacy with the iPhone scandal, to fascination with reliving a year of his life via the GPS traces. He called it “embodied history”. He and others set up a site where you can upload your trace (openpaths.cc) and make it available as a dataset to researchers if you choose. From the default position of “that’s too private to share” he moved to “under some circumstances, I’ll share that, even though it’s private”.

From Thorp’s talk and others over the next two days I learned of lots of exciting new tools that I want to master, or master more fully:

They all have to do with data normalization and visualization. They make me long to become a data wizard, able to make datasets dance the way Thorp does. The point, though, is to make data tell true stories. It took me back to palaeography classes with Father Leonard Boyle (later Prefect of the Vatican Library). The duty of the philologist, Father Boyle said, was to make an ancient document speak as fully and as truthfully as possible about the context in which it was created. There was an ethic involved: the relationship between me and a medieval scribe whose every pen-stroke I could trace but about whom I knew nothing else was real, and I owed the same respect for his or her personhood that I owed anyone I met on the street. The new technologies are just as capable of telling compelling lies as compelling truths, and they are therefore covered by the same scholarly and personal (and librarianly) ethic.

I felt a similar sense of responsibility in the “Real Face of White Australia” project, which I read about a couple of weeks ago on Tim Sherratt’s blog (note to future Access organizers: get this guy!) It starts from scans of immigration documents for mostly Chinese or Indian workers who came to Australia in the late 19th or early 20th century (the age of the “White Australia” policy) and were subject to restrictions on their travel. The documents include photographs; Sherratt’s inspiration was to use open-source facial recognition software to crop the faces out of the scanned documents and present them as a waterfall, with more faces appearing no matter how far you scroll down, each linked to the source document so you can find out about the individual. It zooms you from the macro level of political criticism of the racist policy down to the micro level of individual stories, and back again through the sheer accumulation of cases.

How seductive the tools are! Thorp showed us visualizations that cost him half an hour, which I would be obnoxiously proud to present after a month’s solid work. Inspired by Thorp and by a Hackfest project, I managed to make a little progress using Google Refine and Google Fusion to come up with a GIS visualization based on the 1926 volume of the Henderson city directory for Edmonton. Building on work Tricia Williams (now Jenkins) did for us some years ago, I had a dataset of names and street addresses roughly parsed from the raw OCR text of the directories. Out of 30,000 entries, 11,000 were parsable from the OCR. They were grouped by city block and projected onto the current Google map of Edmonton. At first a jumble of red dots show the blob of built-up areas; as you zoom in they resolve into individual dots, which in turn represent groups of identifiable individuals living on the same street: people with names and occupations, more or less truthfully represented in the Directory, and equally or less truthfully represented in my dataset, with all its OCR errors and parsing errors. The Murphy family, Arthur and Emily, are there, a year before the launching of the Famous Five appeal that established that women are “persons” under Canadian law.

That evening in my room I wanted to follow up on an email exchange with my younger brother with a link to one of our grandfather’s articles, from the book that was recently digitized by Internet Archive through the generosity of Rick Prelinger. I found that I couldn’t get access: the Selected Papers had been placed in the Lending Library collection and was flagged as unavailable for borrowing. Rick spotted my mournful tweet and worked his contacts at IA to have it fixed within a couple of hours.

Tweet: @pabinkley Drat: Internet Archive has put RCB's Selected Papers in lending library collection. Glad I downloaded text and images. #alwaysbackupthecloud

The piece I wanted was “History for a Democracy”, given as a closing keynote at the Minnesota Historical Society’s conference in 1937. Binkley criticizes the doctrinaire historical projects of the fascists and the communists:

Now it is the weakness of this kind of history — whether it be written for the church, the nation, the communist society, the fascist state, or even the federal democracy itself — that it stands at the mercy of objective criticism. The faithful following of the technique of historical investigation may at any time overturn elements of the story that stand as essentials in the use that is being made of it. Objective investigation may prove that the world was not created in 4004 B.C.; that the most important developments on the European scene were not the special experience of any one nation, but were shared in common by many peoples; and that the continuity alleged to be found in the life of a nation from the remote past to the present day is illusory or incidental. The communist interpretation of social evolution and political events may not be sustainable in the light of an objective criticism of the evidence, and the fascist or nazi interpretations may also go to pieces under criticism. Nor is the historical interpretation which has nourished the spirit of democracy immune. The bold conceptions of Freeman and Stubbs on early German democracy have already been relegated to the junk heap of discarded historical syntheses.

If we undertake deliberately to nourish our own institutions on a history of this kind, made to order for this purpose, we may find ourselves confronted with the tragic dilemma that the mission of our history cannot be served without abandoning the scientific historical method itself. And this would be particularly fatal to democracy, because democracy more than any other kind of government needs to sustain free investigation and criticism of everything. A myth that will not stand criticism must ultimately be protected by force. And an interpretation of history that one is not permitted to doubt and criticize becomes ipso facto an interpretation that one cannot sustain and prove.

In defending the role of history and free inquiry in a democracy, he zooms down to the personal level, both for the content and for the practice of history:

It took us several generations to build up the corpus of published material, to make the critical studies, to collect the bibliographies, to organize the knowledge from which our present historical writing is documented. Our Ph.D.’s move sure-footed through this material. If I want to work on the Clayton-Bulwer treaty, I know where to look for the material, and I can begin where the last scholar left off. But if I want to write the history of my family, or of the school district in which my son is going to school, I find nothing prepared for me. It will take us several generations to adapt and complete the documentary equipment for the writing of family and local history. It took us several generations also to train the army of scholars in the tradition of the craft. It may well take us several generations to train every man to be his own historian.

More and more, I think “democracy” is the word I’m looking for as the foundation of the values I want to buttress with my work.

Tweet: @shlew tweeted, “I love that #access2011 ended with inspiring talks by amazing women: @researchremix, @andreareimer, #eosadler.”

The final day of the most powerful. First was Heather Piwowar, who is working to establish the evidence base for the benefits of the open-data model in scholarly communication. She grounded the need for peer review of scientific papers on the same principles as collaboration in open-source software development: 50% of published papers contain errors in their use of data, and 5-10% of those errors affect the research outcomes. Linus’ law applies: only if the data is available for review can those errors be corrected. The current model of on-request sharing doesn’t work: many researchers report that they’ve had requests denied, and the young and the productive are disproportionately represented in that group. On the other hand, the science done on the basis of reused datasets increases the return on the investment in the original research. Her current project is to track 1000 randomly-selected datasets (100 each from 10 repositories) through the published literature. The roles she proposes for libraries including hosting repositories and also advocating and educating among faculty, publishers and the public. It was great to see the solid evidence she’s gathering to back up our intuitions about the value of open access.

During the introduction of the next panel I was diverted by Brian Owen’s trivia question about the “Dukes of URL”, Dave’s band which played at the 1996 Access conference.

IMG_2621
Nick (photo: BigD)

There was a correction from the floor: Owen had referred to 1996 as the first Access, but it was only the first in Vancouver, third of that ilk. I’ve wanted for a long time to assemble a history of the conference, and a couple of years ago I started gathering links and asking for materials; Art Rhyno sent me some early programs, which I still have. I let the project languish but now seemed like a good time to restart it. A wiki! No, a Google Doc! I started one and shared it out to a new Google Group “Access Conference History” (join us!), and lent only half an ear to the following session on Evergreen as I and Nick Ruest and others filled in the chronology of the conference since 1994, identified surviving conference websites and excavated others from the Wayback Machine, linked to Flickr sets and blog postings, and sketched in a few memories of specific conferences. Shared editing of a Google Doc is incredibly exciting for the immediate visibility of the collaboration: new text from other collaborators crawls across the screen in the paragraph above the one you’re writing, new coloured cursors appear as people join in. By the end of the session we had a solid framework.

The next session dealt with proprietary and open-source software. Bess Sadler described the Hydra community and what makes it work. The strength of the collaboration depends on building trust by, paradoxically, limiting the areas where trust is needed. If we have software tests, we know without trusting whether a given commit broke a given feature (and Hydra’s rule is “no code without tests”). Uncertainty is reduced, and the team can get on with building without wasting time and emotional energy assigning blame. I live with too much uncertainty (of my own devising, given my lousy documentation habits); I need to make Bess’s approach work in our environment.

Tweet: @eosadler Collection, analysis and policy response to data is gvmt's job, so recognize data as a vital public asset. #Access2011 #VancoverIsAmazing

The closing keynote was Andrea Reimer, a city councilor in Vancouver and sponsor of the city’s open data initiative. She’s in the middle of an election campaign and was a little late because a debate ran overtime; fortunately the conference program was running a little late as well.

Reimer’s talk took us back to the theme of democracy. Her principles: all people are equal; all people have the ability to reason. She allowed a 3% rule: in every hundred people, three are assholes who need to be worked around, but the rest, however much you disagree with them, are at least open to reason. From this it follows that people can make good decisions, and therefore democracy can work; but the public good depends on giving people good information on which to base those decisions. (This took me back to a photo Eric Hellman took at the Rally to Restore Sanity in Washington last year: a smiling woman carrying a sign that read “Librarians for Informed Opinions”. It’s the best brief statement I’ve seen of the role of libraries in a democracy. If you know who she was, please let me know!) Reimer described the process she and a few others initiated to start Vancouver’s open data initiative. It was the flagship for this movement among Canadian cities. Edmonton is now right up there among the leaders, I’m proud to say. I recently attended an organizational meeting for the Edmonton Pipelines project, which is bringing together GIS-based research in the Edmonton area, including the city’s open-data folks. They’re data pipelines: it’s about “maps and narratives for dense urban spaces”. The library is providing scans of historical maps, which we hope to have georeferenced and mounted in the Hypercities platform.

I was stirred to get up and ask another political question (what’s got into me?), or rather the same question: how can we carry this commitment across partisan boundaries in these polarized times — when the preponderance of dangerously uninformed opinions are on the right, across the divide from us? Reimer’s response was wonderful: I’m not religious, but if I had a religion, it would be the Age of Reason. Except for that 3%, we’re all open to rational argument and we just have to keep making the case on the best foundations we can.

We had heard that David Suzuki was to address the Occupy Vancouver camp a couple of blocks away at 1:00, so after the conference closed several of us hurried over to hear him. A general assembly was starting when we arrived, and it was necessary to pass a motion to let Suzuki speak. We were therefore introduced to the remarkable rules of order of the Occupy movement. When we arrived various people were introducing themselves from the stage, using the amplified public address system. They spoke in short spurts, and the crowd repeated their words. This turned out to be the “human microphone”, and it was used whenever anyone needed to be heard, either from the stage or from the crowd. People responded to the proceedings with a set of gestures, whether registering votes or just responding to what was being said. Jazz-hands means agreement, thumbs-down disagreement, hands above the head with fingertips touching a point of order, and arms folded means a block: the blocker is prepared to leave the movement if the motion is passed. This triggers further discussion and even dividing the meeting into break-out groups, until consensus (set at 90%) is reached.

We were enthralled; “This is great!” I said to Geoff Harder, “We should use it in Library Council meetings!” The motion to allow Suzuki to speak was blocked by someone off to the left. Through the human mic we learned the objection was that this was an egalitarian movement, and no one should be allowed priority of speech for mere celebrity. The motion was reworded: Suzuki wasn’t being allowed to speak first at the meeting, the meeting was being suspended to allow him to speak according to his invitation. Another vote, another block. “Yeah,” said Geoff, “this is just like Library Council.” “Are you prepared to leave the movement over this?”, asked the moderator, and after more discussion the block was withdrawn. The motion passed, and Suzuki was allowed to speak to us.

He gave a great twenty-minute speech. I recorded it on my phone (and as my version of the human mic I’ve posted it on Facebook). Suzuki covered all the themes you’d expect on such an occasion: pro-environmental, anti-corporate, anti-globalization, anti-consumerist, pro-democracy, pro-biodiversity, taking the long-term view. And he touched on information as a social good:

For five years the prime minister of Canada has never acknowledged the reality of human-induced climate change, or that Canada is the industrialized nation most vulnerable to its impact; and now he’s cutting back on scientists in Environment Canada and research on climate, so that we don’t have to listen to the facts. (6:15)

A myth that will not stand criticism must ultimately be protected by force, but often subterfuge will do. There were jazz-hands all around, mine included. It was the social vision Reimer had given us, of involvement, discussion, consensus, rationality, but stripped of the technology: just people.

After the speech we grabbed some lunch, and I split off to go to MacLeod’s books, the insanely jumbled second-hand bookstore at W. Pender and Richards. Just the place to settle your nerves after an overstimulating day. I was there for a couple of hours.

Share photos on twitter with Twitpic

Then, still with time to kill, I went to the Vancouver Public Library, which looks like the Coliseum reoccupied by information freaks (I saw serried ranks of sticky notes stuck to a second-floor office window, planning — what? An event, a collection, a website?). I found a desk with an electrical outlet and tried to sort out the relics, digital and physical, of the week: tweets, photos by and of me and others, a few emails and Facebook updates, the video I took of Suzuki, a panorama I’d made with my phone and posted to Occipital (who owns them? How long will they take care of it?), the flyer from the Occupy meeting, the hotel bill, the books, the presents for my family in Edmonton.

All these zooms to visualize:

  • From the personal level to the community level and back again: as RCB proposed anchoring the project of history in expanding concentric circles rippling out from the individual, I want to see my digital work rooted in personal archiving practices that mesh with institutional and social movements
  • From the past to the present to the future: recovering old stories, preserving present experience so the stories it can tell will still be heard in when I’m no longer there to hear them
  • From information into action and back: guiding public policy with good information, in this age when we see how far wrong public policy can go when information is ignored or lost or hidden

I think those are the dimensions we work with.




We need a programmer
Tuesday 4 November 2008 @ 4:12 pm

MPOW has just advertised a new position for a programmer-analyst to work on the new Digital Initiatives Team. The person will be developing and extending our new Fedora deployments, and working on the digital preservation infrastructure that we’re putting together, based on the STK-5800 (Honeycomb). Lots of interesting development work (Java), coupled with some operational responsibilities for the repository. Closing date: Nov. 18 2008 (just two weeks away).

If you’re a Java programmer interested in digital libraries, please have a look; or if you know one, please pass it on.




Access 2008 day 1
Thursday 2 October 2008 @ 8:33 pm

Brief thinklets from the first day at Access:

Karen Schneider’s keynote and Dale Askey’s talk on why we don’t share more code spoke to the ongoing professionalization of coding practices in OS development in the library world. We have a better sense that this is what we do, and we see the way forward toward learning to do it better. There was a good comment from an audience member about how the use of Subversion made him more careful about preparing his code (since it’s hard to delete once it’s in the repository), which automatically makes it easier to share.

Prominence of Evergreen: it was the subject of KGS’s keynote, and two of the three hackfest projects we heard about; and there’s more coming.

Prominence of Solr: it’s part of everyone’s toolkit.

Mark Leggott is so organized that he has postcards at the registration desk advertising next year’s Access at UPEI.

Best joke (Dale Askey, Kansas State): speaking at a Canadian library conference officially gives him enough foreign-policy experience to run for vice president.

Running observation: librarians are more worried about users’ privacy than users are.




Older Posts »