Archive for July, 2008

The Guardian takes two on piracy

Thursday, July 31st, 2008

The Guardian have a couple of articles which have a relevance to the notion of creative openness. Cory Doctorow extends the copyleft argument to the recent agreement between ISPs and the BPI whilst Keith Stuart explores how the games industry have dealt with piracy.

Cory Doctorow’s article uses the recent agreement between the ISPs and the music industry to point out that the real criminals will now find other outlets and go deeper underground, presumably further developing their own darknet for filesharing. All this agreement will do is to annoy/hack-off the very people who may share some tracks but go and buy a download later or go and do something creative with it. If the music industry (and quite possibly the film industry) were bothered with creativity and making a viable industry in the future, then they would be developing platforms and getting involved at the grass roots level.

Indeed this is what the flash games industry is doing for itself, according to Keith’s post on the Gamesblog. Rather than trying sue a set of shadows, they have explored ways of making these some money (not much unless you’re really popular) from the associated revenue streams, such as advertising or in-game items/levels. If a game is pirated, the creator can still get some revenue for themselves through these mechanisms.

It boggles the mind how one industry can so clearly get it and work with it, whilst another stumbles aimlessly around trying to justfiy its current existence.

However, if you go sideways, there are some intriguing parallels to this. Freeing data and knowledge sets allows an individual to come up with and explore new ideas. It also means that potentialy some revenue will be lost if there are charges involved. Well its going to happen any way but one might as well accept this and work on ways of making the original source more appealing and useful.

Storing chat and SMS – is it possible?

Wednesday, July 23rd, 2008

A thought. Given the amount of IM and chat clients, how do we store any knowledge across that is being transferred? Is it be lost or can you “dump” the logs for later use?

A similar thing must be happening with SMS. I would have thought that the providers store these but can we get hold of them? Are there interfaces to dump the information for personal use or is it only in companies data stores?

Storing data from blogs and wikis

Wednesday, July 23rd, 2008

Insitutional repositories already exist to store abstracts and documents. I was wondering if any of these have a way of storing blog posts or wiki pages and identifying their states; i.e. if a user was looking at a wiki page, they could see and archive edits to find its history.

Whilst wikis do this as the talk page, would you then need to store the edit data differently as drafts inside the repository so that future users can immediately identify the changes and either inspect or ignore them? Would repositories need to develop their own blog search or leverage Google’s BlogSearch and Technorati?

Open Web Foundation to be announced

Wednesday, July 23rd, 2008

Chris Saad has announced that the Open Web Foundation is being set up to aid in the governance of data portability technologies on his blog.

The Data Portability group has done a sterling job in evangelising and ensuring that their ideas are on the roadmap. The data silos are gradually being brought together (though I wonder if services like JISCmail and insitutional repositories should also be joining in).

I truly hope that the new Foundation is receptive and also welcomes the academic “market” and conitinues and extends efforts to leverage the web as an open platform for sharing.

Thursday, July 10th, 2008

Bobbie Johnson has interviewed Tim Berners-Lee for the Guardian about the new subject of web science – study of how the Web works and the way it works. Both MIT and the University of Southampton are championing the Web Science Research Initiative.

As the article says, the Web needs to remain free and open if it is to achieve its potential and to avoid being broken up or controlled by repressive regimes.

UK government asks “Showusabetterway.co.uk”

Thursday, July 10th, 2008

The Guardian reports that its Free Our Data campaign took another step closer to its goal today. Tom Watson, currently the Cabinet Office minister, is one of the forces behind a competition with the first prize of £20,000 for the best use of non-personal public data available through Showusabetterway.

Open Service Definition

Sunday, July 6th, 2008

The Open Knowledge Foundation are bringing the Open Service Definition to version 1.0 which is a helpful step. I wholeheartedly agree with it. As services and APIs develop, we need to create a legal framework within which data, knowledge and dissemination services can be used to allow greater access to open knowledge now rather than when silos have been built.

However I believe that it needs an addition to the first clause: freedom of data access.

Any methodology by which data has been transformed or is generated should be clearly explained so that, if necessary, results can be replicated. The transparency of this would allow commercial and educational users a greater confidence in the data presented.

Perhaps it is more along the lines of Open Knowledge Definition but I think it is an important point to make clear rather than leaving it implicit.

Getting vertigo retrieving information

Sunday, July 6th, 2008

Last week I went along to the ISKO UK seminar/event on Information Retrieval (IR) held at University College London.

Brian Vickery gave a talk about the first fifty years or so of IR.

Like any good event, I came away with loads to ponder. I’m still pondering some of my notes (I wish my handwriting as neater…)

Stephen Robertson of the Microsoft Research lab talked about where search was beginning to go and what was being explored by companies such as Microsoft and Google.

During the Q&A session, there seemed to be a theme of users using Google and Yahoo et al for quick references, such as three word searches. To some extent this is probably for the ease of the interface but…

What I really got out of this was an intriguing thought.

Google et al are good at general searches. They can find vast amounts of data quickly and easily and presnnt them via the algorithm to the user. Their search is horizontal.

Yet repositories can contain vast amounts of better search data than the search engines can create using controlled vocabularies, RDF, RDFa and so on. They have people writing the classifications for them who know the subject well (we hope) and can make more rational judgments than a search engine. So if repositories and data stores could come together and leverage their inherantly more detailed vertical search via XML and RDF interfaces to link to each other and allow the search engines rapid access to the relevant data. Their search is vertical.

The experts in the field will probably already know where the relevant stores are but casual and non-acadmic users will not. Nor are they likely to take them time to delve through advanced searches. We are time pressured. The vertical search engines may well not have the resources as our large search friends but a few adaptations should allow better access and also lever the knowledge into a more public sphere.

The Future of Knowledge?

Sunday, July 6th, 2008

I went to the Future of the Internet talk at the Oxford Internet Institute (webcast here) where Larry Sanger (Citizendium and Wikipedia) and Andrew Keen debated the where the Internet might go and how knowledge would develop.

Neither,  I think, really got into the argument but rather skirted the issues. Sanger’s argument for a more editorially driven Wikipedia, which is what he has created at Citizendium, was interesting. He is very much for the self selecting model but feels that it needs a guiding hand every so often by an elite.

Keen, on the other hand, attacked the ideals of collaboration, community and conversation. We do over use this as developers and at some point they will be subject to redefinition and re thought. I can see Keen’s point in the (ab)use of anonymity by a small minority of people on boards, fora, Wikis but despite his attacks, he offered no coherent view or redefinition of the 3 c’s noted above. Anonymity is useful but, like everything else, is open to abuse.

He’s correct in the sense that they are used too loosely to describe a service or website. Here’s the rub. I think that we are still at the beginning of the process. Services such as Wikipedia are demonstrating how the Internet can be used to collaborate in learning and education but they still have problems in their own communities, such as the various cries about editorial policy. The recent outcry over at BoingBoing regarding the unpublishing of Violet Blue’s posts. Whilst it is a blog and therefore subject to its own editorial, the outcry begins to go to the heart of the idea of community, how it is used and leveraged.

An earlier post on this blog from the Dilemmas conference suggests that this has not yet been fully argued and leveraged in the academic community. Repositories have to be linked not only to each other but also to outside, non-academic networks to become useful and keep universities and research councils are arbiters of knowledge and research.

Building data stores

Sunday, July 6th, 2008

Mats Dahlstrom’s talk at the Dilemmas of Digitization conference mentioned the Deep Sharing: A Case for the Federated Digital library paper by Daivd Seaman.

It would be great if there was a system for rapidly building small data stores from scratch to include texts and then have these with editing software components, text encoding output (RDF and TEI to share data easily electronically rather than expect users to have to re-enter key fields, such as bibliographic data).

Last weekend, I quickly hacked up a sample from Milton’s cry for free printing, the Areopagitica, and began to rdf some of the text. I think I’ve overegged the pudding as it were by adding SKOS (I was curious to see if you can adapt it to text documents but Dublin Core is a better fit). As I am using a just a few lines of text, I didn’t use the Rdf Api for PHP but hacked up a quick template using  a database behind it. I’ll be looking to re-write this at some point soon (as I will with the beginnings of an alternate spelling database to show that you could use SKOS to highlight any alternative or misspellings in a text).