On, cried the leaders – the charge of the self-published

August 30th, 2010

Paul Carr has an excellent post on Techcrunch regarding self-publishing and being damned. I agree with him in his analysis that this is going to be certain career suicide for the less famous author. Seth Godin has a following that means he has a market and I suspect that a fair amount of the followers who try self-publishing do not. Not to say they cannot get it, just that they may not have it already.

A friend of mine who is in publishing is thinking of quiting citing that it is horrible at the moment. I’m trying to get back into it as an author (having worked in sales and  marketing for a small publisher and as a bookseller) and have just submitted a manuscript to  an agent. Whilst I see the model for publishing changing (but quite into what I don’t see yet), I see publishers as important but I do wonder whether the time of large publishers might be temporarily up. They might need to either swallow up smaller publishers (like Harper Collins or Rebellion), become huge (Hachette with Random House, Orion and Headline), or perhaps get smaller or disappear (nobody I can think of yet).

In her Guardian article, Ursula MacKenzie argues that publishers do play a role. Whilst she might be overstating the case, there is room for the traditional publisher but as I’ve argued before, that role will change and the type of book published will probably change.

Self-publishing might work for some people but not for others.

Weeknotes: Ubuntu, messaging and Open Correspondence

August 29th, 2010

It has been a while since the last weeknotes. I’ve finally made the move to Linux, or at least dual booting, by installing Ubuntu so I’m currently learning a little the OS and getting a development environment set up for it.

I’ve nearly finsihed the ongoing accounts project at work. The framework is up and it went through testing over the last couple of weeks. There are a few rough edges and some bugs which still need fixing but it largely seems to be there now.

I’ve also installed the first part of a messaging server written in PHP (taking ideas and concepts from JMS and Python’s Routes for service urls) which takes a message from the core CMS system and routes them to the correct service using SOA. If there’s an issue with the service then it logs it and queues the message using Redis (athough an array might be quicker, I wanted the queue decoupled from the server if it failed or had to be restarted and the memory was wiped). I need to finish up the worker to dequeue at certain points in time but it is expected that I’ll get it finished in about four days once I’m back at work.

I’ve done one or two things on the Open Correspondence site as well. I’ve tidied up the source XML and the sources XML as well to expose them so I need to update the site itself. The next thing I think we need to do is to start writing stuff to expose the underlying data and to show what you can do with the data. One of the things that I want to do is to write a function which I can put behind either Protovis or Javascript Infovis Toolkit to convert a SPARQL query into the relevant JSON and I’m thinking of using Lee Feigenbaum’s sparql.js script. Quite possibly I need to write some sort of API to the dataset to allow other queries to be run.

My friend, Simon Biles who owns Thinking Security, and I have been talking about a Knowledge Management project which is slightly aligned with some stuff I’ve been thinking about storing research pages for RSS and web pages. He’s thinking in terms of MS Office documents which means a little investigation into the various types of structured storage in Office and the ways that Office has changed to mine different types of documents. It does appear at first glance though that newer versions of Office and Open Office are similar in terms of finding the metatadata being collections of XML documents in an archive.

Never ending death of the book

August 15th, 2010

Devin Coldewey has an intriguing post over on Crunchgear regarding the Google Books project. Google have digitised some books. Just one or two. Like many other people, I find the project useful for finding information and books I’d never come across or lost somewhere. Sometimes I’ll buy the book, sometimes I just need a bit of information and sometimes the preview is enough to persuade me not to part with cash.

On the other hand, Nicholas Negroponte has determined that the book will be dead . Using the Amazon data that e-book sales for the Kindle surpass physical book sales, he reckons that within 5 years, the physical format will no longer be the dominant format. He uses the data from music to justify this and to a certain extent, he is correct. I do see niche publishing, like high end Science, Technical and Medical publishing, going online and perhaps mass market publishing will grow faster online. But to go back to the music analogy, vinyl was going to be replaced by CDs.

Not entirely. For sure vinyl was not the dominant form anymore, it became its own niche but with a loyal fanbase.

I suspect that books will be the same. Publishing is going to change dramatically in the next few years whilst houses try to find various different models. Not all will work for all; each will have to choose and determine their own path. I still think that there will be  a vibrant publishing industry but it will be smaller and more specialised. According to the Bookseller some time ago, the average earnings for an author from books were about £4,000 a year. This implies that authors either starved, lived extremely frugally, had partners in supporting jobs or have  / had second jobs. A reminder of authors before the massive growth in literacy and I see it happening again. There will, of course, be some authors who can support themselves through writing. Some won’t and will work in other jobs during the day.

So perhaps we come back to Cory Doctorow’s observation that the obscurity is the thing to avoid:

That’s because my biggest threat as an author isn’t piracy, it’s obscurity. The majority of ideal readers who fail to buy my book will do so because they never heard of it, not because someone gave them a free electronic copy. (‘Why Publishing Should Send Fruit Baskets to  Google’, BoingBoing, 14 Feb 2006)

The less obscure an author, the less chance of the book / oeuvre disappearing.

I have to agree with Coldewey that as e-book / readers become cheaper and more prevalent, then physical books will become more luxurious items.  For sure. The mass market will change and it is up to readers and writers to make choices and go with it. So back to a more nineteenth century book culture again.

However the book will not die. It’ll change but treeware will survive in some form.

Creating bibliographic resources from web pages

August 15th, 2010

Given the increasingly digital nature of research, including not only websites but blogs, forums, wikis, the (in my view), beloved moleskin is becoming increasingly outdated.
I’ve just finished writing my first book and had the joy of using moleskin notebooks to note down urls and make notes. I like moleskins a lot but pen and paper does have its limitations when searching. I also bookmarked pages but changing computers has lost a few of these.

I’m just starting the research on a new book and looking around for any open source / free software to capture a url, mark it with the time accessed (for later bibliographical purposes), capture the raw HTML, and possibly allow me to tag it for folksonomical reference if I want. What would be sort of cool is to have an interface to share the results later or just post an XML / RDF file to be posted later.

I suppose what I essentially want to find is something along the lines of a moleskin for electronic notes? I can see various subscription services listed but I really want something on the desktop to create  a relevant project archive to later share. Potentially this does add to the issue of lots of mini-silos by creating more but if , in Bibliographica style, they could be linked or linkable, I think it could be an interesting way of sharing research links or allowing bodies to create a meta-frame calling from the shared resources.

I think that this falls into the realm of archiving, which poses issues in the UK, especially when it concerns commercial sites as my reading of the consultation has it. Wired UK has an article on the issues of archiving web sites in Britain and the legal difficulties therein. The British Library has been working on an archive (including some from shops no longer extant) but can only archive the site if the copyright holder has given permission. Even the consultation paper (itself archived now) is vague on this.

Ultimately this will hobble research if ways of noting and sharing the relevant data and metadata cannot be found to allow sharing and relevant notation. It would also mean that I’m left to the vagaries of my browser or remembering to make a note of the link in a new moleskin.

Building something along the lines of what I want might create a tool which other people might find useful.

Weeknotes: Talks, Open Correspondence, XMPP

July 25th, 2010

I gave a talk at the Oxford Geek Nights about Open Correspondence and letters. At some point I really ought to learn how to give talks. Anyhow Russell Davies was the main speakers and he showed how you could make physical objects from data derived from social networks. (He has a marvellously sane post about the Raoul Moat facebook page.) Anyhow its gathered some people who are interested in contributing. Now I’ve finished the book, I’ve got more time to make changes to the codebase which urgently needs it. Finishing off stuff really. Then making the real changes.

Accounts has been slightly on hold since the wages needed to be run and I didn’t see that accounts or operations would be happy with fugures potentially changing.

The main project has been setting up a notification service to set up the service layer correctly. I’ve finally got the server working so I’m just building a framework. I thought of porting parts of djabberd projects into PHP but I’m just  looking at parts of it but XMPP is certainly a useful tool in getting machines to speak to each other and to develop event driven services.

Finding a space for NoSQL

July 20th, 2010

ReadWriteWeb have a post on NoSQL (again?) by Audrey Watters which is a brief overview of the area.  The original post points the Heroku blog, where Adam Wiggins outlines the uses of NoSQL. I’m not an expert by any means but use Redis on a daily basis with the Rediska PHP library. I remember having an argument with the IT director when I originally proposed using Redis but I’m glad that the gamble has paid off. The caching system that uses is now far more productive than the earlier version.

Our base is database is MySQL which I like a fair amount for what we do with it but all I needed do was to cache some data. The scripts write a fair amount of data to the cache and then there is one read process to read the entire list before updating the main database. At least I know that the data has some sort of security. It is not a panacea or similar cure all but it does have a place in development for certain jobs.

Best tool and all that?

I can understand why Twitter are not using Cassandra in the main service but are still using it for other projects.  For now. Systems and priorities change and perhaps it will happen in some way.

Despite its meteoric rise, NoSQL is not the answer to everything. It does have a useful place though.

BBC’s use of Semantic Web technology in World Cup

July 13th, 2010

Just caught this story on ReadWrite Web about the BBC website’s use of semantic web technology during the World Cup.  Jem Rayfield explains more on the BBC Internet blog about the use of technology.

I’ve still got a fair amount of reading to do but this is the sort of project that makes me rethink the Open Letters project and how it could be used by other sites. It has also given me food for thought for work as well.

Weeknotes: documentation, prototyping and cats

July 11th, 2010

I’ve spent most of the week either trying to persuade colleagues that rewrites are needed to existing services. I’ve also finally managed to get the initial promise of working from home so hopefully I’ll be able to get the rewrite started on the “quiet” days away from the office. (Although the cat can drive me nuts before she goes to sleep at 10am).

Still working on the accounts project which keeps unravelling a series of underlying problems. Most of them we know about but they appear in all sorts of odd places.

Assuming the world doesn’t fall on my head next time I’m in the office, I’m going to try and spend the day at home on a “Fedex” day. I’m taking the notion from an issue of Wired where they were talking about different ways of working and Atlassian mentioned “Fedex” days where you spend a day building a prototype. What I’d really like to get prototyped is the service bus / queuing system. So fingers crossed.

The impetus came from updating the disaster recovery documentation and writing the first department of the service status documentation (which I wrote after getting the last bit of debugging finished). I know that documentation is not everybody’s favourite thing but I find it useful in rethinking the system and making sure it fits together.

I’ve made time to rewrite the load function for Open Letters. I’ve got the document building the letters in XML and written a rough upload script. Next task is to rewrite the main.py script, test the XML loading and then finished tidying up the initial document.

I’m also looking forward to Textcamp so it’ll be great to get the load finished (as it normalises the function) and get on with doing a presentation for the camp.

I’m also coming to end of writing my book on children’s fantasy. Whilst not technical in an IT sense, I’m thinking of the next project on the New Weird and how to use IT to visualise influences and timelines. The one that worries me is archiving necessary web pages for the research which I need to look towards as I’m not sure whether it is technically illegal.

Weeknotes: maintenance, and Dickens

July 4th, 2010

It seems to be maintenance season again.

Still carry on with the accounts systems and doing some work to those systems for most of the week. It is a slow job but I would rather spend time getting it right rather than rush something ut and spend the next year patching it because we rushed it rather than a need changed.

The rest of the week is spent either developing some new functionality for the admin department or thinking about revamping the existing services. Most of them are fine but a lick of paint and some further optimisation to take care of unanticipated needs wouldn’t go amiss. I suspect that maintenance isn’t high on most developer’s agendas but in a moving and growing company, some systems begin to be left behind when their use either changes or the company outgrows the service. The challenge is trying to minimise user frustration whilst getting the new version out and finished. Mm, time to work on the ‘soft skills’ of people methinks.

In the meanwhile, I’ve created an XML file of letters of Dickens and now am just changing the loading script for Open Correspondence so that there is a normalised way of loading the data into the database. The fact that everything was predicated on one file was annoying me so I made the time to change it. The next thing is to dive back into TEI lite and rework the file so that it fits into an already definted schema. (I don’t see there being any point in this case in trying to create something new as it should be unnecessary.)

Weeknotes: All quiet on the accounting front

June 27th, 2010

It’s been a week of relative frustration with priorities suddenly being shifted and the infrastructure road map looking more and more unclear.

The soap server is largely debugged and ready for more extensive testing on the server and the back end has now been rewritten to capture more data. I cannot help feeling that it will change once more services go online to scale more efficiently but right now I don’t have the expertise to do it. I’ll get there.

On a different tack, I’m back on the accounting project that I was on several months ago and making some headway in that. Its grown since I was last involved in it but nothing that a decent set of specs and roadmaps cannot solve in terms of making it manageable.

I’ve been thinking about my next book project which is on the New Weird and genre over the last 15 years and wondering how to use dbpedia’s influencedBy and influence terms in terms of showing how writers influence each other over a century. I’m tempted to put the data into a large rdf sheet and then use javascript or PHP to transform it into JSON to see if you can use the Simile timeline software usefully or if I need to find / write something more appropriate. It does have to wait for me to finish the current book.

I forgot to link to the Open Correspondence blog post on the Open Knowledge Foundation’s blog which was posted a few days ago.