Posts Tagged ‘open_correspondence’

Weeknotes: Open correspondence

Sunday, September 5th, 2010

A quiet week as I’ve been having a few days off but I’ve been working on some of the tickets for Open Correspondence.

The urls have changed to /letters/view/<author>/<correspondent>/<letter id> in an attempt to make them more user friendly and also to allow the user to define smaller or larger collections by altering the url. There is also some basic content negotation to expose the data in json, xml and rdf as well as html.

I’ve been trying to use the linked data api specs to make that the site operates as a service as well. Multi-tasking really.

One of the next things to do is to develop graphical features for the correspondent pages and to sort out the RDF with rdflib. Think I’ll try to do that next week though.

Weeknotes: Ubuntu, messaging and Open Correspondence

Sunday, August 29th, 2010

It has been a while since the last weeknotes. I’ve finally made the move to Linux, or at least dual booting, by installing Ubuntu so I’m currently learning a little the OS and getting a development environment set up for it.

I’ve nearly finsihed the ongoing accounts project at work. The framework is up and it went through testing over the last couple of weeks. There are a few rough edges and some bugs which still need fixing but it largely seems to be there now.

I’ve also installed the first part of a messaging server written in PHP (taking ideas and concepts from JMS and Python’s Routes for service urls) which takes a message from the core CMS system and routes them to the correct service using SOA. If there’s an issue with the service then it logs it and queues the message using Redis (athough an array might be quicker, I wanted the queue decoupled from the server if it failed or had to be restarted and the memory was wiped). I need to finish up the worker to dequeue at certain points in time but it is expected that I’ll get it finished in about four days once I’m back at work.

I’ve done one or two things on the Open Correspondence site as well. I’ve tidied up the source XML and the sources XML as well to expose them so I need to update the site itself. The next thing I think we need to do is to start writing stuff to expose the underlying data and to show what you can do with the data. One of the things that I want to do is to write a function which I can put behind either Protovis or Javascript Infovis Toolkit to convert a SPARQL query into the relevant JSON and I’m thinking of using Lee Feigenbaum’s sparql.js script. Quite possibly I need to write some sort of API to the dataset to allow other queries to be run.

My friend, Simon Biles who owns Thinking Security, and I have been talking about a Knowledge Management project which is slightly aligned with some stuff I’ve been thinking about storing research pages for RSS and web pages. He’s thinking in terms of MS Office documents which means a little investigation into the various types of structured storage in Office and the ways that Office has changed to mine different types of documents. It does appear at first glance though that newer versions of Office and Open Office are similar in terms of finding the metatadata being collections of XML documents in an archive.

Weeknotes: Talks, Open Correspondence, XMPP

Sunday, July 25th, 2010

I gave a talk at the Oxford Geek Nights about Open Correspondence and letters. At some point I really ought to learn how to give talks. Anyhow Russell Davies was the main speakers and he showed how you could make physical objects from data derived from social networks. (He has a marvellously sane post about the Raoul Moat facebook page.) Anyhow its gathered some people who are interested in contributing. Now I’ve finished the book, I’ve got more time to make changes to the codebase which urgently needs it. Finishing off stuff really. Then making the real changes.

Accounts has been slightly on hold since the wages needed to be run and I didn’t see that accounts or operations would be happy with fugures potentially changing.

The main project has been setting up a notification service to set up the service layer correctly. I’ve finally got the server working so I’m just building a framework. I thought of porting parts of djabberd projects into PHP but I’m just  looking at parts of it but XMPP is certainly a useful tool in getting machines to speak to each other and to develop event driven services.

BBC’s use of Semantic Web technology in World Cup

Tuesday, July 13th, 2010

Just caught this story on ReadWrite Web about the BBC website’s use of semantic web technology during the World Cup.  Jem Rayfield explains more on the BBC Internet blog about the use of technology.

I’ve still got a fair amount of reading to do but this is the sort of project that makes me rethink the Open Letters project and how it could be used by other sites. It has also given me food for thought for work as well.

Weeknotes: documentation, prototyping and cats

Sunday, July 11th, 2010

I’ve spent most of the week either trying to persuade colleagues that rewrites are needed to existing services. I’ve also finally managed to get the initial promise of working from home so hopefully I’ll be able to get the rewrite started on the “quiet” days away from the office. (Although the cat can drive me nuts before she goes to sleep at 10am).

Still working on the accounts project which keeps unravelling a series of underlying problems. Most of them we know about but they appear in all sorts of odd places.

Assuming the world doesn’t fall on my head next time I’m in the office, I’m going to try and spend the day at home on a “Fedex” day. I’m taking the notion from an issue of Wired where they were talking about different ways of working and Atlassian mentioned “Fedex” days where you spend a day building a prototype. What I’d really like to get prototyped is the service bus / queuing system. So fingers crossed.

The impetus came from updating the disaster recovery documentation and writing the first department of the service status documentation (which I wrote after getting the last bit of debugging finished). I know that documentation is not everybody’s favourite thing but I find it useful in rethinking the system and making sure it fits together.

I’ve made time to rewrite the load function for Open Letters. I’ve got the document building the letters in XML and written a rough upload script. Next task is to rewrite the main.py script, test the XML loading and then finished tidying up the initial document.

I’m also looking forward to Textcamp so it’ll be great to get the load finished (as it normalises the function) and get on with doing a presentation for the camp.

I’m also coming to end of writing my book on children’s fantasy. Whilst not technical in an IT sense, I’m thinking of the next project on the New Weird and how to use IT to visualise influences and timelines. The one that worries me is archiving necessary web pages for the research which I need to look towards as I’m not sure whether it is technically illegal.

Weeknotes: All quiet on the accounting front

Sunday, June 27th, 2010

It’s been a week of relative frustration with priorities suddenly being shifted and the infrastructure road map looking more and more unclear.

The soap server is largely debugged and ready for more extensive testing on the server and the back end has now been rewritten to capture more data. I cannot help feeling that it will change once more services go online to scale more efficiently but right now I don’t have the expertise to do it. I’ll get there.

On a different tack, I’m back on the accounting project that I was on several months ago and making some headway in that. Its grown since I was last involved in it but nothing that a decent set of specs and roadmaps cannot solve in terms of making it manageable.

I’ve been thinking about my next book project which is on the New Weird and genre over the last 15 years and wondering how to use dbpedia’s influencedBy and influence terms in terms of showing how writers influence each other over a century. I’m tempted to put the data into a large rdf sheet and then use javascript or PHP to transform it into JSON to see if you can use the Simile timeline software usefully or if I need to find / write something more appropriate. It does have to wait for me to finish the current book.

I forgot to link to the Open Correspondence blog post on the Open Knowledge Foundation’s blog which was posted a few days ago.

Weeknotes: PHP, SOAP, and Open Letters

Sunday, June 20th, 2010

It has been a fairly quiet week with the boss away. I’ve managed to complete a service to upload details from spreadsheets sent via email.

I’ve also managed to complete a SOAP service in PHP to listen for status updates and just doing the final tests to it now. Once its up it can be repurposed for other companies. One of the things that I think  will come up is how to store XML files most efficiently as MySQL 5 appears to be tied to uploading files rather than just taking POST strings. I’m thinking of using something like Oracle’s BDB XML database (though the license appears to preclude our uses) or eXist but that is something to come back to much later.

I’ve been thinking about the Open Correspondence site and the best way to allow it to be extended by other people. I think that the best way forward to create an internal XML format which the load command can use and anybody can use to create their own files and databases. Its along the lines of the stuff I partially did some work on in the Open Shakespeare project.

Given the boss is away, time for finishing more things off next week. I’ve also created a Trac instance for internal purposes but I think it’ll help on that bane if developing live – documentation.

Weeknotes: Data, service buses and trac

Sunday, June 13th, 2010

I’ve succumbed and I’ve got a microslot at the next Oxford Geek Nights where I’m talking about the Open Correspondence website. I’ve downloaded the rest of the Gutenberg copies of the Dickens letters but just need an evening to make some headway with transforming them.

I spent a fair amout of this week trying to get a status update server built using SOAP and PHP which has been an ‘interesting’ task but seems to have finally got there. Having done some debugging at home on Friday, I’ve got to test the whole thing on Monday on the test server.

I’ve also  been debugging the csv uploads into the database and refactoring the code so that there is more re-use of similar objects. On top of that I started the documentation for the services and realised that I’d written most of the upload service for invoices as well. Bonus… So all I need do really is to spend a couple of days finishing  things of at work so that the first versions of the services can go out.

Whilst doing all of that though, I realised that the queueing system that I was working on was only part of a solution to get all of our services working together. Instead of just queuing, I need to start thinking more along the lines of an enterprise service bus. So that’ll keep me busy then for a couple of weeks. My notebook has various notes and doodles, much to my boss’s enjoyment who thinks its all old-fashioned.

I’ve also started putting together a Trac instance for work to see if it scales and helps with ticketing and information acorss our department’s groups. It’ll probably be sidelined for this week whilst I try and get everything put together again with regards to the data uploads.

Weeknotes: Pylons, Python and printing

Sunday, May 30th, 2010

I’ve been doing some more work to the Open Correspondence website (which is now functional  thanks to Rufus Pollock’s help). In part I’ve been cleaning up the urls for the data controller (which is still coming along) and trying to tie the views in together. Being happier with Apache and PHP I spent some time looking for how to rewrite the urls until I came across Andre Kollel’s blog post about the internal workings of the middleware in the Pylons framework.  The more I do on the project, the more I learn about both Python and Pylons.

One of the next things to do is to reformat the dates into human readable format. I had thought of using Python’s datetime strftime to reformat the date from its current ISO format (YYYY-MM-DD) into day, month year. Unfortunately, the method states ” years before 1900 cannot be used.” A slight cramp in the plan. However there is an Activestate recipe by Andrew Dalke which might do the trick or at least point me in the right direction. It is one of the things to be tidied up at some point.

It is a good feeling to have the site running now. The next task is to write the tests and then  to refactor the code. It is very PHPish and needs to be made more Pythonic. I’ve got an idea for trying to create a dendrogram around the textReferred element and to discover the letters and correspondents around the books that Dickens was writing. One of the tings is to continue loading the other volumes of Dickens’s letters into the site. So version 0.2 is a little way off but the light at the end of the tunnel is not a train this time.

Workwise has been a little hectic. I must make some time to write a method to allow our admin team to resubmit applications. Like so many things it is a balance between a five minute job and the two hour ones that need to be done. The major job for the week though was getting the automated printing working.

One of the jobs that admin do is to go through each client and create the packs for them. Using HTMLtools, I’ve managed to compile the html into PDF and then convert the PDF into a PostScript file for a printer. I’ve managed to use the Line Printer Remote protocol to send the job to the printer. It is a simple enough command:

lpr -S <ip address/name of printer>  -P <name of print job> (-o <optional -o 1 sets file to binary>) <name of file>

Windows doesn’t appear to support the full protocol but enough to be useful. The -o switch appears to only define whether the file is binary or not rather than specifying the paper type and so on. Annoying but it can be got around.

Anyhow it got me thinking about other ways of using commands to explore how texts can be converted and changed into useful objects. It brings me back to the use of psbook for printing but how to make it useful for an average user who does not necessarily want to run various commands. Having had a conversation with my friend Darren Nash ,editorial director of Orbit books,  about the future of publishing; he opined that small presses would come to the fore. I think, certainly in genre that this is correct. It would be interesting to see how existing tools could be used towards these ends rather than constantly re-invent the wheel.

Now that the first version of letters is out the way, time to go over other projects. I’ve got a yen to try and create something from Milton’s Areopagitica, appropriate I think as it is a cry for free presses.

Weeknotes: Data mining, XML and bibliographies

Sunday, May 23rd, 2010

It seems to be have been a week of frantic completion and refactoring.

The first half was spent frantically converting html pages into PDFs using Verypdf’s HTMLtools server product. All in all the manual is very helpful and the test server could be set up quickly. It might have helped the other end if I’d remembered to break the file up for printing but that turned out to be a 10 minute jub to put back into production. The next task is to transfer it from the test server and onto the production one but that’ll need to wait for networking to tweak it a little.

I spent some time refactoring the call recordings archive. For some reason the archiving solution that I hacked up in November decided to start failing in March after it was changed. Despite being put back to its original state it never quite got back to working as it did. I’ve been trying to tweak it ridon and off but never found the time to complete it. I finally just made the time on friday afternoon to look at it properly. I’d been thinking about item based filtering after reading the first chapter of Toby Segaran’s Programming Collective Intelligence. (On the back of this, I think I’ll be buying his Beautiful Data at some point.)  Although this is not really an intelligent programme as such, the techniques have shown some real promise in the hurried tests. Using a Redis datastore, the percentage of found recordings is way up. Fingers crossed for Monday morning when I can see what the scripts run over the weekend. I also spent some time simplifying the matching algorithm so that I didn’t have to account for so many edge cases when dealing with time.

It seems that we are approaching some sort of real-time status update systems at work. I’ve sort of been arguing for this for a while to remove the bottlenecks of having each system dependant on another one. One of our suppliers is sending us XML data so I’ve been playing with Xpath 1.0 (since Xpath 2.0 apparently isn’t directly supported by PHP but there might be a way of passing the data to Java which adds unnecessary overhead) to extract the relevant values. Anyhow the core is running but I still need to fully test it and add in security.

I’ve also been asked to design and implement a queueing system for the main internal server. I’ve run up a quick high level overview but the detail still needs to be worked on. I’m pushing it back to June so that I can slear the decks of the older projects that are still on the board.

I had a chat with Jonathan Gray, a sound guy who does far too much, about digital humanities ideas. We’ve agreed to keep closer contact with each other about the area and to encourage each other into actually doing stuff (I have half a moleskin of ideas – time for more code, less talk then).  He proposed the Bibliographica idea in January and the team wrote a blog entry for the Open Knowledge Foundation blog. It is an idea that I’m looking forward to playing with and trying to embed data from. (http://bibliographica.org/)

One of the things that I’ve been thinking about though is increasingly when we do research, we store  web pages, blog entries and so on. Whilst there is way of recording these in a footnote (http:example.org accessed on <insert data> type thing), there does not appear to be a way of building a local archive of these with the relevant metadata for later retrieval, Don’t know about anybody else but I’ve got a fair few pages dotted around my hard drive for projects and I’d like a way of storing these properly and to be able to integrate them into bibliographies or research notes. I know that there is WARC format (Library of Congress link and the WARC tools Google code project) to play with so need to make time to do that.

I had a mini-hack on the Open Correspondence project last Sunday intending to update a couple of pages and got a little more done than that. The database needs rebuilding but the purl reference (http://purl.org/letter) now points to the schema. It is so close that I can’t wait to actually start hacking the data. Time to do the last little bits like tidy up the parser, use the weaving history API to embed a timeline and start using JENA, ARC and Chris Gutteridge’s Graphite library which worked out of the box (but as yet I haven’t entirely used it for much yet).

Goals for this week are to finish the Open Correspondence bits, update the trac instance with the various ‘todo’s, write a blog post for the Open Knowledge Foundation for Open Correspondence, do some major testing this week at work on various XML exports and imports. I should just be about caught up then. With any luck…