<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>The Aust Gate &#187; redis</title>
	<atom:link href="http://austgate.co.uk/tags/redis/feed/" rel="self" type="application/rss+xml" />
	<link>http://austgate.co.uk</link>
	<description>Open Knowledge and Literature</description>
	<lastBuildDate>Tue, 08 May 2012 20:33:34 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.2</generator>
		<item>
		<title>More autocompletion with Redis and Drupal</title>
		<link>http://austgate.co.uk/2011/07/more-autocompletion-with-redis-and-drupal/</link>
		<comments>http://austgate.co.uk/2011/07/more-autocompletion-with-redis-and-drupal/#comments</comments>
		<pubDate>Tue, 19 Jul 2011 20:26:22 +0000</pubDate>
		<dc:creator>iain_emsley</dc:creator>
				<category><![CDATA[Programming]]></category>
		<category><![CDATA[projects]]></category>
		<category><![CDATA[drupal]]></category>
		<category><![CDATA[redis]]></category>

		<guid isPermaLink="false">http://austgate.co.uk/?p=372</guid>
		<description><![CDATA[Last week I began working on an auto-complete function using Redis behind Drupal 7 to do some auto-completing functions. I needed to get some county data, and possibly other sorts, put into some forms so that it can be standardised. One of the issues that I&#8217;ve been trying to do is to make sure that [...]]]></description>
			<content:encoded><![CDATA[<p>Last week I began working on <a title="Auto-Complete with Redis and Drupal 7" href="http://austgate.co.uk/2011/07/auto-completing-drupal-with-redis/" target="_blank">an auto-complete function</a> using Redis behind <a title="Drupal CMS" href="http://drupal.org/" target="_blank">Drupal 7</a> to do some auto-completing functions.</p>
<p>I needed to get some county data, and possibly other sorts, put into some forms so that it can be standardised. One of the issues that I&#8217;ve been trying to do is to make sure that data is clean across various systems.</p>
<p>One of the issues that has come up is the connection library to the Redis server. I&#8217;ve used <a title="PHPRedis" href="https://github.com/nicolasff/phpredis" target="_blank">Nicholas FF&#8217;s phpredis</a> library and the <a title="Rediska library" href="http://rediska.geometria-lab.net/" target="_blank">Rediska</a> library. The matters that I&#8217;ve been considering are the size and complexity of each library.</p>
<p>I do like phpredis but as it is a C library needs compiling so that puts a potential barrier to its use in Drupal. Also the user may not have root access to compile it. As earlier posts might show, I loved Rediska but it seems overly large and complex for such a slight task. I was running some tests today on another project and it did not appear to be closing its connections properly (an issue which might be &#8216;QDH&#8217;able in the short term). Both libraries are complete and large which is what makes them great Swiss Army knives for Redis and PHP.</p>
<p>But I would like something small and light that has access to a subset of commands and is pure PHP. I suppose a long term desire might be to implement some of the pub/sub commands but I cannot think of a use at the moment.</p>
<p>I&#8217;ve taken ideas from <a title="Antirez on Redis, autocomplete and Redis" href="http://antirez.com/post/autocomplete-with-redis.html" target="_blank">Salvatore Sanfillipo&#8217;s post</a> which uses Ruby and ported these into PHP so the code uses Redis&#8217;s sorted sets to get the relevant items. I need to complete the code to see how it fairs under AJAX in Drupal. It would be a great sidetrack to follow some of the other possibilities such as search prediction, or perhaps this might go sideways into using that to pull up homonyms or synonyms where these are stored.</p>
<p>But that is definitely another day and another project.</p>
]]></content:encoded>
			<wfw:commentRss>http://austgate.co.uk/2011/07/more-autocompletion-with-redis-and-drupal/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Auto-completing Drupal with Redis</title>
		<link>http://austgate.co.uk/2011/07/auto-completing-drupal-with-redis/</link>
		<comments>http://austgate.co.uk/2011/07/auto-completing-drupal-with-redis/#comments</comments>
		<pubDate>Tue, 12 Jul 2011 17:33:32 +0000</pubDate>
		<dc:creator>iain_emsley</dc:creator>
				<category><![CDATA[Programming]]></category>
		<category><![CDATA[projects]]></category>
		<category><![CDATA[drupal]]></category>
		<category><![CDATA[redis]]></category>

		<guid isPermaLink="false">http://austgate.co.uk/?p=369</guid>
		<description><![CDATA[I&#8217;ve been working on some functions for a forthcoming site at Janet and have been looking at the user functionality in some of our forms. In a reversal of roles, I&#8217;ve been trying to find ways of making it easier for users to complete the forms for various products and services which has taken me down [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve been working on some functions for a forthcoming site at Janet and have been looking at the user functionality in some of our forms. In a reversal of roles, I&#8217;ve been trying to find ways of making it easier for users to complete the forms for various products and services which has taken me down some interesting avenues.</p>
<p>One of them is the idea of autocomplete. Nothing new there but something that can be easily overlooked when looking at the data instead. User Experience (UX) isn&#8217;t the only reason I&#8217;ve been looking at this. I&#8217;m also trying to find ways of keeping some of our data cleaner than it is and limiting options to a validated set currently seems to be the best way of doing this. To that end, as I&#8217;m still in the early stages of learning Drupal, i searched the Drupal forums and came across this comment (<a title="Autocomplete code in Drupal forums" href="http://drupal.org/node/1117562#comment-4452968" target="_blank">http://drupal.org/node/1117562#comment-4452968</a>) which has really helped me understand this. I&#8217;ve only implemented a cursory version of this code at the moment, throwing some rough values into the array to return, but the results appear to show fruit.</p>
<p>My intention is to back this onto a <a title="Redis key value store" href="http://redis.io" target="_blank">Redi</a>s store which will hold the cached data so that the site is not putting undue strain onto the main database and that the data store can be shared with other applications on the site. I&#8217;ve got the <a title="Rediska library" href="http://rediska.geometria-lab.net/" target="_blank">Rediska</a> library which I&#8217;m currently using but it has far more functions in it than I&#8217;ll ever need so I might roll my own pure PHP stripped down library. I&#8217;ll see what time I have.</p>
]]></content:encoded>
			<wfw:commentRss>http://austgate.co.uk/2011/07/auto-completing-drupal-with-redis/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Weeknotes: Storing and cleaning data</title>
		<link>http://austgate.co.uk/2011/06/weeknotes-storing-and-cleaning-data/</link>
		<comments>http://austgate.co.uk/2011/06/weeknotes-storing-and-cleaning-data/#comments</comments>
		<pubDate>Sun, 19 Jun 2011 13:15:13 +0000</pubDate>
		<dc:creator>iain_emsley</dc:creator>
				<category><![CDATA[weeknotes]]></category>
		<category><![CDATA[data sets]]></category>
		<category><![CDATA[node]]></category>
		<category><![CDATA[redis]]></category>

		<guid isPermaLink="false">http://austgate.co.uk/?p=350</guid>
		<description><![CDATA[This week has been soft launching a CRM system for the Janet project. Hopefully these would be just user bugs but it has highlighted some interesting data cleaning issues. These are going to be inherent in the exchange of data between two or more systems, especially when one is a long-term pre-existing one. This has [...]]]></description>
			<content:encoded><![CDATA[<p>This week has been soft launching a CRM system for the Janet project. Hopefully these would be just user bugs but it has highlighted some interesting data cleaning issues. These are going to be inherent in the exchange of data between two or more systems, especially when one is a long-term pre-existing one.</p>
<p>This has long-term implications in terms of continuing to ensure that the data is clean and standardised. Given that one of the forthcoming projects is based on our technical documents and converting them from existing formats (when these are fully confirmed) into the , as yet unbuilt or designed, system. As part of this I&#8217;ve been looking at the Chris Gutteridge&#8217;s <a title="Chris Gutteridge's Grinder" href="https://github.com/cgutteridge/Grinder" target="_blank">Grinder,</a> a parser for getting RDF data out of Excel and CSV files. I was reminded of Grinder whilst reading his article about Linked Data at the University of Southampton in the final ever Nodalities. Whilst Grinder itself may not be of initial use, it does give me some clues about the possibilities of transforming the data.</p>
<p>The project also forces me to think about how the programme would run and I suspect off the command line. If this is a safe assumption, then it means that I need to get back to Perl or use Python. Much as I like PHP, I&#8217;m not sure it is a command line language. I know it can be run as one but it always make me nervous as I don&#8217;t really consider it a system administration or data munging language. In either case, Perl and Python mean another re-learning curve, especially Perl which I last use at JISCMail a couple of years ago.</p>
<p>A side project that I&#8217;ve been  looking at is the real-time data storage of feeds for later mining and use. I&#8217;ve been thinking of using Node.js (and actually starting something!) and Redis to run in the background. A little side something, methinks. It does mean me learning more about Node though and gives me something tangible to build. I&#8217;ve been having a little search around the Net and came across an older post by <a title="Marshall Kirkpatrick on Realtime web" href="http://www.nten.org/blog/2009/10/28/ten-useful-examples-realtime-web-action" target="_blank">Marshall Kirkpatrick on the NTEN blog about realtime data</a> whilst reading about <a title="Elegant Code blogon node event loops" href="http://elegantcode.com/2010/11/19/taking-baby-steps-with-node-js-threads-vs-events/" target="_blank">event loops in Node on the Elegant Code</a> blog. Of course, once it is stored, it must be processed to be useful but that is the next step.</p>
]]></content:encoded>
			<wfw:commentRss>http://austgate.co.uk/2011/06/weeknotes-storing-and-cleaning-data/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Book Hackday and using Node with Redis</title>
		<link>http://austgate.co.uk/2011/05/book-hackday-and-using-node-with-redis/</link>
		<comments>http://austgate.co.uk/2011/05/book-hackday-and-using-node-with-redis/#comments</comments>
		<pubDate>Wed, 11 May 2011 20:18:15 +0000</pubDate>
		<dc:creator>iain_emsley</dc:creator>
				<category><![CDATA[Open Knowledge]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[projects]]></category>
		<category><![CDATA[node.js]]></category>
		<category><![CDATA[publishing]]></category>
		<category><![CDATA[redis]]></category>

		<guid isPermaLink="false">http://austgate.co.uk/?p=333</guid>
		<description><![CDATA[I&#8217;ve bumped into Marcus Povey in a few places but the last time was at the Oxford Geek Night. He kindly pointed me in the direction of Paul Squires of Perini when he heard that I wanted to organise a text hacking day. Paul is one of the people behind Book Hackday this coming Saturday [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve bumped into <a title="Marcus Povey's site" href="http://www.marcus-povey.co.uk/" target="_blank">Marcus Povey</a> in a few places but the last time was at the Oxford Geek Night. He kindly pointed me in the direction of Paul Squires of <a title="Perini Media" href="http://www.pereramedia.com/about-perera/about-us" target="_blank">Perini</a> when he heard that I wanted to organise a text hacking day. Paul is one of the people behind <a title="Book Hackday site" href="http://www.bookhackday.com" target="_blank">Book Hackday</a> this coming Saturday which is now backed up with the <a title="Book Hackers site" href="http://www.bookhackers.com" target="_blank">BookHackers</a> site. So no need to organise much more than brain, interest, be at the <a title="Free Word centre" href="http://www.freewordonline.com/" target="_blank">Free Word</a> centre for 10am (it lasts until 8om)</p>
<p>I&#8217;ve just spent the evening using <a title="Node JS website" href="http://nodejs.org" target="_blank">Node.js</a> and <a title="Redis website" href="http://redis.io" target="_blank">Redis</a>, making a change from usually using Redis with PHP or occasional Python. I&#8217;ve been using the latter for logging work and messaging but have been intrigued by the idea of having a system constantly pushing out annotations if added to a document between distributed editors. The idea is that these would all be saved and pushed out to users of a particular document and stored in Redis.</p>
<p>Node strikes me as a good way of creating an efficient back-end system in polling Redis (or perhaps using Redis&#8217;s Pub/Sub architecture to achieve a similar end). I would need to look at whether the Publish command would save the data though.</p>
<p>From this evening&#8217;s hack, I&#8217;ve got the polling from the server working and need to separate out the pushing into the store from the polling to retrieve all notes/comments/et al. I suppose the next thing after that is to pop in an html page or template for a front end.</p>
<p>I&#8217;ll do some more tomorrow but I&#8217;m making headway with Node which is cool as I&#8217;ve been trying to find something meatier to do with it before introducing it into work, even as a development project.</p>
]]></content:encoded>
			<wfw:commentRss>http://austgate.co.uk/2011/05/book-hackday-and-using-node-with-redis/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Scripting announced for Redis</title>
		<link>http://austgate.co.uk/2011/04/scripting-announced-for-redis/</link>
		<comments>http://austgate.co.uk/2011/04/scripting-announced-for-redis/#comments</comments>
		<pubDate>Wed, 27 Apr 2011 18:52:16 +0000</pubDate>
		<dc:creator>iain_emsley</dc:creator>
				<category><![CDATA[projects]]></category>
		<category><![CDATA[nosql]]></category>
		<category><![CDATA[redis]]></category>

		<guid isPermaLink="false">http://austgate.co.uk/?p=331</guid>
		<description><![CDATA[I&#8217;ve just come across this blog post via the Redis Google group by Salvatore &#8216;Antirez&#8217; Sanfilippo on introducing some scripting into the Redis key-vaue datastore. I&#8217;ve played around with Redis again as part of a logging system, having used it as a really basic queue system in a previous life. I may not play with [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve just come across this <a title="Antirez on Redis and scripting" href="http://antirez.com/post/redis-and-scripting.html" target="_blank">blog post</a> via the Redis Google group by Salvatore &#8216;Antirez&#8217; Sanfilippo on introducing some scripting into the Redis key-vaue datastore.</p>
<p>I&#8217;ve played around with Redis again as part of a logging system, having used it as a really basic queue system in a previous life. I may not play with it immediately but there is a lot to chew over in the post about writing scripts against Redis.</p>
<p>Initially I&#8217;m looking at trying to write a search script for a day&#8217;s worth of logs, so nothing major but I&#8217;m also hoping to capture counts for services and parts of a platform being used. It&#8217;ll probably have a very few defined commands to avoid bloat although it will initially be in PHP (since I&#8217;m using it for other parts of the platform). It might change into a different language (Lua seems to be the one being muttered about) if performance requires it.</p>
<p>It has been a while since I&#8217;ve really played with Redis but I&#8217;m glad to come back to it and become re-acquainted with it.</p>
]]></content:encoded>
			<wfw:commentRss>http://austgate.co.uk/2011/04/scripting-announced-for-redis/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Finding a space for NoSQL</title>
		<link>http://austgate.co.uk/2010/07/187/</link>
		<comments>http://austgate.co.uk/2010/07/187/#comments</comments>
		<pubDate>Tue, 20 Jul 2010 19:11:26 +0000</pubDate>
		<dc:creator>iain_emsley</dc:creator>
				<category><![CDATA[Information Retrieval]]></category>
		<category><![CDATA[database]]></category>
		<category><![CDATA[nosql]]></category>
		<category><![CDATA[redis]]></category>

		<guid isPermaLink="false">http://austgate.co.uk/?p=187</guid>
		<description><![CDATA[ReadWriteWeb have a post on NoSQL (again?) by Audrey Watters which is a brief overview of the area.  The original post points the Heroku blog, where Adam Wiggins outlines the uses of NoSQL. I&#8217;m not an expert by any means but use Redis on a daily basis with the Rediska PHP library. I remember having [...]]]></description>
			<content:encoded><![CDATA[<p><a title="ReadWriteWeb on NoSQL" href="http://www.readwriteweb.com/cloud/2010/07/cassandra-predicting-the-futur.php" target="_blank">ReadWriteWeb</a> have a post on NoSQL (again?) by Audrey Watters which is a brief overview of the area.  The original post points the Heroku blog, where Adam Wiggins <a title="Heroku blog on NoSQL" href="http://blog.heroku.com/archives/2010/7/20/nosql/" target="_blank">outlines the uses of NoSQL</a>. I&#8217;m not an expert by any means but use Redis on a daily basis with the  Rediska PHP library. I remember having an argument with the IT director when I originally proposed using Redis but I&#8217;m glad that the gamble has paid off. The caching system that uses is now far more productive than the earlier version.</p>
<p>Our base is database is MySQL which I like a fair amount for what we do with it but all I needed do was to cache some data. The scripts write a fair amount of data to the cache and then there is one read process to read the entire list before updating the main database. At least I know that the data has some sort of security. It is not a panacea or similar cure all but it does have a place in development for certain jobs.</p>
<p>Best tool and all that?</p>
<p>I can understand why <a title="Cassandra, Twitter and NoSQL" href="http://engineering.twitter.com/2010/07/cassandra-at-twitter-today.html" target="_blank">Twitter are not using Cassandra</a> in the main service but are still using it for other projects.  For now. Systems and priorities change and perhaps it will happen in some way.</p>
<p>Despite its meteoric rise, NoSQL is not the answer to everything. It does have a useful place though.</p>
]]></content:encoded>
			<wfw:commentRss>http://austgate.co.uk/2010/07/187/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Weeknotes: Redis, PHP, mail and SOAP</title>
		<link>http://austgate.co.uk/2010/06/weeknotes-redis-php-mail-and-soap/</link>
		<comments>http://austgate.co.uk/2010/06/weeknotes-redis-php-mail-and-soap/#comments</comments>
		<pubDate>Sun, 06 Jun 2010 11:05:18 +0000</pubDate>
		<dc:creator>iain_emsley</dc:creator>
				<category><![CDATA[Information Retrieval]]></category>
		<category><![CDATA[projects]]></category>
		<category><![CDATA[php]]></category>
		<category><![CDATA[redis]]></category>
		<category><![CDATA[soap]]></category>

		<guid isPermaLink="false">http://austgate.co.uk/?p=164</guid>
		<description><![CDATA[I&#8217;ve spent some time writing a queueing library using Redis as a backend. I started with the notion that it would need to be a FIFO queue but didn&#8217;t want to only use the in-built parts of PHP as a stack using array_pop or array_push. Whilst it might be faster, it doesn&#8217;t allow for queue [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve spent some time writing a queueing library using Redis as a backend. I started with the notion that it would need to be a FIFO queue but didn&#8217;t want to only use the in-built parts of PHP as a stack using array_pop or array_push. Whilst it might be faster, it doesn&#8217;t allow for queue storage if the worker / router calling the queue does not run until a certain time so I looked at Redis. I  drew some inspiration from <a title="MEMQ blog post" href="http://abhinavsingh.com/blog/2010/02/memq-fast-queue-implementation-using-memcached-and-php-only/" target="_blank">MEMQ</a>, a queue implementation using memcached. I wrote a quick set of functions to handle connection, enqueuing and dequeueing with the ever present Rediska as the underlying Redis connection library. I&#8217;m tempted to revisit this and to write my own connection to remove the reliance on Rediska. What I did learn was how to increase and decrease the number of items that could be dequeued. For some stupid reason, I&#8217;d got into my head that it would either by one or all items.</p>
<p>However if you think about the LLEN command, you can pop as many items as you want, drop them into an array and iterate across them. I need to try this but you could feasibly call items from the middle of the array by changing the start and end points in LLEN. Normally I&#8217;d do something like  &lt;list name&gt; LLEN 0, -1 for all items or &lt;list name&gt; LLEN 0, 2 for the first two but if you change 0 to something else where you know there are 30 items but only want 5 from position 20 then you could pop in LLEN 20, 5 to achieve the result. It is not really germaine to the queueing that I&#8217;ve been looking at (for system updates where I need everything or just the first item) but could be a useful adaptation for somebody else.</p>
<p>The main challenge this week has been reading Excel attachments from email. PHP&#8217;s <a title="PHP's imap functions" href="http://php.net/manual/en/book.imap.php" target="_blank">imap</a> library  allows you to read the structure of an email but is curiously reticent in retrieving data if you have mime parts. I spent ethe best part of a day and a half getting a script to iterate over an incoming email, filter the parts so that it just explored the attachments mime type and then retrive any attachments either from a flat structure or iterating over each part before calling imap_fetchbody(). So far the fix appears to work and has allowed me to create a prototype mail service for receiving email data. It seems odd that in the era of web services that financial data is still sent by insecure methods but we must accomodate.</p>
<p>I&#8217;ve also been looking at PHP&#8217;s<a title="PHP's soap functions" href="http://php.net/manual/en/book.soap.php" target="_blank"> SOAP</a> library to create a status update service which will probably utilise <a title="Wikipedia on Service Orientated Architecture" href="http://en.wikipedia.org/wiki/Service-oriented_architecture" target="_blank">Service Orientated Architecture</a> to create a stable, scalable service. Initially I created a <a title="W3 on WSDL" href="http://www.w3.org/TR/wsdl" target="_blank">WSDL</a> file using the <a title="Eclipse ide" href="http://www.eclipse.org/" target="_blank">Eclipse IDE</a> but that threw all sorts of issues and ended up using Zend&#8217;s WSDL generator tool running across the existing server. Must look into this but there might be a conflict in versions of WSDL as well as first time learning curve. I&#8217;m hoping to get the first version of the service up this week.</p>
<p>I suspect that this week is going to complete the commission and service status services as well as possibly doing some documentation as it is beginning to pile up.</p>
]]></content:encoded>
			<wfw:commentRss>http://austgate.co.uk/2010/06/weeknotes-redis-php-mail-and-soap/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Weeknotes: Data mining, XML and bibliographies</title>
		<link>http://austgate.co.uk/2010/05/weeknotes-data-mining-xml-and-bibliographies/</link>
		<comments>http://austgate.co.uk/2010/05/weeknotes-data-mining-xml-and-bibliographies/#comments</comments>
		<pubDate>Sun, 23 May 2010 10:57:25 +0000</pubDate>
		<dc:creator>iain_emsley</dc:creator>
				<category><![CDATA[Information Retrieval]]></category>
		<category><![CDATA[Open Knowledge]]></category>
		<category><![CDATA[projects]]></category>
		<category><![CDATA[open_bibliography]]></category>
		<category><![CDATA[open_correspondence]]></category>
		<category><![CDATA[rdf]]></category>
		<category><![CDATA[redis]]></category>

		<guid isPermaLink="false">http://austgate.co.uk/?p=155</guid>
		<description><![CDATA[It seems to be have been a week of frantic completion and refactoring. The first half was spent frantically converting html pages into PDFs using Verypdf&#8217;s HTMLtools server product. All in all the manual is very helpful and the test server could be set up quickly. It might have helped the other end if I&#8217;d [...]]]></description>
			<content:encoded><![CDATA[<p>It seems to be have been a week of frantic completion and refactoring.</p>
<p>The first half was spent frantically converting html pages into PDFs using Verypdf&#8217;s<a title="VeryPDF htmltools command line manual" href="http://www.verypdf.com/htmltools/html-tools.html" target="_blank"> HTMLtools</a> server product. All in all the manual is very helpful and the test server could be set up quickly. It might have helped the other end if I&#8217;d remembered to break the file up for printing but that turned out to be a 10 minute jub to put back into production. The next task is to transfer it from the test server and onto the production one but that&#8217;ll need to wait for networking to tweak it a little.</p>
<p>I spent some time refactoring the call recordings archive. For some reason the archiving solution that I hacked up in November decided to start failing in March after it was changed. Despite being put back to its original state it never quite got back to working as it did. I&#8217;ve been trying to tweak it ridon and off but never found the time to complete it. I finally just made the time on friday afternoon to look at it properly. I&#8217;d been thinking about item based filtering after reading the first chapter of Toby Segaran&#8217;s <a title="OReilly page for Programming Collective Intelligence" href="http://oreilly.com/catalog/9780596529321/" target="_blank">Programming Collective Intelligence</a>. (On the back of this, I think I&#8217;ll be buying his <a title="O'Reilly page for Beautiful Data" href="http://oreilly.com/catalog/9780596157128/" target="_blank">Beautiful Data</a> at some point.)  Although this is not really an intelligent programme as such, the techniques have shown some real promise in the hurried tests. Using a Redis datastore, the percentage of found recordings is way up. Fingers crossed for Monday morning when I can see what the scripts run over the weekend. I also spent some time simplifying the matching algorithm so that I didn&#8217;t have to account for so many edge cases when dealing with time.</p>
<p>It seems that we are approaching some sort of real-time status update systems at work. I&#8217;ve sort of been arguing for this for a while to remove the bottlenecks of having each system dependant on another one. One of our suppliers is sending us XML data so I&#8217;ve been playing with Xpath 1.0 (since Xpath 2.0 apparently isn&#8217;t directly supported by PHP but there might be a way of passing the data to Java which adds unnecessary overhead) to extract the relevant values. Anyhow the core is running but I still need to fully test it and add in security.</p>
<p>I&#8217;ve also been asked to design and implement a queueing system for the main internal server. I&#8217;ve run up a quick high level overview but the detail still needs to be worked on. I&#8217;m pushing it back to June so that I can slear the decks of the older projects that are still on the board.</p>
<p>I had a chat with <a title="Jonathan Gray's blog" href="http://jonathangray.org/" target="_blank">Jonathan Gray</a>, a sound guy who does far too much, about digital humanities ideas. We&#8217;ve agreed to keep closer contact with each other about the area and to encourage each other into actually doing stuff (I have half a moleskin of ideas &#8211; time for more code, less talk then).  He proposed the <a title="Jonathan Gray on Bibliographica" href="http://austgate.co.uk/2010/01/bibliographica-open-bibliographic-sourcing-and-maintenance/" target="_blank">Bibliographica idea</a> in January and the team wrote <a title="Bibliographican entry on the blog" href="http://blog.okfn.org/2010/05/20/bibliographica-an-introduction/" target="_blank">a blog entry</a> for the Open Knowledge Foundation blog. It is an idea that I&#8217;m looking forward to playing with and trying to embed data from. (<a href="http://bibliographica.org/">http://bibliographica.org/</a>)</p>
<p>One of the things that I&#8217;ve been thinking about though is increasingly when we do research, we store  web pages, blog entries and so on. Whilst there is way of recording these in a footnote (http:example.org accessed on &lt;insert data&gt; type thing), there does not appear to be a way of building a local archive of these with the relevant metadata for later retrieval, Don&#8217;t know about anybody else but I&#8217;ve got a fair few pages dotted around my hard drive for projects and I&#8217;d like a way of storing these properly and to be able to integrate them into bibliographies or research notes. I know that there is WARC format (<a title="Library of Congress on WARC" href="http://www.digitalpreservation.gov/formats/fdd/fdd000236.shtml" target="_blank">Library of Congress</a> link and the <a title="WARC tools on Google code" href="http://code.google.com/p/warc-tools/" target="_blank">WARC tools</a> Google code project) to play with so need to make time to do that.</p>
<p>I had a mini-hack on the Open Correspondence project last Sunday intending to update a couple of pages and got a little more done than that. The database needs rebuilding but the purl reference (<a title="Letter schema PURL" href="http://purl.org/letter" target="_blank">http://purl.org/letter</a>) now points to the schema. It is so close that I can&#8217;t wait to actually start hacking the data. Time to do the last little bits like tidy up the parser, use the weaving history API to embed a timeline and start using <a title="jena sourceforge archive" href="http://jena.sourceforge.net/" target="_blank">JENA</a>, <a title="ARC website" href="http://arc.semsol.org" target="_blank">ARC</a> and Chris Gutteridge&#8217;s <a title="Graphite rdf library" href="http://graphite.ecs.soton.ac.uk/" target="_blank">Graphite</a> library which worked out of the box (but as yet I haven&#8217;t entirely used it for much yet).</p>
<p>Goals for this week are to finish the Open Correspondence bits, update the trac instance with the various &#8216;todo&#8217;s, write a blog post for the Open Knowledge Foundation for Open Correspondence, do some major testing this week at work on various XML exports and imports. I should just be about caught up then. With any luck&#8230;</p>
]]></content:encoded>
			<wfw:commentRss>http://austgate.co.uk/2010/05/weeknotes-data-mining-xml-and-bibliographies/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Weeknotes: Redis, RDF, rdflib and openletters</title>
		<link>http://austgate.co.uk/2010/05/weeknotes-redis-rdf-rdflib-and-openletters/</link>
		<comments>http://austgate.co.uk/2010/05/weeknotes-redis-rdf-rdflib-and-openletters/#comments</comments>
		<pubDate>Sat, 15 May 2010 14:57:14 +0000</pubDate>
		<dc:creator>iain_emsley</dc:creator>
				<category><![CDATA[Open Knowledge]]></category>
		<category><![CDATA[projects]]></category>
		<category><![CDATA[open_correspondence]]></category>
		<category><![CDATA[redis]]></category>

		<guid isPermaLink="false">http://austgate.co.uk/?p=152</guid>
		<description><![CDATA[I&#8217;ve been trying to play catch up this week at work. One of the projects that I&#8217;ve been working on is the temporary storage of information. For one reason or another, one of the workers has decided to occasionally throw a fit and not do its job properly (on top of a connection that appears [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve been trying to play catch up this week at work.</p>
<p>One of the projects that I&#8217;ve been working on is the temporary storage of information. For one reason or another, one of the workers has decided to occasionally throw a fit and not do its job properly (on top of a connection that appears to fail at odd times). What I really needed was a temporary store to save the parsed information so that if something failed, we didn&#8217;t loose everything. To that end, I&#8217;ve started looking at <a title="Redis code base" href="http://code.google.com/p/redis/" target="_blank">Redis</a> in more detail and started using the Windows build of version 1.2.1 (available on <a title="Aspninja and redis" href="http://www.aspninja.com/2010/01/23/using-redis-on-asp-net-example-twitter-clone-retwis-c/" target="_blank">aspninja.com</a>) with the <a title="Rediska library" href="http://rediska.geometria-lab.net/" target="_blank">Rediska</a> library. At some point I&#8217;ll sit down and compile it on my laptop under Cygwin to get the latest version.</p>
<p>I ended up using the PEAR version of Rediska and managed to get it up and running fairly quickly. One of the things that I needed to do was to call a new instance of the list that I was creating in each method, having split the set and get methods into two workers. The speed of Redis is fantastic and the server happily runs on the test server caching the data and allowing another worker to load into a copy of the MySQL tables that it will eventually update. I found the Rediska library really easy to use and I&#8217;ll be using it for various projects at home to do some processing rather than using MySQL all the time. <a title="Simon Willison on redis" href="http://simonwillison.net/2010/Apr/25/redis/" target="_blank">Simon Willison</a> has a post which links to <a title="Simon Willison on redis" href="http://simonwillison.net/static/2010/redis-tutorial/" target="_blank">a tutorial on Redis</a> that I found extremely useful and encouraging in finding more about the server in future.</p>
<p>I&#8217;ve been working on the RDF exports for the <a title="Open Correspondence website" href="http://opencorrespondence.org" target="_blank">open letters</a> project which are yet to go live. The main job has been making sure that the exports validate using the RDF validator and pulling in the data. A future task is to finish tidying up the data but I&#8217;m trying to get the letter html template figured out. Since Python isn&#8217;t the main language that I know use (work is entirely based on PHP), I&#8217;ve been taking a look at the <a title="open shakespeare website" href="http://openshakespeare.org" target="_blank">Open Shakespeare</a> code and found that RDFa work that I worked on a year ago and completely forgotten about. It would be good to get RDFa into open correspondence but I think that is a later task. Main thing is to complete the initial port. I managed to get the www.purl.org/letter forwarding to the site but need to get a schema page up and the purl correctly referring to the right page.</p>
<p>One of things that I&#8217;ve been trying to play with <a title="rdflib python library" href="http://code.google.com/p/rdflib/" target="_blank">RDFlib</a> on Windows. I built it successfully on my last laptop (Windows XP, Cygwin) but for some reason version 2.4.2 would not build on Vista, even under easy install. I&#8217;ve been trying with the version 3 (which has just been released on may 13th according to the news group) and apparently the <a title="rdfextras project" href="http://code.google.com/p/rdfextras/" target="_blank">rdfextras</a> project has a pure Python version of the Sparql parser which was failing to build. I&#8217;ll be trying that once the current work on open correspondent as been completed to explore what we can do with the data.</p>
<p>Ben O&#8217;Steen talked at the Open Knowledge conference after me and one of the things he talked about was the psutils package. I&#8217;ve found it on<a title="Cygwin site" href="http://www.cygwin.com" target="_blank"> Cygwin</a> and downloaded it so it would be good to have fun with that one or to find accessible <a title="PSUtils Windows port" href="http://gnuwin32.sourceforge.net/packages/psutils.htm" target="_blank">Windows ports</a> for people who don&#8217;t necessarily want to download Cygwin.</p>
]]></content:encoded>
			<wfw:commentRss>http://austgate.co.uk/2010/05/weeknotes-redis-rdf-rdflib-and-openletters/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

