<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>The Aust Gate &#187; weeknotes</title>
	<atom:link href="http://austgate.co.uk/category/weeknotes/feed/" rel="self" type="application/rss+xml" />
	<link>http://austgate.co.uk</link>
	<description>Open Knowledge and Literature</description>
	<lastBuildDate>Mon, 23 Jan 2012 18:10:47 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>Weeknotes: Documents and data</title>
		<link>http://austgate.co.uk/2011/07/weeknotes-documents-and-data/</link>
		<comments>http://austgate.co.uk/2011/07/weeknotes-documents-and-data/#comments</comments>
		<pubDate>Sun, 03 Jul 2011 14:39:22 +0000</pubDate>
		<dc:creator>iain_emsley</dc:creator>
				<category><![CDATA[Information Retrieval]]></category>
		<category><![CDATA[weeknotes]]></category>
		<category><![CDATA[documents]]></category>
		<category><![CDATA[drupal]]></category>
		<category><![CDATA[linked_data]]></category>

		<guid isPermaLink="false">http://austgate.co.uk/?p=364</guid>
		<description><![CDATA[The main project this week (apart from hte onging one of moving and virtualising servers) is to begin work on our technical documents. I&#8217;m trying to move them onto the web and make the useful, not only in terms of reading about them but also to make them linkable. I&#8217;m trying to get them out [...]]]></description>
			<content:encoded><![CDATA[<p>The main project this week (apart from hte onging one of moving and virtualising servers) is to begin work on our technical documents.</p>
<p>I&#8217;m trying to move them onto the web and make the useful, not only in terms of reading about them but also to make them linkable. I&#8217;m trying to get them out of being placed on a web site as Word or PDF downloads and move them into being web pages with comments. Drupal 7&#8242;s inbuilt book module is probably the way to go and is producing some really nice results in the hacking I managed on Friday. There is a certain pleasure now in that I began the hack at 8:30 and within an hour, I had a working document (albeit I wanted to mess around with the URLs to make the nicer and far more meaningful). It had comments and was generally felt to be good.</p>
<p>The next task was to work on a way of doing Frequently Asked Questions (FAQs). Having begun some of the work using the <a title="Frequently Asked Questions Drupal module" href="http://drupal.org/project/faq" target="_blank">Frequently Asked Questions module</a>, I decided it had to many issues for us (including not being able to control where the page was and it did not appear to play nicely wiht the <a title="Pathauto Drupal module" href="http://drupal.org/project/pathauto" target="_blank">Pathauto rewriting module</a>), I write my own content type which we can manipulate via the Views module to create sets of FAQs. When I&#8217;ve got more time, I may come back to the module and try to help fix some bugs.</p>
<p>Whilst neither of these are finished items, it was a pleasant day hacking and creating, getting prototypes ready in a day. I&#8217;m taking this as a sign of increasing familiarity with Drupal. I do, however, need to find a morning to finish the Sugar SOAP integration module and tidy that up. Ideally I&#8217;d trying to find a way of integrating it with the current module to offer swapable backends.</p>
<p>I&#8217;ve also started looking at using <a title="Redis website" href="http://redis.io" target="_blank">Redis</a> for caching again in a major way to ensure that various static fields of data, such as UK counties, can have a common reference to reduce data cleaning issues such as county begin written as co., co and county. I&#8217;m also looking at the issue of Linked Data and how to integrate the ideas into our current projects. For now I&#8217;m rereading <a title="Tim Berner's-Lee on Linked Data" href="http://www.w3.org/DesignIssues/LinkedData.html" target="_blank">Tim Berners-Lee&#8217;s guide</a>, linked from the <a title="Linked Data website" href="http://linkeddata.org/" target="_blank">linkeddata.org </a>website and formulating ideas and refining the ones I currently have.</p>
<p>Ambition might bet the better of me but at least I feel like I want to take all of this on and to try to improve skills and learn more. In the meanwhile, I have some serious hills to climb.</p>
<p>Update:  This post has got me rethinking the Open Correspondence RDF and Linked Data. The more I delve, the greater my sense of needing to rethink that part of the project and to complete the correspondence links. Most of them are there but need complete linking. I also need to look at the Python&#8217;s <a title="Python's RDFLib code" href="http://www.rdflib.net/" target="_blank">RDFlib</a> and perhaps make better use of the Sparql qeuries and stores. I sense an evening or several of experimentation before a hacking weekend to resolve these issues.</p>
]]></content:encoded>
			<wfw:commentRss>http://austgate.co.uk/2011/07/weeknotes-documents-and-data/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Weeknotes: Drupal, NoSQL and data</title>
		<link>http://austgate.co.uk/2011/06/weeknotes-drupal-nosql-and-data/</link>
		<comments>http://austgate.co.uk/2011/06/weeknotes-drupal-nosql-and-data/#comments</comments>
		<pubDate>Sun, 26 Jun 2011 11:02:47 +0000</pubDate>
		<dc:creator>iain_emsley</dc:creator>
				<category><![CDATA[weeknotes]]></category>

		<guid isPermaLink="false">http://austgate.co.uk/?p=362</guid>
		<description><![CDATA[It has been an interesting week which  I would rather forget. However I am not and it made me rethink quite a few assumptions. On the plus side, I&#8217;ve managed to write some of the documentation for the portal and map the processes which need to be coded next week. The major thing that I [...]]]></description>
			<content:encoded><![CDATA[<p>It has been an interesting week which  I would rather forget. However I am not and it made me rethink quite a few assumptions. On the plus side, I&#8217;ve managed to write some of the documentation for the portal and map the processes which need to be coded next week.</p>
<p>The major thing that I have completed is the basic integration of Drupal 7 with SugarCRM Community edition. At the moment it definitely works with version 6.12 as this is what I have at the moment but I&#8217;m going to upgrade to 6.2. I do not see any issues regarding this as apart from having to remap one or two fields. I&#8217;m hoping, next week now, to split off some of the changes and to offer them to original project as patches so that the main <a title="Drupal Webform2Sugar project" href="http://drupal.org/project/webform2sugar" target="_blank">webform2sugar</a> project can bring them on board or not as they will.</p>
<p>In tandem with the data cleaning project mentioned last week, I am looking at caching data using <a title="Redis website" href="http://redis.io" target="_blank">Redis</a> behind forms to offer fixed lists of data. Although we commonly use PHP, I am strongly thinking of writing the readers and document parsers in either Perl or Python. What I might do is to write some test scripts in both and benchmark them but also have to balance their handling on MS Word (probably largely the 2003 version rather than the 2007 one) and PDF documents across platforms as I will be moving them across platforms.</p>
<p>I&#8217;ve also been thinking about NoSQL stores for other pieces of data and projects which are being worked on. The <a title="HighScalability on NoSQL use cases" href="http://highscalability.com/blog/2011/6/20/35-use-cases-for-choosing-your-next-nosql-database.html" target="_blank">HighScalability blog</a> has a great piece on what to look under which circumstances for SQL and NoSQL databases.</p>
]]></content:encoded>
			<wfw:commentRss>http://austgate.co.uk/2011/06/weeknotes-drupal-nosql-and-data/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>JISCMail to migrate to new platform</title>
		<link>http://austgate.co.uk/2011/06/jiscmail-to-migrate-to-new-platform/</link>
		<comments>http://austgate.co.uk/2011/06/jiscmail-to-migrate-to-new-platform/#comments</comments>
		<pubDate>Sun, 19 Jun 2011 13:25:52 +0000</pubDate>
		<dc:creator>iain_emsley</dc:creator>
				<category><![CDATA[weeknotes]]></category>
		<category><![CDATA[jiscmail]]></category>

		<guid isPermaLink="false">http://austgate.co.uk/?p=352</guid>
		<description><![CDATA[I see from Twitter that JISCMail announced that they have funding for another year which is a good thing (in the Sellars and Yeats sense). It does mention that they are migrating onto a new platform (though it is inferred keeping the current mail system) but does not mention what this might be. The statement [...]]]></description>
			<content:encoded><![CDATA[<p>I see from Twitter that <a title="JISCMail funding news" href="http://www.jiscmail.ac.uk/news/2011/june2011.html" target="_blank">JISCMail announced that they have funding</a> for another year which is a good thing (in the Sellars and Yeats sense). It does mention that they are migrating onto a new platform (though it is inferred keeping the current mail system) but does not mention what this might be.</p>
<p>The statement linked to is more than terse but I wait with interest to see what this new platform is and what it will do to &#8220;provide new benefits beyond JISCMail’s traditional boundaries&#8221;. I did hear mention of social networking in passing and integration with social networks was being discussed whilst I was there.</p>
<p>Further announcements to come&#8230;</p>
]]></content:encoded>
			<wfw:commentRss>http://austgate.co.uk/2011/06/jiscmail-to-migrate-to-new-platform/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Weeknotes: Storing and cleaning data</title>
		<link>http://austgate.co.uk/2011/06/weeknotes-storing-and-cleaning-data/</link>
		<comments>http://austgate.co.uk/2011/06/weeknotes-storing-and-cleaning-data/#comments</comments>
		<pubDate>Sun, 19 Jun 2011 13:15:13 +0000</pubDate>
		<dc:creator>iain_emsley</dc:creator>
				<category><![CDATA[weeknotes]]></category>
		<category><![CDATA[data sets]]></category>
		<category><![CDATA[node]]></category>
		<category><![CDATA[redis]]></category>

		<guid isPermaLink="false">http://austgate.co.uk/?p=350</guid>
		<description><![CDATA[This week has been soft launching a CRM system for the Janet project. Hopefully these would be just user bugs but it has highlighted some interesting data cleaning issues. These are going to be inherent in the exchange of data between two or more systems, especially when one is a long-term pre-existing one. This has [...]]]></description>
			<content:encoded><![CDATA[<p>This week has been soft launching a CRM system for the Janet project. Hopefully these would be just user bugs but it has highlighted some interesting data cleaning issues. These are going to be inherent in the exchange of data between two or more systems, especially when one is a long-term pre-existing one.</p>
<p>This has long-term implications in terms of continuing to ensure that the data is clean and standardised. Given that one of the forthcoming projects is based on our technical documents and converting them from existing formats (when these are fully confirmed) into the , as yet unbuilt or designed, system. As part of this I&#8217;ve been looking at the Chris Gutteridge&#8217;s <a title="Chris Gutteridge's Grinder" href="https://github.com/cgutteridge/Grinder" target="_blank">Grinder,</a> a parser for getting RDF data out of Excel and CSV files. I was reminded of Grinder whilst reading his article about Linked Data at the University of Southampton in the final ever Nodalities. Whilst Grinder itself may not be of initial use, it does give me some clues about the possibilities of transforming the data.</p>
<p>The project also forces me to think about how the programme would run and I suspect off the command line. If this is a safe assumption, then it means that I need to get back to Perl or use Python. Much as I like PHP, I&#8217;m not sure it is a command line language. I know it can be run as one but it always make me nervous as I don&#8217;t really consider it a system administration or data munging language. In either case, Perl and Python mean another re-learning curve, especially Perl which I last use at JISCMail a couple of years ago.</p>
<p>A side project that I&#8217;ve been  looking at is the real-time data storage of feeds for later mining and use. I&#8217;ve been thinking of using Node.js (and actually starting something!) and Redis to run in the background. A little side something, methinks. It does mean me learning more about Node though and gives me something tangible to build. I&#8217;ve been having a little search around the Net and came across an older post by <a title="Marshall Kirkpatrick on Realtime web" href="http://www.nten.org/blog/2009/10/28/ten-useful-examples-realtime-web-action" target="_blank">Marshall Kirkpatrick on the NTEN blog about realtime data</a> whilst reading about <a title="Elegant Code blogon node event loops" href="http://elegantcode.com/2010/11/19/taking-baby-steps-with-node-js-threads-vs-events/" target="_blank">event loops in Node on the Elegant Code</a> blog. Of course, once it is stored, it must be processed to be useful but that is the next step.</p>
]]></content:encoded>
			<wfw:commentRss>http://austgate.co.uk/2011/06/weeknotes-storing-and-cleaning-data/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Weeknotes: Open Correspondence toolkit and converting XML into JSON</title>
		<link>http://austgate.co.uk/2011/05/weeknotes-open-correspondence-toolkit-and-converting-xml-into-json/</link>
		<comments>http://austgate.co.uk/2011/05/weeknotes-open-correspondence-toolkit-and-converting-xml-into-json/#comments</comments>
		<pubDate>Thu, 26 May 2011 19:25:47 +0000</pubDate>
		<dc:creator>iain_emsley</dc:creator>
				<category><![CDATA[Open Knowledge]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[projects]]></category>
		<category><![CDATA[Text Mining]]></category>
		<category><![CDATA[weeknotes]]></category>

		<guid isPermaLink="false">http://austgate.co.uk/?p=342</guid>
		<description><![CDATA[I&#8217;ve been quiet for a bit though generally because I&#8217;ve been quite busy on projects and exploring ideas. After Book Hackday, I&#8217;ve written a post about beginning to develop the Open Correspondence toolkit for the Open Knowledge Foundation&#8217;s Notebook blog. I was also contacted regarding converting the TEI XML pages into JSON, which I am [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve been quiet for a bit though generally because I&#8217;ve been quite busy on projects and exploring ideas.</p>
<p>After Book Hackday, I&#8217;ve written a post about <a title="Open Correspondence toolkit" href="http://notebook.okfn.org/2011/05/25/mining-the-personal-using-open-correspondence-to-explore-correspondents/" target="_blank">beginning to develop the Open Correspondence toolkit</a> for the Open Knowledge Foundation&#8217;s Notebook blog. I was also contacted regarding converting the TEI XML pages into JSON, which I am currently working on.</p>
<p>Once I&#8217;ve done some more work on it, I&#8217;ll post the code and more about it.</p>
<p>I&#8217;ve been working on another project which may or may not be open. It is certainly interesting but I am not sure I can say much more than that. I hope to have a blog post up soon about it but I am rather excited by it and its possibilities.</p>
<p>Meanwhile, the work project continues apace with some surprising outcomes for me. Following watching a video on Facebook&#8217;s architecture, I&#8217;m beginning to see certain parts very differently. I really do hope more on this but I&#8217;ve got some building to do and a bit more delving and reading that needs completion.</p>
]]></content:encoded>
			<wfw:commentRss>http://austgate.co.uk/2011/05/weeknotes-open-correspondence-toolkit-and-converting-xml-into-json/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Weeknotes: Open Correspondence updates</title>
		<link>http://austgate.co.uk/2011/03/weeknotes-open-correspondence-updates/</link>
		<comments>http://austgate.co.uk/2011/03/weeknotes-open-correspondence-updates/#comments</comments>
		<pubDate>Tue, 08 Mar 2011 10:01:37 +0000</pubDate>
		<dc:creator>iain_emsley</dc:creator>
				<category><![CDATA[projects]]></category>
		<category><![CDATA[Text Mining]]></category>
		<category><![CDATA[weeknotes]]></category>
		<category><![CDATA[mapping]]></category>
		<category><![CDATA[open_correspondence]]></category>
		<category><![CDATA[timelines]]></category>

		<guid isPermaLink="false">http://austgate.co.uk/?p=298</guid>
		<description><![CDATA[I&#8217;ve bitten the bullet and done it. I&#8217;ve uploaded the current changes to the Open Correspondence site. The current changes are: additional fields in the RDF endpoint.  I still need to do some major work to JSON and XML which I hope to do for the next update. a basic text search a basic set [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve bitten the bullet and done it. I&#8217;ve uploaded the current changes to the Open Correspondence site.</p>
<p>The current changes are:</p>
<ul>
<li> additional fields in the RDF endpoint.  I still need to do some major  work to JSON and XML which I hope to do for the next update.</li>
</ul>
<ul>
<li>a basic text search</li>
</ul>
<ul>
<li>a basic set of geographic data in the collection</li>
</ul>
<ul>
<li> better linking from the letters to the correspondent and geographical  data (NB it is still incomplete)</li>
</ul>
<ul>
<li> some mapping with <a title="Open Layers Javascript mapping website" href="http://openlayers.org/" target="_blank">Open Layers</a> javascript.</li>
</ul>
<ul>
<li> a <a title="Simile timeline " href="http://www.simile-widgets.org/timeline/" target="_blank">Simile</a> timeline (which is a bit slow at the moment).</li>
</ul>
<p>Admittedly some of this is exposing work already there but hidden. However I&#8217;ve also been working on some unicode fixes to the underlying XML which is used by the project which has meant rebuilding the tables and the Xapian indexes.</p>
<p>Following a request on the Open Literature mailing list, I&#8217;m looking at the idea of using Python&#8217;s <a title="Python Natural Language Toolkit" href="http://www.nltk.org/" target="_blank">NLTK</a> to create some linguistic API wrappers around the Xapian search. It strikes me that these letters can be used to create a corpus of Dickens&#8217;s language where you can explore the language used in family correspondence (to his daughters and wife), to other authors (Wilkie Collins) and to readers. That is a longer project though in terms of building the relevant indexes.</p>
<p>I&#8217;m also looking at the idea of clustering a collection of letters to a correspondent and seeing what happens (for some reason, the current script is looking at Wilkie Collins). There is also a set of queries that one might run against letters discusing books and the publication dates to view the distribution. I&#8217;m working on these latter questions at the moment for intended release later this week but I do foresee it being delayed a while.</p>
]]></content:encoded>
			<wfw:commentRss>http://austgate.co.uk/2011/03/weeknotes-open-correspondence-updates/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Weeknotes: Conferences and Open Correspondence</title>
		<link>http://austgate.co.uk/2011/02/weeknotes-conferences-and-open-correspondence/</link>
		<comments>http://austgate.co.uk/2011/02/weeknotes-conferences-and-open-correspondence/#comments</comments>
		<pubDate>Sun, 20 Feb 2011 15:55:37 +0000</pubDate>
		<dc:creator>iain_emsley</dc:creator>
				<category><![CDATA[weeknotes]]></category>
		<category><![CDATA[conferences]]></category>
		<category><![CDATA[dev8d]]></category>
		<category><![CDATA[linked_data]]></category>
		<category><![CDATA[mobile_web]]></category>

		<guid isPermaLink="false">http://austgate.co.uk/?p=291</guid>
		<description><![CDATA[On Wednesday I went to the JISC dev8d conference. I wish I could have gone for both days but time doesn&#8217;t permit at the moment. In all, I had a trhough provoking day and managed to catch talks on the Mobile Web (which I wasn&#8217;t expecting) and Linked Data. Whilst I didn&#8217;t attend the programming [...]]]></description>
			<content:encoded><![CDATA[<p>On Wednesday I went to the JISC <a title="JISC funded dev8d conference" href="http://www.dev8d.org/" target="_blank">dev8d</a> conference. I wish I could have gone for both days but time doesn&#8217;t permit at the moment. In all, I had a trhough provoking day and managed to catch talks on the Mobile Web (which I wasn&#8217;t expecting) and Linked Data. Whilst I didn&#8217;t attend the programming workshops on languages such as <a title="Clojure website" href="http://clojure.org/" target="_blank">Clojure</a> or <a title="Erlang site" href="http://www.erlang.org/" target="_blank">Erlang</a> (at the moment I don&#8217;t have a need to use either), I was looking for matters that might be useful for my impending move to <a title="Janet website" href="http://ja.net" target="_blank">Janet</a>. (This is one of the reasons why I haven&#8217;t posted recently &#8211; I was either preparing or convinced I hadn&#8217;t got the job.)</p>
<p>I bumped into <a title="Eamonn Neylon on Twitter" href="http://twitter.com/eneylon" target="_blank">Eamonn Neylon</a> and we went along to the Mobile Web session with Mike Jones from Bristol and the <a title="Molly project: open source mobile portal" href="http://mollyproject.org/" target="_blank">Molly project</a>. They outlined the two main approaches (either via the various app markets or having a front end which caters for the different phones) and issues such as being sandboxed from the hardware layer at the moment. It would seems from them that you need to do both ideally, though development time doesn&#8217;t always allow. The session was slightly hijacked by the Python 2 versus 3 question and if Molly would ever use Python 3 but we gradually got back on track. The main barrier to entry would be the lack of standardisation so you need to target the platforms as well as the hardware issues.</p>
<p>I stayed for the Linked Data session which <a title="Chris Gutteridge's page at ECS" href="http://www.ecs.soton.ac.uk/people/cjg" target="_blank">Chris Gutteridge</a> took sort of control from the array of speakers. The main focus became notions of openness (as defined in the Open Knowledge Definition) and how it is perceived by the academic community bringing back up the ideas of attribution on the web. (An issue which is partly cultural.) The issue of clear licencing came up again as well but there does seems to be some clarification needed on the different models.</p>
<p>I did go to some of the lightning talks before lunch but they didn&#8217;t leave much of an impression this time (though Chris Gutteridge did plug his <a title="Q&amp;D RDF browser" href="http://graphite.ecs.soton.ac.uk/browser/" target="_blank">Q &amp;D RDF Browser</a> which I&#8217;m thinking of using). After an excellent lunch, I wandered into basecamp where I plugged in my laptop and worked on <a title="Open Correspondence" href="http://www.opencorrespondence.org" target="_blank">Open Correspondence</a> whilst waiting for the session on the Linked Data API. I did spend a couple of hours work on it to fix some bugs and little things for the next version to go live (though discovered another one in places with some missing but it is not huge, just needs a couple of hours). <a title="Rufus Pollock's site" href="http://rufuspollock.org" target="_blank">Rufus Pollock</a> and <a title="Jo Walsh's site" href="http://frot.org/" target="_blank">Jo Walsh</a> popped by so we managed to catch up and do some hacking. Rufus suggested using <a title="Python's flask " href="http://flask.pocoo.org/" target="_blank">Flask</a> which I think I&#8217;ll use for some smaller projects in the future (and for some reason <a title="Backbone.js Github" href="http://documentcloud.github.com/backbone/" target="_blank">Backbone</a> was mentioned but not sure how).</p>
<p>I went along to <a title="epimorphics site" href="http://www.epimorphics.com/web/" target="_blank">Chris Dollin</a>&#8216;s talk on his <a title="ELDA implementation of Linked Data" href="http://elda.googlecode.com/hg/deliver-elda/src/main/docs/index.html" target="_blank">eLDA</a> library and how the Linked Data API works. It seems like an eminently sensible solution to removing the complexity of semantic technologies from the user and to make it easier to use. It is certainly something which will be useful to get my head around completely.</p>
<p>The day, as suggested by <a title="Devcsi page" href="http://devcsi.ukoln.ac.uk/" target="_blank">Mahendra Mahey</a>, was definitely more useful when doing something and just cracking on with it. We need more days like this as it provides a collegiate atmosphere to try new things and take a look at different technologies which might not appear on the normal radar. The friendly atmosphere was great as well. I&#8217;ll book both days off if it comes around next year.</p>
]]></content:encoded>
			<wfw:commentRss>http://austgate.co.uk/2011/02/weeknotes-conferences-and-open-correspondence/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Weeknotes: Places in Open Correspondence</title>
		<link>http://austgate.co.uk/2011/02/weeknotes-places-in-open-correspondence/</link>
		<comments>http://austgate.co.uk/2011/02/weeknotes-places-in-open-correspondence/#comments</comments>
		<pubDate>Sun, 06 Feb 2011 13:25:55 +0000</pubDate>
		<dc:creator>iain_emsley</dc:creator>
				<category><![CDATA[projects]]></category>
		<category><![CDATA[weeknotes]]></category>
		<category><![CDATA[open_correspondence]]></category>
		<category><![CDATA[place_names]]></category>

		<guid isPermaLink="false">http://austgate.co.uk/?p=288</guid>
		<description><![CDATA[I&#8217;ve been doing some work to Open Correspondence over the last couple of weeks. I started re-parsing the letters to expose some more metadata, mainly placenames and to normalise them. I&#8217;ve finally done the first pass of this update which I&#8217;m hoping to make live soon once I&#8217;ve updated the controllers and re-checked the other [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve been doing some work to Open Correspondence over the last couple of weeks. I started re-parsing the letters to expose some more metadata, mainly placenames and to normalise them.</p>
<p>I&#8217;ve finally done the first pass of this update which I&#8217;m hoping to make live soon once I&#8217;ve updated the controllers and re-checked the other data improvements. Whilst it is not perfect, it is a lot better than it was. I think that the next week will be spent going over the endpoints and the Pylons controllers so that the data is cleaner than at present and correctly linked.</p>
<p>It has been a useful exercise in that I&#8217;ve started rewriting the parser for the letters (an ongoing large job I was thinking of doing when I come to the next set of letters) and putting some of the earlier thoughts into place.</p>
<p>Once I&#8217;m happy with these updates, I&#8217;ll update the site which does mean rebuilding the databases and endpoints. However once it is done, it should be a lot cleaner  and I can then start looking at the correspondents and linking into other data sources like dbpedia.org. I think that the first task though might be to restart work on the clients that I had been putting together  as a basic development kit.</p>
]]></content:encoded>
			<wfw:commentRss>http://austgate.co.uk/2011/02/weeknotes-places-in-open-correspondence/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Weeknotes: Arts funding, Open Correspondence</title>
		<link>http://austgate.co.uk/2011/01/weeknotes-arts-funding-open-correspondence/</link>
		<comments>http://austgate.co.uk/2011/01/weeknotes-arts-funding-open-correspondence/#comments</comments>
		<pubDate>Sun, 16 Jan 2011 20:44:33 +0000</pubDate>
		<dc:creator>iain_emsley</dc:creator>
				<category><![CDATA[projects]]></category>
		<category><![CDATA[Text Mining]]></category>
		<category><![CDATA[weeknotes]]></category>
		<category><![CDATA[arts_funding]]></category>
		<category><![CDATA[linked_data]]></category>
		<category><![CDATA[open_correspondence]]></category>
		<category><![CDATA[search]]></category>

		<guid isPermaLink="false">http://austgate.co.uk/?p=276</guid>
		<description><![CDATA[I&#8217;ve been doing some updating this week rather than anything new. I was going to spend time trying to complete the places section of the Open Correspondence website. It needs some tidying up as the endpoint has had some changes made to it. I did come across an issue which has implications in exposing other [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve been doing some updating this week rather than anything new. I was going to spend time trying to complete the <a title="Open Correspondence places index" href="http://www.opencorrespondence.org/place/" target="_blank">places section of the Open Correspondence</a> website. It needs some tidying up as the endpoint has had some changes made to it. I did come across an issue which has implications in exposing other pieces of metadata, such as people who are being referred to.</p>
<p>Firstly, I need to work out a more exact way of mapping the data in the database or flat file. I think what I really need is to use something like:</p>
<ul>
<li>place</li>
<li>address</li>
<li>city</li>
<li>latitude</li>
<li>longitude</li>
<li>description</li>
<li>url</li>
</ul>
<p>The data that I have is not quite as granular as this. Yet. When I&#8217;ve done this, I need to build the mapping so that if a place is entered, say <a title="Wikipedia page on Hotel Meurice Paris" href="http://en.wikipedia.org/wiki/H%C3%B4tel_Meurice" target="_blank">Hotel Meurice, Paris</a>, then I can return the details and latitude / longitude to render an Open layers map. That&#8217;s almost the easiest bit really.</p>
<p>The second issue is the difference in names. Over time and in the heat of writing, names can change subtly. For instance <a title="Wikipedia page on Gads Hill Place" href="http://en.wikipedia.org/wiki/Gads_Hill_Place" target="_blank">Gads Hill Place</a>, one of Dickens&#8217;s homes which is now a school. In the letters it is referred to as</p>
<ol>
<li>Gad&#8217;s Hill Place,</li>
<li>Gad&#8217;s Hill Place, Higham</li>
<li>Gad&#8217;s Hill</li>
</ol>
<p>It can also be known as Gadshill Place or Gads Hill Place. I need to find a way of differencing the terms. Firstly I need to develop a way of checking inside a term and then returning it if it is a new terms or returning the mapped version if it matches a term. Secondly I need to fuzzy match the strings so that any near differences (using the <a title="Levenshtein edit distance code" href="http://en.wikibooks.org/wiki/Algorithm_implementation/Strings/Levenshtein_distance#Python" target="_blank">Levenshtein edit distance</a>) can be checked and either ignored or exclude the term.</p>
<p>These issues will also affect the correspondent code which is being created. I suspect that anything with names will have the same issues. For  instance, Wilkie Collins is known in the letters as <a href="http://opencorrespondence.org/correspondent/view/Mr%20W%20Wilkie%20Collins">Mr W Wilkie Collins</a> and <a href="http://opencorrespondence.org/correspondent/view/Mr%20Wilkie%20Collins">Mr Wilkie Collins</a>. In the current implementation of the site, these are two different entities which is clearly wrong. They are the same entity but there is a subtle difference which is not accounted.</p>
<p>So to deal with this, I am going back to the parsing library and building these in instead. Whilst it is a slower way of dealing with these issues, it provides a chance of doing any necessary information and site re-thinking.</p>
<p>As part of this, I downloaded some <a title="TEI website" href="http://www.tei-c.org/index.xml" target="_blank">TEI </a>guidelines from the <a title="TEI Guidelines on California Digital Library" href="http://www.cdlib.org/groups/stwg/index.html" target="_blank">California Digital Library</a> to use to build the base metadata export. Ideally what I&#8217;m hoping to do is to create the data as a Python dictionary and then reformat into HTML, HTML &amp; RDFa, RDF, JSON or XML. It should allow me to export the same data for each type.</p>
<p>I&#8217;m sure at times I&#8217;ll wonder what I started but it needs doing if the site is to accept more authors. After that, back to search.</p>
<p>On a separate note, I have also done some work on the <a title="Arts funding search" href="http://austgate.co.uk/development/search_arts.php" target="_blank">Arts Funding search</a>. I&#8217;ve given it a re-skin and used the <a title="jQuery accordion widget" href="http://jqueryui.com/demos/accordion/" target="_blank">Accordion widget</a> from the JQuery UI. It also has some more search options built in so that the data can be searched by date and amount as well as political constituency and art form. The search needs to take in some arguments such as &lt; or &gt; or equals in the amount but that can come. I&#8217;ve been reading <a title="Jenni Tennison on Linked data on data.gov.uk" href="http://data.gov.uk/blog/guest-post-developers-guide-linked-data-apis-jeni-tennison" target="_blank">Jenni Tennison&#8217;s post</a> on the data.gov.uk blog to best expose the data using Linked Data.</p>
<p>Whilst writing this post, it occurs to me that whilst Linked Data is an awesome way of exposing data, useful search is still an important part of any content driven website. As blogged before, I have implemented an early version of a Xapian search. As Tim Bray has noted, advanced search might have a smaller use but it is more likely to be used by the heavier users so deserves to have time taken on it.</p>
]]></content:encoded>
			<wfw:commentRss>http://austgate.co.uk/2011/01/weeknotes-arts-funding-open-correspondence/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Weeknotes: Books and places for Open Correspondence</title>
		<link>http://austgate.co.uk/2010/11/weeknotes-books-and-places-for-open-correspondence/</link>
		<comments>http://austgate.co.uk/2010/11/weeknotes-books-and-places-for-open-correspondence/#comments</comments>
		<pubDate>Sun, 21 Nov 2010 12:54:36 +0000</pubDate>
		<dc:creator>iain_emsley</dc:creator>
				<category><![CDATA[projects]]></category>
		<category><![CDATA[Text Mining]]></category>
		<category><![CDATA[weeknotes]]></category>
		<category><![CDATA[open_correspondence]]></category>
		<category><![CDATA[places]]></category>

		<guid isPermaLink="false">http://austgate.co.uk/?p=246</guid>
		<description><![CDATA[Progress on the next version of Open  Correspondence has been a bit slower than I would have like. Sleep is, however, useful to being alert enough to write code. I&#8217;ve gone back to the some of the work that I was doing for the first version of the site way back last year. As part [...]]]></description>
			<content:encoded><![CDATA[<p>Progress on the next version of Open  Correspondence has been a bit slower than I would have like. Sleep is, however, useful to being alert enough to write code.</p>
<p>I&#8217;ve gone back to the some of the work that I was doing for the first version of the site way back last year. As part of the move to Linked Data, I&#8217;ve been working on a URI for places and books. Places, asn oted in previous posts, has come together and is just in need of some tidying up. I&#8217;ve managed to create an index page from the RDF endpoint using rdflib to parse the triples looking for the geo: namespace and then putting the items into a set to remove the duplicates. This needs changing as sets are unordered and I&#8217;d like the page to be ordered so that a pace can be found quickly. Perhaps a better option would be to place the raw data into a dictionary and cast to a list to sort at the last moment (or more simply sort the keys in the dictionary&#8230;) and then to remove the duplicates such as Gad&#8217;s Hill which is analogous to Gadshill. Both are used but refer to the same entity, so I need to do a difference pn the string (probably using difflib or a variant)  to identify the changes and clean up the URIs.</p>
<p>With the books, I had created a table of the publication dates and the titles, so all I need to do is to map the book&#8217;s variant titles, such as the &#8220;The Adventures of Nicholas Nickleby&#8221; is better known as &#8220;Nicholas Nickelby&#8221; or plain &#8220;Nickleby&#8221; in the letters. It might be easiest to put this into a dictionary at the moment rather than another table and to call that. I would also need to get some sort of introduction (and perhaps in the future create an Open Dickens site for the novels).</p>
<p>I&#8217;m sure I can do this in a few hours and to get it working. Must make the time now I&#8217;ve had a small break.</p>
]]></content:encoded>
			<wfw:commentRss>http://austgate.co.uk/2010/11/weeknotes-books-and-places-for-open-correspondence/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

