<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>The Aust Gate &#187; open_correspondence</title>
	<atom:link href="http://austgate.co.uk/tags/open_correspondence/feed/" rel="self" type="application/rss+xml" />
	<link>http://austgate.co.uk</link>
	<description>Open Knowledge and Literature</description>
	<lastBuildDate>Tue, 08 May 2012 20:33:34 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.2</generator>
		<item>
		<title>Weeknotes: Open Correspondence and TextCamp</title>
		<link>http://austgate.co.uk/2012/02/weeknotes-open-correspondence-and-textcamp/</link>
		<comments>http://austgate.co.uk/2012/02/weeknotes-open-correspondence-and-textcamp/#comments</comments>
		<pubDate>Sun, 19 Feb 2012 14:20:02 +0000</pubDate>
		<dc:creator>iain_emsley</dc:creator>
				<category><![CDATA[projects]]></category>
		<category><![CDATA[weeknotes]]></category>
		<category><![CDATA[open_correspondence]]></category>
		<category><![CDATA[textcamp]]></category>

		<guid isPermaLink="false">http://austgate.co.uk/?p=476</guid>
		<description><![CDATA[It has been a while since I&#8217;ve written a weeknote. Must get back into the habit. Open Correspondence Development on the Open Correspondence project has been slow to stalled for a while. I have been doing bits and pieces but sitting down with Mark McGillivray of Cottage Labs and the Open Knowledge Foundation, brought some [...]]]></description>
			<content:encoded><![CDATA[<p>It has been a while since I&#8217;ve written a weeknote. Must get back into the habit.</p>
<p><span style="text-decoration: underline;">Open Correspondence</span></p>
<p>Development on the <a title="Open Correspondence site" href="http://www.opencorrespondence.org/" target="_blank">Open Correspondence</a> project has been slow to stalled for a while. I have been doing bits and pieces but sitting down with Mark McGillivray of Cottage Labs and the Open Knowledge Foundation, brought some clarity. Recently the <a title="Textus project" href="http://wiki.okfn.org/Projects/Textus" target="_blank">Textus project</a> has been announced and I have been talking with the developers to put the data onto that platform. It seems to me that it is better to pool resources and to contribute where I can. There are parts of the existing project that I like and others that need more work to make me happy and it seems right now to move onto the developing platform.</p>
<p><span style="text-decoration: underline;">Textcamp</span></p>
<p>At Textcamp last September, one of the sessions covered DIY Bookscanners (<a title="Textcamp post" href="http://austgate.co.uk/2011/08/thinking-about-texts-and-communities-at-textcamp/" target="_blank">Austgate post on Textcamp</a>). One of the actions on the Textus wiki was OCRing text. I have posted previously about <a title="Austgate and Tesseract" href="http://austgate.co.uk/2011/11/using-tesseract-with-python-for-ocr/" target="_blank">playing with Tesseract</a> and seeing this, I emailed the humanities-dev list to explore the possibilities. To this end, I have volunteered to work on the area and will write a blog post about it There is already a large amount of work that exists, so  I am perhaps not developing anything new. However it would, I think, be interesting to develop a stand-alone system that is flexible and downloadable. Like other OKF projects, it will be a Python project but also be a hardware project to try and extend some of the existing projects.</p>
<p><span style="text-decoration: underline;">Other Bits</span></p>
<p>I&#8217;ve been working on an indexing project which appears to be coming together quite nicely. Hopefully I&#8217;ll be able to say some more shortly but it depends on a conversation that has yet to be had.</p>
<p>Next week, after a break, is a return to work and to data. The Dev8d conference provided me with some ideas and clarity on one or two things, so time to put them into practice.</p>
]]></content:encoded>
			<wfw:commentRss>http://austgate.co.uk/2012/02/weeknotes-open-correspondence-and-textcamp/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Marking up Open Correspondence with TEI XML</title>
		<link>http://austgate.co.uk/2011/03/marking-up-open-correspondence-with-tei-xml/</link>
		<comments>http://austgate.co.uk/2011/03/marking-up-open-correspondence-with-tei-xml/#comments</comments>
		<pubDate>Sun, 20 Mar 2011 11:03:26 +0000</pubDate>
		<dc:creator>iain_emsley</dc:creator>
				<category><![CDATA[Open Knowledge]]></category>
		<category><![CDATA[projects]]></category>
		<category><![CDATA[Text Mining]]></category>
		<category><![CDATA[open_correspondence]]></category>
		<category><![CDATA[tei]]></category>
		<category><![CDATA[xml]]></category>

		<guid isPermaLink="false">http://austgate.co.uk/?p=303</guid>
		<description><![CDATA[As part of the next version of Open Correspondence, I&#8217;ve been working on the XML and JSON mark-up. As part of the XML, I&#8217;ve been using the TEI mark-up for the letters. I once hard this described as &#8220;XML for people who don&#8217;t think XML is flexible enough&#8221;. Now I can see why. It is [...]]]></description>
			<content:encoded><![CDATA[<p>As part of the next version of <a title="Open Correspondence site" href="http://www.opencorrespondence.org" target="_blank">Open Correspondence</a>, I&#8217;ve been working on the XML and JSON mark-up.</p>
<p>As part of the XML, I&#8217;ve been using the <a title="TEI P5 XML mark-up" href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/DS.html" target="_blank">TEI mark-up</a> for the letters. I once hard this described as &#8220;XML for people who don&#8217;t think XML is flexible enough&#8221;. Now I can see why. It is a highly flexible solution to digitising texts but can be confusing, especially when switching between versions. I believe the original model that I had been working on was P4 but the current one is P5 so I had to negotiate that change and to make sure that I had the correct elements in the blocks. Even then, there can be two or three different versions of the same element in the section and I do have to wonder about that wisdom rather than simplifying the elements so that there are the extensible elements that may or may not be used. I&#8217;m intending to use the schema again and to really get my head around it rather than tinkering on the edges.</p>
<p>I&#8217;ve attempted this conversion before but think that I&#8217;ve finally got it to a point which is nearly there. What I would really like to do is to put together some sort of tool kit as a core to the Open Correspondence project. Clearly this would be a long-term project and would need more research but it might be useful to other projects.</p>
<p>As well as marking up texts, it would be useful to use the XML mark-up to convert the text into other formats such as Mobipocket or the Kindle formats to allow a user to create their own e-publication. It would also be useful to find a way of using the XML in conjunction with the <a title="psbook command pages" href="http://www.tardis.ed.ac.uk/~ajcd/psutils/psbook.html" target="_blank">psbook</a> command to create a print version of a letter or collection. This does mean that I need to convert the XML into a PostScript file (which raises a host of questions at the moment &#8211; such as converting structured format into layout format) and then print it.</p>
<p>I&#8217;ve also been playing around with the correspondent collections and the way of marking up collections in TEI. I had thought of this as working on creating printable collections and making the data re-usable for printing. Equally it might allow the data to be used in answer to Jonathan Gray&#8217;s question regarding identifying the letters written to a particular correspondent.</p>
<p>When I can get the XML working and validated, then I&#8217;ll look at the JSON output. It would draw a line under this part of the project and allow me to move on. I&#8217;m aiming for a release towards the end of March or middle of April in keeping with trying to keep into a six week schedule.</p>
<p>The next thing after that is to begin answering Jonathan&#8217;s questions in terms of a tool kit to identify weaknesses and to try and write some code to re-use and re-mix the data. I would hope that would be in the next release towards the end of May.</p>
]]></content:encoded>
			<wfw:commentRss>http://austgate.co.uk/2011/03/marking-up-open-correspondence-with-tei-xml/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Finding and mapping influences</title>
		<link>http://austgate.co.uk/2011/03/finding-and-mapping-influences/</link>
		<comments>http://austgate.co.uk/2011/03/finding-and-mapping-influences/#comments</comments>
		<pubDate>Wed, 16 Mar 2011 18:49:12 +0000</pubDate>
		<dc:creator>iain_emsley</dc:creator>
				<category><![CDATA[Text Mining]]></category>
		<category><![CDATA[letters]]></category>
		<category><![CDATA[open_correspondence]]></category>
		<category><![CDATA[rdf]]></category>

		<guid isPermaLink="false">http://austgate.co.uk/?p=307</guid>
		<description><![CDATA[The awesome Jonathan Gray posted an intriguing question on his blog about mapping influence in intellectual history. What he is trying to do is to map the possible routes of influence between people. In his case, it is philosophers; in mine, authors. One of the driving ideas behind the Open Correspondence RDF was to begin [...]]]></description>
			<content:encoded><![CDATA[<p>The awesome Jonathan Gray posted an intriguing question on his blog about <a title="Jonathan Gray on mapping intellectual history and influence" href="http://jonathangray.org/2011/02/20/who-read-what-mapping-influence-in-intellectual-history/" target="_blank">mapping influence in intellectual history</a>. What he is trying to do is to map the possible routes of influence between people. In his case, it is philosophers; in mine, authors.</p>
<p>One of the driving ideas behind the <a title="Open Correspondence RDF schema" href="http://www.opencorrespondence.org/schema" target="_blank">Open Correspondence RDF</a> was to begin identifying the people to whom Dickens wrote about books. Out of this I would like to create some visualisations of the data. You could possibly do this for the places, for example track his letters for one of the US tours.</p>
<p>But back to the original question. I believe this can be done (as I&#8217;ve been working on the XML issues) using Python&#8217;s rdflib. The major issue would be to get this working across version 2.4 and 3 so that any released code would be cross-platform.</p>
<p>Jonathan: as an open call, I&#8217;d love to work with you on this.</p>
]]></content:encoded>
			<wfw:commentRss>http://austgate.co.uk/2011/03/finding-and-mapping-influences/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Adding linguistic interfaces to Open Correspondence</title>
		<link>http://austgate.co.uk/2011/03/adding-linguistic-interfaces-to-open-correspondence/</link>
		<comments>http://austgate.co.uk/2011/03/adding-linguistic-interfaces-to-open-correspondence/#comments</comments>
		<pubDate>Wed, 09 Mar 2011 11:18:59 +0000</pubDate>
		<dc:creator>iain_emsley</dc:creator>
				<category><![CDATA[Open Knowledge]]></category>
		<category><![CDATA[projects]]></category>
		<category><![CDATA[Text Mining]]></category>
		<category><![CDATA[linguistics]]></category>
		<category><![CDATA[open_correspondence]]></category>

		<guid isPermaLink="false">http://austgate.co.uk/?p=301</guid>
		<description><![CDATA[I&#8217;ve been playing around with the Python NLTK package, in particular the WordNet interface. WordNet is hosted by Princeton University. I mentioned that I was going to look at this and the idea of allow a search for lemmas of a word. It came about from a question posed on Open Literature mailing list regarding [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve been playing around with the Python <a title="Python NLTK package website" href="http://www.nltk.org/" target="_blank">NLTK</a> package, in particular the <a title="NLTK WordNet interface" href="http://nltk.googlecode.com/svn/trunk/doc/howto/wordnet.html" target="_blank">WordNet interface</a>. <a title="WordNet lexical database" href="http://wordnet.princeton.edu/" target="_blank">WordNet</a> is hosted by Princeton University. I mentioned that I was going to look at this and the idea of allow a search for lemmas of a word. It came about from a question posed on Open Literature mailing list regarding whether you could search it with Lemmas.</p>
<p>Xapian does word stemming but not lemmas which are slightly different. In stemming, the word production should appear as produc* since produc is the base of the word. However that is nonsense. The base of the word is produce which is what the Wordnet Lemma returns.</p>
<p>Using the API notes, I&#8217;ve been playing around with the following:</p>
<blockquote><p>from nltk.corpus import wordnet as wn</p>
<p>word_lem = set()<br />
ret_lem = []<br />
for i in wn.synsets(author):<br />
[word_lem.add(lemma.name) for lemma in i.lemmas]</p>
<p>ret_lem = list(word_lem)</p></blockquote>
<p>Having used  set to remove any duplicates, I can return the list of the lemmas that WordNet gives. Since you have to use a <a title="Wikipedia on Synsets" href="http://en.wikipedia.org/wiki/Synsets" target="_blank">Synset </a>if you don&#8217;t have the exact part of speech that a word is (Verb, Adverb, Adjective or Noun) since the lemma constructor requires that to produce the lemma. That&#8217;s fine  and I can return the names using lemma.name but the part of speech is in the synset and I&#8217;m not sure how to retrieve it but it would be useful to send back so that a user can see the part of speech and determine whether it is of interest or not.</p>
<p>In the first instance though, I can return the related synsets to the user through an API, yet to be written, and link them to the Xapian search so that they can search for the term if of interest. It begins the opening up of the letters as a linguistic dataset since the tone and language might vary across the letters depending on the correspondent. One would expect letters to his family to be less formal than to a business colleague or fellow author. I&#8217;m aiming to have an early draft up shortly with some improved XML and JSON handling for the individual letters.</p>
<p>Given that I really did not do that well in the linguistics module at the University of Leicester, I&#8217;m surprised that this has been the first API module being developed. It makes sense though but I need to find a way of getting back to the original purpose of the site.</p>
]]></content:encoded>
			<wfw:commentRss>http://austgate.co.uk/2011/03/adding-linguistic-interfaces-to-open-correspondence/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Weeknotes: Open Correspondence updates</title>
		<link>http://austgate.co.uk/2011/03/weeknotes-open-correspondence-updates/</link>
		<comments>http://austgate.co.uk/2011/03/weeknotes-open-correspondence-updates/#comments</comments>
		<pubDate>Tue, 08 Mar 2011 10:01:37 +0000</pubDate>
		<dc:creator>iain_emsley</dc:creator>
				<category><![CDATA[projects]]></category>
		<category><![CDATA[Text Mining]]></category>
		<category><![CDATA[weeknotes]]></category>
		<category><![CDATA[mapping]]></category>
		<category><![CDATA[open_correspondence]]></category>
		<category><![CDATA[timelines]]></category>

		<guid isPermaLink="false">http://austgate.co.uk/?p=298</guid>
		<description><![CDATA[I&#8217;ve bitten the bullet and done it. I&#8217;ve uploaded the current changes to the Open Correspondence site. The current changes are: additional fields in the RDF endpoint.  I still need to do some major work to JSON and XML which I hope to do for the next update. a basic text search a basic set [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve bitten the bullet and done it. I&#8217;ve uploaded the current changes to the Open Correspondence site.</p>
<p>The current changes are:</p>
<ul>
<li> additional fields in the RDF endpoint.  I still need to do some major  work to JSON and XML which I hope to do for the next update.</li>
</ul>
<ul>
<li>a basic text search</li>
</ul>
<ul>
<li>a basic set of geographic data in the collection</li>
</ul>
<ul>
<li> better linking from the letters to the correspondent and geographical  data (NB it is still incomplete)</li>
</ul>
<ul>
<li> some mapping with <a title="Open Layers Javascript mapping website" href="http://openlayers.org/" target="_blank">Open Layers</a> javascript.</li>
</ul>
<ul>
<li> a <a title="Simile timeline " href="http://www.simile-widgets.org/timeline/" target="_blank">Simile</a> timeline (which is a bit slow at the moment).</li>
</ul>
<p>Admittedly some of this is exposing work already there but hidden. However I&#8217;ve also been working on some unicode fixes to the underlying XML which is used by the project which has meant rebuilding the tables and the Xapian indexes.</p>
<p>Following a request on the Open Literature mailing list, I&#8217;m looking at the idea of using Python&#8217;s <a title="Python Natural Language Toolkit" href="http://www.nltk.org/" target="_blank">NLTK</a> to create some linguistic API wrappers around the Xapian search. It strikes me that these letters can be used to create a corpus of Dickens&#8217;s language where you can explore the language used in family correspondence (to his daughters and wife), to other authors (Wilkie Collins) and to readers. That is a longer project though in terms of building the relevant indexes.</p>
<p>I&#8217;m also looking at the idea of clustering a collection of letters to a correspondent and seeing what happens (for some reason, the current script is looking at Wilkie Collins). There is also a set of queries that one might run against letters discusing books and the publication dates to view the distribution. I&#8217;m working on these latter questions at the moment for intended release later this week but I do foresee it being delayed a while.</p>
]]></content:encoded>
			<wfw:commentRss>http://austgate.co.uk/2011/03/weeknotes-open-correspondence-updates/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Weeknotes: Places in Open Correspondence</title>
		<link>http://austgate.co.uk/2011/02/weeknotes-places-in-open-correspondence/</link>
		<comments>http://austgate.co.uk/2011/02/weeknotes-places-in-open-correspondence/#comments</comments>
		<pubDate>Sun, 06 Feb 2011 13:25:55 +0000</pubDate>
		<dc:creator>iain_emsley</dc:creator>
				<category><![CDATA[projects]]></category>
		<category><![CDATA[weeknotes]]></category>
		<category><![CDATA[open_correspondence]]></category>
		<category><![CDATA[place_names]]></category>

		<guid isPermaLink="false">http://austgate.co.uk/?p=288</guid>
		<description><![CDATA[I&#8217;ve been doing some work to Open Correspondence over the last couple of weeks. I started re-parsing the letters to expose some more metadata, mainly placenames and to normalise them. I&#8217;ve finally done the first pass of this update which I&#8217;m hoping to make live soon once I&#8217;ve updated the controllers and re-checked the other [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve been doing some work to Open Correspondence over the last couple of weeks. I started re-parsing the letters to expose some more metadata, mainly placenames and to normalise them.</p>
<p>I&#8217;ve finally done the first pass of this update which I&#8217;m hoping to make live soon once I&#8217;ve updated the controllers and re-checked the other data improvements. Whilst it is not perfect, it is a lot better than it was. I think that the next week will be spent going over the endpoints and the Pylons controllers so that the data is cleaner than at present and correctly linked.</p>
<p>It has been a useful exercise in that I&#8217;ve started rewriting the parser for the letters (an ongoing large job I was thinking of doing when I come to the next set of letters) and putting some of the earlier thoughts into place.</p>
<p>Once I&#8217;m happy with these updates, I&#8217;ll update the site which does mean rebuilding the databases and endpoints. However once it is done, it should be a lot cleaner  and I can then start looking at the correspondents and linking into other data sources like dbpedia.org. I think that the first task though might be to restart work on the clients that I had been putting together  as a basic development kit.</p>
]]></content:encoded>
			<wfw:commentRss>http://austgate.co.uk/2011/02/weeknotes-places-in-open-correspondence/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Weeknotes: Arts funding, Open Correspondence</title>
		<link>http://austgate.co.uk/2011/01/weeknotes-arts-funding-open-correspondence/</link>
		<comments>http://austgate.co.uk/2011/01/weeknotes-arts-funding-open-correspondence/#comments</comments>
		<pubDate>Sun, 16 Jan 2011 20:44:33 +0000</pubDate>
		<dc:creator>iain_emsley</dc:creator>
				<category><![CDATA[projects]]></category>
		<category><![CDATA[Text Mining]]></category>
		<category><![CDATA[weeknotes]]></category>
		<category><![CDATA[arts_funding]]></category>
		<category><![CDATA[linked_data]]></category>
		<category><![CDATA[open_correspondence]]></category>
		<category><![CDATA[search]]></category>

		<guid isPermaLink="false">http://austgate.co.uk/?p=276</guid>
		<description><![CDATA[I&#8217;ve been doing some updating this week rather than anything new. I was going to spend time trying to complete the places section of the Open Correspondence website. It needs some tidying up as the endpoint has had some changes made to it. I did come across an issue which has implications in exposing other [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve been doing some updating this week rather than anything new. I was going to spend time trying to complete the <a title="Open Correspondence places index" href="http://www.opencorrespondence.org/place/" target="_blank">places section of the Open Correspondence</a> website. It needs some tidying up as the endpoint has had some changes made to it. I did come across an issue which has implications in exposing other pieces of metadata, such as people who are being referred to.</p>
<p>Firstly, I need to work out a more exact way of mapping the data in the database or flat file. I think what I really need is to use something like:</p>
<ul>
<li>place</li>
<li>address</li>
<li>city</li>
<li>latitude</li>
<li>longitude</li>
<li>description</li>
<li>url</li>
</ul>
<p>The data that I have is not quite as granular as this. Yet. When I&#8217;ve done this, I need to build the mapping so that if a place is entered, say <a title="Wikipedia page on Hotel Meurice Paris" href="http://en.wikipedia.org/wiki/H%C3%B4tel_Meurice" target="_blank">Hotel Meurice, Paris</a>, then I can return the details and latitude / longitude to render an Open layers map. That&#8217;s almost the easiest bit really.</p>
<p>The second issue is the difference in names. Over time and in the heat of writing, names can change subtly. For instance <a title="Wikipedia page on Gads Hill Place" href="http://en.wikipedia.org/wiki/Gads_Hill_Place" target="_blank">Gads Hill Place</a>, one of Dickens&#8217;s homes which is now a school. In the letters it is referred to as</p>
<ol>
<li>Gad&#8217;s Hill Place,</li>
<li>Gad&#8217;s Hill Place, Higham</li>
<li>Gad&#8217;s Hill</li>
</ol>
<p>It can also be known as Gadshill Place or Gads Hill Place. I need to find a way of differencing the terms. Firstly I need to develop a way of checking inside a term and then returning it if it is a new terms or returning the mapped version if it matches a term. Secondly I need to fuzzy match the strings so that any near differences (using the <a title="Levenshtein edit distance code" href="http://en.wikibooks.org/wiki/Algorithm_implementation/Strings/Levenshtein_distance#Python" target="_blank">Levenshtein edit distance</a>) can be checked and either ignored or exclude the term.</p>
<p>These issues will also affect the correspondent code which is being created. I suspect that anything with names will have the same issues. For  instance, Wilkie Collins is known in the letters as <a href="http://opencorrespondence.org/correspondent/view/Mr%20W%20Wilkie%20Collins">Mr W Wilkie Collins</a> and <a href="http://opencorrespondence.org/correspondent/view/Mr%20Wilkie%20Collins">Mr Wilkie Collins</a>. In the current implementation of the site, these are two different entities which is clearly wrong. They are the same entity but there is a subtle difference which is not accounted.</p>
<p>So to deal with this, I am going back to the parsing library and building these in instead. Whilst it is a slower way of dealing with these issues, it provides a chance of doing any necessary information and site re-thinking.</p>
<p>As part of this, I downloaded some <a title="TEI website" href="http://www.tei-c.org/index.xml" target="_blank">TEI </a>guidelines from the <a title="TEI Guidelines on California Digital Library" href="http://www.cdlib.org/groups/stwg/index.html" target="_blank">California Digital Library</a> to use to build the base metadata export. Ideally what I&#8217;m hoping to do is to create the data as a Python dictionary and then reformat into HTML, HTML &amp; RDFa, RDF, JSON or XML. It should allow me to export the same data for each type.</p>
<p>I&#8217;m sure at times I&#8217;ll wonder what I started but it needs doing if the site is to accept more authors. After that, back to search.</p>
<p>On a separate note, I have also done some work on the <a title="Arts funding search" href="http://austgate.co.uk/development/search_arts.php" target="_blank">Arts Funding search</a>. I&#8217;ve given it a re-skin and used the <a title="jQuery accordion widget" href="http://jqueryui.com/demos/accordion/" target="_blank">Accordion widget</a> from the JQuery UI. It also has some more search options built in so that the data can be searched by date and amount as well as political constituency and art form. The search needs to take in some arguments such as &lt; or &gt; or equals in the amount but that can come. I&#8217;ve been reading <a title="Jenni Tennison on Linked data on data.gov.uk" href="http://data.gov.uk/blog/guest-post-developers-guide-linked-data-apis-jeni-tennison" target="_blank">Jenni Tennison&#8217;s post</a> on the data.gov.uk blog to best expose the data using Linked Data.</p>
<p>Whilst writing this post, it occurs to me that whilst Linked Data is an awesome way of exposing data, useful search is still an important part of any content driven website. As blogged before, I have implemented an early version of a Xapian search. As Tim Bray has noted, advanced search might have a smaller use but it is more likely to be used by the heavier users so deserves to have time taken on it.</p>
]]></content:encoded>
			<wfw:commentRss>http://austgate.co.uk/2011/01/weeknotes-arts-funding-open-correspondence/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Contextualising places in time</title>
		<link>http://austgate.co.uk/2010/11/contextualising-places-in-time/</link>
		<comments>http://austgate.co.uk/2010/11/contextualising-places-in-time/#comments</comments>
		<pubDate>Sun, 21 Nov 2010 16:46:15 +0000</pubDate>
		<dc:creator>iain_emsley</dc:creator>
				<category><![CDATA[projects]]></category>
		<category><![CDATA[Text Mining]]></category>
		<category><![CDATA[linked_data]]></category>
		<category><![CDATA[mapping]]></category>
		<category><![CDATA[open_correspondence]]></category>
		<category><![CDATA[place_names]]></category>

		<guid isPermaLink="false">http://austgate.co.uk/?p=241</guid>
		<description><![CDATA[As part of the Open Correspondence project, I&#8217;ve started to look at place names and locations to build a set of temporal and spatial data for the letters to allow for geographical queries. As part of the search, I came across a reference to Sean Gillies&#8217; useful blog post talking about modelling historical place names [...]]]></description>
			<content:encoded><![CDATA[<p>As part of the Open Correspondence project, I&#8217;ve started to look at place names and locations to build a set of temporal and spatial data for the letters to allow for geographical queries.</p>
<p>As part of the search, I came across a reference to Sean Gillies&#8217; useful blog post talking about <a title="Sean Gillies on historical placenames" href="http://sgillies.net/blog/1032/modeling-historical-places-for-pleiades/" target="_blank">modelling historical place names</a> for the Pleiades project. What intrigues me about the places is that they don&#8217;t exist in amber. They change and adapt.</p>
<p>Playing around with Open Layers (and inspired by Jo Walsh&#8217;s <a title="Jo Walsh's Mapping Hacks on historical maps" href="http://mappinghacks.com/2010/03/21/a-re-education-in-openstreetmap/" target="_blank">piece on historical maps on Mapping Hacks</a>), I&#8217;ve become interested in the idea of placing a historical map on top of a current street map so that you can see what a place looks like now and also when, for example, Dickens lived in <a title="Wikipedia on Tavistock Square" href="http://en.wikipedia.org/wiki/Tavistock_Square" target="_blank">Tavistock Square</a> or <a title="Wikipedia on Gad's Hill Place" href="http://en.wikipedia.org/wiki/Gads_Hill_Place" target="_blank">Gad&#8217;s Hill Place</a>. How has it changed? What did it look like then? What does it look like now? Does it even exist?</p>
<p>Whilst that may not aid textual analysis, it could be tied into historical and social queries about the letters. By adding this layer of data, which one might not normally think about in terms of leters data, we can find out other things of interest.</p>
<p>I think for now, I&#8217;ll try not to go too far down this road only so that I can get the other bits of data fixed first.</p>
]]></content:encoded>
			<wfw:commentRss>http://austgate.co.uk/2010/11/contextualising-places-in-time/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Weeknotes: Books and places for Open Correspondence</title>
		<link>http://austgate.co.uk/2010/11/weeknotes-books-and-places-for-open-correspondence/</link>
		<comments>http://austgate.co.uk/2010/11/weeknotes-books-and-places-for-open-correspondence/#comments</comments>
		<pubDate>Sun, 21 Nov 2010 12:54:36 +0000</pubDate>
		<dc:creator>iain_emsley</dc:creator>
				<category><![CDATA[projects]]></category>
		<category><![CDATA[Text Mining]]></category>
		<category><![CDATA[weeknotes]]></category>
		<category><![CDATA[open_correspondence]]></category>
		<category><![CDATA[places]]></category>

		<guid isPermaLink="false">http://austgate.co.uk/?p=246</guid>
		<description><![CDATA[Progress on the next version of Open  Correspondence has been a bit slower than I would have like. Sleep is, however, useful to being alert enough to write code. I&#8217;ve gone back to the some of the work that I was doing for the first version of the site way back last year. As part [...]]]></description>
			<content:encoded><![CDATA[<p>Progress on the next version of Open  Correspondence has been a bit slower than I would have like. Sleep is, however, useful to being alert enough to write code.</p>
<p>I&#8217;ve gone back to the some of the work that I was doing for the first version of the site way back last year. As part of the move to Linked Data, I&#8217;ve been working on a URI for places and books. Places, asn oted in previous posts, has come together and is just in need of some tidying up. I&#8217;ve managed to create an index page from the RDF endpoint using rdflib to parse the triples looking for the geo: namespace and then putting the items into a set to remove the duplicates. This needs changing as sets are unordered and I&#8217;d like the page to be ordered so that a pace can be found quickly. Perhaps a better option would be to place the raw data into a dictionary and cast to a list to sort at the last moment (or more simply sort the keys in the dictionary&#8230;) and then to remove the duplicates such as Gad&#8217;s Hill which is analogous to Gadshill. Both are used but refer to the same entity, so I need to do a difference pn the string (probably using difflib or a variant)  to identify the changes and clean up the URIs.</p>
<p>With the books, I had created a table of the publication dates and the titles, so all I need to do is to map the book&#8217;s variant titles, such as the &#8220;The Adventures of Nicholas Nickleby&#8221; is better known as &#8220;Nicholas Nickelby&#8221; or plain &#8220;Nickleby&#8221; in the letters. It might be easiest to put this into a dictionary at the moment rather than another table and to call that. I would also need to get some sort of introduction (and perhaps in the future create an Open Dickens site for the novels).</p>
<p>I&#8217;m sure I can do this in a few hours and to get it working. Must make the time now I&#8217;ve had a small break.</p>
]]></content:encoded>
			<wfw:commentRss>http://austgate.co.uk/2010/11/weeknotes-books-and-places-for-open-correspondence/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Weeknotes: Open Correspondence, Xapian and Linked Data</title>
		<link>http://austgate.co.uk/2010/11/weeknotes-open-correspondence-xapian-and-linked-data/</link>
		<comments>http://austgate.co.uk/2010/11/weeknotes-open-correspondence-xapian-and-linked-data/#comments</comments>
		<pubDate>Sun, 07 Nov 2010 10:58:20 +0000</pubDate>
		<dc:creator>iain_emsley</dc:creator>
				<category><![CDATA[Information Retrieval]]></category>
		<category><![CDATA[projects]]></category>
		<category><![CDATA[weeknotes]]></category>
		<category><![CDATA[charles dickens]]></category>
		<category><![CDATA[open_correspondence]]></category>
		<category><![CDATA[xapian]]></category>

		<guid isPermaLink="false">http://austgate.co.uk/?p=233</guid>
		<description><![CDATA[After last week&#8217;s server move, we discovered one or two things that needed to be changed before they could go live. The main thing was the Xapian search which I had been working on. The initial version kept the Xapian server on the local machine and used that to index and search the letters butt [...]]]></description>
			<content:encoded><![CDATA[<p>After last week&#8217;s server move, we discovered one or two things that needed to be changed before they could go live. The main thing was the Xapian search which I had been working on. The initial version kept the Xapian server on the local machine and used that to index and search the letters butt he new version is distributed across machines so it required a brief change.</p>
<p>Opening a &#8220;one box wonder&#8221; Xapian search in Python is done via:</p>
<blockquote><p>xapian.WritableDatabase(db_path, xapian.DB_CREATE_OR_OPEN)</p></blockquote>
<p>where db_path is the database name that you want to give the index and open the index using:</p>
<blockquote><p>xapian.Database(db_path)</p></blockquote>
<p>Since the project uses Pylons, the controller used a path out to the .ini file loaded at runtime to link to the correct database.</p>
<p>Using the documentation on the <a title="Xapian Documentation on remote backends" href="http://xapian.org/docs/remote.html" target="_blank">Xapian site for remote backends</a> and the<a title="Xapian Python bindings documentation" href="http://xapian.org/docs/bindings/python/" target="_blank"> Python bindings</a>, I was able to quickly adjust the code so that xapian.WritableDatabase is replaced by:</p>
<blockquote><p>xapian.remote_open_writable(&#8220;&lt;host name&gt;&#8221;, &#8220;&lt;port number&gt;&#8221;)</p></blockquote>
<p>and is opened by:</p>
<blockquote><p>xapian.remote_open(&#8220;&lt;host name&gt;&#8221;, &#8220;&lt;port number&gt;&#8221;)</p></blockquote>
<p>Once that is set up, then all you need to do is to start the the TCP server which is what I&#8217;ve been looking at. I downloaded the tar.gz file of Xapian-core from the Xapian site, configured and made on Ubuntu Lucid Lynx and then ran xapian-tcpsrv &#8211;port &lt;port number&gt; &lt;database name&gt; in a new terminal window which allowed me to test the connections and get them ready for going live.</p>
<p>Changes are afoot on the Open Correspondence site as well. As part of a conversation that involved Keith Alexander, of <a title="Talis Platform" href="http://www.talis.com/platform" target="_blank">Talis</a>, the project is going to evolve into a slightly more Linked Data direction with references to the books, magazines, correspondents and so on. I&#8217;d already started going in this direction with the correspondent links (such as <a title="Georgina Hogarth correspondent link on Open Correspondence" href="http://www.opencorrespondence.org/letters/correspondent/Miss%20Hogarth" target="_blank">http://www.opencorrespondence.org/letters/correspondent/Miss%20Hogarth</a>) so this is really an extension of where we need to go to connect to other resources such  as Dbpedia, Wikipedia and so on. The fact that it is <a title="Dickens 2012 website" href="http://www.dickens2012.org" target="_blank">Dickens&#8217;s bi-centenary in 2012</a> gives an added boost to the project. The Linked Data approach gives us the chance of creating some sort of framework for future expansion and linking together of data sources, not only at a literary level but also socially. It also encourages me to sort out the content negotiation work that was started and to try and follow the FAQs that the <a title="Pedantic Web group site" href="http://pedantic-web.org/" target="_blank">Pedantic Web</a> group have posted to make sure that the site follows the best standards that it can and to build them into future developments and directions.</p>
]]></content:encoded>
			<wfw:commentRss>http://austgate.co.uk/2010/11/weeknotes-open-correspondence-xapian-and-linked-data/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

