<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>The Aust Gate &#187; simile</title>
	<atom:link href="http://austgate.co.uk/tags/simile/feed/" rel="self" type="application/rss+xml" />
	<link>http://austgate.co.uk</link>
	<description>Open Knowledge and Literature</description>
	<lastBuildDate>Sun, 25 Jul 2010 15:19:13 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0</generator>
		<item>
		<title>Mining the Letters of Charles Dickens</title>
		<link>http://austgate.co.uk/2009/07/mining-the-letters-of-charles-dickens/</link>
		<comments>http://austgate.co.uk/2009/07/mining-the-letters-of-charles-dickens/#comments</comments>
		<pubDate>Tue, 14 Jul 2009 07:41:13 +0000</pubDate>
		<dc:creator>iain_emsley</dc:creator>
				<category><![CDATA[Information Retrieval]]></category>
		<category><![CDATA[Text Mining]]></category>
		<category><![CDATA[charles dickens]]></category>
		<category><![CDATA[simile]]></category>

		<guid isPermaLink="false">http://austgate.co.uk/?p=81</guid>
		<description><![CDATA[As an aside I&#8217;ve started  a small project to begin visualising ways of searching the letters of Charles Dickens and exploring the Simile library which MIT have produced. Its originally an extension to the D-Space repository tool but Rufus Pollock used in the Open Knowledge Foundation&#8217;s Weaving History project &#8211; to which I contributed the [...]]]></description>
			<content:encoded><![CDATA[<p>As an aside I&#8217;ve started  a small project to begin visualising ways of searching the letters of Charles Dickens and exploring the <a title="Simile project page at MIT" href="http://simile.mit.edu/" target="_blank">Simile</a> library which MIT have produced.</p>
<p>Its originally an extension to the D-Space repository tool but Rufus Pollock used in the Open Knowledge Foundation&#8217;s <a title="Microfacts website" href="http://www.microfacts.org" target="_blank">Weaving History</a> project &#8211; to which I contributed the <a title="Milton threads on Microfacts" href="http://www.microfacts.org/thread/read/831cf372-1d28-4c98-ab55-c19899fa3840" target="_blank">Milton</a> json data file. Originally I&#8217;d used it just for biographical timelines but thinking about it, I wondered how you could use it to mine datasets like the letters of Charles Dickens.</p>
<p>Dickens was a prolific letter writer (the Pilgrim edition extends to 12 thick volumes). I don&#8217;t have access to that data but I did download the first volume (of three) that his daughters edited.</p>
<p>Using Perl, I have extracted the date and recipient tags and converted the text file into JSON (as part of a larger process of converting the file into XML and using XSL to transform the data) and then created a table view of the data so that you can easily find the dates of the letters sent to certain people in <a title="Letters of Dickens project" href="/development/dickensletter.php" target="_blank">tabular form</a>.</p>
<p>I&#8217;ve also used the same data set to produce a fairly <a title="Timeline of Dickens' letters" href="http://www.austgate.myzen.co.uk/development/timeline.php" target="_blank">basic timeline of the letters</a> which is being rewritten from here. It needs some rewriting to update to the new version of timeline.</p>
]]></content:encoded>
			<wfw:commentRss>http://austgate.co.uk/2009/07/mining-the-letters-of-charles-dickens/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
