<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>The Aust Gate &#187; open_literature</title>
	<atom:link href="http://austgate.co.uk/tags/open_literature/feed/" rel="self" type="application/rss+xml" />
	<link>http://austgate.co.uk</link>
	<description>Open Knowledge and Literature</description>
	<lastBuildDate>Tue, 08 May 2012 20:33:34 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.2</generator>
		<item>
		<title>Using Tesseract with Python for OCR</title>
		<link>http://austgate.co.uk/2011/11/using-tesseract-with-python-for-ocr/</link>
		<comments>http://austgate.co.uk/2011/11/using-tesseract-with-python-for-ocr/#comments</comments>
		<pubDate>Sun, 27 Nov 2011 18:38:59 +0000</pubDate>
		<dc:creator>iain_emsley</dc:creator>
				<category><![CDATA[Programming]]></category>
		<category><![CDATA[projects]]></category>
		<category><![CDATA[ocr]]></category>
		<category><![CDATA[open_literature]]></category>
		<category><![CDATA[tesseract]]></category>
		<category><![CDATA[textcamp]]></category>

		<guid isPermaLink="false">http://austgate.co.uk/?p=415</guid>
		<description><![CDATA[Following several conversations with Alex Butterworth over pots of tea in the crypt of St Mary&#8217;s Church in Oxford, I&#8217;ve been having a look at Python and its bindings with the Tesseract library. A quick Google search brought me to this post by Roy on building an HTTP service using Tornado. I am fairly new [...]]]></description>
			<content:encoded><![CDATA[<p>Following several conversations with Alex Butterworth over pots of tea in the crypt of St Mary&#8217;s Church in Oxford, I&#8217;ve been having a look at Python and its bindings with the <a title="Tessearct OCR library" href="http://code.google.com/p/tesseract-ocr/" target="_blank">Tesseract</a> library.</p>
<p>A quick Google search brought me to this post by Roy on building an<a title="OCR, Tornado" href="http://www.morethantechnical.com/2011/01/25/10-lines-of-code-ocr-http-service-with-python-tesseract-and-tornado/" target="_blank"> HTTP service using Tornado</a>. I am fairly new to <a title="Tornado's website" href="http://www.tornadoweb.org/" target="_blank">Tornado</a> but have been looking at it for an experiment at work. However I have been using <a title="Flask microsite app" href="http://flask.pocoo.org" target="_blank">Flask</a> for other projects such as a quick and dirty RDF browser largely based on Chris Gutteridge&#8217;s PHP browser. (At some point very soon, I need to get back to this but other projects are slightly more pressing at the moment.)</p>
<p>After a quick upgrade to version 0.8, I have managed to put something together a little like Roy&#8217;s script but I&#8217;m hoping to go further and add a storage layer before tidying it all up.</p>
<p>Unlike Roy&#8217;s script, I&#8217;ve pushed the Tesseract and file handling code outside of the server. In the long term, I&#8217;d like to split out the file handling and storage facilities from the web server which means looking at the storage. As a quick step, I&#8217;ve popped in a link to a MySQL database but a far better option would probably be a NoSQL database like CouchDB or similar. I suppose a Key / Value store like Redis could be used as well (<a title="Redisfs " href="http://blog.steve.org.uk/i_updated_my_redis_based_filesystem.html" target="_blank">Redisfs apparently does something like this</a>) as a back end. I&#8217;m keeping options open.</p>
<p>I do have a temptation to use RabbitMQ to notify various workers that a file exists which suggests that if I&#8217;m hoping to use this as a book scanner back end (discussed at the Textcamp event in August), then I need to add in an automated set of scripts which reads a directory and deals with the file and moving, storing and scanning them. Perhaps Tornado might be a long term answer but realistically it is not needed for a test project.</p>
<p>Also, Tesseract will need some training as I&#8217;ve discovered this evening playing with some newspaper text and seeing some of the results. As one of the reason I began this was to store old fanzines and newspaper articles which I&#8217;ve stored for research but are now degrading, that might be a problem.</p>
<p>Either way, this is a way of moving ahead with the book scanner conversation and building something small to scratch some itches.</p>
]]></content:encoded>
			<wfw:commentRss>http://austgate.co.uk/2011/11/using-tesseract-with-python-for-ocr/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Working on the Panton Principles for Open Literature and Humanities</title>
		<link>http://austgate.co.uk/2011/10/working-on-the-panton-principles-for-open-literature-and-humanities/</link>
		<comments>http://austgate.co.uk/2011/10/working-on-the-panton-principles-for-open-literature-and-humanities/#comments</comments>
		<pubDate>Wed, 26 Oct 2011 17:38:48 +0000</pubDate>
		<dc:creator>iain_emsley</dc:creator>
				<category><![CDATA[Open Knowledge]]></category>
		<category><![CDATA[projects]]></category>
		<category><![CDATA[open_literature]]></category>
		<category><![CDATA[principles]]></category>

		<guid isPermaLink="false">http://austgate.co.uk/?p=402</guid>
		<description><![CDATA[The, it appears indefatigable, James Harriman-Smith and I, amongst others, had been talking about porting the Panton Principles to Open Literature and Humanities uses. After a Skype call, we created a first draft which is now online on the Open Literature wiki: http://wiki.openliterature.net/Principles and on the Open Literature mailing list. One of the matters that [...]]]></description>
			<content:encoded><![CDATA[<p>The, it appears indefatigable, James Harriman-Smith and I, amongst others, had been talking about porting the <a title="Panton Principles" href="http://pantonprinciples.org/" target="_blank">Panton Principles</a> to Open Literature and Humanities uses. After a Skype call, we created a first draft which is now online on the Open Literature wiki: <a title="Open Literature principles" href="http://wiki.openliterature.net/Principles" target="_blank">http://wiki.openliterature.net/Principles</a> and on the Open Literature mailing list.</p>
<p>One of the matters that did concern us was the word &#8220;data&#8221; and what this might mean to literature and humanities. One assumption that we had was that it perhaps had a more defined meaning to scientists. But what is data to humanities? Is it the manuscript, the notes, or the published work? We decided that &#8216;Work&#8217; might be a better word for the overarching principle.</p>
<p>One of the issues that is important is re-use and subsequently closing the re-used work down and making it non-open. The major party that we had in mind was Google Books. Whilst they are making good and admirable strides in the digitising of out of print works but there is no API or metadata store that can be used to mix up the data or to mine it in any other way. Effectively we end up where we started: with a technically open text tied up in ways that cannot be re-used.</p>
<p>Re-use and re-mix are extremely important within digital humanities. Influence and building on works are central to movements like Modernism and also ensuring that works and authors are accessible. Works are adapted and take on their own lives or segue from such moments.</p>
<p>The final major point was that citations and the underlying cited text should be open. Whilst the core of the principles are about the work and ensuring that it can be worked on, a fair amount of work goes into notes and annotations to the text (such as the great <a title="Annotation tool" href="http://www.annotateit.org" target="_blank">Annotate It</a> tool) and these provide a meta work for people to build on. It is vital for debate that these are not put into a closed arena, not just for the sharing of notes but also building on the notes. They might also be put together into a new work or an annotated version of a work put together to build upon the work with communal notes.</p>
<p>This does represent a step forward in open literature and digital humanities. I really hope that debate does start and that these can be developed and make concrete.</p>
]]></content:encoded>
			<wfw:commentRss>http://austgate.co.uk/2011/10/working-on-the-panton-principles-for-open-literature-and-humanities/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Thinking about texts and communities at Textcamp</title>
		<link>http://austgate.co.uk/2011/08/thinking-about-texts-and-communities-at-textcamp/</link>
		<comments>http://austgate.co.uk/2011/08/thinking-about-texts-and-communities-at-textcamp/#comments</comments>
		<pubDate>Sun, 14 Aug 2011 12:33:01 +0000</pubDate>
		<dc:creator>iain_emsley</dc:creator>
				<category><![CDATA[Information Retrieval]]></category>
		<category><![CDATA[Open Knowledge]]></category>
		<category><![CDATA[Text Mining]]></category>
		<category><![CDATA[open_literature]]></category>
		<category><![CDATA[textcamp]]></category>

		<guid isPermaLink="false">http://austgate.co.uk/?p=378</guid>
		<description><![CDATA[Having gone to Textcamp yesterday, I started playing with Wordle and IBM&#8217;s Many Eyes at the suggestion of Dave Flanders of the JISC. As James Harriman-Smith, the organiser and Open Literature co-ordinator for the Open Knowledge Foundation, had suggested that this year is the anniversary of the manuscript of Alexander Pope&#8216;s An Essay in Criticism, [...]]]></description>
			<content:encoded><![CDATA[<p>Having gone to <a title="Textcamp on Open Literature" href="http://wiki.openliterature.net/Text_Camp_2011" target="_blank">Textcamp</a> yesterday, I started playing with Wordle and IBM&#8217;s Many Eyes at the suggestion of <a title="David Flanders JISC staff page" href="http://www.jisc.ac.uk/contactus/staff/davidfflanders" target="_blank">Dave Flanders</a> of the<a title="JISC website" href="http://www.jisc.ac.uk/" target="_blank"> JISC</a>. As <a title="James Harriman-Smith's OKF page" href="http://okfn.org/members/jameshs/" target="_blank">James Harriman-Smith</a>, the organiser and Open Literature co-ordinator for the Open Knowledge Foundation, had suggested that this year is the anniversary of the manuscript of <a title="Wikipedia on Alexander Pope" href="http://en.wikipedia.org/wiki/Alexander_Pope" target="_blank">Alexander Pope</a>&#8216;s <a title="Wikipedia on Essay on Criticism" href="http://en.wikipedia.org/wiki/An_Essay_on_Criticism" target="_blank">An Essay in Criticism</a>, I popped the Gutenberg text into Wordle to see what it <a title="Wordle on Pope's Essay in Criticsm" href="http://www.wordle.net/show/wrdl/3912697/Essay_in_Criticism" target="_blank">shows as a tag cloud</a>. <a title="Wordle: Essay in Criticism" href="http://www.wordle.net/show/wrdl/3912697/Essay_in_Criticism"><img style="padding: 4px; border: 1px solid #ddd;" src="http://www.wordle.net/thumb/wrdl/3912697/Essay_in_Criticism" alt="Wordle: Essay in Criticism" align="left" /></a> The dominance of wit is not a surprise as Wit in poetry was a prized quality for Pope and Dryden. There are some small issues such as &#8216;still&#8217; and &#8216;Still&#8217; and perhaps this could be rectified by making everything lower case but this also presents other issues if two words are similar but the capitalisation suggests a different intonation. As I&#8217;ve <a title="Post on Word clouds" href="http://austgate.co.uk/2010/10/tagging-the-revolution-exploring-edmund-burkes-reflections-on-the-revolution-in-france/" target="_blank">blogged before</a>, word clouds are great but not if they don&#8217;t link so, at some point in the future, I&#8217;ll sit down and actually upload a table to create a useful tag cloud. John Levin, of <a title="James Levin's blog onAnterotesis on Ecco" href="http://anterotesis.com/wordpress/2011/08/making-the-tcp-ecco-texts-accessible/" target="_blank">Anterotesis</a>, loaded a csv file of the recently released ECCO files. He loaded Volume Four of Defoe&#8217;s Tour of the Whole Island of Great Britain, which features Scotland.</p>
<div id="attachment_383" class="wp-caption alignleft" style="width: 190px"><a href="http://austgate.co.uk/wp-content/uploads/2011/08/oenvq.jpg"><img class="size-medium wp-image-383" title="Wordcloud of Defoe's journey" src="http://austgate.co.uk/wp-content/uploads/2011/08/oenvq-180x300.jpg" alt="Wordcloud of Defoe's journey taken at Textcamp by Dave Flanders" width="180" height="300" /></a><p class="wp-caption-text">Wordcloud of Defoe&#39;s journey taken at Textcamp</p></div>
<p>Using the Many Eyes Word Cloud, we can see that Scotland is unsurprisingly the largest item but also Lord and Earl are also popular, suggesting that he stopped with or met the aristocracy rather than just travelling randomly. Dave Flanders and John created some cool visualisations using the tool which allow you to follow words in the text and to see which are the most linked to words (using bigrams I would suppose) in a tree fashion. It is certainly something at I will be looking up later for &#8220;quick win&#8221; visualisations.</p>
<p>One of the intriguing projects that was suggested was building our own DIY bookscanner using links currently stored on the <a title="DIY Bookscanner" href="http://wiki.openliterature.net/Tcamp11/DIYD" target="_blank">Textcamp 2011 wiki pages</a>. I think that Dave Flanders might be organising a hack weekend to actually build the machine for real use. I find it interesting but thinking that it would be cool to also see if can be built at home or using iPhone / Android OSes which also entails a software hack, unless an app already exists. That is something to explore later.</p>
<p>Mark MacGillivray, of OKFN and <a title="Cottage Labs" href="http://cottagelabs.com/" target="_blank">Cottage Labs</a>,  and Brian Hole of <a title="Ubiquity Press" href="http://www.ubiquitypress.com/" target="_blank">Ubiquity Press</a>, spoke about Open Access and making scholarship open but also retaining its rigour. Using Open Access, we should be able to share the data, the ways of interpreting it and and the final interpretation which is published.</p>
<p>The science community has been doing this for some while and things like the Panton Principles and Science Commons are showing the way. One of the ideas was to write a handbook for how to use openness in literature and that it is something that we need address and build on. We ought to write an open guide / manual and build on / develop the Panton Principles where necessary as a core set of principles to work with.</p>
<p>Having days like Textcamp and Book Hackday are extremely useful to think about this and to work on the ideas. It is easy to get into echo chambers of mailing lists and blogs, we need these events to meet new people, be challenged to explain ourselves and to either build on the day or go away with ideas to test and try out. The day has excited me out using word clouds again and doing a bit more work on them as a tool to make them useful. It has also got me excited about book scanning and doing some hardware hacking (which I&#8217;ve not really done) before.</p>
<p>Running the Pope essay through Wordle makes me excited about testing what we can do with the ECCO TEI documents that John Levine  links to. Can we hyperlnk to other texts, author and events that are mentioned in it (not just with the annotator tool but in generated HTML) or use HTML 5 to embed audio links to further discussions or pronunciation (for example Byron&#8217;s Don Juan which has been argued as pronounced &#8220;Jew-an&#8221; rather that &#8220;Hwan&#8221; and the arguments for and against).</p>
<p>Perhaps that gets to one of the issues that arose in the break-out discussions in the kitchen. After the lightning talk about digital publishing, there seemed to be an argument about whether current digital publishing was really pushing the boundaries or flailing around. I do think that it has some real benefits for niche publishing but these have not been fully explored. The model will need to change and perhaps become more open in those senses, perhaps linking the raw data to the interpretation earlier to allow the relevant community to peer review the data earlier. Just a suggestion. There are two distinct communities, the top-down business layer and the grass roots layer, activists, data developers and so on. Both would appear to have broadly similar aims but how to put them together  in a useful way for both to learn. Don&#8217;t get me wrong here as I believe I&#8217;m at the grass roots layer, but I think that both sides do have a dialogue which could get around the issues that the music and film industries have found themselves in, i.e. confrontation. We are here to disrupt and make.because we are passionate.  We care about the industry. Publishing is an industry which needs to change and transform itself. Put the two together and there are ways of moving forward. My hope is that in future events, we could get some more publishers along to the event.</p>
<p>The other important thing is that these conversations carry on afterwards. The round table discussions where great as were the break-out in the kitchen ones but they need to carry on or we create our own echo chamber which reduces the value of what happened yesterday.</p>
<p>Whilst I did not do as much coding as I wanted to yesterday, I met some new people and caught up with colleagues. The fact that organisations such as JISC are supporting events like this shows their underlying importance and use to the community. We&#8217;ve started, now we need to carry on by chatting, blogging, sharing and doing more of these events.</p>
]]></content:encoded>
			<wfw:commentRss>http://austgate.co.uk/2011/08/thinking-about-texts-and-communities-at-textcamp/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Exposing the Classic Serial data</title>
		<link>http://austgate.co.uk/2011/01/exposing-the-classic-serial-data/</link>
		<comments>http://austgate.co.uk/2011/01/exposing-the-classic-serial-data/#comments</comments>
		<pubDate>Sun, 23 Jan 2011 16:33:37 +0000</pubDate>
		<dc:creator>iain_emsley</dc:creator>
				<category><![CDATA[projects]]></category>
		<category><![CDATA[linked_data]]></category>
		<category><![CDATA[open_literature]]></category>

		<guid isPermaLink="false">http://austgate.co.uk/?p=286</guid>
		<description><![CDATA[I&#8217;ve just been listening to the serialisation of Wilkie Collins&#8217; The Moonstone on Radio Four in its Classic serial slot. Whilst  listening (and remembering how much I had enjoyed it when I read it years ago), I began thinking about trying to expose it as Linked Data so that the book&#8217;s publication detail could be [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve just been listening to the serialisation of Wilkie Collins&#8217; The Moonstone on Radio Four in its <a title="BBC Radio Four classic serial" href="http://www.bbc.co.uk/programmes/b00xp2cs" target="_blank">Classic serial</a> slot.</p>
<p>Whilst  listening (and remembering how much I had enjoyed it when I read it years ago), I began thinking about trying to expose it as Linked Data so that the book&#8217;s publication detail could be linked into bibliographic data, the actor&#8217;s details could be linked as well so that you could see who appeared regularly, or which companies were regularly used by the BBC to create these dramatisations. If the bibliographic detail could be suitably developed, then you could also query the publication dates and authors.</p>
<p>I think that might well be a future small project to start doing (perhaps one evening). I guess it could be a script to scrape the relevant pages, query dbpedia (since I&#8217;m using PHP on this server, using ARC) and return a relevant form.</p>
]]></content:encoded>
			<wfw:commentRss>http://austgate.co.uk/2011/01/exposing-the-classic-serial-data/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Weeknotes: Open Correspondence</title>
		<link>http://austgate.co.uk/2010/11/weeknotes-open-correspondence-2/</link>
		<comments>http://austgate.co.uk/2010/11/weeknotes-open-correspondence-2/#comments</comments>
		<pubDate>Mon, 01 Nov 2010 10:52:41 +0000</pubDate>
		<dc:creator>iain_emsley</dc:creator>
				<category><![CDATA[projects]]></category>
		<category><![CDATA[couch]]></category>
		<category><![CDATA[open_correspondence]]></category>
		<category><![CDATA[open_literature]]></category>

		<guid isPermaLink="false">http://austgate.co.uk/?p=228</guid>
		<description><![CDATA[I&#8217;ve been talking with Rufus Pollock about moving the Open Correspondence web site as we&#8217;ve had the occasional snafu with bringing the site back up after maintenance. I&#8217;m pleased to say that we managed the move last night and the site is back up, DNS moved and so on. The one thing that really surprised [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve been talking with Rufus Pollock about moving the Open Correspondence web site as we&#8217;ve had the occasional snafu with bringing the site back up after maintenance. I&#8217;m pleased to say that we managed the move last night and the site is back up, DNS moved and so on. The one thing that really surprised me is that the original site was running the project with a <a title="SQLite database site" href="http://www.sqlite.org/" target="_blank">SQLite</a> db engine which ships natively with Pylons. I use MySQL on both Linux and Windows and I believe that it has been tested (or currently runs) on PostgreSQL but I might have misunderstood that conversation. Possibly it explains why the original endpoint could be flaky and disappear.</p>
<p>I&#8217;ve been working on two things for the new version: timeline and full text search. Xapian has now been installed as the search engine but the move has meant that Ineed to make one or two changes which means that it is not live yet. I hope to have the issue resolved shortly as it is not huge really.</p>
<p>There is also a nascent <a title="Timeline of Dickens's letters" href="http://www.opencorrespondence.org/timeline/index" target="_blank">timeline of all of Dickens&#8217;s letters</a> which still has some issues, like taking its time to load loading as there are around 1000 items. I&#8217;ve a feeling that Rufus might be looking at this when he has a moment.</p>
<p>Last week, I posted the next steps that I&#8217;d like to take with the site to the Open Knowledge Foundation help and open literature lists. The next thing is to start exploring geographical information and to expose that data.</p>
<p>The upshot is that it is time to re-look at the parsing methods and to really beef them up. They sufficed for us to get the project ported from the original PHP and to get the site up  but now I think I have to relook at each method, write more unit tests (and combine them into the openletters tests as they are separate). As the project gets bigger, the value of unit tests becomes much more apparent in terms of ensuring that we have not broken anything which far outweighs the time taken to set them up. It is habit that I need to force myself to continually do when developing. I&#8217;ve do this for some of the systems that I&#8217;m building at work using PHPUnit).</p>
<p>The second point that comes from this is storing the data. Currently the site reparses the data for the end points but we&#8217;ve been talking about using <a title="CouchDb apache site" href="http://couchdb.apache.org/" target="_blank">CouchDB</a> and possibly using <a title="GeoCouch site" href="http://vmx.cx/cgi-bin/blog/index.cgi/geocouch-the-future-is-now%3A2010-05-03%3Aen,CouchDB,Python,Erlang,geo" target="_blank">GeoCouch</a> for the next version.The idea is that we can then store the data and then transform it to the correct format when requested.</p>
<p>In part I&#8217;ve decided the only way of finding out how the site might be used is to, well, dog food it. So I&#8217;ve also started writing a<a title="Open Correspondence Java client" href="http://bitbucket.org/austgate/opencorrespondence-java" target="_blank"> Java client</a> using <a title="Jena rdf tookit" href="http://jena.sourceforge.net/" target="_blank">Jena</a> for Sparql to retrieve lists from the rdf endpoint and to represent them in XML, JSON or HTML. Currently the SPARQL query is built (though need schanges due to last night&#8217;s move as the RDF endpoint moved) and I need to do to complete the change from a List&lt;QuerySolution&gt; into a readable form like XML and  so on. The idea is that it will come as a JAR which can be packaged onto the class path of a WAR or another system.</p>
<p>I&#8217;ve also got a Python script to cluster the letters on the go as well which I will commit to the Python bitbucket repo once I&#8217;ve got a bit more done on it as it currently only builds the intial matrix so I need to visualise the data next.</p>
]]></content:encoded>
			<wfw:commentRss>http://austgate.co.uk/2010/11/weeknotes-open-correspondence-2/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Making Milton sparql</title>
		<link>http://austgate.co.uk/2010/10/making-milton-sparql/</link>
		<comments>http://austgate.co.uk/2010/10/making-milton-sparql/#comments</comments>
		<pubDate>Tue, 26 Oct 2010 20:56:07 +0000</pubDate>
		<dc:creator>iain_emsley</dc:creator>
				<category><![CDATA[projects]]></category>
		<category><![CDATA[Text Mining]]></category>
		<category><![CDATA[open_literature]]></category>
		<category><![CDATA[open_milton]]></category>

		<guid isPermaLink="false">http://austgate.co.uk/?p=225</guid>
		<description><![CDATA[I&#8217;ve been going over some ideas that have been bubbling in my mind for a while about using RDF to load in further details about a test in question. I&#8217;ve gone back to an old Milton file, the Areopagitica,  that I created for another project but never really used. Essentially its part of the Burke [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve been going over some ideas that have been bubbling in my mind for a while about using RDF to load in further details about a test in question. I&#8217;ve gone back to an old Milton file, the <a title="Milton's areopagitica" href="http://austgate.co.uk/development/miltonoutput.html" target="_blank">Areopagitica</a>,  that I created for another project but never really used. Essentially its part of the Burke stuff that I&#8217;ve been looking at in other posts. I&#8217;ve started using the HTML version but will get around to updating the XML/XSL version which is in dire need of an overhaul.</p>
<p>What I&#8217;ve been thinking about is software re-usability and I realised that quite a few of my queries are very similar in terms of looking for abstracts and influences on a text. I&#8217;ve started collecting various scripts together and putting them together to form a small library to re-use across texts to develop over time.</p>
<p>I&#8217;ve been writing some scripts which wrap around the ARC toolkit and query dbpedia. It still has a fair few bugs but I need to spend some time getting rid of them. At the moment, the abstract overwrites all the text if you scroll over the title or Milton&#8217;s name.</p>
<p>Whilst PHP might not always be my language of choice but one of work, I&#8217;ve started porting some of these ideas across to Java and Python. As part of this, I&#8217;m hoping to finish some stuff I&#8217;ve been working on for some command line scripts and tools for the Open Correspondence project which will hopefully feed into the project.</p>
]]></content:encoded>
			<wfw:commentRss>http://austgate.co.uk/2010/10/making-milton-sparql/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Tagging the revolution &#8211; exploring Edmund Burke&#8217;s Reflections on the Revolution in France</title>
		<link>http://austgate.co.uk/2010/10/tagging-the-revolution-exploring-edmund-burkes-reflections-on-the-revolution-in-france/</link>
		<comments>http://austgate.co.uk/2010/10/tagging-the-revolution-exploring-edmund-burkes-reflections-on-the-revolution-in-france/#comments</comments>
		<pubDate>Sun, 03 Oct 2010 17:29:54 +0000</pubDate>
		<dc:creator>iain_emsley</dc:creator>
				<category><![CDATA[Text Mining]]></category>
		<category><![CDATA[edmund_burke]]></category>
		<category><![CDATA[open_literature]]></category>
		<category><![CDATA[tag_cloud]]></category>

		<guid isPermaLink="false">http://austgate.co.uk/?p=214</guid>
		<description><![CDATA[Over the weekend, I read an interesting article, &#8220;Edmund Burke: How did a long-dead Irishman become the hottest thinker of 2010?&#8220;, by Amol Rajan in the Independent on the philosopher, Edmund Burke. In the past I&#8217;ve read his musings on the sublime in &#8220;A Philosophical Enquiry into the Origin of our Ideas of the Sublime [...]]]></description>
			<content:encoded><![CDATA[<p>Over the weekend, I read an interesting article, &#8220;<a title="Amol Rajan writing in the Independent about Edmund Burke" href="http://www.independent.co.uk/news/uk/politics/edmund-burke-how-did-a-longdead-irishman-become-the-hottest-thinker-of-2010-2094434.html" target="_blank">Edmund Burke: How did a long-dead Irishman become the hottest thinker of 2010?</a>&#8220;, by Amol Rajan in the Independent on the philosopher, Edmund Burke. In the past I&#8217;ve read his musings on the sublime in &#8220;A Philosophical Enquiry into the Origin of our Ideas of the Sublime and the Beautiful&#8221; and seen his &#8220;Reflections on the Revolution in France&#8221; on the shelf but have yet not pushed myself to exploring it. The article that I read changed that as it managed to draw out the relevancies of his argument with the current ConDem / LibCon coalition. It was written as a response to David Marquand&#8217;s article, &#8220;<a title="Prospect Magazine on Edmund Burke and the big society" href="http://www.prospectmagazine.co.uk/2010/10/edmund-burke-big-society/" target="_blank">Patron saint  of the big society</a>&#8221; in Prospect magazine.</p>
<p>I still have not read the Reflections yet but I thought it might be fun to get the Gutenberg text, strip it down to the text itself and begin building a tag cloud as an exercise in text mining. What I&#8217;m trying to achieve is a visualisation of the text and the keywords. That will probably take me a couple of attempts in cleaning the information up and making it relevant.</p>
<p>Having done that, I&#8217;d like to make the cloud relevant by clustering words together and linking them into the text or search. That is one of the things that really annoys me with some tag clouds. Pretty pictures but no linking. I always think of it as why bother? Seriously, time is taken to mine the data, then visualise it but not to make it relevant or link it into the source material somehow.</p>
<p>For now, I&#8217;m going to leave it as a pretty picture version (sorry&#8230;) as I try to make it more relevant and release bits of it as I get it done.</p>
<p><a title="Tag Cloud of Edmund Burke's Reflections" href="http://austgate.co.uk/development/tag.php" target="_blank">Tag cloud</a> of Burke&#8217;s Reflections on the Revolution in France.</p>
<p>On the todo list is linking the words to either a search and text and to contextualise by creating a corpus of words.</p>
<p>This follows on some stuff that I was doing a while ago to explore how make tag clouds and then use them. I was reading Jim Bumgardner&#8217;s <a title="Jim Bumgardner on Tag Clouds and link to OReilly website" href="http://oreilly.com/catalog/9780596527945/" target="_blank">Building Tag Clouds in Perl and PHP</a> (OReilly, 2006). Lots to do but something to chip away at.</p>
<p>Update: Updated the Prospect link which is now available and corrected a spelling error</p>
]]></content:encoded>
			<wfw:commentRss>http://austgate.co.uk/2010/10/tagging-the-revolution-exploring-edmund-burkes-reflections-on-the-revolution-in-france/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Weeknotes: All quiet on the accounting front</title>
		<link>http://austgate.co.uk/2010/06/weeknotes-all-quiet-on-the-accounting-front/</link>
		<comments>http://austgate.co.uk/2010/06/weeknotes-all-quiet-on-the-accounting-front/#comments</comments>
		<pubDate>Sun, 27 Jun 2010 13:48:12 +0000</pubDate>
		<dc:creator>iain_emsley</dc:creator>
				<category><![CDATA[projects]]></category>
		<category><![CDATA[weeknotes]]></category>
		<category><![CDATA[open_correspondence]]></category>
		<category><![CDATA[open_literature]]></category>

		<guid isPermaLink="false">http://austgate.co.uk/?p=174</guid>
		<description><![CDATA[It&#8217;s been a week of relative frustration with priorities suddenly being shifted and the infrastructure road map looking more and more unclear. The soap server is largely debugged and ready for more extensive testing on the server and the back end has now been rewritten to capture more data. I cannot help feeling that it [...]]]></description>
			<content:encoded><![CDATA[<p>It&#8217;s been a week of relative frustration with priorities suddenly being shifted and the infrastructure road map looking more and more unclear.</p>
<p>The soap server is largely debugged and ready for more extensive testing on the server and the back end has now been rewritten to capture more data. I cannot help feeling that it will change once more services go online to scale more efficiently but right now I don&#8217;t have the expertise to do it. I&#8217;ll get there.</p>
<p>On a different tack, I&#8217;m back on the accounting project that I was on several months ago and making some headway in that. Its grown since I was last involved in it but nothing that a decent set of specs and roadmaps cannot solve in terms of making it manageable.</p>
<p>I&#8217;ve been thinking about my next book project which is on the New Weird and genre over the last 15 years and wondering how to use dbpedia&#8217;s <a title="dbpedia ontology influencedBy term" href="http://dbpedia.org/ontology/influencedBy" target="_blank">influencedBy</a> and <a title="dbpedia's influence term" href="http://dbpedia.org/ontology/influence" target="_blank">influence</a> terms in terms of showing how writers influence each other over a century. I&#8217;m tempted to put the data into a large rdf sheet and then use javascript or PHP to transform it into JSON to see if you can use the Simile timeline software usefully or if I need to find / write something more appropriate. It does have to wait for me to finish the current book.</p>
<p>I forgot to link to the <a title="Open Correspondence post on OKF blog" href="http://blog.okfn.org/2010/06/16/open-correspondence/" target="_blank">Open Correspondence blog post</a> on the Open Knowledge Foundation&#8217;s blog which was posted a few days ago.</p>
]]></content:encoded>
			<wfw:commentRss>http://austgate.co.uk/2010/06/weeknotes-all-quiet-on-the-accounting-front/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Weeknotes: Pylons, Python and printing</title>
		<link>http://austgate.co.uk/2010/05/weeknotes-pylons-python-and-printing/</link>
		<comments>http://austgate.co.uk/2010/05/weeknotes-pylons-python-and-printing/#comments</comments>
		<pubDate>Sun, 30 May 2010 10:22:41 +0000</pubDate>
		<dc:creator>iain_emsley</dc:creator>
				<category><![CDATA[Open Knowledge]]></category>
		<category><![CDATA[open_correspondence]]></category>
		<category><![CDATA[open_literature]]></category>
		<category><![CDATA[printing]]></category>
		<category><![CDATA[Python]]></category>

		<guid isPermaLink="false">http://austgate.co.uk/?p=157</guid>
		<description><![CDATA[I&#8217;ve been doing some more work to the Open Correspondence website (which is now functional  thanks to Rufus Pollock&#8217;s help). In part I&#8217;ve been cleaning up the urls for the data controller (which is still coming along) and trying to tie the views in together. Being happier with Apache and PHP I spent some time [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve been doing some more work to the Open Correspondence website (which is now functional  thanks to Rufus Pollock&#8217;s help). In part I&#8217;ve been cleaning up the urls for the data controller (which is still coming along) and trying to tie the views in together. Being happier with Apache and PHP I spent some time looking for how to rewrite the urls until I came across <a title="Andre Kollel on Pylons" href="http://blog.andrekolell.de/2009/04/26/the-pylons-web-framework/" target="_blank">Andre Kollel&#8217;s blog post</a> about the internal workings of the middleware in the <a title="Pylons framework" href="http://pylonshq.com/" target="_blank">Pylons framework</a>.  The more I do on the project, the more I learn about both Python and Pylons.</p>
<p>One of the next things to do is to reformat the dates into human readable format. I had thought of using Python&#8217;s <a title="python's date time module" href="http://docs.python.org/library/datetime.html" target="_blank">datetime</a> strftime to reformat the date from its current ISO format (YYYY-MM-DD) into day, month year. Unfortunately, the method states &#8221; years before 1900 cannot be used.&#8221; A slight cramp in the plan. However there is an <a title="Andrew Dalke's Activestate date recipe" href="http://code.activestate.com/recipes/306860-proleptic-gregorian-dates-and-strftime-before-1900/" target="_blank">Activestate recipe</a> by Andrew Dalke which might do the trick or at least point me in the right direction. It is one of the things to be tidied up at some point.</p>
<p>It is a good feeling to have the site running now. The next task is to write the tests and then  to refactor the code. It is very PHPish and needs to be made more Pythonic. I&#8217;ve got an idea for trying to create a dendrogram around the textReferred element and to discover the letters and correspondents around the books that Dickens was writing. One of the tings is to continue loading the other volumes of Dickens&#8217;s letters into the site. So version 0.2 is a little way off but the light at the end of the tunnel is not a train this time.</p>
<p>Workwise has been a little hectic. I must make some time to write a method to allow our admin team to resubmit applications. Like so many things it is a balance between a five minute job and the two hour ones that need to be done. The major job for the week though was getting the automated printing working.</p>
<p>One of the jobs that admin do is to go through each client and create the packs for them. Using HTMLtools, I&#8217;ve managed to compile the html into PDF and then convert the PDF into a PostScript file for a printer. I&#8217;ve managed to use the <a title="Line Printer Remote protocol wikipedia page" href="http://en.wikipedia.org/wiki/Line_Printer_Remote" target="_blank">Line Printer Remote</a> protocol to send the job to the printer. It is a simple enough command:</p>
<p>lpr -S &lt;ip address/name of printer&gt;  -P &lt;name of print job&gt; (-o &lt;optional -o 1 sets file to binary&gt;) &lt;name of file&gt;</p>
<p>Windows doesn&#8217;t appear to support the full protocol but enough to be useful. The -o switch appears to only define whether the file is binary or not rather than specifying the paper type and so on. Annoying but it can be got around.</p>
<p>Anyhow it got me thinking about other ways of using commands to explore how texts can be converted and changed into useful objects. It brings me back to the use of psbook for printing but how to make it useful for an average user who does not necessarily want to run various commands. Having had a conversation with my friend Darren Nash ,editorial director of Orbit books,  about the future of publishing; he opined that small presses would come to the fore. I think, certainly in genre that this is correct. It would be interesting to see how existing tools could be used towards these ends rather than constantly re-invent the wheel.</p>
<p>Now that the first version of letters is out the way, time to go over other projects. I&#8217;ve got a yen to try and create something from Milton&#8217;s <a title="Wikipedia on the Areopagitica" href="http://en.wikipedia.org/wiki/Areopagitica" target="_blank">Areopagitica</a>, appropriate I think as it is a cry for free presses.</p>
]]></content:encoded>
			<wfw:commentRss>http://austgate.co.uk/2010/05/weeknotes-pylons-python-and-printing/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Date set for Textcamp</title>
		<link>http://austgate.co.uk/2010/05/date-set-for-textcamp/</link>
		<comments>http://austgate.co.uk/2010/05/date-set-for-textcamp/#comments</comments>
		<pubDate>Wed, 05 May 2010 08:45:26 +0000</pubDate>
		<dc:creator>iain_emsley</dc:creator>
				<category><![CDATA[Open Knowledge]]></category>
		<category><![CDATA[open_literature]]></category>
		<category><![CDATA[textcamp]]></category>

		<guid isPermaLink="false">http://austgate.co.uk/?p=150</guid>
		<description><![CDATA[The provisional date for Textcamp has been set for August 21st on the twitter feed.]]></description>
			<content:encoded><![CDATA[<p>The provisional date for <a title="Textcamp website" href="http://textcamp.org/index.php/Main_Page" target="_blank">Textcamp</a> has been set for August 21st on the <a title="Textcamp twitter feed" href="http://twitter.com/textcamp" target="_blank">twitter feed</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://austgate.co.uk/2010/05/date-set-for-textcamp/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

