<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>The Aust Gate &#187; Open Knowledge</title>
	<atom:link href="http://austgate.co.uk/category/openknowledge/feed/" rel="self" type="application/rss+xml" />
	<link>http://austgate.co.uk</link>
	<description>Open Knowledge and Literature</description>
	<lastBuildDate>Tue, 08 May 2012 20:33:34 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.2</generator>
		<item>
		<title>Beginning APC with Drupal</title>
		<link>http://austgate.co.uk/2011/10/beginning-apc-with-drupal/</link>
		<comments>http://austgate.co.uk/2011/10/beginning-apc-with-drupal/#comments</comments>
		<pubDate>Mon, 31 Oct 2011 19:18:41 +0000</pubDate>
		<dc:creator>iain_emsley</dc:creator>
				<category><![CDATA[Open Knowledge]]></category>
		<category><![CDATA[drupal]]></category>
		<category><![CDATA[performance]]></category>

		<guid isPermaLink="false">http://austgate.co.uk/?p=406</guid>
		<description><![CDATA[I&#8217;ve been looking at performance in PHP as a side project. I decided to install it on Snow Leopard, having already set up PEAR and PECL. Using pecl install apc, I downloaded APC which appeared to be fine. However when I ran some scripts, I got the error: Fatal error: Unknown: apc_fcntl_unlock failed: in Unknown [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve been looking at performance in PHP as a side project. </p>
<p>I decided to install it on Snow Leopard, having already set up PEAR and PECL. Using pecl install apc, I downloaded APC which appeared to be fine. However when I ran some scripts, I got the error:<br />
Fatal error: Unknown: apc_fcntl_unlock failed: in Unknown on line 0 </p>
<p>which was definitely not expected.</p>
<p>I found some useful instructions on the PHP Bug fixes (https://bugs.php.net/bug.php?id=59750 ). I ran pecl uninstall apc then installed the beta version and answered &#8220;no&#8221; to pthreadm utex locks and  &#8220;yes&#8221; to spinlocks. You will also need to add apc.rfc1867 = 1 to the php.ini file for use with Drupal 7.</p>
<p>Having done this, I no longer get any errors in the status report regarding APC and when I have got my head around Organic Groups, I&#8217;m looking forward to getting into APC and improving Drupal performance. </p>
]]></content:encoded>
			<wfw:commentRss>http://austgate.co.uk/2011/10/beginning-apc-with-drupal/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Working on the Panton Principles for Open Literature and Humanities</title>
		<link>http://austgate.co.uk/2011/10/working-on-the-panton-principles-for-open-literature-and-humanities/</link>
		<comments>http://austgate.co.uk/2011/10/working-on-the-panton-principles-for-open-literature-and-humanities/#comments</comments>
		<pubDate>Wed, 26 Oct 2011 17:38:48 +0000</pubDate>
		<dc:creator>iain_emsley</dc:creator>
				<category><![CDATA[Open Knowledge]]></category>
		<category><![CDATA[projects]]></category>
		<category><![CDATA[open_literature]]></category>
		<category><![CDATA[principles]]></category>

		<guid isPermaLink="false">http://austgate.co.uk/?p=402</guid>
		<description><![CDATA[The, it appears indefatigable, James Harriman-Smith and I, amongst others, had been talking about porting the Panton Principles to Open Literature and Humanities uses. After a Skype call, we created a first draft which is now online on the Open Literature wiki: http://wiki.openliterature.net/Principles and on the Open Literature mailing list. One of the matters that [...]]]></description>
			<content:encoded><![CDATA[<p>The, it appears indefatigable, James Harriman-Smith and I, amongst others, had been talking about porting the <a title="Panton Principles" href="http://pantonprinciples.org/" target="_blank">Panton Principles</a> to Open Literature and Humanities uses. After a Skype call, we created a first draft which is now online on the Open Literature wiki: <a title="Open Literature principles" href="http://wiki.openliterature.net/Principles" target="_blank">http://wiki.openliterature.net/Principles</a> and on the Open Literature mailing list.</p>
<p>One of the matters that did concern us was the word &#8220;data&#8221; and what this might mean to literature and humanities. One assumption that we had was that it perhaps had a more defined meaning to scientists. But what is data to humanities? Is it the manuscript, the notes, or the published work? We decided that &#8216;Work&#8217; might be a better word for the overarching principle.</p>
<p>One of the issues that is important is re-use and subsequently closing the re-used work down and making it non-open. The major party that we had in mind was Google Books. Whilst they are making good and admirable strides in the digitising of out of print works but there is no API or metadata store that can be used to mix up the data or to mine it in any other way. Effectively we end up where we started: with a technically open text tied up in ways that cannot be re-used.</p>
<p>Re-use and re-mix are extremely important within digital humanities. Influence and building on works are central to movements like Modernism and also ensuring that works and authors are accessible. Works are adapted and take on their own lives or segue from such moments.</p>
<p>The final major point was that citations and the underlying cited text should be open. Whilst the core of the principles are about the work and ensuring that it can be worked on, a fair amount of work goes into notes and annotations to the text (such as the great <a title="Annotation tool" href="http://www.annotateit.org" target="_blank">Annotate It</a> tool) and these provide a meta work for people to build on. It is vital for debate that these are not put into a closed arena, not just for the sharing of notes but also building on the notes. They might also be put together into a new work or an annotated version of a work put together to build upon the work with communal notes.</p>
<p>This does represent a step forward in open literature and digital humanities. I really hope that debate does start and that these can be developed and make concrete.</p>
]]></content:encoded>
			<wfw:commentRss>http://austgate.co.uk/2011/10/working-on-the-panton-principles-for-open-literature-and-humanities/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>The shameful jailing of our cultural heritage</title>
		<link>http://austgate.co.uk/2011/09/the-shameful-jailing-of-our-cultural-heritage/</link>
		<comments>http://austgate.co.uk/2011/09/the-shameful-jailing-of-our-cultural-heritage/#comments</comments>
		<pubDate>Sun, 04 Sep 2011 16:01:30 +0000</pubDate>
		<dc:creator>iain_emsley</dc:creator>
				<category><![CDATA[Open Knowledge]]></category>

		<guid isPermaLink="false">http://austgate.co.uk/?p=395</guid>
		<description><![CDATA[Having had some fun and games restoring my laptop after the combination of Norton AntiVirus and Windows decided to lock up completely, I&#8217;ve just re-installed Ubuntu so apologies if you are waiting for anything from me. I&#8217;ve just come across this post from Philippe Agrain  on his blog (originally linked from OKF&#8217;s Open Humanities mailing [...]]]></description>
			<content:encoded><![CDATA[<p>Having had some fun and games restoring my laptop after the combination of Norton AntiVirus and Windows decided to lock up completely, I&#8217;ve just re-installed Ubuntu so apologies if you are waiting for anything from me.</p>
<p>I&#8217;ve just come across this post from Philippe Agrain  on his blog (originally linked from OKF&#8217;s Open Humanities mailing list) regarding the <a title="Phillipe Agrain on the British Library and Google" href="http://paigrain.debatpublic.net/?p=3448&amp;lang=en" target="_blank">recent agreement signed by the British Library and Google</a>. The upshot appears to be that the present and future rights to public domain work has been given to Google. Sorry, public domain should stay in the public domain.</p>
<p>I was at a conference in Oxford a couple of years ago where some of these issues were being discussed and it sounded like the Library were having to do these sort of deals with Google (and, I believe at the time, Microsoft &#8211; who pulled out eventually) through the costs and technical expertise needed. I&#8217;m pretty sure that these could be sourced or bodged in Britain and Europe if the will is there. I rather fear, in the UK at least, that we do not have the will here particularly at the moment. Yet again, we might be echoing Eric Schmidt&#8217;s point made in the recent <a title="Eric Schmidt's MacTaggart speech" href="http://www.guardian.co.uk/media/interactive/2011/aug/26/eric-schmidt-mactaggart-lecture-full-text" target="_blank">MacTaggart speech</a> at the 2011 Edinburgh TV festival: that the UK helped invent so many things and then failed to follow through. Rather than being in the technical sphere, we&#8217;re now doing this in the cultural sphere &#8211; and this surely cannot be right, can it?</p>
<p>The Open Knowledge Foundation, with Creative Commons and Centrum Cyfrowe, are <a title="OKF on GLAM workshop" href="http://blog.okfn.org/2011/09/03/open-glam-workshop-warsaw-15th-september-2011/" target="_blank">organising a workshop on opening cultural, library and museum metadata</a>.</p>
<p>One hopes that the British Library are going or any of our cultural mavens. We have a rich cultural heritage, we cannot have it locked away in certain companies.</p>
]]></content:encoded>
			<wfw:commentRss>http://austgate.co.uk/2011/09/the-shameful-jailing-of-our-cultural-heritage/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Thinking about texts and communities at Textcamp</title>
		<link>http://austgate.co.uk/2011/08/thinking-about-texts-and-communities-at-textcamp/</link>
		<comments>http://austgate.co.uk/2011/08/thinking-about-texts-and-communities-at-textcamp/#comments</comments>
		<pubDate>Sun, 14 Aug 2011 12:33:01 +0000</pubDate>
		<dc:creator>iain_emsley</dc:creator>
				<category><![CDATA[Information Retrieval]]></category>
		<category><![CDATA[Open Knowledge]]></category>
		<category><![CDATA[Text Mining]]></category>
		<category><![CDATA[open_literature]]></category>
		<category><![CDATA[textcamp]]></category>

		<guid isPermaLink="false">http://austgate.co.uk/?p=378</guid>
		<description><![CDATA[Having gone to Textcamp yesterday, I started playing with Wordle and IBM&#8217;s Many Eyes at the suggestion of Dave Flanders of the JISC. As James Harriman-Smith, the organiser and Open Literature co-ordinator for the Open Knowledge Foundation, had suggested that this year is the anniversary of the manuscript of Alexander Pope&#8216;s An Essay in Criticism, [...]]]></description>
			<content:encoded><![CDATA[<p>Having gone to <a title="Textcamp on Open Literature" href="http://wiki.openliterature.net/Text_Camp_2011" target="_blank">Textcamp</a> yesterday, I started playing with Wordle and IBM&#8217;s Many Eyes at the suggestion of <a title="David Flanders JISC staff page" href="http://www.jisc.ac.uk/contactus/staff/davidfflanders" target="_blank">Dave Flanders</a> of the<a title="JISC website" href="http://www.jisc.ac.uk/" target="_blank"> JISC</a>. As <a title="James Harriman-Smith's OKF page" href="http://okfn.org/members/jameshs/" target="_blank">James Harriman-Smith</a>, the organiser and Open Literature co-ordinator for the Open Knowledge Foundation, had suggested that this year is the anniversary of the manuscript of <a title="Wikipedia on Alexander Pope" href="http://en.wikipedia.org/wiki/Alexander_Pope" target="_blank">Alexander Pope</a>&#8216;s <a title="Wikipedia on Essay on Criticism" href="http://en.wikipedia.org/wiki/An_Essay_on_Criticism" target="_blank">An Essay in Criticism</a>, I popped the Gutenberg text into Wordle to see what it <a title="Wordle on Pope's Essay in Criticsm" href="http://www.wordle.net/show/wrdl/3912697/Essay_in_Criticism" target="_blank">shows as a tag cloud</a>. <a title="Wordle: Essay in Criticism" href="http://www.wordle.net/show/wrdl/3912697/Essay_in_Criticism"><img style="padding: 4px; border: 1px solid #ddd;" src="http://www.wordle.net/thumb/wrdl/3912697/Essay_in_Criticism" alt="Wordle: Essay in Criticism" align="left" /></a> The dominance of wit is not a surprise as Wit in poetry was a prized quality for Pope and Dryden. There are some small issues such as &#8216;still&#8217; and &#8216;Still&#8217; and perhaps this could be rectified by making everything lower case but this also presents other issues if two words are similar but the capitalisation suggests a different intonation. As I&#8217;ve <a title="Post on Word clouds" href="http://austgate.co.uk/2010/10/tagging-the-revolution-exploring-edmund-burkes-reflections-on-the-revolution-in-france/" target="_blank">blogged before</a>, word clouds are great but not if they don&#8217;t link so, at some point in the future, I&#8217;ll sit down and actually upload a table to create a useful tag cloud. John Levin, of <a title="James Levin's blog onAnterotesis on Ecco" href="http://anterotesis.com/wordpress/2011/08/making-the-tcp-ecco-texts-accessible/" target="_blank">Anterotesis</a>, loaded a csv file of the recently released ECCO files. He loaded Volume Four of Defoe&#8217;s Tour of the Whole Island of Great Britain, which features Scotland.</p>
<div id="attachment_383" class="wp-caption alignleft" style="width: 190px"><a href="http://austgate.co.uk/wp-content/uploads/2011/08/oenvq.jpg"><img class="size-medium wp-image-383" title="Wordcloud of Defoe's journey" src="http://austgate.co.uk/wp-content/uploads/2011/08/oenvq-180x300.jpg" alt="Wordcloud of Defoe's journey taken at Textcamp by Dave Flanders" width="180" height="300" /></a><p class="wp-caption-text">Wordcloud of Defoe&#39;s journey taken at Textcamp</p></div>
<p>Using the Many Eyes Word Cloud, we can see that Scotland is unsurprisingly the largest item but also Lord and Earl are also popular, suggesting that he stopped with or met the aristocracy rather than just travelling randomly. Dave Flanders and John created some cool visualisations using the tool which allow you to follow words in the text and to see which are the most linked to words (using bigrams I would suppose) in a tree fashion. It is certainly something at I will be looking up later for &#8220;quick win&#8221; visualisations.</p>
<p>One of the intriguing projects that was suggested was building our own DIY bookscanner using links currently stored on the <a title="DIY Bookscanner" href="http://wiki.openliterature.net/Tcamp11/DIYD" target="_blank">Textcamp 2011 wiki pages</a>. I think that Dave Flanders might be organising a hack weekend to actually build the machine for real use. I find it interesting but thinking that it would be cool to also see if can be built at home or using iPhone / Android OSes which also entails a software hack, unless an app already exists. That is something to explore later.</p>
<p>Mark MacGillivray, of OKFN and <a title="Cottage Labs" href="http://cottagelabs.com/" target="_blank">Cottage Labs</a>,  and Brian Hole of <a title="Ubiquity Press" href="http://www.ubiquitypress.com/" target="_blank">Ubiquity Press</a>, spoke about Open Access and making scholarship open but also retaining its rigour. Using Open Access, we should be able to share the data, the ways of interpreting it and and the final interpretation which is published.</p>
<p>The science community has been doing this for some while and things like the Panton Principles and Science Commons are showing the way. One of the ideas was to write a handbook for how to use openness in literature and that it is something that we need address and build on. We ought to write an open guide / manual and build on / develop the Panton Principles where necessary as a core set of principles to work with.</p>
<p>Having days like Textcamp and Book Hackday are extremely useful to think about this and to work on the ideas. It is easy to get into echo chambers of mailing lists and blogs, we need these events to meet new people, be challenged to explain ourselves and to either build on the day or go away with ideas to test and try out. The day has excited me out using word clouds again and doing a bit more work on them as a tool to make them useful. It has also got me excited about book scanning and doing some hardware hacking (which I&#8217;ve not really done) before.</p>
<p>Running the Pope essay through Wordle makes me excited about testing what we can do with the ECCO TEI documents that John Levine  links to. Can we hyperlnk to other texts, author and events that are mentioned in it (not just with the annotator tool but in generated HTML) or use HTML 5 to embed audio links to further discussions or pronunciation (for example Byron&#8217;s Don Juan which has been argued as pronounced &#8220;Jew-an&#8221; rather that &#8220;Hwan&#8221; and the arguments for and against).</p>
<p>Perhaps that gets to one of the issues that arose in the break-out discussions in the kitchen. After the lightning talk about digital publishing, there seemed to be an argument about whether current digital publishing was really pushing the boundaries or flailing around. I do think that it has some real benefits for niche publishing but these have not been fully explored. The model will need to change and perhaps become more open in those senses, perhaps linking the raw data to the interpretation earlier to allow the relevant community to peer review the data earlier. Just a suggestion. There are two distinct communities, the top-down business layer and the grass roots layer, activists, data developers and so on. Both would appear to have broadly similar aims but how to put them together  in a useful way for both to learn. Don&#8217;t get me wrong here as I believe I&#8217;m at the grass roots layer, but I think that both sides do have a dialogue which could get around the issues that the music and film industries have found themselves in, i.e. confrontation. We are here to disrupt and make.because we are passionate.  We care about the industry. Publishing is an industry which needs to change and transform itself. Put the two together and there are ways of moving forward. My hope is that in future events, we could get some more publishers along to the event.</p>
<p>The other important thing is that these conversations carry on afterwards. The round table discussions where great as were the break-out in the kitchen ones but they need to carry on or we create our own echo chamber which reduces the value of what happened yesterday.</p>
<p>Whilst I did not do as much coding as I wanted to yesterday, I met some new people and caught up with colleagues. The fact that organisations such as JISC are supporting events like this shows their underlying importance and use to the community. We&#8217;ve started, now we need to carry on by chatting, blogging, sharing and doing more of these events.</p>
]]></content:encoded>
			<wfw:commentRss>http://austgate.co.uk/2011/08/thinking-about-texts-and-communities-at-textcamp/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Weeknotes: Open Correspondence toolkit and converting XML into JSON</title>
		<link>http://austgate.co.uk/2011/05/weeknotes-open-correspondence-toolkit-and-converting-xml-into-json/</link>
		<comments>http://austgate.co.uk/2011/05/weeknotes-open-correspondence-toolkit-and-converting-xml-into-json/#comments</comments>
		<pubDate>Thu, 26 May 2011 19:25:47 +0000</pubDate>
		<dc:creator>iain_emsley</dc:creator>
				<category><![CDATA[Open Knowledge]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[projects]]></category>
		<category><![CDATA[Text Mining]]></category>
		<category><![CDATA[weeknotes]]></category>

		<guid isPermaLink="false">http://austgate.co.uk/?p=342</guid>
		<description><![CDATA[I&#8217;ve been quiet for a bit though generally because I&#8217;ve been quite busy on projects and exploring ideas. After Book Hackday, I&#8217;ve written a post about beginning to develop the Open Correspondence toolkit for the Open Knowledge Foundation&#8217;s Notebook blog. I was also contacted regarding converting the TEI XML pages into JSON, which I am [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve been quiet for a bit though generally because I&#8217;ve been quite busy on projects and exploring ideas.</p>
<p>After Book Hackday, I&#8217;ve written a post about <a title="Open Correspondence toolkit" href="http://notebook.okfn.org/2011/05/25/mining-the-personal-using-open-correspondence-to-explore-correspondents/" target="_blank">beginning to develop the Open Correspondence toolkit</a> for the Open Knowledge Foundation&#8217;s Notebook blog. I was also contacted regarding converting the TEI XML pages into JSON, which I am currently working on.</p>
<p>Once I&#8217;ve done some more work on it, I&#8217;ll post the code and more about it.</p>
<p>I&#8217;ve been working on another project which may or may not be open. It is certainly interesting but I am not sure I can say much more than that. I hope to have a blog post up soon about it but I am rather excited by it and its possibilities.</p>
<p>Meanwhile, the work project continues apace with some surprising outcomes for me. Following watching a video on Facebook&#8217;s architecture, I&#8217;m beginning to see certain parts very differently. I really do hope more on this but I&#8217;ve got some building to do and a bit more delving and reading that needs completion.</p>
]]></content:encoded>
			<wfw:commentRss>http://austgate.co.uk/2011/05/weeknotes-open-correspondence-toolkit-and-converting-xml-into-json/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Book Hackday and using Node with Redis</title>
		<link>http://austgate.co.uk/2011/05/book-hackday-and-using-node-with-redis/</link>
		<comments>http://austgate.co.uk/2011/05/book-hackday-and-using-node-with-redis/#comments</comments>
		<pubDate>Wed, 11 May 2011 20:18:15 +0000</pubDate>
		<dc:creator>iain_emsley</dc:creator>
				<category><![CDATA[Open Knowledge]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[projects]]></category>
		<category><![CDATA[node.js]]></category>
		<category><![CDATA[publishing]]></category>
		<category><![CDATA[redis]]></category>

		<guid isPermaLink="false">http://austgate.co.uk/?p=333</guid>
		<description><![CDATA[I&#8217;ve bumped into Marcus Povey in a few places but the last time was at the Oxford Geek Night. He kindly pointed me in the direction of Paul Squires of Perini when he heard that I wanted to organise a text hacking day. Paul is one of the people behind Book Hackday this coming Saturday [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve bumped into <a title="Marcus Povey's site" href="http://www.marcus-povey.co.uk/" target="_blank">Marcus Povey</a> in a few places but the last time was at the Oxford Geek Night. He kindly pointed me in the direction of Paul Squires of <a title="Perini Media" href="http://www.pereramedia.com/about-perera/about-us" target="_blank">Perini</a> when he heard that I wanted to organise a text hacking day. Paul is one of the people behind <a title="Book Hackday site" href="http://www.bookhackday.com" target="_blank">Book Hackday</a> this coming Saturday which is now backed up with the <a title="Book Hackers site" href="http://www.bookhackers.com" target="_blank">BookHackers</a> site. So no need to organise much more than brain, interest, be at the <a title="Free Word centre" href="http://www.freewordonline.com/" target="_blank">Free Word</a> centre for 10am (it lasts until 8om)</p>
<p>I&#8217;ve just spent the evening using <a title="Node JS website" href="http://nodejs.org" target="_blank">Node.js</a> and <a title="Redis website" href="http://redis.io" target="_blank">Redis</a>, making a change from usually using Redis with PHP or occasional Python. I&#8217;ve been using the latter for logging work and messaging but have been intrigued by the idea of having a system constantly pushing out annotations if added to a document between distributed editors. The idea is that these would all be saved and pushed out to users of a particular document and stored in Redis.</p>
<p>Node strikes me as a good way of creating an efficient back-end system in polling Redis (or perhaps using Redis&#8217;s Pub/Sub architecture to achieve a similar end). I would need to look at whether the Publish command would save the data though.</p>
<p>From this evening&#8217;s hack, I&#8217;ve got the polling from the server working and need to separate out the pushing into the store from the polling to retrieve all notes/comments/et al. I suppose the next thing after that is to pop in an html page or template for a front end.</p>
<p>I&#8217;ll do some more tomorrow but I&#8217;m making headway with Node which is cool as I&#8217;ve been trying to find something meatier to do with it before introducing it into work, even as a development project.</p>
]]></content:encoded>
			<wfw:commentRss>http://austgate.co.uk/2011/05/book-hackday-and-using-node-with-redis/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Marking up Open Correspondence with TEI XML</title>
		<link>http://austgate.co.uk/2011/03/marking-up-open-correspondence-with-tei-xml/</link>
		<comments>http://austgate.co.uk/2011/03/marking-up-open-correspondence-with-tei-xml/#comments</comments>
		<pubDate>Sun, 20 Mar 2011 11:03:26 +0000</pubDate>
		<dc:creator>iain_emsley</dc:creator>
				<category><![CDATA[Open Knowledge]]></category>
		<category><![CDATA[projects]]></category>
		<category><![CDATA[Text Mining]]></category>
		<category><![CDATA[open_correspondence]]></category>
		<category><![CDATA[tei]]></category>
		<category><![CDATA[xml]]></category>

		<guid isPermaLink="false">http://austgate.co.uk/?p=303</guid>
		<description><![CDATA[As part of the next version of Open Correspondence, I&#8217;ve been working on the XML and JSON mark-up. As part of the XML, I&#8217;ve been using the TEI mark-up for the letters. I once hard this described as &#8220;XML for people who don&#8217;t think XML is flexible enough&#8221;. Now I can see why. It is [...]]]></description>
			<content:encoded><![CDATA[<p>As part of the next version of <a title="Open Correspondence site" href="http://www.opencorrespondence.org" target="_blank">Open Correspondence</a>, I&#8217;ve been working on the XML and JSON mark-up.</p>
<p>As part of the XML, I&#8217;ve been using the <a title="TEI P5 XML mark-up" href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/DS.html" target="_blank">TEI mark-up</a> for the letters. I once hard this described as &#8220;XML for people who don&#8217;t think XML is flexible enough&#8221;. Now I can see why. It is a highly flexible solution to digitising texts but can be confusing, especially when switching between versions. I believe the original model that I had been working on was P4 but the current one is P5 so I had to negotiate that change and to make sure that I had the correct elements in the blocks. Even then, there can be two or three different versions of the same element in the section and I do have to wonder about that wisdom rather than simplifying the elements so that there are the extensible elements that may or may not be used. I&#8217;m intending to use the schema again and to really get my head around it rather than tinkering on the edges.</p>
<p>I&#8217;ve attempted this conversion before but think that I&#8217;ve finally got it to a point which is nearly there. What I would really like to do is to put together some sort of tool kit as a core to the Open Correspondence project. Clearly this would be a long-term project and would need more research but it might be useful to other projects.</p>
<p>As well as marking up texts, it would be useful to use the XML mark-up to convert the text into other formats such as Mobipocket or the Kindle formats to allow a user to create their own e-publication. It would also be useful to find a way of using the XML in conjunction with the <a title="psbook command pages" href="http://www.tardis.ed.ac.uk/~ajcd/psutils/psbook.html" target="_blank">psbook</a> command to create a print version of a letter or collection. This does mean that I need to convert the XML into a PostScript file (which raises a host of questions at the moment &#8211; such as converting structured format into layout format) and then print it.</p>
<p>I&#8217;ve also been playing around with the correspondent collections and the way of marking up collections in TEI. I had thought of this as working on creating printable collections and making the data re-usable for printing. Equally it might allow the data to be used in answer to Jonathan Gray&#8217;s question regarding identifying the letters written to a particular correspondent.</p>
<p>When I can get the XML working and validated, then I&#8217;ll look at the JSON output. It would draw a line under this part of the project and allow me to move on. I&#8217;m aiming for a release towards the end of March or middle of April in keeping with trying to keep into a six week schedule.</p>
<p>The next thing after that is to begin answering Jonathan&#8217;s questions in terms of a tool kit to identify weaknesses and to try and write some code to re-use and re-mix the data. I would hope that would be in the next release towards the end of May.</p>
]]></content:encoded>
			<wfw:commentRss>http://austgate.co.uk/2011/03/marking-up-open-correspondence-with-tei-xml/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Finding and mapping influences</title>
		<link>http://austgate.co.uk/2011/03/finding-and-mapping-influences/</link>
		<comments>http://austgate.co.uk/2011/03/finding-and-mapping-influences/#comments</comments>
		<pubDate>Wed, 16 Mar 2011 18:49:12 +0000</pubDate>
		<dc:creator>iain_emsley</dc:creator>
				<category><![CDATA[Text Mining]]></category>
		<category><![CDATA[letters]]></category>
		<category><![CDATA[open_correspondence]]></category>
		<category><![CDATA[rdf]]></category>

		<guid isPermaLink="false">http://austgate.co.uk/?p=307</guid>
		<description><![CDATA[The awesome Jonathan Gray posted an intriguing question on his blog about mapping influence in intellectual history. What he is trying to do is to map the possible routes of influence between people. In his case, it is philosophers; in mine, authors. One of the driving ideas behind the Open Correspondence RDF was to begin [...]]]></description>
			<content:encoded><![CDATA[<p>The awesome Jonathan Gray posted an intriguing question on his blog about <a title="Jonathan Gray on mapping intellectual history and influence" href="http://jonathangray.org/2011/02/20/who-read-what-mapping-influence-in-intellectual-history/" target="_blank">mapping influence in intellectual history</a>. What he is trying to do is to map the possible routes of influence between people. In his case, it is philosophers; in mine, authors.</p>
<p>One of the driving ideas behind the <a title="Open Correspondence RDF schema" href="http://www.opencorrespondence.org/schema" target="_blank">Open Correspondence RDF</a> was to begin identifying the people to whom Dickens wrote about books. Out of this I would like to create some visualisations of the data. You could possibly do this for the places, for example track his letters for one of the US tours.</p>
<p>But back to the original question. I believe this can be done (as I&#8217;ve been working on the XML issues) using Python&#8217;s rdflib. The major issue would be to get this working across version 2.4 and 3 so that any released code would be cross-platform.</p>
<p>Jonathan: as an open call, I&#8217;d love to work with you on this.</p>
]]></content:encoded>
			<wfw:commentRss>http://austgate.co.uk/2011/03/finding-and-mapping-influences/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Adding linguistic interfaces to Open Correspondence</title>
		<link>http://austgate.co.uk/2011/03/adding-linguistic-interfaces-to-open-correspondence/</link>
		<comments>http://austgate.co.uk/2011/03/adding-linguistic-interfaces-to-open-correspondence/#comments</comments>
		<pubDate>Wed, 09 Mar 2011 11:18:59 +0000</pubDate>
		<dc:creator>iain_emsley</dc:creator>
				<category><![CDATA[Open Knowledge]]></category>
		<category><![CDATA[projects]]></category>
		<category><![CDATA[Text Mining]]></category>
		<category><![CDATA[linguistics]]></category>
		<category><![CDATA[open_correspondence]]></category>

		<guid isPermaLink="false">http://austgate.co.uk/?p=301</guid>
		<description><![CDATA[I&#8217;ve been playing around with the Python NLTK package, in particular the WordNet interface. WordNet is hosted by Princeton University. I mentioned that I was going to look at this and the idea of allow a search for lemmas of a word. It came about from a question posed on Open Literature mailing list regarding [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve been playing around with the Python <a title="Python NLTK package website" href="http://www.nltk.org/" target="_blank">NLTK</a> package, in particular the <a title="NLTK WordNet interface" href="http://nltk.googlecode.com/svn/trunk/doc/howto/wordnet.html" target="_blank">WordNet interface</a>. <a title="WordNet lexical database" href="http://wordnet.princeton.edu/" target="_blank">WordNet</a> is hosted by Princeton University. I mentioned that I was going to look at this and the idea of allow a search for lemmas of a word. It came about from a question posed on Open Literature mailing list regarding whether you could search it with Lemmas.</p>
<p>Xapian does word stemming but not lemmas which are slightly different. In stemming, the word production should appear as produc* since produc is the base of the word. However that is nonsense. The base of the word is produce which is what the Wordnet Lemma returns.</p>
<p>Using the API notes, I&#8217;ve been playing around with the following:</p>
<blockquote><p>from nltk.corpus import wordnet as wn</p>
<p>word_lem = set()<br />
ret_lem = []<br />
for i in wn.synsets(author):<br />
[word_lem.add(lemma.name) for lemma in i.lemmas]</p>
<p>ret_lem = list(word_lem)</p></blockquote>
<p>Having used  set to remove any duplicates, I can return the list of the lemmas that WordNet gives. Since you have to use a <a title="Wikipedia on Synsets" href="http://en.wikipedia.org/wiki/Synsets" target="_blank">Synset </a>if you don&#8217;t have the exact part of speech that a word is (Verb, Adverb, Adjective or Noun) since the lemma constructor requires that to produce the lemma. That&#8217;s fine  and I can return the names using lemma.name but the part of speech is in the synset and I&#8217;m not sure how to retrieve it but it would be useful to send back so that a user can see the part of speech and determine whether it is of interest or not.</p>
<p>In the first instance though, I can return the related synsets to the user through an API, yet to be written, and link them to the Xapian search so that they can search for the term if of interest. It begins the opening up of the letters as a linguistic dataset since the tone and language might vary across the letters depending on the correspondent. One would expect letters to his family to be less formal than to a business colleague or fellow author. I&#8217;m aiming to have an early draft up shortly with some improved XML and JSON handling for the individual letters.</p>
<p>Given that I really did not do that well in the linguistics module at the University of Leicester, I&#8217;m surprised that this has been the first API module being developed. It makes sense though but I need to find a way of getting back to the original purpose of the site.</p>
]]></content:encoded>
			<wfw:commentRss>http://austgate.co.uk/2011/03/adding-linguistic-interfaces-to-open-correspondence/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Weeknotes: Open Correspondence updates</title>
		<link>http://austgate.co.uk/2011/03/weeknotes-open-correspondence-updates/</link>
		<comments>http://austgate.co.uk/2011/03/weeknotes-open-correspondence-updates/#comments</comments>
		<pubDate>Tue, 08 Mar 2011 10:01:37 +0000</pubDate>
		<dc:creator>iain_emsley</dc:creator>
				<category><![CDATA[projects]]></category>
		<category><![CDATA[Text Mining]]></category>
		<category><![CDATA[weeknotes]]></category>
		<category><![CDATA[mapping]]></category>
		<category><![CDATA[open_correspondence]]></category>
		<category><![CDATA[timelines]]></category>

		<guid isPermaLink="false">http://austgate.co.uk/?p=298</guid>
		<description><![CDATA[I&#8217;ve bitten the bullet and done it. I&#8217;ve uploaded the current changes to the Open Correspondence site. The current changes are: additional fields in the RDF endpoint.  I still need to do some major work to JSON and XML which I hope to do for the next update. a basic text search a basic set [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve bitten the bullet and done it. I&#8217;ve uploaded the current changes to the Open Correspondence site.</p>
<p>The current changes are:</p>
<ul>
<li> additional fields in the RDF endpoint.  I still need to do some major  work to JSON and XML which I hope to do for the next update.</li>
</ul>
<ul>
<li>a basic text search</li>
</ul>
<ul>
<li>a basic set of geographic data in the collection</li>
</ul>
<ul>
<li> better linking from the letters to the correspondent and geographical  data (NB it is still incomplete)</li>
</ul>
<ul>
<li> some mapping with <a title="Open Layers Javascript mapping website" href="http://openlayers.org/" target="_blank">Open Layers</a> javascript.</li>
</ul>
<ul>
<li> a <a title="Simile timeline " href="http://www.simile-widgets.org/timeline/" target="_blank">Simile</a> timeline (which is a bit slow at the moment).</li>
</ul>
<p>Admittedly some of this is exposing work already there but hidden. However I&#8217;ve also been working on some unicode fixes to the underlying XML which is used by the project which has meant rebuilding the tables and the Xapian indexes.</p>
<p>Following a request on the Open Literature mailing list, I&#8217;m looking at the idea of using Python&#8217;s <a title="Python Natural Language Toolkit" href="http://www.nltk.org/" target="_blank">NLTK</a> to create some linguistic API wrappers around the Xapian search. It strikes me that these letters can be used to create a corpus of Dickens&#8217;s language where you can explore the language used in family correspondence (to his daughters and wife), to other authors (Wilkie Collins) and to readers. That is a longer project though in terms of building the relevant indexes.</p>
<p>I&#8217;m also looking at the idea of clustering a collection of letters to a correspondent and seeing what happens (for some reason, the current script is looking at Wilkie Collins). There is also a set of queries that one might run against letters discusing books and the publication dates to view the distribution. I&#8217;m working on these latter questions at the moment for intended release later this week but I do foresee it being delayed a while.</p>
]]></content:encoded>
			<wfw:commentRss>http://austgate.co.uk/2011/03/weeknotes-open-correspondence-updates/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

