<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>The Aust Gate &#187; open_bibliography</title>
	<atom:link href="http://austgate.co.uk/tags/open_bibliography/feed/" rel="self" type="application/rss+xml" />
	<link>http://austgate.co.uk</link>
	<description>Open Knowledge and Literature</description>
	<lastBuildDate>Tue, 08 May 2012 20:33:34 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.2</generator>
		<item>
		<title>Weeknotes: Data mining, XML and bibliographies</title>
		<link>http://austgate.co.uk/2010/05/weeknotes-data-mining-xml-and-bibliographies/</link>
		<comments>http://austgate.co.uk/2010/05/weeknotes-data-mining-xml-and-bibliographies/#comments</comments>
		<pubDate>Sun, 23 May 2010 10:57:25 +0000</pubDate>
		<dc:creator>iain_emsley</dc:creator>
				<category><![CDATA[Information Retrieval]]></category>
		<category><![CDATA[Open Knowledge]]></category>
		<category><![CDATA[projects]]></category>
		<category><![CDATA[open_bibliography]]></category>
		<category><![CDATA[open_correspondence]]></category>
		<category><![CDATA[rdf]]></category>
		<category><![CDATA[redis]]></category>

		<guid isPermaLink="false">http://austgate.co.uk/?p=155</guid>
		<description><![CDATA[It seems to be have been a week of frantic completion and refactoring. The first half was spent frantically converting html pages into PDFs using Verypdf&#8217;s HTMLtools server product. All in all the manual is very helpful and the test server could be set up quickly. It might have helped the other end if I&#8217;d [...]]]></description>
			<content:encoded><![CDATA[<p>It seems to be have been a week of frantic completion and refactoring.</p>
<p>The first half was spent frantically converting html pages into PDFs using Verypdf&#8217;s<a title="VeryPDF htmltools command line manual" href="http://www.verypdf.com/htmltools/html-tools.html" target="_blank"> HTMLtools</a> server product. All in all the manual is very helpful and the test server could be set up quickly. It might have helped the other end if I&#8217;d remembered to break the file up for printing but that turned out to be a 10 minute jub to put back into production. The next task is to transfer it from the test server and onto the production one but that&#8217;ll need to wait for networking to tweak it a little.</p>
<p>I spent some time refactoring the call recordings archive. For some reason the archiving solution that I hacked up in November decided to start failing in March after it was changed. Despite being put back to its original state it never quite got back to working as it did. I&#8217;ve been trying to tweak it ridon and off but never found the time to complete it. I finally just made the time on friday afternoon to look at it properly. I&#8217;d been thinking about item based filtering after reading the first chapter of Toby Segaran&#8217;s <a title="OReilly page for Programming Collective Intelligence" href="http://oreilly.com/catalog/9780596529321/" target="_blank">Programming Collective Intelligence</a>. (On the back of this, I think I&#8217;ll be buying his <a title="O'Reilly page for Beautiful Data" href="http://oreilly.com/catalog/9780596157128/" target="_blank">Beautiful Data</a> at some point.)  Although this is not really an intelligent programme as such, the techniques have shown some real promise in the hurried tests. Using a Redis datastore, the percentage of found recordings is way up. Fingers crossed for Monday morning when I can see what the scripts run over the weekend. I also spent some time simplifying the matching algorithm so that I didn&#8217;t have to account for so many edge cases when dealing with time.</p>
<p>It seems that we are approaching some sort of real-time status update systems at work. I&#8217;ve sort of been arguing for this for a while to remove the bottlenecks of having each system dependant on another one. One of our suppliers is sending us XML data so I&#8217;ve been playing with Xpath 1.0 (since Xpath 2.0 apparently isn&#8217;t directly supported by PHP but there might be a way of passing the data to Java which adds unnecessary overhead) to extract the relevant values. Anyhow the core is running but I still need to fully test it and add in security.</p>
<p>I&#8217;ve also been asked to design and implement a queueing system for the main internal server. I&#8217;ve run up a quick high level overview but the detail still needs to be worked on. I&#8217;m pushing it back to June so that I can slear the decks of the older projects that are still on the board.</p>
<p>I had a chat with <a title="Jonathan Gray's blog" href="http://jonathangray.org/" target="_blank">Jonathan Gray</a>, a sound guy who does far too much, about digital humanities ideas. We&#8217;ve agreed to keep closer contact with each other about the area and to encourage each other into actually doing stuff (I have half a moleskin of ideas &#8211; time for more code, less talk then).  He proposed the <a title="Jonathan Gray on Bibliographica" href="http://austgate.co.uk/2010/01/bibliographica-open-bibliographic-sourcing-and-maintenance/" target="_blank">Bibliographica idea</a> in January and the team wrote <a title="Bibliographican entry on the blog" href="http://blog.okfn.org/2010/05/20/bibliographica-an-introduction/" target="_blank">a blog entry</a> for the Open Knowledge Foundation blog. It is an idea that I&#8217;m looking forward to playing with and trying to embed data from. (<a href="http://bibliographica.org/">http://bibliographica.org/</a>)</p>
<p>One of the things that I&#8217;ve been thinking about though is increasingly when we do research, we store  web pages, blog entries and so on. Whilst there is way of recording these in a footnote (http:example.org accessed on &lt;insert data&gt; type thing), there does not appear to be a way of building a local archive of these with the relevant metadata for later retrieval, Don&#8217;t know about anybody else but I&#8217;ve got a fair few pages dotted around my hard drive for projects and I&#8217;d like a way of storing these properly and to be able to integrate them into bibliographies or research notes. I know that there is WARC format (<a title="Library of Congress on WARC" href="http://www.digitalpreservation.gov/formats/fdd/fdd000236.shtml" target="_blank">Library of Congress</a> link and the <a title="WARC tools on Google code" href="http://code.google.com/p/warc-tools/" target="_blank">WARC tools</a> Google code project) to play with so need to make time to do that.</p>
<p>I had a mini-hack on the Open Correspondence project last Sunday intending to update a couple of pages and got a little more done than that. The database needs rebuilding but the purl reference (<a title="Letter schema PURL" href="http://purl.org/letter" target="_blank">http://purl.org/letter</a>) now points to the schema. It is so close that I can&#8217;t wait to actually start hacking the data. Time to do the last little bits like tidy up the parser, use the weaving history API to embed a timeline and start using <a title="jena sourceforge archive" href="http://jena.sourceforge.net/" target="_blank">JENA</a>, <a title="ARC website" href="http://arc.semsol.org" target="_blank">ARC</a> and Chris Gutteridge&#8217;s <a title="Graphite rdf library" href="http://graphite.ecs.soton.ac.uk/" target="_blank">Graphite</a> library which worked out of the box (but as yet I haven&#8217;t entirely used it for much yet).</p>
<p>Goals for this week are to finish the Open Correspondence bits, update the trac instance with the various &#8216;todo&#8217;s, write a blog post for the Open Knowledge Foundation for Open Correspondence, do some major testing this week at work on various XML exports and imports. I should just be about caught up then. With any luck&#8230;</p>
]]></content:encoded>
			<wfw:commentRss>http://austgate.co.uk/2010/05/weeknotes-data-mining-xml-and-bibliographies/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Bibliographica &#8211; open bibliographic sourcing and maintenance</title>
		<link>http://austgate.co.uk/2010/01/bibliographica-open-bibliographic-sourcing-and-maintenance/</link>
		<comments>http://austgate.co.uk/2010/01/bibliographica-open-bibliographic-sourcing-and-maintenance/#comments</comments>
		<pubDate>Sun, 24 Jan 2010 11:37:20 +0000</pubDate>
		<dc:creator>iain_emsley</dc:creator>
				<category><![CDATA[Information Retrieval]]></category>
		<category><![CDATA[Open Knowledge]]></category>
		<category><![CDATA[open_bibliography]]></category>
		<category><![CDATA[open_service]]></category>

		<guid isPermaLink="false">http://austgate.co.uk/?p=113</guid>
		<description><![CDATA[Jonathan Gray of the Open Knowledge Foundation has a thought provoking post on the need for an Open Bibliographic Service which he calls Bibliographica. As he writes: lists of publications are an absolutely critical part of scholarship. They articulate the contours of a body of knowledge, and define the scope and focus of scholarly enquiry [...]]]></description>
			<content:encoded><![CDATA[<p>Jonathan Gray of the Open Knowledge Foundation has a thought provoking post on the need for an Open Bibliographic Service which he calls <a title="Jonathan Gray on Bibliographica" href="http://jonathangray.org/2010/01/22/bibliographica/" target="_blank">Bibliographica</a>. As he writes:</p>
<blockquote><p>lists of publications are an absolutely critical part of scholarship. They articulate the contours of a body of knowledge, and define the scope and focus of scholarly enquiry in a given domain. Furthermore such lists are always changing. Books and articles are published and translated all the time. Works fall in and out of fashion. ‘Secondary’ reference works can become obsolete &#8211; considered interesting more for what they say about a particular intellectual period than what they say about their subject matter.</p></blockquote>
<p>I&#8217;ve been working on my own book as an independent researcher and wanted to know common books and articles in the area. As a user I wanted to know what was published in a particular area and what the points of commonality are to identify key works. Jonathan&#8217;s idea would be a help for this and, perhaps more importantly, provide a shared platform form.</p>
<p>As he identifies, sites like Amazon and LibraryThing allow for the user to create lists of books but over time, fashions change and books fall into and out of favour. Being able to compile searchable, sortable lists would allow the user to develop comprehensive lists (and also allow the intellectual historian to figure out zeitgeist&#8217;s from lists) and also realise the web&#8217;s potential for knowledge sharing which should go beyond mere surfing and into finding the source material and perhaps surprising links between data sets.</p>
<p>His specification, I think, offers a fertile starting point. It appears to source from and link to existing sources rather than re-invent the wheel and to also use existing technologies and ontologies like <a title="MARC website" href="http://www.loc.gov/marc/" target="_blank">MARC</a> and <a title="Dublin Core" href="http://dublincore.org/" target="_blank">Dublin Core</a>. I think that the specification is also sensible in its identification of users and groups to create and edit lists. It mentions that the service could be run by individual universities but what would be extremely useful (but perhaps would not happen) if these silos could then link to each other via interfaces to create continually updated communal resources rather than being individual silos.</p>
<p>Perhaps this is a slightly off topic thought but I&#8217;d love to know which books referred to each other, so that we could examine whether Foo writing Bar read the book by Baz which would be an indicator of influence.</p>
<p>The Bibliographica idea mixes &#8220;traditional&#8221; scholarship with crowd sourcing and is a sensible and potentially useful idea and service. I think it would need to build a critical mass of data and sources to be really useful but it could encourage use of resources.</p>
<p>UPDATE: Just one of those thoughts I had whilst making some lemon tea. Actually one of the challenges would be normalising the data sources to update the sources and pull in from the external sources.</p>
]]></content:encoded>
			<wfw:commentRss>http://austgate.co.uk/2010/01/bibliographica-open-bibliographic-sourcing-and-maintenance/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
	</channel>
</rss>

