During the previously blogged dinner with Ben and Rufus, we talked about the nascent work on the letters project. Both have “encouraged” me (it didn’t take too much persuasion, it must be said) to move the project to the Open Knowledge Foundation and to port it to Python with a Redis backend rather than the current PHP/MySQL set up. I hope that the move will be complete soon.
Archive for March, 2010
A change to the Letters project
Sunday, March 28th, 2010Textcamp announced
Sunday, March 28th, 2010Had dinner with Rufus Pollock and Ben O’Steen on Monday in Oxford. As part of the dicussions, the notion of Textcamp was raised and Ben has created the Textcamp website with an associated blog. It is a slightly bigger concept than I had had but the approach, I think, will allow the creation of a wider community and a place to publicly follow up any ideas that get thrown up. I like the idea of hacking texts as well and it will be great to have a place to discuss ideas and to learn. Equally Ben’s post makes it clear that it should be friendly and helpful leading up to a Barcamp style event. It is slated to run in August or September. I can’t wait.
Exporting and querying Dickens data
Sunday, March 21st, 2010As a follow up to the posting regarding the propsed ontology, I’ve started to try and create a SPARQL endpoint. At some point soon, I want to use the new version of ARC as the version I’ve got here is a little out of date. After that the next thing should be to allow the endpoint to be converted into other forms like JSON.
UPDATE: I’ve created an endpoint using the default ARC settings here: http://austgate.co.uk/dickens/endpoint.php
Creating the text ontology
Thursday, March 18th, 2010I’ve been working quietly on ideas for an ontology to describe relationships in a letter from the correspondent to people referred in the text. It is intended to complement and extend the Dublin Core and Foaf (Friend of a Friend) namespaces. Anyhow I’ve decided to publish a first set of thoughts on it having sat on the project for a while.I’ve sort of thought of it as using the text namespace in the text, which I currently doing, but it is not set in stone.
Simple Ontology for Relationships in Texts
Text namespace
austgate.co.uk/ontology/text
Definition: An ontology which allows for the linking text items, such as letters, together. It extends and complements Dublin Core (DC) and Friend of a Friend (FOAF).
Terms
Appearsin
The term is used to denote a work in which a character appears. For example:
Dear Alice,
As you may know I am coming to the end of the latest draft of the Ponsonby diaries. Bob Ponsonby is making his way across the marshes…
The character Bob Ponsonby could be referenced as text:Appearsin to denote his appearance in the work. This allows queries to find documents where the characters from a work appear, rather than just individual characters. It would usually be considered as a collection of text:Character references.
Character
A fictional person who is referenced in the text. This element is used to disambiguated between fictional and non-fictional characters. Non-fictional, i.e. real people, are denoted by foaf:Person. Character is a subset of foaf:Person and is intended for fictional people. For example, in a letter from an author to an agent, the author may describing their latest project.
Dear Alice,
As you may know I am coming to the end of the latest draft of the Ponsonby diaries. Bob Ponsonby is making his way across the marshes…
In the example, Alice is a real person and could be denoted as such by using foaf:Person but Bob Ponsonby is equally a name and a person. Since he is fictional in this letter, he could be denoted as text:Character in any RDF representation to allow users to link documents where the character is mentioned.
<text:character
rdf:ID=”http://austgate.co.uk/Dickens/characters/pickwick”>
<foaf:name>Mr. Pickwick</foaf:name>
<text:appearsin
rdf:resource=”http://austgate.co.uk/Dickens/works/pickwickpapers” />
</text:character>
Correspondent
This field denotes the correspondent of the letter. It is a subset of foaf:Person as it should denote a real person. (However it is perfectly possible for a fictional letter to be written and in this case it would perhaps be inappropriate to use foaf:Person).
textReferred
This refers to a text (book, verse or similar) which is referred to in the letter being serialised. It is intended to allow the building of graphs between the letters where a text is being referred to so that a graph can be built of what an author was doing or thinking about a text around the time or after writing the text. It is designed to allow for some contextualisation of the referred work. It could also be used to build a reading list, possible influences or forgotten works that the author was aware of at the time.
Work
The term denotes a type of text, in this case a book. It would be a collection of Dublin Core terms.
<text:work rdf:ID=”http://austgate.co.uk/dickens/work/pickwick”>
<dc:title>Pickwick Papers</dc:title>
<dc:author
rdf:resource=”http://austgate.co.uk/dickens/people/CharlesDickens”>
<dc:publisher>Chapman and Hall</dc:publisher>
</text:work>
I’m still working on applying some of this to my letters project (which sort of came about because and from the curiosity about the idea). Many thanks to Brian Matthews of the e-Science department of the STFC but any mistakes or oversights are entirely mine.
Growing and using data
Wednesday, March 17th, 2010Just seen an article on Techcrunch by Bradford Cross of Flightcaster regarding the growth of data on the Web. He appears to argue that data and its uses will drive the Web soon, writing:
the data age is less about the raw size of your data, and more about the cool stuff you can do with it. Now that there is so much data, it is time to unlock its value.
It seems fairly straight forward given the lower barriers to growth and tools to create and access data.
There are issues with this such as learnng how to best leverage these for the user and to gain most benefit. It’ll certainly be an interesting time and Cross identifies a few technologies and ideas which may or may not gain currency but will spark debate nonetheless.
Mining data driving the web?
Wednesday, March 17th, 2010Just seen an article on Techcrunch by Bradford Cross of Flightcaster regarding the growth of data on the Web. He appears to argue that data and its uses will drive the Web soon, writing:
the data age is less about the raw size of your data, and more about the cool stuff you can do with it. Now that there is so much data, it is time to unlock its value.
It seems fairly straight forward given the lower barriers to growth and tools to create and access data.
There are issues with this such as learnng how to best leverage these for the user and to gain most benefit. It’ll certainly be an interesting time and Cross identifies a few technologies and ideas which may or may not gain currency but will spark debate nonetheless.