Exploring Charles Dickens’s networks

As part of the ongoing Open Correspondence rewrite, I’ve started working on some visualisations after a conversation with Rufus Pollock during one of the Humanities calls. One of the immediate ones was a force-directed graph to link all the correspondents to the authors. Well author at the moment. Although I am aware of SigmaJS, I decided to use D3‘s force directed graphs as an initial exploration in the use of such graphs in determining their use in the project.

It lays the basis for future graphs though.

I have put together two network graphs showing the total overview of the letters and one which looks at letters linked to the novel, Bleak House.

The first graph just shows the counts of authors linked to Charles Dickens. This has been set up limiting the group to two groups, the first being the root (in this case the author, Charles Dickens) and the second being an arbitrary number set for each link. Whilst it looks fairly uniform and boring, it does give a sense of the relationships as expressed in the number of letters to each recipient.

In the second graph, I changed through groups to being the known years in the letters, to limit it to a range between 1840 and 1872. I’m guessing that months could be added but it makes the potential group range rather large. Initial attempts just used the force arguments for charge and link. Normally, with a few groups this was fine but with a large amount of links and groups the graph became somewhat large and unreadable. To solve this, I read the D3 Force Gravity wiki and added gravity to bind the graph more closely together.

This, I think, gives the viewer a sense of the relationships.

An idea that I had was to change the groups so that the focus, or the target, was the year or decade but this was very confusing and does not give a sense of cohesion or collection. Although it does give a quick overview of the groups of letters by decade, it distracts from the original set of relationships which might give a user a better understanding of them. I suspect that time dimensions are better and more clearly expressed using time lines and charts but this is a set of experiments.

Both give a good overview of the figures but what happens if we are looking for a particular novel. One of the aims of the Open Correspondence project was to explore the ways in which information about texts, such  as novels or plays, was transmitted. Effectively looking for hash tags in Twitter where these are used to highlight a word, set of words or phrases.

I went back to the original dataset and re-parsed them looking for all mentions of Bleak House. There were not as many as I had thought but the collection that I have only contains around 900 letters which is not the full dataset. However I expressed the relationships as a graph and a time line (not yet published) to show how we might want to visualise them. They do show an outlier, a letter to the author Wilkie Collins in 1862, but that the rest are in 1852 and 1853 when the novel was being serialised and then published in book form.

I did create a quick time chart but I have not yet published it because it is a little too small and messy but it will appear shortly. What it does show, unsurprisingly is the publications spark a cluster of letters. I also ran the numbers for the publication of David Copperfield and this showed a similar pattern but the clusters are longer and more sustained. At first glance, this might show that Copperfield was more popular, that the collection editors (Georgina Hogarth and Mamie Dickens) either kept or had more letters pertaining to this novel, or it might be some other reason.

The visualisation does not necessarily give answers but it does give the reader a tool to better dive into the collections. So force directed graphs do have a real use in this respect for showing clusters and I learned more about creating and manipulating the JavaScript to make them a little clearer and more friendly. I think that there is a way to go on these to make them more usable and user friendly.

The next post will explore time lines and charts since D3 can be used to create these as can Simile, a project that I have used before but not for some time.


Using Redis as a store

A few months ago I started on a work project to do some work on social media imports for a CRM. The idea was to query a contact’s Twitter stream, if it existed, and show it to screen. I updated the existing module to prevent it re-querying Twitter immediately so that the IP address was


More STOMPing with Drupal and Bean

Recently I scratched an itch and posted a module onto Drupal.org as a sandbox. It is perhaps slightly misnomered in “Bean Stomp” which is its development name. It is an integration of Jeff Mesnil’s Stomp over WebSockets JavaScript library which allows real time updating of a page without polling or similar overhead. I had been


Using RPC/Encoding on an Apache Camel route

I have been looking at Apache Servicemix for an integration project recently to route messages between services in different network zones and written in different languages together. The CXF project which is used to set up and wire together web services provides many methods of doing this and can either build a service from its


Reflecting on log driven programming

Antirez recently posted about “Log driven programming is a real productivity booster” on his blog. He mused on using notes to keep focus on the what your doing at the moment. Personally I’ve always preferred notebooks or the back of envelopes if that is the only thing to hand. His point still stands though. It


Showing error messages and redirecting in SugarCRM

I was working on applying some rules to SugarCRM this week. The logic hooks have allowed this to be build in before saving or deleting a file. Thinking about the UI and the actions, I wanted to create a set of redirects which would show an error message if something could not take place. Normally,


Parsing ActiveMQ statistics to check on queue health

In my last post about monitoring ActiveMQ, I looked at various advisory queues. I also mused on the possibility of reading the stats from the queues and topics and parsing them so that if issues occur, they can be dealt with quickly rather than waiting for an issue to be reported. Although this information is


Using advisory queues to monitor activity on ActiveMQ

In my last post on ActiveMQ, I mused on using mirrored queues to keep a message store for varying reasons. It is a naive way of doing this but it showed how such a thing can be done quickly. As part of some ongoing explorations into operations and governance of MQ systems, I have also


Simple message storing with ActiveMQ

I have been doing some digging into monitoring queues on ActiveMQ and pushing the messages into a message store for what ever reason. The main reason for doing this would be monitoring error messages and giving developers and operations a way of exploring what is happening in a queue if data at a recipient system


Attending the Open Humanities Hack

I’ve just come back from a couple of excellent days of Humanities Hacking, organised by the King’s College, London Digital Humanities department and the Open Knowledge Foundation. To be fair, it went slightly differently than I thought it would. After an interesting start trying to find the room we were in, a few of us