Looking at mentions and users in a Twitter message

I was preparing for the recent OK Festival and discovered that the Weird Council was taking place; a conference on the awesome China Miéville. As you may guess, I am a bit of a fan. Unfortunately I was not aware that it had taken place so I watched it on Twitter.

Whilst on my travels, I got curious and decided to search for the #mieville2012 hash tag to see what had happened and how it might be used to explore how Twitter might be used to explore reactions to works. More that another day.

As one of the the things that I wanted to look at was the way that users mention each other and how it might be represented. Thinking in terms of letters, it shows a more interesting network; the connections and conversations between users. In the JSON object (as I requested JSON), I noticed the “to_user” key (represents the screenname), along with the “to_user_id” (represents the Twitter user id) and “to_user_id_screen” (represents the public name).

I decided to use this as a way of finding the sub-network (if we assume Twitter is a network). I quickly parsed the JSON and found the links and made a quick bar chart of how many times a screen name had been mentioned; the assumption being that those who are most mentioned are the loci of the network. As a quick start this was good and give me some data to work with.

However, whilst looking at some of the tweets being mentioned, I noticed something that did not make much sense. If the “to_user” keys are filled, then the tweet contains a user name. Great. There is a problem though. The “to_user” only contains one name. A tweet can have one or more names.

From the (non-exhaustive) reading that I’ve been doing, these appear to be being used in the statuses/mentions API call. I suspect that Twitter will have an internal way of calling these which is not exposed so that if a tweet has two names, it will be shown in the authenticated  user’s time line.

For the user to obtain the network, the tweets need to be parsed for the @<username> and extract those. Of course, the fact that a name is mentioned does not necessarily mean that a user knows that username or is a friend. That is a different set of questions but might be worth keeping in mind in machine reading some of Tweets; that any user name parsed from those fields should be checked.

So why look at this? Well I am interested in doing some sort of exploration into the Social Graph on letters in the nineteenth century and also delving into texts and how they might be mentioned in a social context which goes to how ideas can be spread.

Given that I was having some issues with connectivity at the time and could not get hold of some source material, I dug into Twitter. Also I am a screaming Miéville fan boy. I am not 100% sure how this will play out but I am having fun finding out. It could be different way of  looking at how texts have a social life.