A glimpse into the wormhole

The High Scalability blog posted a link to Facebook’s new posts search system and the Facebook Notes written about it by a member of the engineering team.

One of the sections mentioned the Wormhole publish/subscribe system that they developed to push data across multiple data centres in near real time. At a very basic level, they appear to have developed a three tier push system that can be queried if the system fails in part to retrieve updates.

A statistic on its effects is:

Compared to the previous system, Wormhole reduced CPU utilization on UDBs by 40% and I/O utilization by 60%, and reduced latency from a day down to a few seconds

(A UDB is a User Database used to connect the data to other services and be replicated.) That might suggest that the original system was a polling one and now they have moved into the data streams model. Given the size of Facebook, I doubt it was a small task!

The use of pub/sub in this way is becoming a familiar pattern (think DataSift and HighScalability’s post about their architecture) and I think Gnip does some thing similar. RabbitMQ talks about it in their tutorial about topics and logging.

What changes is doing in near real-time and being able to recover from crashes and failures.


Random Links

The Observer’s New Review section had a short piece on the British Library’s Beautiful Science exhibition, “How visualising data has changed life … and saved lives“. That section also ran a larger article on the new wave of small press magazines and their design aesthetic. The common theme seems to be having a focus. Wondering […]


In which PandoDaily draws from its roots – journalism and presentation

PandoDaily ran a piece called “From word games to spy games” on the encryption and the NSA’s attempts to undermine it writen by David Holmes, and Explainer Music (where he is a co-founder). Whilst the content interests me, it was the way that the piece was put together that really intrigued me. The subheadings had […]


How not to build a messaging network

This talk was originally given at the lightning talks session of the Oxford Accu group. The updated slides are on my account on Slideshare. These are the fleshed out notes of a talk that I gave that evening based on experiences and experiments. It does contain some responses to questions and points raised. It is […]


Weeknotes – Blogging

I put out a tweet asking for any advice on light weight blogging engines. I was looking at options to replace desktop notes. Having been told about bolt.cm and sculpin.io, I have added them to my list of software to look at. I have been using Ghost as a first experiment and am looking at […]


A quick skim into mining Twitter data

This is a variant on the text prepared for a short talk at the Open Science evening at the Oxford e-Research Centre on Wednesday 27th November. Peter Murray-Rust also spoke at the event on the AMI software and the Chemical Tagger. This is a brief talk about some work that I have been doing in […]


Weeknotes

A quiet couple of weeks with a project being changed from C to C++ to allow me to use some extra libraries. I’ve also been diving into some new texts to look at some relationships that I have been musing on for a while. So I am in the process of exploring how to visualise […]


Repost of Principles for Open Humanities and Literature

A while ago, I posted about the Panton Principles for Humanities and Literature. The Panton Principles are a set of guide lines for the development of Open Science and at the last Open Knowledge Foundation conference in London, I badgered Jonathan Gray about the idea of porting them to Literature and Humanities. One Sunday afternoon […]


Weeknotes – catching up

I’ve been a little lax in catching up with week notes. Apart from running about the place, I’ve been diving into Perl and shell scripting to visualise some log files. It looks like there are some new avenues to go with it. The major project was getting Open Correspondence project back up with some help […]


KimDotcom suggestion on stopping piracy

Came across this via a retweet on Twitter from @KimDotCom‘s Twitter feed.   How to stop piracy: 1. Create great content 2. Make it easy to buy 3. Same day global release 4. Works on any device 5. Fair price — Kim Dotcom (@KimDotcom) September 19, 2013 It seems to be common sense and, well, […]