How not to build a messaging network

This talk was originally given at the lightning talks session of the Oxford Accu group. The updated slides are on my account on Slideshare. These are the fleshed out notes of a talk that I gave that evening based on experiences and experiments. It does contain some responses to questions and points raised.

It is a truth universally acknowledged, that components of a distributed system in possession of data must be in want of a method of sharing it. With apologies to Jane Austen.

Mama, we have an engagement…

There are various methods of sharing information between systems. Shared databases, web services, polling, publish/subscribe and push architectures. There are changes between states and these need to be shared some how.

Publish and Subscribe is one method (used by many such as DataSift link to the HA article)) but I’m going to talk about push architectures and some of the experiences that I’ve had with them. This is not because I have something against them, I just have more experience with

In brief, push architectures are based on system A pushing data to system B on a change of state. Generally, but not always, this uses a broker that is aware of any connected systems and what type of messages that they listen for. (There are brokerless systems such as ZeroMQ that can work differently but that is another day.)

The brokers that I have been using keep these in memory and are aware of the connections. Most of this is great until the client fails without disconnecting cleanly. At this point you have a data zombie. Data zombies like a byte to eat. Unfortunately your data might being eaten and unless there is a copy on the network somewhere, then that is it. By default my experience of RabbitMq is that it will not keep copies of messages, you need to add your own store on to them and ActiveMQ can occasionally allow you to recover them but sometimes the admin UI prevents this. By default ZeroMQ certainly does not keep messages, this would need to be developed and might have a performance impact if not implemented correctly.

Reconnection and heartbeating is a must for the clients to make sure that they are not only up but also responding and sending messages. Heartbeating is also important to ensure that the broker is also working and hasn’t run out of memory or crashed uncleanly. I have written some code to explore this with Node.js and Perl for a production system to integrate with Nagios.

What messages are you sending? Messaging works most efficiently with smaller payloads, such as strings, JSON, XML encoding and so on. These can create some interesting problems if a header is not set correctly to alert the broker what type of message is being sent. I know files can be sent across brokers and systems. I’ve done it before but I really think that it would be more efficient to use existing protocols for sending these where necessary. May be I’ll change my mind some time.

Some messages brokers (as I’ve not tested every one) tend towards unicode. These are sometimes hidden by a content type like XML or JSON but can bite when sending plain strings in streaming applications, for instance taking data from Twitter’s streaming api. Unless defensive coding is taken onboard, which one ought to be doing, then the broker might well complain or even just throw an error.

Where are you firewalls and ports? If you’re messaging system is going to be across a LAN, then this is a slightly moot point. Although HTTP is used as the main transport mechanism to prevent yet another protocol for operations, ports need to be opened and remain open. I found NMap to be very useful to test the route across Eth0 from the producer server to the client to make sure that no firewalls were blocking the message path. Bare in mind that if an application, say on a web server, is connected to a remote backend, it might have a different IP to the server’s IP. Without traffic, firewalls will close up without warning leaving both client and broker convinced that the other is available. Heartbeating will help with this but it is a useful thing to be thinking about.

What messaging systems do you have? If so, do they speak the same protocols? You might start out with one, and another team uses a different one. Can you guarantee that they both speak the same protocol? How do you get the two to talk to each other?

If they do not, then you need to use a network forwarding bridge (ActiveMQ used to suggest bridging), or shovel (RabbitMQ has the shovel concept) or even a tool like Apache Camel. These are simply different names for a piece of code that consumes a message, transforms it and then produces it to the recipient broker. In some cases, you might need to write a piece of code that sits in the middle of the connections or the edge of network zones. It does have a speed and performance hit though as I discovered using some very unoptimised PHP code over one Christmas when I originally came across the question: http://austgate.co.uk/2011/12/bridging-message-queues-in-stomp-and-amqp/. It would probably be better off to write a custom connector for the relevant broker (or use one if it already exists) and avoid having intermediate code but you might be in a position where this is not possible.

If the latter, can you separate the networks? It might be better if the system needs high performance then it might be an idea to remove the bridges rethink the network. what ever language is used, there are always performance hits as messages are decoded and re-encoded.

These are only thoughts and suggestions from experiences in production and testing various systems. I am always trying to extend my own experience and try things with messaging queues.

Update: the Slideshare link as incorrect