Reading in remote data sources into Drupal 7 part one – the custom code way

I’ve been looking at integrating a variety of different data sources in a Drupal project for various reasons but generally from the perspective of pulling data from existing sources. These do not necessarily go into the same system and equally, only parts may be required for by differing systems.

In a way, I am trying to move towards a publish once, pull any where model using web services (and a message queue where necessary but it would over kill here).

Since I was on a day off,  I thought I would spend some time digging around and having a play. I have been putting stuff together for SugarCRM (using YUI in a Cloud Connector) but thought it would be fun to pull the same data through into a Drupal 7 instance. A little searching took me to this post by Larry Garfield, Remote Data in Drupal: Museums and the Web 2009, which offers some useful information.

So why not dive in and use the Services module? For starters, I want to experiment and find a way of doing this myself. Secondly, my impression of Services is that it is a great way of providing Web Services so that other applications can use Drupal but does not consume them – it may be an answer for another project. As per the Services handbook page on:

Services is a standardized API for Drupal that allows you to create “services”, or a collection of methods, intended for consumption by remote applications.

Larry Garfield takes the time in his post to outline three ways of viewing data in Drupal. There does seem to be a certain undercurrent of Drupal being the primary system for consuming data in his post which I think were his use cases whereas mine is that Drupal is one of other systems potentially consuming the data. This is why I am sort of ignoring the third option (wipe/rebuild import) for now. It may be useful later if the data was only ever going to be consumed by one system but right now this means that your system may have two copies of the same data. The very problem that I am trying to get away from!. Of course, an option might be to import it into Drupal then to publish with Services. This however defeats a certain point if using remote data sources, which is that I am interested in providing/showing a unified set of data web services. This later might be of use in building a data repository with Drupal (but that really is another project!).

My first attempt at this is the easier version of setting up a module which has functions to read the remote data source and return the data as an array to Drupal. It then has functions to explore the data and then theme it, in keeping with how Drupal should work (and other decently worked out modules). The layers of logic are separated. Since I am only interested in proving the idea, I have not worried too much about the notion of routing and transforming. For now, I’m trying to get a simple page working with a table. Since this is data, we really want or need more interesting things to happen to it but again that would be another day. The first option merely pulls in data and the Drupal theming layer takes over.

No nodes are created and the data is constantly being pulled across from the service. This does present a performance issue since no caching is involved here. For now, this is not an issue.

For display purposes, I wrapped the data into theme_table function to output a simple table. For production purposes, I would probably be better of creating my own theme template to absolutely control the output. Better still (and this is a completely different topic), I could find a story to tell with my data and, since it is largely numeric, use charts and graphs (jQuery and so on here I come!) to pull out patterns for the user.

Looking at the network profiler in Chrome, it would appear the while call took 703ms. The web service took 539 ms to respond and 74ms to receive the JSON data. During the remaining 164 ms after the waiting was over and the data had been received and was being processed, the remaining time was spent calling the Javascript and theme functions.

Now to be fair, the web service it is calling is unoptimised and would probably be a little faster on a server but it hints at the major issue for constantly going back to the web service. The page is slightly at the mercy of the back-end service and any or no caching. I don’t have any on this test case but a production API should do.

Despite the fact that you would need to write a fair amount of custom code to do this, there is a flexibility that is gained to control what happens to the data.

The next method to try would be what Larry Garfield calls ‘Lazy bridge nodes’ which builds on this first, loose version to try and leverage the power of Drupal’s internals to help the developer. Hopefully this will give me both the flexibility of custom code but also be to hook into any contributed modules and node methods to pull out the data and make it useful for the reader.