A few months ago I started on a work project to do some work on social media imports for a CRM. The idea was to query a contact’s Twitter stream, if it existed, and show it to screen.
I updated the existing module to prevent it re-querying Twitter immediately so that the IP address was not banned. Part of this involved cached the call to the local file system so that the pop up window could show the latest few tweets and if it was over a certain age, update the local copy.
The project was postponed. It has been bugging me for a bit so I thought that I would come back to it. The one thing that worried me about it was the storing of the data to file each time. I know that Larry Garfield has written a post on the various ways of reading a file in PHP showing that their performance is as awful as normally mentioned but I wanted to use a different solution rather than continually reading and writing from disk. I settled on the old friend of Redis, an in-memory key-value store.
As readers of this blog might know, I have used Redis in the past for various things, such as messaging (apart from pub/sub for transient messaging or lists for a more queue like structure) or data processing with keys and sets to hold the lists of relevant keys. I figured that the latter could be used for a proper store to keep hold of the social data and then set up a series of sets with different names to allow the store to store different sets of streams together which might be called easily. I came across the redisFS project which Steve Kemp has and it made me think about caching the streams with some sort of file system style metadata.
As part of this small project (which is as yet unfinished), I thought I would use the Pairtree FS spec to handle the names as keys and was aware of the Python implementation. I began porting this to PHP but rewrote the client implementation.
The main operations that I wanted to get working were the getting and setting of data and creating the keys in conformance with the Pairtree spec. The other issue was offering something analogous to the ‘ls -l’ operation and returning keys with the time they were last created. I’ll add the stream size and a state flag (whether the key is read, write, appendable and a mechanism to manage this) in the near future.
Rather than using redisFS’s keys, I thought that I would use Redis’s hash structure so that meta data could be kept with the key rather than having to know the key name to retrieve it. This may or may not work over large scale operations but I haven’t tried to do this. Yet.
The other major piece of work is versioning the streams where required so that earlier versions of the data can be recovered or viewed. This might be viewed in two ways: archive or audit.
In the first case, archive, I would want to be able to see how the data changed and to be able to roll back to earlier versions, or retrieve them. I may want to merely store these older versions of the entity.
In the second case, audit, I am more interested in when a change is made keeping older versions if data appears to be incorrect to be able to identify the when, where and how an entity changes.
These use cases are the reason for the $hashlib variable and new id functions. I can see a few ways of achieving this, something along the lines that version control systems such as Git work, or using the hashing to identify the file version but really want to do a bit more digging before this can be completed.
The ReadMe demonstrates a short piece of PHP that shows the library’s operation in a simple fashion.
This is more a personal project but I’ve been trying some Test Driven-Development with it which was a read head trip to start (frustratingly slow but really showed in some recent refactoring). In time I want to complete the raised issues and also write the code for which it was sort of intended. it has pushed my own knowledge and use of Redis further and has also made me think about various issues to do with storage and file systems which I have brushed on in the past but have never dived into properly.
No Comments