Weeknotes – Scripting and scraping

It has been a while since I last posted a week note, so I thought I would try and get back in the habit.

I’ve been involved in glueing together profiling tools to run so that I can have a vaguely generic framework to profile software at the IO level and the CPU level. Shell scripting has been used to run the scripts and kill off processes at the right moment but the challenge really came in the log files.

As Thomas Doran discusses in his talk about Message::Passing at the London Perl Weekend, dates in log files don’t match. Nor does the log data tend to match what you really need. In one case, Python was used to extend a programme to extract parts of data that I needed but Perl came to the rescue most often. I get why it is disliked (or now out of favour) but I felt that in the Practical Extraction and Reporting part, well you get the rest of the commonly quoted backronym, the language shows its strength. Most of the mining was really normalisation and conversion but also sending data to different files where needed.

The next stage was visualising some of the data and gnuplot proved invaluable for smaller graphs. Once the raw data hit a certain size, it proved less useful (okay, it wouldn’t load the file) and so a next task is to work out how to visualise it. I’ve got some ideas but I am on leave for the next few days and trying to have some sort of break (which probably means spending some time on these issues and what I’d like to do to solve them 😉 ).