I’ve been thinking about full text searching for the letters project and trying to find various solutions that are open source. On the Open Shakespeare and Open Milton sites, we used the Xapian project which is an excellent search engine. However I wanted to try and find a way of getting a search running using PHP and MySQL which is what the site uses at the moment although I’d be happy to also use Perl. (I also wanted to impose a limit to use technologies that I currently use at my current job.)
I started with reading an article on the Zend site that offers an overview of setting up a table to run with a Full-Text index. As the article mentions, you have to ensure that the column which is being searched is either VARCHAR or TEXT as MySQL. If it is not in either form, then just alter the column using
ALTER TABLE <tablename> MODIFY <column> TEXT
(or VARCHAR but TEXT is probably preferable). What the Zend article does not mention is that the table type needs to by MyISAM rather than use InnoDB (which means that transactions won’t work on the table). Having made the alteration, I ran the query:
SELECT * , MATCH (<table>) AGAINST (‘<search term>’) AS score FROM <table> WHERE MATCH (<table>) AGAINST (‘<search term>’)
The table returns all the columns with a score against the term.
The SQL code just needs calling as you would any other form of database code. I’m still playing with this but I’ve been ordering the table by the score descending (ORDER BY score DESC) so that the most relevant results are posted for the user.
I do think that I need to do some pre-processing on my own results set to highlight relevance and to extract further semantic meanings for results. For example the publisher ‘Chapman and Hall’ that I could run on the Dickens letters (http://austgate.co.uk/dickens/search.php?term=Chapman&submit=Submit+Query) could equally pull up other businesses or people. I still need to write a parser that can make some sort of judgement even if it is a guess.
I’m sure as I carry on developing the engine and bringing everything together for the project, I’ll have further thoughts on the creation of an engine and creating a more advanced version. This does at least give me a start using current tools (though it is perhaps not as good as Xapian but sometimes you have to at least learn some of the basics).