Full-text search: MySQL vs MongoDB vs Sphinx
You have indexed all your favorite music sites, now you has an index with hundreds of thousands of pages and you want to allow your users to make searches in the best and fastest way possible?
This is the work for a full text search engine!
Yes, but which one to choose?
There are many Search Engines very valid; this article will discuss the two databases with full text index supported by OpenWebSpider: MySQL and MongoDB; and in the end we will compare their results with a software that does this by profession: Sphinx Search (read how to configure openwebspider and Sphinx).
OpenWebSpider, from the very beginning, has always used MySQL as storage for indexed pages. The MySQL’s FULLTEXT index does not actually need any special configuration and, for not too large indexes, works very well.
From version 0.2.3 of OpenWebSpider has been introduced support for storing pages over MongoDB.
But, in practice: Who is faster in full text searches?
This test was performed on a VM Windows 8.1 with 3Gb of RAM with an index of about 15000 pages.
Let’s try some common search query:
|live music -news||0.100||1.460|
|music live summer -news||0.105||1.621|
|live music -summer||0.081||2.025|
|music live summer community -news -itunes||0.105||2.145|
Unfortunately, this test showed that MongoDB works perfectly when it comes to storing documents (as they say with mongodb you can “scale the universe”) but not so good when you have to make full text searches on a large number of pages with complex search queries.
I think that the development team will have to improve things in future versions.
I have not reported in the table the results obtained by sphinx because they are always below 10 milliseconds.
I’ll probably have to raise my index to hundreds of thousands of pages before putting Sphinx under pressure.
In conclusion, I can confirm that MySQL works very well and that all search queries performed took less than 150ms.
MongoDB can be considered an excellent storage for pages, but right now the power of its search engine is not enough for an index of medium size. Furthermore, particularly complex query require a time which would be in any case unacceptable.
As suspected, Sphinx comes out as the absolute winner. All search queries performed, notwithstanding their complexity, took few ms.