About 9 months ago we started using a MySQL database to store the bayes information for SpamAssassin. At first it worked great. It significantly reduced the load on our disk drives that used to house that information in the users home directories. Over time SpamAssassin became more popular and now it looks like over 3000 accounts are using it to filter their email and now the sa_bayes database is over 4.1GB in size. Having a database of that size on an underpowered server (dual Athalon MP 2000+, 3GB ram, 2 IDE disk drives) it just could not keep up. Probably because the disks are just too slow for this kind of application and load.
I tried to diagnose the bottleneck and it appears as though the bayes_token table ( now over 3GB ) is frequently locked causing other access to it to stall for a bit. So, my plan was to move the table from MyISAM to InnoDB and hope that row level locking would help. Well it turns out that even after days of trying, I just could not get our current database altered to be InnoDB in a reasonable amount of time. As a result I created new tables on a different database server and pointed spamd at the now empty but InnoDB based tables. For now performance is great, hopefully InnoDB will scale much better than MyISAM did otherwise I will be doing this again in a few months.
If anyone else is using MySQL to house their SpamAssassin database I would like to hear how you do it. Perhaps there is something simple that I missed.
Leave a Reply