At the beginning of the project I had several options for president storage for CppCMS:
- Use standard SQL databases like MySQL, PostgreSQL.
- Use embedded SQL data base like Sqlite3
- Use Berkeley DB
- Implement my own storage model.
At the beginning I wanted to use MySQL. At certain point I decided to switch to Berkeley DB, and now I come back to the original question: "what data storage to use?"
In order to make a proper design I had run lots of micro benchmarks and finally I had chosen Berkeley DB. However I never used queries from real DB. And now, when I had implemented a simple blog I decided to run benchmarks on the situation that is closer to real one.
I had reimplemented an operation of fetching single post using MySQL and imported the database that included 5000 articles with 10 comments each one and tested.
First of all I had found that there is almost no performance difference in number of pages per second between MySQL and Berkeley DB driven web site. However it was clear for me that an amount of the time that the system had spend in kernel mode was significantly higher with MySQL then with BDB. It was obvious that I had done some something wrong.
So I prepared a simple program that had queried Berkeley DB and MySQL in exact way the application does. And I had found that MySQL and BDB runs in almost same speed, when MySQL application spends about 40% of time in kernel mode (IPC between server and client).
After looking deeply I had found my big design mistake I done with planning my DB. In order to simplify and BDB interface I had separated table that contains texts from the tables that contains meta data. Thus each query had required additional join on another table that had added about 40% of execution time. I hadn't craated such separated table In MySQL, but about same time was wasted by MySQL client on IPC to the server.
I had changed the structure of my BDB tables and run tests once more:
=========================================== Database | runtime for | pages per | 15,000 queries | second ========================================== MySQL MyISAM | 2.75 | 5,450 MySQL InnoDB | 2.77 | 5,420 BDB Old Tables | 2.36 | 6,340 BDB New Tables | 1.67 | 8,980 ==========================================
Berkeley DB is about 65% faster then MySQL because it does not waste time on IPC. On the other hand, it is not so much faster. Also I hadn't used any kind of optimizations for MySQL as prepared statements etc.
Many questions and no answers...
Berkeley DB is indeed faster, but isn't the price for that 65% of performance too high?
When I work with BDB, I do miss things like simple queries for development purposes, I still have no idea how to upgrade my DB if I add a new field and much more.
On the other hand, I want to work with my DB as I work with STL and not to use some fancy staff like ORM or embed SQL to my C++ programs.
And usability? How many developers know what Berkeley DB is and how many know what SQL is? The answer is obvious. On the other hand, how many web developers know to design indexes correctly and write good queries? The answer is obvious as well... (BTW, learning BDB had helped me a lot in writing better SQL)
So... Many questions, many points to think about.