The Goals
When I started working on this project, many were (and still are) extremely skeptical about its future. Many questions were asked, like: "Are you masochist? Who you think will ever use it? The hardware is cheap, we do not need such things!" etc.
In this post I'll explain my point of view and I'll show why "C++ Web Development Framework" is more then just "yet another crazy idea of another geek"
Goals
It is very simple -- create a web framework for development of high load web sites.
Many popular CMS like phpBB, WordPress are not suited for high load sites. Usual loads of that they can stand on are between 5-50 pages per second. Applying sophisticated techniques of caching can increase their performance up to a 200-400 pages per second.
However, caching methods are not always relevant because of "dynamic" nature of such sites. A good example of such CMS is forum. It can include huge amount of personal information for each visitor, like: personal messages, pages it had visited, etc. Thus the caching will not work as expected in this situations.
Thus sophisticated (in terms of development and hardware) scaling methods should be applied. They usually do not scale proportionally with an amount of HW and Human resources given for the solution of the problem.
Thus another approach is proposed.
Where Is The Bottleneck?
In typical web server solution there are many components that can be the bottleneck that should be removed:
- Client -- slow download, `bloated' pages
- Traffic -- low bandwidth connection
- Web Server -- high concurrency loads
- CMS -- heavy ineffective CMS
- CMS--xxSQL connection -- high loads on the CMS to xxSQL traffics.
- xxSQL -- slow database.
The client side today is usually not a big problem unless you put 500K images on your web pages.
The traffic is totally depends on your service provider and can be controlled only by the amount of money you pay.
The web server: there are many high performance web servers like Lighttpd that serve extremely high load web sites like YouTube. It is not a problem anymore.
Thus we have the last: CMS, CMS-DB connection and DB itself.
As we look on DB, there is not much we can do that modern algorithm technologies can't. Thus if you have a DB of 1TB, nothing will help, but if you have a good DB queries caching system, good connection and fast API, than you had a good chances to improve significantly the overall productivity of the system.
The last but not the least -- The CMS itself.
Just in order to understand what is the relative part of it in all the chain: CMS-Connection-DB I had benchmarked a popular web forum software phpBB and I had found that it is capable to create about 50 pages per second, when the underlying MySQL DB can serve about 1,000 buckets of request per second that are required to create this single page.
Thus it seems that CMS can be an important bottleneck as well.
Components
Thus in order to create a web development framework for high load sites we need to choose the best component for the last three elements in the "data to customer" chain.
Content Management System
We need one of the fastest high end programming languages that allow rapid object oriented development.
Dynamic typing languages like PHP/Perl/Python/Ruby are interpreted or byte code compiled, thus they are always slower that any other compiled languages. Thus they are not suitable for this purposes.
JIT languages like Java or C# are not compiled to native code and still slower then compiled ones. Due to garbage collection model they use about twice more memory then required. Thus they are not the preferred one.
Compiled languages like C++/C/Pascal. They are suited well for high performance problem, however they introduce more debugging and stability issues and also are relatively harder to develop.
Thus I had taken C++, as a powerful and still quite easy (in terms of development) language to write on, thanks STL and Boost Libraries.
Data Base
There are several options:
SQL Based solutions like MySQL, MsSQL, PostgreSQL or Sqlite3. They all work with texts as an intermediate connection protocol, (almost) all serve their queries over the IPC and require caching solutions like memcached -- that still work over the network.
Embedded DB -- databases based on C++ API. There are several options:
- Sqlite3 -- still uses text as intermediate protocol, do not allow high concurrency.
- Berkeley DB -- high performance multi threaded transactional DB with C++ API that has in-process cache that can be shared between different threads. It has other problems like: very complex API, binary data representation that is problematic for versions upgrades.
Thus Berkeley DB was chosen as a most suitable solution of permanent storage. It has its own problems that can be solved, but it's performance that is as close as a performance of direct memory access is the most important feature we need.
Summary
There is a long way to go in order to create fully useful web development framework. But this blog is a first example of such system that is powered by: C++, Boost, Oracle Berkeley DB, FastCGI and Lighttpd running on Linux Debian OS.


Comments
כל הכבוד :)
Thanks :)
Hello Sir, what you are doing is extremely good, there are thousands of souls interested in having a c++ web framework. just remember to abstract the database layer to include mssql, mysql, and xml, haha.
Actually, at this point I had chosen Berkeley DB as primary storage.
In future I may consider to use other DB. However, anything prevents from you to use any other DB Abstraction Layer in CppCMS.
Add Comment:
You must enable JavaScript in order to post comments.