Boost Namespace Renamer

Sunday, May 24, 2009, by artyom ; 0 comments

I had written a little Python script that solves the collision problem of different versions of Boost library, it is especially critical for library project that aims to provide backward binary compatibility.

Script source: http://art-blog.no-ip.info/files/rename.py

Running:

./rename.py /path/to/boost/tree new_namespace_name

Notes:

No guaranteeing, if this scripts eats your cat or destroys all your source, it is Your problem.
It is quite an initial version, but I managed to build most important Boost libraries like: thread, program-options, system, iostreams, date-time, regex, serialization, signals and run some regression tests.
This script does not update Jam files, so, some conditional stuff(like symbol exports for DLL) would probably not work. I assume that each project that pics some code from Boost, provides its own build system.

What's Next?

Sunday, May 3, 2009, by artyom ; Posted in: Progress, FastCGI, Framework; 10 comments

The road map of the project includes two important milestones:

CppCMS core components refactoring including following:
- Removal of dependency on CgiCC -- today there is about 5% of CgiCC library is used, many features are not supported by it or are not supported well. For example: file upload handling in CgiCC is very primitive, limited and error prone, support of cookies buggy and so on.
- Using of Boost.Asio as internal event handler, because:
  1. It provides transparent synchronous and asynchronous event handling allowing future implementation of server push technologies.
  2. It provides efficient timer based event handling.
- Removal dependency of libfcgi and writing Boost.Asio friendly implementation of FastCGI/SCGI connectors. Implementation of HTTP connectors as well.
- Support of plug-in applications in CppCMS framework.
- Improving compilation speed by representing more pimpl idioms and removal of unnecessary classes.
Better support of i18n and and l10n:
- Transparent support of std::wstring with forms including automatic encoding testing and conversion.
- Support of std::locale for localization for outputs like numbers, dates, monetary, translation and so on.
- Optional support of ICU and icu::UnicodeString and icu::Locale that would add unsupported features by std::locale and allow replacement std::locale features with more correct implementations provided by ICU.

These changes will significantly break API backward compatibility, but it would be possible to adopt the code almost "mechanically" to the new API.

Unicode in 2009? Why is it so hard?

Wednesday, April 15, 2009, by artyom ; Posted in: Framework, Unicode and Localization; 8 comments

From my point of view, one of the most missing features in C++ is the lack of good Unicode support. C++ provides some support via std::wstring and std::locale, but it is quite limited for real live purposes.

This definitely makes the life of C++ (Web) Developers harder.

However there are several tools and toolkits that provide such support. I had checked 6 of them: ICU library with bindings to C++, Java and Python, Qt3 and Qt4, glib/pango and native support of Java/JDK, C++ and Python.

I did little bit challenging test for correctness:

To Upper: Is German ß converted to SS?
To Lower: Is Greek Σ converted to σ in the middle of the word and to ς at its end?
Word Boundaries: Are Chinese 中文 actually two words?

Basic features like encoding conversions and simple case conversion like "Артём" (my name in Russian) to "АРТЁМ" worked well in all tools. But more complicated test results were quite bad:

Results

Tookit	To Upper Case	To Lower Case	Word Boundaries
C++	Fail	Fail	No Support
C++/ICU‎	Ok	Ok	Ok
C++/Qt4‎	Ok	Fail	Ok
C++/Qt3‎	Fail	Fail	No Support
C/glib+pango	Ok	Ok	Fail
Java/JDK	Ok	Ok	Fail
Java/ICU4j	Ok	Ok	Ok
Python	Fail	Fail	No Support
Python/PyICU‎	Ok	Ok	Ok

Description

ICU: Provides great support but... it has very unfriendly and old API in terms of C++ development. The documentation is really bad.

Qt4: Gives good results and friendly API, has great documentation, but as we can see, some tests are failed. Generally, useful in web projects.

Qt3: Provides very basic Unicode support, no reason to use any more, especially when Qt4.5 is released under LGPL.

C++/STL: Even basic support exists, the API is not too friendly to STL containers and requires explicit usage of char * or wchar_t * and manual buffers allocation.

Glib: Gives quite good basic functionality. But finding word boundaries with Pango is really painful and does not work with Chinese. It has very nice C API and quite well documented. It uses internally utf-8 which makes the life easier when working with C strings. It still requires wrapping its functionality with C++ classes or grabbing huge GtkMM.

Python: has very basic native Unicode support. PyICU has terrible documentation.

Java: JDK provides quite good Unicode support, it can be quite easily replaced by ICU4J (actually most of JDK is based on ICU).

Summary

It is a shame that in 2009 there is no high quality, well documented, C++ friendly toolkit to work with Unicode.

For real purposes I would take QtCode part of Qt4 or wrap ICU library with friendly API.
Glib is good as well and, what is very important is its high availability on most UNIX systems.

When there will be Boost.ICU or Boost.Unicode just like there is Boost.Math or Boost.Asio?

Is Data Base the Bottle Neck of Web Service?

Saturday, April 11, 2009, by artyom ; 6 comments

One of most common knowledge of many web developers is the assumption the data base is the bottle neck of their web services. Indeed, if the search in the database takes $O(\log n)$ , it is probably the one that should take most of the time because most of other processing should take $O(1)$ .

However, the complexity theory tells one important thing... $n$ should should be sufficiently big. How much is that? 1GB, 1,000GB, 1,000,000GB or even more?

So let's take as an example on of the biggest web projects: Wikipedia, or Wikimedia's Server Farm.

The facts:

WikiMedia's Server Farm includes about 300 servers.
The main web page path includes:
- 95 Squids servers --- the upstream cachers that are responsible on 78% of performance (the hit ratio)
- 144 Apache+PHP servers that create the rest 22% of pages.
- 20 MySQL Master Slave servers.
Other servers are used for other various purposes like search, static files, images rescaling and more.
Memcached improves by additional 7% of hit ratio for the apache servers.

So, how the SQL servers can be the bottle neck of the system when they are about less then 10% of the server farm? Assuming balanced system, it is obvious that Apache and PHP consume most of computation resources of WikiMedia server farm.

So:

Is Data Base is the Bottle Neck of Web Service?

Definitly not.

Would switch to CppCMS server side technology would improve the performance

Definitly yes.

CppCMS 0.0.4 Released

Saturday, February 21, 2009, by artyom ; Posted in: Progress, Framework, Cache; 0 comments

Version 0.0.4 of CppCMS had released.

It includes optimizations required for using it in embedded systems.

Normal Embedded Build:

Caching is completely removed. Small memory footprint is very important for embedded system thus, caching stuff in memory is quite useless.
Zlib compression are removed -- it removes dependency on boost::iostreams, zlib and bzip2 libraries.
Removed mod-prefork.
Removed dynamic templates loading --- this feature requires export of symbols to binary and increases its size in order to make RTTI work. Thus, all templates should be statically compiled into the binary.

Embedded CGI Mode:

FastCGI and SCGI APIs are removed
Mod-thread and mod process are removed including all thread pool facilities
Changes in files based session backend to work properly with CGI mode including garbage collection (sessions that had time-out).

Downloads are avialable from Sf Project Page.

Project

Some rights reserved, the content of this blog is available under Creative Commons Attribution License 2.5 Israel.

Creative Commons