Introducing Boost.Locale
After a long period of hesitating I had understood -- standard C++ locale facets are no-go and started developing localization tools based on ICU that work in C++ friendly way. Thus, Boost.Locale was born. It had just been announced to the Boost community for preliminary review.
Boost.Locale provides case conversion and folding, Unicode normalization, collation, numeric, currency, and date-time formatting, messages formatting, boundary analysis and code-page conversions in C++ aware way.
For example in order to display a currency value it is enough to write this:
cout << as::currency << 123.34 << endl;
And currency like "$124.34" would be formatted. Spelling a number?
cout << as::spellout << 1024 << endl;
Very simple. And much more. This library would be the base for CppCMS localization support and I hope it would be also accepted in Boost at some point in future.
I've tested this library with:
- Linux GCC 4.1, 4.3. with ICU 3.6 and 3.8
- Windows MSVC-9 (VC 2008), with ICU 4.2
- Windows MingW with ICU 4.2
- Windows Cygwin with ICU 3.8
Documentation
- Full tutorials: http://cppcms.sourceforge.net/boost_locale/docs/
- Doxygen reference: http://cppcms.sourceforge.net/boost_locale/docs/doxy/html/
Source Code
Is available from SVN repository.
svn co https://cppcms.svn.sourceforge.net/svnroot/cppcms/boost_locale/trunk
Building
You need CMake 2.4 and above, ICU 3.6 and above, 4.2 recommended So checkout the code, and run
cmake /path/to/boost_locale/libs/locale
make
Inputs and comments are welcome.
Localization in 2009 and broken standard of C++.
There are many goodies in upcoming standard C++0x. Both, core language and standard libraries were significantly improved.
However, there is one important part of the library that remains broken -- localization.
Let's write a simple program that prints number to file in C++:
#include <iostream>
#include <fstream>
#include <locale>
int main()
{
// Set global locale to system default;
std::locale::global(std::locale(""));
// open file "number.txt"
std::ofstream number("number.txt");
// write a number to file and close it
number<<13456<<std::endl;
}
And in C:
#include <stdio.h>
#include <locale.h>
int main()
{
setlocale(LC_ALL,"");
FILE *f=fopen("number.txt","w");
fprintf(f,"%'f\n",13456);
fclose(f);
return 0;
}
Lets run both programs with en_US.UTF-8
locale and observe the following number in the output file:
13,456
Now lets run this program with Russian locale LC_ALL=ru_RU.UTF-8 ./a.out
. C version gives us as expected:
13 456
When C++ version produces:
13<?>456
Incorrect UTF-8 output text! What happens? What is the difference between C library and C++ library that use same locale database?
According to the locale, the thousands separator in Russian is U+2002 -- EN SPACE, the codepoint that requires more then one byte in UTF-8 encoding. But let's take a look on C++ numbers formatting provider: std::numpunct. We can see that member functions thousands_sep
returns single character. When in C locale definition, thousands separator represented as a string, so there is no limitation of single character as in C++ standard class.
This was just a simple and easily reproducible problems with C++ standard locale facets. There much more:
std::time_get
-- is not symmetric withstd::time_put
(as it in C strftime/strptime) and does not allow easy parsing of times with AM/PM marks.std::ctype
is very simplistic assuming that toupper/tolower can be done on per-character base (case conversion may change number of characters and it is context dependent).std::collate
-- does not support collation strength (case sensitive or insensitive).- There is not way to specify a timezone different from global timezone in time formatting and parsing.
- Time formatting/parsing always assumes Gregorian calendar.
Its very frustrating that in 2009 such annoying, easily reproducible bugs exist and make localization facilities totally useless in certain locales.
All the work I had recently done with support of localization in CppCMS framework had convinced me in important decision --- ICU would be mandatory dependency and provide most of localization facilities by default, because native C++ localization is no-go...
The question is: "Would C++0x committee revisit localization support in C++0x?"
Message to Blog Readers
This web site will be down for couple of days. Sorry for inconvenience.
CppCMS meets Comet
One of the major requirements for framework refactoring was support of Comet. Now, with introduction of asynchronous request handling and persistent application servers it becomes reality.
Client Side
There is a HTML source of simple chat client, that uses Dojo toolkit. It does following:
Submits new messages to the server application by posting form using XHR:
function send_data() { var kw = { url : "/chat/post", form : "theform" }; dojo.xhrPost(kw); dojo.byId("message").value=""; return false; }
Receives new messages from the server using long poll via XHR:
var message_count = 0; function read_data() { dojo.xhrGet( { url: "/chat/get/" + message_count, timeout: 120000, handleAs: "text", load: function(response, ioArgs) { dojo.byId("messages").innerHTML = response + '<br/>' + dojo.byId("messages").innerHTML; message_count++; read_data(); return response; }, error: function(response,ioArgs) { read_data(); return response; } }); } dojo.addOnLoad(read_data);
So, the client side is quite simple (however error handling should be quite better).
Server Side
First we create our long running asynchronous application, that receives two kinds
for requests: "/post" -- with new data, and "/get/NN" -- receive message nuber NN, we assign these calls to two member functions post
and get
.
class chat : public cppcms::application {
public:
chat(cppcms::service &srv) : cppcms::application(srv)
{
dispatcher().assign("^/post$",&chat::post,this);
dispatcher().assign("^/get/(\\d+)$",&chat::get,this,1);
}
Now, this class includes two data members:
private:
std::vector<std::string> messages_;
std::vector<cppcms::intrusive_ptr<cppcms::http::context> > waiters_;
The history of all chat messages -- messages_
and all pending get
requests
that can't be satisfied, because the message still not exists -- waiters_
Each, "waiter" is actually pointer to request/response context that can be used for message transport.
Now, when new message arrives, post
member function is called:
void post()
{
if(request().request_method()=="POST") {
if(request().post().find("message")!=request().post().end()) {
messages_.push_back(request().post().find("message")->second);
broadcast();
}
}
release_context()->async_complete_response();
}
If the requested message was found, it is added to messages_
list and all waiters are notified using broadcast()
member function.
At the end, the current request context is released and completed.
The broadcasting is done as following:
void broadcast()
{
for(unsigned i=0;i<waiters_.size();i++) {
waiters_[i]->response().set_plain_text_header();
waiters_[i]->response().out() << messages_.back();
waiters_[i]->async_complete_response();
waiters_[i]=0;
}
waiters_.clear();
}
For each pending request the last message is written and the request closed. After that, all pending request are cleaned.
When get
request arrives, it is handled by get(std::string no)
member function, first of all
we check if requested message exists, if so we just return it to user.
unsigned pos=atoi(no.c_str());
if(pos < messages_.size()) {
response().set_plain_text_header();
response().out()<<messages_[pos];
release_context()->async_complete_response();
}
Otherwise, if the requested message is the last one, that does not exists, we
add the request context to pending list waiters
else if(pos == messages_.size()) {
waiters_.push_back(release_context());
}
If requested message it too late -- probably client error, we just set status to "404 Not Found" and return the response.
else {
response().status(404);
release_context()->async_complete_response();
}
No, all we need to do is to add application to the main running loop under script name "/char" and start the service.
cppcms::service service(argc,argv);
cppcms::intrusive_ptr<chat> app=new chat(service);
service.applications_pool().mount(app,"/chat");
service.run();
Summary
So, the simple chat service was written with about 50 lines of C++ code and about same amount of JavaScript code.
I must admit, that it is too simplistic and not efficient, for example: if new client connects it receives all messages one by one and not as bulk (can be easily fixed), I do not handle timeouts and disconnects. But the general idea is quite clear:
- Asynchronous long running application that handles all request is created.
- It manages all outstanding request and uses them for server side push.
This is actually a base for future development of tools like XML-RPC and JSON-RPC that allow client to call asynchronously server side objects, it can be used for implementation of any other Comet protocols.
Progress Report on CppCMS v1
Its quite long time that most of the work is done in new refactoring branch... Meanwhile trunk stays silent. So, I decided to open a window and show some new changes:
Dependencies:
I had removed almost all dependencies with a big exception of Boost libraries.
Because of internal structure changed --- mostly introduction of asynchronous event handling I could not use existing implementations of FastCGI because of its synchronous API. Also I decided to remove CgiCC that was very problematic in terms of installation, portability and most important the quality of implementation and ability to communicate with its primary developer.
So, at this point you only need latest boost library... Thats all. When the job would be complete it would be very easy to create deb/rpm packages for most popular distributions.
Server APIs:
In addition to supported FastCGI and SCGI protocols, direct HTTP protocol is supported, so you do not need to use external web server for debug purposes any more. It is also useful for embedding web applications.
Localization is now fully integrated with C++
std::locale
and allows using correct facets for each supported language and translationWindows is now would be one of the officially supported platforms.
There is still lot of work to make new version as useful as current CppCMS stable version:
- Integrate all template system back.
- Integrate cache and sessions management back.
- Rewrite forms classes that currently work with CgiCC.
- Rewrite support of CGI API for embedded systems.
But there are many good points that are already visible.