Home  /  RSS  /  RSS Comments  /  Enter

Introducing Boost.Locale

Sunday, November 8, 2009, by artyom ; Posted in: Progress, Unicode and Localization; 0 comments

After a long period of hesitating I had understood -- standard C++ locale facets are no-go and started developing localization tools based on ICU that work in C++ friendly way. Thus, Boost.Locale was born. It had just been announced to the Boost community for preliminary review.

Boost.Locale provides case conversion and folding, Unicode normalization, collation, numeric, currency, and date-time formatting, messages formatting, boundary analysis and code-page conversions in C++ aware way.

For example in order to display a currency value it is enough to write this:

cout << as::currency  << 123.34 << endl;

And currency like "$124.34" would be formatted. Spelling a number?

cout << as::spellout << 1024 << endl;

Very simple. And much more. This library would be the base for CppCMS localization support and I hope it would be also accepted in Boost at some point in future.

I've tested this library with:

Documentation

Source Code

Is available from SVN repository.

svn co https://cppcms.svn.sourceforge.net/svnroot/cppcms/boost_locale/trunk

Building

You need CMake 2.4 and above, ICU 3.6 and above, 4.2 recommended So checkout the code, and run

cmake /path/to/boost_locale/libs/locale
make

Inputs and comments are welcome.

Localization in 2009 and broken standard of C++.

Thursday, October 8, 2009, by artyom ; Posted in: Unicode and Localization; 5 comments

There are many goodies in upcoming standard C++0x. Both, core language and standard libraries were significantly improved.

However, there is one important part of the library that remains broken -- localization.

Let's write a simple program that prints number to file in C++:

#include <iostream>
#include <fstream>
#include <locale>


int main()
{
        // Set global locale to system default;
        std::locale::global(std::locale(""));

        // open file "number.txt"
        std::ofstream number("number.txt");

        // write a number to file and close it
        number<<13456<<std::endl;
}

And in C:

#include <stdio.h>
#include <locale.h>

int main()
{
        setlocale(LC_ALL,"");
        FILE *f=fopen("number.txt","w");
        fprintf(f,"%'f\n",13456);
        fclose(f);
        return 0;
}

Lets run both programs with en_US.UTF-8 locale and observe the following number in the output file:

13,456

Now lets run this program with Russian locale LC_ALL=ru_RU.UTF-8 ./a.out. C version gives us as expected:

13 456

When C++ version produces:

13<?>456

Incorrect UTF-8 output text! What happens? What is the difference between C library and C++ library that use same locale database?

According to the locale, the thousands separator in Russian is U+2002 -- EN SPACE, the codepoint that requires more then one byte in UTF-8 encoding. But let's take a look on C++ numbers formatting provider: std::numpunct. We can see that member functions thousands_sep returns single character. When in C locale definition, thousands separator represented as a string, so there is no limitation of single character as in C++ standard class.

This was just a simple and easily reproducible problems with C++ standard locale facets. There much more:

Its very frustrating that in 2009 such annoying, easily reproducible bugs exist and make localization facilities totally useless in certain locales.

All the work I had recently done with support of localization in CppCMS framework had convinced me in important decision --- ICU would be mandatory dependency and provide most of localization facilities by default, because native C++ localization is no-go...

The question is: "Would C++0x committee revisit localization support in C++0x?"

Message to Blog Readers

Friday, August 28, 2009, by artyom ; 2 comments

This web site will be down for couple of days. Sorry for inconvenience.

CppCMS meets Comet

Thursday, August 27, 2009, by artyom ; Posted in: Progress, Framework, Comet; 8 comments

One of the major requirements for framework refactoring was support of Comet. Now, with introduction of asynchronous request handling and persistent application servers it becomes reality.

Client Side

There is a HTML source of simple chat client, that uses Dojo toolkit. It does following:

  1. Submits new messages to the server application by posting form using XHR:

     function send_data() {
             var kw = {
                     url : "/chat/post",
                     form : "theform"
             };
             dojo.xhrPost(kw);
             dojo.byId("message").value="";
             return false;
     }
    
  2. Receives new messages from the server using long poll via XHR:

     var message_count = 0;
     function read_data() {
             dojo.xhrGet( {
                     url: "/chat/get/" + message_count,
                     timeout: 120000,
                     handleAs: "text",
                     load: function(response, ioArgs) {
                             dojo.byId("messages").innerHTML =
                                     response
                                     + '<br/>'
                                     + dojo.byId("messages").innerHTML;
                             message_count++;
                             read_data();
                             return response;
                     },
                     error: function(response,ioArgs) {
                             read_data();
                             return response;
                     }
    
             });
     }
     dojo.addOnLoad(read_data);
    

So, the client side is quite simple (however error handling should be quite better).

Server Side

First we create our long running asynchronous application, that receives two kinds for requests: "/post" -- with new data, and "/get/NN" -- receive message nuber NN, we assign these calls to two member functions post and get.

class chat : public cppcms::application {
public:
    chat(cppcms::service &srv) : cppcms::application(srv)
    {
        dispatcher().assign("^/post$",&chat::post,this);
        dispatcher().assign("^/get/(\\d+)$",&chat::get,this,1);
    }

Now, this class includes two data members:

private:
    std::vector<std::string> messages_;
    std::vector<cppcms::intrusive_ptr<cppcms::http::context> > waiters_;

The history of all chat messages -- messages_ and all pending get requests that can't be satisfied, because the message still not exists -- waiters_

Each, "waiter" is actually pointer to request/response context that can be used for message transport.

Now, when new message arrives, post member function is called:

void post()
{
    if(request().request_method()=="POST") {
        if(request().post().find("message")!=request().post().end()) {
            messages_.push_back(request().post().find("message")->second);
            broadcast();
        }
    }
    release_context()->async_complete_response();
}

If the requested message was found, it is added to messages_ list and all waiters are notified using broadcast() member function.

At the end, the current request context is released and completed.

The broadcasting is done as following:

void broadcast()
{
    for(unsigned i=0;i<waiters_.size();i++) {
        waiters_[i]->response().set_plain_text_header();
        waiters_[i]->response().out() << messages_.back();
        waiters_[i]->async_complete_response();
        waiters_[i]=0;
    }
    waiters_.clear();
}

For each pending request the last message is written and the request closed. After that, all pending request are cleaned.

When get request arrives, it is handled by get(std::string no) member function, first of all we check if requested message exists, if so we just return it to user.

unsigned pos=atoi(no.c_str());
if(pos < messages_.size()) {
    response().set_plain_text_header();
    response().out()<<messages_[pos];
    release_context()->async_complete_response();
}

Otherwise, if the requested message is the last one, that does not exists, we add the request context to pending list waiters

else if(pos == messages_.size()) {
    waiters_.push_back(release_context());
}

If requested message it too late -- probably client error, we just set status to "404 Not Found" and return the response.

else {
    response().status(404);
    release_context()->async_complete_response();
}

No, all we need to do is to add application to the main running loop under script name "/char" and start the service.

cppcms::service service(argc,argv);
cppcms::intrusive_ptr<chat> app=new chat(service);
service.applications_pool().mount(app,"/chat");
service.run();

Summary

So, the simple chat service was written with about 50 lines of C++ code and about same amount of JavaScript code.

I must admit, that it is too simplistic and not efficient, for example: if new client connects it receives all messages one by one and not as bulk (can be easily fixed), I do not handle timeouts and disconnects. But the general idea is quite clear:

This is actually a base for future development of tools like XML-RPC and JSON-RPC that allow client to call asynchronously server side objects, it can be used for implementation of any other Comet protocols.

Progress Report on CppCMS v1

Monday, August 17, 2009, by artyom ; Posted in: Progress, Framework; 3 comments

Its quite long time that most of the work is done in new refactoring branch... Meanwhile trunk stays silent. So, I decided to open a window and show some new changes:

There is still lot of work to make new version as useful as current CppCMS stable version:

But there are many good points that are already visible.

previous page

next page

Pages

Categories