Home  /  RSS  /  RSS Comments  /  Enter

Thread Safe Implementation of GNU gettext

4/26/08, by artyom ; Posted in: Progress, Templates, Unicode and Localization; 0 comments

There is widely available software internationalization tool called GNU gettext. Is is used as base for almost all FOSS software tools. It has binding to almost every language and supports many platforms including Win32.

How does it works? In any place you need to display a string that may potentially show in other language then English you just write:

printf(gettext("Hello World\n"));

And you get the required translation for this string (if available).

In 99% of cases this is good enough. However, as you can see, there is no parameter "target language". It is defined for entry application.

What happends if you need to display this string in different languages? You need to switch locale, and this operation is not thread safe. In most of cases you do not need to do this, because almost all applications will "talk" in single language that user had asked. However this is not the case of web based applications.

Certain web application allow you to display content in several languages: think of government site that should display information in three languages: Hebrew, Arabic and English. So you may need to define the translation per each session you open or use.

So, if you write a multithreaded FastCGI application that supports different languages is signle instance you can't use gettext.

My Simple Gettext implementaion

So I need an ability to read gettext translation files (.mo files) and translate strings according to the content.

Actually a translation of a single string with gettext is quite simple. I had found a simple library that does the job.

However it has significant limitation: it does not support plural forms.

Lets assume you want to use "ngettext" and display following

printf(ngettext("The article was published one day ago",
                "This article was published %d days ago",n),n);

So in Hebrew you expect to see

המאמר פורסם לפני יום אחד

For single

המאמר פורסם לפני יומיים

For two days (pair)

המאמר פורסם לפני 5 ימים

For n in range of 3--10 days and finally:

המאמר פורסם לפני 20 יום

For n that higher then 10.

Translation files includes this information as a "C" equation that calculates and index of the string. In case of English it would be:

n==1 ? 0 : 1

And the translations id will be 0 for single and 1 for plural.

In case of Hebrew it will look like

n==1 ? 0 : (n==2 ? 2 : (n>10 ? 3 : 1))

When index 0 defines single, 1 plural, 2 pair and 3 plural for count higher then 10.

Thus in order to support ngettext functionality correctly, I had to implement a parsing and compilation of simple C expression to C++ function (some kind of lambda function) and then use it in order to fetch correct strings.

I implemented these in a simple C++ library that can transparently use both GNU gettext and my thread safe implementation.

I called it transtext. It is part of templates project, but actually the code is independent and can be compiled externally.

Another advantage of this library, that you can actually get an access to a function that calculates a plural expression, so if you want to store pointers messages instead of performing hash based lookup you can do this.

Limitations

There are several limitations on this library:

  1. It does not support encoding convertion yet, so if the encoding of the ".mo" file and the encoding of target text is different, this will not work.
  2. It does not implement full path lookup in order to find translation files as GNU gettext does.
  3. No support of d(n)gettext and dc(n)gettext (but this is not really required in case of thread safe implementation).

The library API is C++ API, but it can be easily wrapped with C functions for any needs of C projects.

Add Comment:

 
 the email would not displayed
 

You can write your messages using Markdown syntax.

You must enable JavaScript in order to post comments.

Pages

Categories