CppCMS Blog :: Unicode and Localization http://blog.cppcms.com/ A blog on CppCMS - C++ Web Development Framework How not to do Unicode... http://blog.cppcms.com/post/105 http://blog.cppcms.com/post/105 <div style="direction:ltr"> <p>All started from a small problem, how to print Unicode text to the Windows Console with option redirect to a file.</p> <p>Let's say we have a program Hello that prints few words in several languages to the screen..</p> <pre><code>#include &lt;stdio.h&gt; int main() { printf("Мир Peace Ειρήνη\n"); return 0; } </code></pre> <p>The program above is trivial and works fine under Windows if current console codepage is set to UTF-8. Also this can be fixed from the program by calling <code>SetConsolseOutputCP(CP_UTF8)</code>.</p> <p>Now simple tweak... Instead of that standard C <code>printf</code> we would use standard C++ <code>std::cout</code>... It works fine for GCC. But under Visual C++ it prints squares...</p> <p>If I try redirection <code>test.exe &gt;test.txt</code> - I get perfectly fine UTF-8 text...</p> <p>I had started researching the issue and found the <a href="http://blogs.msdn.com/b/michkap/archive/2008/03/18/8306597.aspx">post</a> of one of the Windows Unicode Gurus <a href="http://blogs.msdn.com/b/michkap/">Michael Kaplan's</a>.</p> <p>I've tried to run <code>_setmode(_fileno(stdout), _O_U8TEXT)</code> as recommended by the Microsoft's Unicode guru and... By program crashed on attempt to write to the output stream.</p> <p>Keeping searching for an answer I've got to this <a href="http://connect.microsoft.com/VisualStudio/feedback/details/431244/std-ostream-fails-to-write-utf-8-encoded-string-to-console">bug report</a>...</p> <p>Short summary:</p> <ul> <li>User: Can't print UTF-8 to console with std::cout</li> <li>MS: Closing - this is by design, see Michael Kaplan's article about writing to console</li> <li>User: But if I do what suggested program crashes, and I still can't write Unicode to console</li> <li>MS: Reactivate the ticket if it does not works</li> <li>User: it does not!</li> <li>MS: Use wide output...</li> <li>User: I'd rather use fprintf in first place!?</li> </ul> <p>To the summary...</p> <p>If you use Visual C++ you can't use UTF-8 to print text to <code>std::cout</code>.</p> <p>If you still want to, please read this <a href="http://alfps.wordpress.com/2011/12/08/unicode-part-2-utf-8-stream-mode/">amazingly long article</a> about how to make <code>wcout</code> and <code>cout</code> working, but it does not really give a simple solution - finally falling to redefinition of the stream buffers...</p> <p>So please, if you design API or Operating System, <strong>do not use</strong> kind of "Wide" API... This is is the wrong way to do Unicode.</p> <p>Which reminds me... Spread around:</p> <p><a href="http://www.utf8everywhere.org/">http://www.utf8everywhere.org/</a></p> <p>Related Posts: <a href="http://blog.cppcms.com/post/62">http://blog.cppcms.com/post/62</a></p> </div> The lecture slides and the poster from August Penguin 2011 conference. http://blog.cppcms.com/post/85 http://blog.cppcms.com/post/85 <div style="direction:ltr"> <p>I had given a lecture at <a href="http://en.wikipedia.org/wiki/August_Penguin">August Penguin</a> conference about Boost.Locale and presented a poster about CppCMS project.</p> <ul> <li>The slides for the lecture about Boost.Locale: <ul> <li><a href="http://art-blog.no-ip.info/files/August-Pinguin-Boost-Locale.odp">odp</a></li> <li><a href="http://art-blog.no-ip.info/files/August-Pinguin-Boost-Locale.pdf">pdf</a></li> </ul> </li> <li>The poster <a href="http://art-blog.no-ip.info/files/august-pinguin-cppcms-poster.pdf">pdf</a></li> </ul> </div> CppCMS 0.99.8 and Boost.Locale 4.0.0 Rleased http://blog.cppcms.com/post/82 http://blog.cppcms.com/post/82 <div style="direction:ltr"> <p>New Versions of CppCMS and Boost.Locale were released.</p> <h3>New Features:</h3> <ul> <li><p>Boost.Locale is updated to the latest version that is going to be merged into Boost svn tree.</p> <p>It includes some breaking changes:</p> <ul> <li><p> Redesigned boundary analysis interface:</p> <p> Instead of using <code>mapping</code>, <code>token_iterator</code> and <code>break_iterator</code> new classes that provide same functionality introduced:</p> <p> <code>segment_index</code>, <code>boundary_point_index</code> and the elements that can be iterated <code>segment</code> and <code>boundary_point</code>.</p> <p> See: <a href="http://cppcms.sourceforge.net/boost_locale/html/boundary_analysys.html">http://cppcms.sourceforge.net/boost_locale/html/boundary_analysys.html</a></p></li> <li><p> Updated messages interface, now messages use same type of character for key and output message, i.e.</p> <pre><code> std::wstring wh = translate(L"hello").str(); std::string h = translate( "hello").str(); </code></pre> <p> Instead of</p> <pre><code> std::wstring wh = translate("hello").str&lt;wchar_t&gt;(); std::string h = translate("hello").str&lt;char&gt;(); </code></pre> <p> It allows to use non-US-ASCII keys transparently.</p></li> <li><p> Update <code>date_time</code> interface to be more consistent with Boost.DateTime and Boost.Chrono. Operations are more type safe now.</p></li> </ul> </li> <li><p>Introduced support of SunStudio Compiler on OpenSolaris.</p></li> <li><p>New nightly tests: Linux Armel and Solaris/SunStudio.</p></li> </ul> <h3>Bug Fixes:</h3> <ul> <li>Fixed bug that virtually disabled gzip compression in CppCMS 0.99.7</li> <li><p>Some compilation and testing fixes for older versions of Mac OS X/Darwin 8.</p> <p>Note Darwin 8 is not supported due to bugs in the standard C library, but there should be no problems with newer Mac OS X versions.</p></li> <li>Fixes to support ICU 4.8</li> <li>Fixes to support gcc-4.6 and gcc-4.0</li> <li>Fixes to support Python 2.3.5</li> </ul> <h3>Note to SVN-trunk users</h3> <p>Do not forget to untar the updated cppcms_boost.tar.bz2 file.</p> </div> Boost.Locale was accepted into Boost http://blog.cppcms.com/post/79 http://blog.cppcms.com/post/79 <div style="direction:ltr"> <p>Now it is official. Boost.Locale <a href="http://article.gmane.org/gmane.comp.lib.boost.devel/218369">was accepted</a> into Boost.</p> <p>So the Localization part of CppCMS would be spread all over the C++ world.</p> </div> Formal Review of Boost.Locale starts today http://blog.cppcms.com/post/78 http://blog.cppcms.com/post/78 <div style="direction:ltr"> <p>The <a href="http://permalink.gmane.org/gmane.comp.lib.boost.devel/217586">formal review</a> of the <a href="http://cppcms.sourceforge.net/boost_locale/html/index.html">Boost.Locale</a> library starts today.</p> <p>I hope it would pass the review and would be accepted as official Boost library.</p> </div> It is official, Boost.Locale on its way to formal review http://blog.cppcms.com/post/76 http://blog.cppcms.com/post/76 <div style="direction:ltr"> <p>Now it is official, <a href="http://cppcms.sourceforge.net/boost_locale/html/index.html">Boost.Locale</a> is scheduled for a <a href="http://www.boost.org/community/reviews.html">formal review</a> at <a href="http://www.boost.org/community/review_schedule.html">April 7-16</a>.</p> <p>Boost.Locale is important part of CppCMS as was developed for its needs, however I had found that this library was so important and useful that I decided to "Boostify" and make it ready for a formal review for Boost.</p> <p>What does it provides:</p> <ul> <li>Message formatting based on gettext dictionaries</li> <li>Localized numbers, dates, currency formatting and parsing (and more)</li> <li>Collation</li> <li>Text manipulations like case handing and Unicode normalization</li> <li>Text Boundary analysis</li> <li>Support of non-Gregorian calendars like Hebrew calendar.</li> <li>And much more</li> </ul> <p>Most of these features are based on the state-of-the-art Unicode library <a href="http://site.icu-project.org/">ICU</a> but it also allows to handle many of them using only standard operating system API significantly reducing its size and requirements of external components.</p> <p>Most important is that is provides platform independent and uniform interface for C++ localization and internationalization tightly integrated to C++ iostreams and existing <code>std::locale</code> framework.</p> <p>The most up-to-date version of the library and documentation would be released soon.</p> </div> CppCMS 0.0.7 and 0.99.3-beta3 released http://blog.cppcms.com/post/66 http://blog.cppcms.com/post/66 <div style="direction:ltr"> <p>This release is security fix release for stable branch of CppCMS and both security and feature release for CppCMS 1.x.x branch.</p> <p>All users are encouraged to update to latest version.</p> <p>If it is not possible to upgrade don't use "hmac" session backend, switch to "aes" or server side session storage backend.</p> <h2>Changedlog 0.0.7</h2> <ul> <li>Bugfix of hmac backend: generation of signature with too small block size</li> </ul> <h2>Changedlog 0.99.3</h2> <p>Security:</p> <ul> <li>Bugfix of hmac backend: generation of signature with too small block size</li> </ul> <p>Features:</p> <ul> <li>New version of Boost.Locale</li> <li><p>Added support of multiple hmac cookie signatures:</p> <p>Built in: hmac-md5, hmac-sha1<br/> With libgcrypt: hmac-sha224, hmac-sha256, hmac-sha384, hmac-sha512<br/> By default hmac now uses sha1 instead of less secure md5</p></li> </ul> <p>Bugs:</p> <ul> <li>Fixed memory leak in aes session encryptor</li> <li>Fixed incorrect validation of UTF-8 encoding that could cause some illegal sequences to pass through.</li> <li>Fixed missing attributes of some form widgets</li> <li>Fixed incorrect code generation in templates in <code>foreach</code> loop</li> <li>Fixed race condition when dispatch and context assignment may happen not simultaneously</li> </ul> </div> Boost.Locale v3 preview version is released http://blog.cppcms.com/post/65 http://blog.cppcms.com/post/65 <div style="direction:ltr"> <p>I want to announce a preview of the third version of Boost.Locale:</p> <ul> <li>Tutorial: <a href="http://cppcms.sourceforge.net/boost_locale/html/tutorial.html">http://cppcms.sourceforge.net/boost_locale/html/tutorial.html</a></li> <li>Reference: <a href="http://cppcms.sourceforge.net/boost_locale/html/index.html">http://cppcms.sourceforge.net/boost_locale/html/index.html</a></li> <li>Downloads: <a href="https://sourceforge.net/projects/cppcms/files/boost_locale/">https://sourceforge.net/projects/cppcms/files/boost_locale/</a></li> </ul> <p>There are following significant changes:</p> <ul> <li>Implemented multiple localization backends: <ul> <li>icu - the default and recommended backend, based on ICU library</li> <li>std - based on C++ standard library localizations support,</li> <li>posix - based on POSIX 2008 API (newlocale, strftime_l,...)</li> <li>winapi - based on Windows API functions</li> </ul> </li> <li>Significantly simplified locale generation.</li> <li>Improvements in UTF-8 handling by ICU where possible</li> <li>Thread safety fixes when using ICU library</li> <li>Fixed std::codecvt facet support to handle UTF-16 instead of UCS-2 only.</li> <li>Removed support of compilers missing wide character support, gcc-3.4 on windows is not supported any more, latest gcc-4.x required with support of wide streams and strings, for example gcc-4.5</li> </ul> <p>Tested Platforms:</p> <ul> <li>Compilers: GCC (3.4, 4.2, 4.3, 4.5, 4.5/c++0x), Intel 11.0, MSVC 2008, SunStudio/stlport</li> <li>Operating Systems: Linux, FreeBSD, OpenSolaris, Windows XP, Cygwin 1.7, (TODO Mac OS X)</li> <li>ICU version: 3.6 to 4.4</li> </ul> <p>It would be soon integrated into CppCMS 1.x.x.</p> </div> First beta version of CppCMS 1.x.x is officially out! http://blog.cppcms.com/post/63 http://blog.cppcms.com/post/63 <div style="direction:ltr"> <p>Hello all CppCMS users.</p> <p>The first beta version of CppCMS 1.x.x is available for download from the <a href="https://sourceforge.net/projects/cppcms/files/">Sourceforge</a>. The build instructions can be found <a href="http://cppcms.sourceforge.net/wikipp/en/page/cppcms_1x_build">here</a>.</p> <p>This version very different from CppCMS 0.0.x branch - it fixes many design flaws that had been done the previous version, it is almost 90% rewrite of the original code according to new design.</p> <p>It also includes many <a href="http://cppcms.sourceforge.net/wikipp/en/page/cppcms_1x_whats_new">important features</a></p> <p>Most significant ones:</p> <ul> <li><p>Full CppCMS core rewrite that introduced:</p> <ul> <li>Asynchronous programming support</li> <li>Removal of 3rd part libraries from the core api.</li> <li>Stable API and ABI through all major releases.</li> </ul> </li> <li>Improved Ajax support with introduction of JSON-RPC</li> <li>Powerful i18n and l10n</li> <li>Native Windows support including support of MSVC.</li> <li>And much more...</li> </ul> <p>So now CppCMS beta is ready.</p> <p>Hopefully first release candidate version will be ready withing about a 3 month. And the first stable release is expected at the end of 2010 beginning of 2011.</p> </div> Surviving Windows Development http://blog.cppcms.com/post/62 http://blog.cppcms.com/post/62 <div style="direction:ltr"> <p>One of the issues that had bothered my in CppCMS on Windows, was actually absence of full support of Unicode file-names.</p> <p>It is known, that standard library functions like <code>fopen</code> or <code>std::fstream</code> are not capable of using UTF-8 encoded file names, like they are on all "normal" operating systems.</p> <p>So in order to deal with such issues in transparent way, I added <code>booster::nowide</code> library that makes this conversion transparent. All functions in <code>booster::nowide</code> use UTF-8 encoded strings and convert then internally to wide strings for brain-damaged Win32 Wide API.</p> <p>I also had created an implementation of <code>std::fstream</code> over stdio, especially for thous windows compilers that do not "extend" their standard libraries with non-standard wide-crap.</p> <p>So, now CppCMS in fully Unicode capable over Win32. So if you use CppCMS with Windows, please note that all pathes are represented as UTF-8 strings and not "ANSI" one.</p> <p>When Windows will finally enable UTF-8 locales? Maybe in Windows 15 (if it survive till then).</p> <p>If you interested why do I hate "wide" API so much, <a href="http://stackoverflow.com/questions/1049947/should-utf-16-be-considered-harmful">read this</a>.</p> <p>You can download independent version of "nowide" library there:<br/> <a href="http://art-blog.no-ip.info/files/nowide.zip">http://art-blog.no-ip.info/files/nowide.zip</a></p> </div>