Posts in category ‘Unicode and Localization’.
How not to do Unicode...
All started from a small problem, how to print Unicode text to the Windows Console with option redirect to a file.
Let's say we have a program Hello that prints few words in several languages to the screen..
#include <stdio.h>
int main()
{
printf("Мир Peace Ειρήνη\n");
return 0;
}
The program above is trivial and works fine under Windows if current console codepage is set to UTF-8. Also this can be fixed from the program by calling SetConsolseOutputCP(CP_UTF8)
.
Now simple tweak... Instead of that standard C printf
we would use standard C++ std::cout
... It works fine for GCC. But under Visual C++ it prints squares...
If I try redirection test.exe >test.txt
- I get perfectly fine UTF-8 text...
I had started researching the issue and found the post of one of the Windows Unicode Gurus Michael Kaplan's.
I've tried to run _setmode(_fileno(stdout), _O_U8TEXT)
as recommended by the
Microsoft's Unicode guru and... By program crashed on attempt to write
to the output stream.
Keeping searching for an answer I've got to this bug report...
Short summary:
- User: Can't print UTF-8 to console with std::cout
- MS: Closing - this is by design, see Michael Kaplan's article about writing to console
- User: But if I do what suggested program crashes, and I still can't write Unicode to console
- MS: Reactivate the ticket if it does not works
- User: it does not!
- MS: Use wide output...
- User: I'd rather use fprintf in first place!?
To the summary...
If you use Visual C++ you can't use UTF-8 to print text to std::cout
.
If you still want to, please read this amazingly long article about
how to make wcout
and cout
working, but it does not really give a simple
solution - finally falling to redefinition of the stream buffers...
So please, if you design API or Operating System, do not use kind of "Wide" API... This is is the wrong way to do Unicode.
Which reminds me... Spread around:
http://www.utf8everywhere.org/
Related Posts: http://blog.cppcms.com/post/62
The lecture slides and the poster from August Penguin 2011 conference.
I had given a lecture at August Penguin conference about Boost.Locale and presented a poster about CppCMS project.
CppCMS 0.99.8 and Boost.Locale 4.0.0 Rleased
New Versions of CppCMS and Boost.Locale were released.
New Features:
Boost.Locale is updated to the latest version that is going to be merged into Boost svn tree.
It includes some breaking changes:
Redesigned boundary analysis interface:
Instead of using
mapping
,token_iterator
andbreak_iterator
new classes that provide same functionality introduced:segment_index
,boundary_point_index
and the elements that can be iteratedsegment
andboundary_point
.See: http://cppcms.sourceforge.net/boost_locale/html/boundary_analysys.html
Updated messages interface, now messages use same type of character for key and output message, i.e.
std::wstring wh = translate(L"hello").str(); std::string h = translate( "hello").str();
Instead of
std::wstring wh = translate("hello").str<wchar_t>(); std::string h = translate("hello").str<char>();
It allows to use non-US-ASCII keys transparently.
Update
date_time
interface to be more consistent with Boost.DateTime and Boost.Chrono. Operations are more type safe now.
Introduced support of SunStudio Compiler on OpenSolaris.
New nightly tests: Linux Armel and Solaris/SunStudio.
Bug Fixes:
- Fixed bug that virtually disabled gzip compression in CppCMS 0.99.7
Some compilation and testing fixes for older versions of Mac OS X/Darwin 8.
Note Darwin 8 is not supported due to bugs in the standard C library, but there should be no problems with newer Mac OS X versions.
- Fixes to support ICU 4.8
- Fixes to support gcc-4.6 and gcc-4.0
- Fixes to support Python 2.3.5
Note to SVN-trunk users
Do not forget to untar the updated cppcms_boost.tar.bz2 file.
Boost.Locale was accepted into Boost
Now it is official. Boost.Locale was accepted into Boost.
So the Localization part of CppCMS would be spread all over the C++ world.
Formal Review of Boost.Locale starts today
The formal review of the Boost.Locale library starts today.
I hope it would pass the review and would be accepted as official Boost library.