Tugger the SLUGger!SLUG Mailing List Archives

[coders] [solved] Re: Converting a UTF-8 string to a wchar_t (in C)

On 14/12/2006, at 1:15 AM, Andre Pang wrote:

I have a C string (char*) that's encoded in UTF-8. I'd like to convert this to a wide string (wchar_t*). I've done plenty of reading about mbstowcs(3), iconv(3) and friends, and from what I understand, I have two options:
So far, I've tried (2) -- the iconv() method -- and it doesn't work for me. It seems to work fine if the characters are ASCII, but the moment it actually hits any non-ASCII characters, iconv() throws a return code of -1 and errno's set to EILSEQ. I'm assuming there are some bugs in my code, which is no surprise considering how annoying iconv() is to use.

Well, this was slightly bizarre. I changed the destination encoding from:

const iconv_t utf8ToWCharTIconvDescriptor = iconv_open("WCHAR_T", "UTF-8");


const iconv_t utf8ToWCharTIconvDescriptor = iconv_open("UCS-4- INTERNAL", "UTF-8");

And then iconv() did its job merrily. (I even put in a setlocale (LC_CTYPE, "") before trying to use "WCHAR_T".)

I realise this means that I'm relying on wchar_t being UCS-4, but I've got #ifdefs around it so that it will only work for defined architectures.

(Note that the destination's "UCS-4-INTERNAL" rather than simply "UCS-4", since "UCS-4" seems to be synonymous with "UCS-4BE"; this is obviously incorrect on little-endian platforms.)

% Andre Pang : trust.in.love.to.save  <http://www.algorithm.com.au/>