Using UTF on Windows

From D Wiki
Jump to: navigation, search

Strings in D are UTF strings. The Microsoft Windows API normally offers two versions of each function that takes a string as input - an "A" version and a "W" version. The "W" version accepts UTF-16 as the format of its string parameters, and that hooks up directly to wchar[] strings in D. Some earlier versions of Windows, however, do not implement many of the "W" versions of the API functions. Using the "A" versions is problematic because the "A" versions accept strings in various code page encodings. This is not directly compatible with D's char[] strings, which are in UTF-8 format.

The right way to deal with this is to first detect if the version of Windows supports the "W" functions, if not, convert the wchar[] string to a format that can be used by the "A" functions. This technique is used in the Phobos runtime library, and is exerpted here:

private import std.c.windows.windows;
private import std.utf;

int useWfuncs = 1;

static this()
{
    // Win 95, 98, ME do not implement the W functions
    useWfuncs = (GetVersion() < 0x80000000);
}

char* toMBSz(char[] s)
{
    // Only need to do this if any chars have the high bit set
    foreach (char c; s)
    {
        if (c >= 0x80)
        {   char[] result;
            int i;
            wchar* ws = std.utf.toUTF16z(s);
            result.length = WideCharToMultiByte(0, 0, ws, -1, null, 0, null, null);
            i = WideCharToMultiByte(0, 0, ws, -1, result, result.length, null, null);
            assert(i == result.length);
            return result;
        }
    }
    return std.string.toStringz(s);
}

uint getAttributes(char[] name)
{
    uint result;

    if (useWfuncs)
        result = GetFileAttributesW(std.utf.toUTF16z(name));
    else
        result = GetFileAttributesA(toMBSz(name));
    return result;
}

This article was originally published at http://digitalmars.com/techtips/windows_utf.html.