- Contents of this lesson:
- > Unicode versus ANSI
- > Why
UNICODE
should be defined in the source code - > How to include the [windows.h] header file in C++
This lesson’s main points:
- The Windows API is based on UTF-16 encoded text, called “Unicode”.
- In C++ that means using the
wchar_t
type, & friends. - Microsoft’s
T
…-macros support Windows 9x, if you want that.
> Unicode versus ANSI
About the simplest GUI application you can make is one that just presents a message box, consisting of a single function call in main
. Most every GUI framework offers a message box function, and so does of course also the Windows API. The official online documentation of Windows API’ MessageBox
shows its signature as …
int WINAPI MessageBox( __in_opt HWND hWnd, __in_opt LPCTSTR lpText, __in_opt LPCTSTR lpCaption, __in UINT uType );
WINAPI
is a macro that originally depended on the platform, and that expands to a non-standard specification of the machine code level calling convention. __in_opt
and __in
are macros that expand to nothing, but that purportedly aid certain C level tools and programming techniques. I write “purportedly” because they’re frequently wrong, as they are above, and then it’s difficult to see how they can really help with anything. This visual noise is called header annotations or SAL annotations, where SAL is short for “Standard Annotation Language”, Microsoft’s pretentious name for this “language”. They’re best ignored.
Ignoring the macro-based visual noise, then, and also ignoring the WINAPI
, the declaration becomes …
int MessageBox( HWND hWnd, LPCTSTR lpText, LPCTSTR lpCaption, UINT uType );
which looks pretty much like a standard C++ function declaration!
However, there’s more macro-based obfuscation here. Both MessageBox
and LPCTSTR
are names that depend on whether the UNICODE
symbol is defined when the [windows.h] header is included. MessageBox
is directly a macro, as is almost every Windows API function name!, while LPCTSTR
is defined in terms of a conditionally compiled definition:
Symbol: | UNICODE defined: |
UNICODE not defined: |
---|---|---|
MessageBox |
MessageBoxW |
MessageBoxA |
LPCTSTR |
wchar_t const* |
char const* |
So depending on whether UNICODE
is defined when [windows.h] is included, the documentation’s single declaration expands to one of the two declarations …
“Unicode” version: | “ANSI” version: |
---|---|
int MessageBoxW( HWND hWnd, wchar_t const* lpText, wchar_t const* lpCaption, UINT uType ); |
int MessageBoxA( HWND hWnd, char const* lpText, char const* lpCaption, UINT uType ); |
Both versions are provided by the [user32.dll] DLL, and both versions can be called in the same program. The wchar_t
based MessageBoxW
version is called the Unicode version, because the strings are encoded as UTF-16 (a 16-bit encoding of Unicode). From which you can infer that the wchar_t
type in Windows is 16 bits, guaranteed. The Unicode version is the basic version, and the char
based MessageBoxA
version is just a wrapper that translates the strings from Windows ANSI encoding, to UTF-16, and calls MessageBoxW
. Most Unicode based API functions (but not all) have such an ANSI wrapper.
OK, let’s try this, with this source code:
#undef UNICODE #define UNICODE #undef STRICT #define STRICT #include <windows.h> int main() { DWORD const infoboxOptions = MB_OK | MB_ICONINFORMATION | MB_SETFOREGROUND; char const* const narrowText = "It's a 日本国 кошка, LOL!"; wchar_t const* const wideText = L"It's a 日本国 кошка, LOL!"; MessageBox( 0, wideText, L"Unicode (wide) text:", infoboxOptions ); MessageBoxA( 0, narrowText, "ANSI (narrow) text:", infoboxOptions ); }
Detailed instructions:
TODO #1:
- Run Visual C++ Express.
- Bring up the New Project dialog, e.g. via the menus [File → New → Project…].
- Choose a Win32 Project:
- Type in a project name like “unicode_versus_ansi”.
- Click the OK button, but DO NOT click Finish yet.
- In the Application Wizard, choose Application Settings (it’s up at the left):
- Still in the Application Wizard, tick off that you want an Empty Project. This also prevents precompiled headers (non-standard preprocessor behavior), even though that check box is disabled and has a tick mark in it. The dialog has a bug which prevents you from changing the precompiled header choice as long as the selection is “Windows Application”, and which fails to remove the check mark, but it works in spite of this bug:
- Click the Finish button in the Application Wizard; the new project is created.
- To add a main source code file: right-click the project name or e.g. the Source Files project folder and [Add → New Item…], choose type C++ File (.cpp), name it e.g. [main.cpp] (without the square brackets), and add the source code shown above, including the Russian and Chinese characters (you can just copy and paste them):
- Save, even if you have red underlinings as shown above (it’s just Microsoft’s “Intellisense”™ code completion feature going amok). Most probably a dialog will now pop up asking whether you want to save as Unicode, as shown below, due to the Russian and Chinese characters. Just click Yes in order to save as UTF-8 with BOM:
- Build. You should get a number of warnings about characters that cannot be represented, and unless you have already fixed the project settings you should get a linking error about
WinMain
, which means, the silly beast failed to recognize the standardmain
function.- Fix the project settings to support standard
main
, by setting entry pointmainCRTStartup
, as in the first lesson.- Build and run:
→
As you can see in the second message box, the char
based literal ended up as a Windows ANSI string value in the executable, with question marks for the characters that could not be represented, in spite of the UTF-8 encoded C++ source code. That’s because Visual C++ has Windows ANSI as its (as of this writing still undocumented) C++ execution character set. In order to deal correctly with e.g. international file names, it’s a good idea to base modern Windows programs entirely on Unicode; it’s also more efficient.
> Why UNICODE
should be defined in the source code
The UNICODE
-dependent declarations in [windows.h] were originally meant to allow you to compile the same code “as Unicode” (i.e. using Unicode functions and wchar_t
based strings) and “as ANSI” (i.e. using ANSI functions and char
based strings). For example, the same source that would be compiled as Unicode for Windows NT 4.0 and Windows 2000, could be compiled as ANSI for Windows 9x, since Windows 9x was a kind of hybrid Windows without the Unicode API. The Windows 9x version would then have reduced functionality, but it would at least exist!
For that target-switching usage it makes sense to have UNICODE
defined – or not – in the project settings, instead of in the code.
But how well does that practice hold up, for modern programming as of 2011? Not very well. For it is very difficult to avoid introducing direct wchar_t
dependencies, so that without UNICODE
defined the compilation simply fails, as illustrated below:
TODO #2:
- Open the Visual C++ Express solution from lesson 1 (probably named “default_gui_project”).
- In the project settings dialog, drill down to [Configuration Properties → General], and click in the edit field for the Character Set value:
- In the drop down list for the field, select e.g. Use Multi-Byte Character Set, which does not make
UNICODE
defined.- Click OK in the project settings dialog.
- Build, using e.g. keypress [F7] or via the menus [Build → Build Solution], and note that (if your code is as decribed in lesson 1) you now get a compilation error!
1>d:\winfolders\alf\my documents\visual studio 2010\projects\default_gui_project\default_gui_project.cpp(165): error C2664: 'DrawTextA' : cannot convert parameter 2 from 'const wchar_t *' to 'LPCSTR' 1> Types pointed to are unrelated; conversion requires reinterpret_cast, C-style cast or function-style cast
DrawTextA
is the ANSI version of DrawText
, selected because UNICODE
is now not defined. And DrawTextA
is not very happy with a wchar_t
based argument! So, even for that minimal program a dependency on wchar_t
had sneaked in, and the simply-compile-with-or-without-UNICODE
-defined scheme no longer worked!
Point: defining the UNICODE
symbol, or not, via the project settings, will in general just introduce a way to make compilation fail, like above. Perhaps it will fail for someone who inadvertently changes this setting. Or perhaps it will fail for someone building the software without the Visual C++ Express solution, for example building it with compilation from the command line. The UNICODE
symbol should therefore instead be defined in the source code. For example, you can create a wrapper for [windows.h] that defines UNICODE
before including [windows.h].
> How to include the [windows.h] header file in C++
The previous section showed that a simple include of [windows.h], like …
#include <windows.h>
is not good enough. One should as a minimum define the UNICODE
symbol before the include, like …
#define UNICODE #include <windows.h>
But then the compiler may issue some silly-warning about UNICODE
already having been defined (e.g. via project settings), so that one should ideally either put that definition within an #ifndef
, or more concisely just undefine the symbol before defining it:
#undef UNICODE #define UNICODE #include <windows.h>
For consistency one should then also define the runtime library’s corresponding symbol _UNICODE
in the same way. The UNICODE
symbol that we’ve so far dealt with, affects the declarations that you get from the Windows API [windows.h] header, while the _UNICODE
symbol affects the declarations that you get from the (non-standard) [tchar.h] runtime library header. Then the including code can look like this:
#undef _UNICODE #define _UNICODE // For [tchar.h] #undef UNICODE #define UNICODE // For [windows.h] #include <windows.h>
But hey, the [tchar.h] header depends not only on _UNICODE
but also on a symbol _MBCS
which, if defined, says that declarations suitable for a non-Unicode “multi-byte character set” is desired. For example, the Use Multi-Byte Character Set setting chosen in the previous section, defines _MBCS
. Since _MBCS
is in direct conflict with _UNICODE
it’s best to detect it and err out as early as possible, instead of letting the compilation go blithely on to some arbitrary later error:
#ifdef _MBCS # error "_MBCS (multi-byte character set) is defined, but only Unicode is supported" #endif #undef _UNICODE #define _UNICODE // For [tchar.h] #undef UNICODE #define UNICODE // For [windows.h] #include <windows.h>
That takes care of the Windows API and runtime library declarations that depend on the chosen character encoding. Well except the choice of startup function, which for standards-compliance should just be main
regardless of the chosen encoding. And the encoding should be UTF-16 Unicode anyway… 🙂
In addition to UNICODE
C++ code should best define the NOMINMAX
symbol before including [windows.h]. Otherwise two lowercase macros min
and max
will be defined, and will then wreak havoc with code using these names from the standard C++ library. In the 1990’s it was also generally necessary to define STRICT
before including [windows.h] in C++ code, so as to get type safe declarations that could be used from C++, but somewhere on the road STRICT
definitions became the default. Still, it does not hurt to define also STRICT
. A minimal general inclusion of [windows.h] for C++ can therefore look like this:
#ifdef _MBCS # error "_MBCS (multi-byte character set) is defined, but only Unicode is supported" #endif #undef _UNICODE #define _UNICODE // For [tchar.h] #undef UNICODE #define UNICODE // For [windows.h] #undef NOMINMAX #define NOMINMAX // C++ standard library compatibility #undef STRICT #define STRICT // C++ type-checking compatibility #include <windows.h>
To be practically useful the assumed minimum version of Windows should also be specified. This serves both to include declarations of functions introduced with that version, and to exclude declarations of functions introduced after that version. The main problem is that there are a host of different preprocessor symbols used for this version selection: WINVER
, _WIN32_WINNT
, _WIN32_WINDOWS
, _WIN32_IE
, and as of this writing the latest most shiny and new one, NTDDI_VERSION
…
The documentation is unfortunately rather vague & misleading, but happily the generated code for the 1st lesson’s program contained this clear advice, in the [targetver.h] file:
“Including [SDKDDKVer.h] defines the highest available Windows platform. If you wish to build your application for a previous Windows platform, include [WinSDKVer.h] and set the _WIN32_WINNT
macro to the platform you wish to support before including [SDKDDKVer.h].”
Relevant possible values for _WIN32_WINNT
include _WIN32_WINNT_WIN2K
, _WIN32_WINNT_WINXP
, _WIN32_WINNT_WS03
, _WIN32_WINNT_VISTA
, _WIN32_WINNT_WS08
and _WIN32_WINNT_WIN7
, where e.g. _WIN32_WINNT_WIN2K
is defined as 0x0500
, signifying Windows NT version 5.00. As I’m writing this many people are still running Windows XP, but in just half a year or so (mid 2012) I expect nearly all those people and institutions to have upgraded, with everybody then running either Windows 7 or Windows 8. Or Mac or Linux.
#ifdef _MBCS # error "_MBCS (multi-byte character set) is defined, but only Unicode is supported" #endif #undef _UNICODE #define _UNICODE // For [tchar.h] #undef UNICODE #define UNICODE // For [windows.h] #undef NOMINMAX #define NOMINMAX // C++ standard library compatibility #undef STRICT #define STRICT // C++ type-checking compatibility #ifndef _WIN32_WINNT # define _WIN32_WINNT _WIN32_WINNT_WINXP // Minimum version as per 2011/2012. #endif #include <SDKDDKVer.h> #include <windows.h>
Finally, [windows.h] will by default include a lot of headers that one will normally not need. It’s easy to explicitly include such a header where it’s actually needed, but it’s not easy to make sure that code that needs it is including it, if [windows.h] is also including it. Also, all these headers just may affect build times negatively, to some extent, and they may wreak havoc with your code.
To avoid all that, just define WIN32_LEAN_AND_MEAN
before including [windows.h].
OK, it’s not really as easy as “just” indicates. If you do define WIN32_LEAN_AND_MEAN
then you might get in trouble with e.g. the GDI+ headers, but if you don’t define WIN32_LEAN_AND_MEAN
then you might get in trouble with e.g. the Winsock v2 headers. For that matter, just defining NOMINMAX
, which is necessary to avoid trouble with the standard C++ library, can land you in trouble with the GDI+ headers. Given that, it’s probably best to just ignore the issues with low quality headers. Then the including code can look like this:
#pragma once #ifdef _MBCS # error "_MBCS (multi-byte character set) is defined, but only Unicode is supported" #endif #undef _UNICODE #define _UNICODE // For [tchar.h] #undef UNICODE #define UNICODE // For [windows.h] #undef NOMINMAX #define NOMINMAX // C++ standard library compatibility #undef STRICT #define STRICT // C++ type-checking compatibility #ifndef _WIN32_WINNT # define _WIN32_WINNT _WIN32_WINNT_WINXP // Minimum version as per 2011/2012. #endif #ifdef _MSC_VER // Visual C++ only (the SDK is Visual C++ specific). # include <SDKDDKVer.h> // E.g. g++ 4.4.1 does not support this file/scheme. #endif #undef WIN32_LEAN_AND_MEAN #define WIN32_LEAN_AND_MEAN #include <windows.h>
This is, of course, best put in a reusable wrapper! 🙂
TODO #3:
- Define a reusable C++ wrapper for [windows.h], and test it in e.g. a program that displays a message box.
Acknowledgments:
-
@DeadMG over at Stack Overflow encouraged me to not just ignore
WINAPI
. At first I decided to show what it expanded to. But then I ended up adding just “and also ignoring theWINAPI
”… 🙂
Cheers, & enjoy!,
– Alf
Thank you for your lessons. Up to now I have only done MFC-GUI-programming and I hope your lessons will help me in understanding “Naked”-API-GUI-programming for Windows a bit better.
Hey There. I discovered your blog the use of msn. This is a very smartly written article. I’ll make sure to bookmark it and return to read extra of your helpful info. Thanks for the post. I’ll certainly return.
ERROR:
I discovered that the MinGW g++ 4.4.1 compiler does not support the [SDKDDKVer.h] header. The Microsoft SDK is after all written for Visual C++. So I now made the inclusion of that header conditional on
_MSC_VER
, i.e. only when the compiler is Visual C++.You said:
“…But how well does that practice hold up, for modern programming as of 2011? Not very well. For it is very difficult to avoid introducing direct wchar_t dependencies, so that without UNICODE defined the compilation simply fails, as illustrated below…”
and
“…DrawTextA is the ANSI version of DrawText, selected because UNICODE is now not defined. And DrawTextA is not very happy with a wchar_t based argument! So, even for that minimal program a dependency on wchar_t had sneaked in, and the simply-compile-with-or-without-UNICODE-defined scheme no longer worked!…”
Your statement is only correct, in that it wouldn’t work, because your sample code describes exactly how it should *NOT* be done. Your sample code, as you have mentioned, only works with UNICODE builds. Thats because you’re supposed to use TCHAR and TEXT macros so that the same code can compile properly regardless if the project is ansi or unicode. I understand you were trying to make a point, but your statement is saying that it wont work, and you provide sample code which supports this, but your sample code is showing the worst possible example of how to use ansi/unicode specific strings. However, nowhere in your article do I see it mentioned that it could very easily work if you were to do things the right way.
The proper way of using strings for ANSI and/or UNICODE:
Use TCHAR instead of char or wchar_t
Use TEXT macro (or _T) for all strings, instead of specifically of L prefix or (no prefix) for ANSI string.
TCHAR const* const myText = _T(“This works, LOL”);
MessageBox( NULL, myText, _T(“ANSI and/or UNICODE text”), MB_OK );
The above code will compile properly regardless if the project is set to ansi or unicode. Technically, this is how strings and api should always be used.
For someone who wants to support Windows 9x while using MFC in DLLs, and unwilling to recompile MFC or not knowing about Microsoft’s Layer for Unicode, using the
T
macros is indeed a possible solution. Not “proper” as you write, but one possible solution for supporting that archaic technology.However, other than that it’s a lot of work and complication and sheer ugliness, for negative gain.
For, exactly who constitute the market for crippled applications that don’t handle international characters? That’s what you get with
T
macros when you build for ANSI. I don’t think any modern users will be impressed by your app’s ability to run on Windows 9x – if you managed to tackle all the other problems involved in targeting Windows 9x…Cheers & hth.,
– Alf
PS: Upshot: do not use the Microsoft
T
macros. Since the year 2000, when Microsoft introduced the Layer for Unicode, they’ve been a very ugly solution looking for some problem to solve. And that problem has as of 2012 not yet surfaced.My suggestion was only for developing code which would compile properly for both ansi and unicode builds.
Using TCHAR instead of char/wchar_t, and using T macro instead of L prefix/no prefix, is pretty easy, as a solution to this specific issue.
I was not recommending for anyone do it this way for “normal” (ansi-only or multibyte-only or unicode-only) development.
I have my reasons for occasionally developing for both ansi and unicode, and it has absolutely nothing to do with Windows 9x whatsoever.
If you have a more practical way to write code which would compile properly for both ansi and unicode, I would be very grateful to hear about it.
I agree with sheer ugliness, but unfortunately, I have yet to find a better way.
Thanks.
I’m learning a lot from these tutorials. I have a bit of time to experiment with this right now while I’m waiting for several machines to finish their Windows updates. 🙂
Strangely, on this particular machine, the Asian characters display fine in the source code, but not in the message box, yet the Cyrillic (I think) characters display fine both in the source and in the message box.
I’ve been thinking for a while about supporting international characters in apps that I work on, and this has further motivated me. One thing I’d like to find out is how to eliminate the “universal-character-name” warnings. I supposed I could use a “#pragma” safely, but I wonder if it’s possible to change the “current codepage” to accommodate this characters properly.
Thanks again for all your efforts.