win32 programming lesson 6: everything you never wanted to know about strings
TRANSCRIPT
Win32 ProgrammingLesson 6: Everything you never wanted to know about strings
Before We Begin Several of you probably had problems with
character types in the last assignment, especially when reading the command line
Why? Because in Windows, strings aren’t always strings (if that makes sense)
Why? Traditionally, a C string is a sequence of
bytes, terminated by a NULL Unfortunately, this only accommodates 256
different characters, and that’s too few for some languages (Kanji being the classic example)
DBCS To fix this problem DBCS was created. In a DBCS each character consists of 1 or 2
bytes This means things like strlen don’t work
correctly Helper functions exist, but the solution is ugly Enter UNICODE
WBCS Wide Byte Character Set == Unicode Consortium founded in 1988(!) See http://www.unicode.org for more
information that you could ever want All characters are 16 bits in length
Why bother? Enables easy data exchange between
languages Create a single binary that supports all
languages Improves execution efficiency
History Unicode really is much more of a Windows
2000 thing… Support in 98 was lacking However, looking to the future, we’ll ignore
the old 16-bit application space Windows CE is Unicode only
Writing Unicode Code… It’s possible to write pure Unicode
applications using several new functions in the RTL
However, you can write code which is *both* very easily using macros
Unicode types typedef unsigned short wchar_t; Declared in string.h wchar_t szBuffer[100] allocates 100
characters but not 100 bytes Breaks strcat, strcpy etc. Equivalent functions with wcs replacing str
e.g. wcscat
A Better Way tchar.h Introduces a series of macros which allows
the program to use Unicode or not, depending on compilation options
Creates a new TYPE TCHAR which is equivalent to a char if _UNICODE is not defined, and a wchar_t if it is
Problems Imagine this:
TCHAR *szError = “Error”; wchar_t *szError = “Error”; TCHAR *szError = L“Error”; TCHAR *szError = _TEXT(“Error”);
Windows Unicode data WCHAR: Unicode character PWSTR: Pointer to a Unicode string PWCSTR: Pointer to a constant Unicode
string
Windows API Revisited CreateWindowEx doesn’t exist… Really, is CreateWindowExA and
CreateWindowExW One is ASCII, the other is Unicode Switched in WinUser.h depending on the
definition of UNICODE
Unicode Gotchas Use type BYTE and PBYTE to define bytes Use generic type TCHAR etc. Use the TEXT macro Beware string arithmetic… don’t think about
sizeof(szBuffer) as the number of characters you can hold! Similarly, think about malloc
Windows functions Use lstrcat, lstrcmp, lstrcmpi, lstrcpy and
lstrlen instead of wcs/str counterparts Some use the Windows function
CompareString Useful for fancy language comparisons There are a whole host of these functions (like
CharLower and CharLowerBuff…)
Type Conversion Of course, sometimes you have to convert
from ASCII to Unicode in a program Use MultiByteToWideChar to make Wide
characters Use WideCharToMultiByte to make regular
characters
Your pwn DLLs You can write your DLLs to provide both
ASCII and Unicode support For example, imagine a routine which
reverses a string… BOOL StringReverseW(PWSTR pWideCharStr)
Instead of writing a completely separate function for StringReverseA… it should convert to WCS and then call StringReverseW and then convert back
Prototype BOOL StringReverseW(PWSTR pWideCharStr);
BOOL StringReverseA(PSTR pMultibyteStr);
#ifdef UNICODE#define StringReverse StringReverseW#else#define StringReverse StringReverseA#endif
Not-too-difficult Assignment Sort n words from the command line in
ascending alphabetic order (unless the –d flag is set , in which case descending), and have your program compile and run easily with MBCS or UNICODE set
Next Class Simple Kernel Objects…