secure programming chapter 2 strings. overview ● arrays and their problems ● character strings...
TRANSCRIPT
Overview
● Arrays and their Problems● Character Strings● Common String Manipulation errors● String Vulnerabilities and exploits● Mitigation Strategies● String Handling Functions, the bad and the good● Runtime Protection Strategies● Some Notable Vulnerabilities● Summary
Arrays and their Problems
1) Hard to determine size.
2) Size defaults may not work.
3) Easy to index an array out of bounds.
4) Easy to write non-portable code (non-consistent handling, for example).
5) Size parameters may be wrong (see 3))
6) Array copying may overflow the array
7) Pointer arithmetic may be incorrect.
Character Strings
The problem: Many strings come from outside:• Command line arguments• Environment variables• Console or other input• Text files• Network Connections
Strings are not built-in to C/C++, though there is (some) Library support
Character Strings: String Data Type
Most people implement a string as a Null terminated array of characters; addressed by a pointer. Have all the problems of arrays magnified because most string manipulation is done through procedures.
Five Important terms for arrays:
1. Bound = size of the array.
2. Lo = Address of first element of the array
3. Hi = Address of last element of the array
4. TooFar = The address of the one-too-far element of the array = Hi + 1 = Lo + Bound
5. Target size (Tsize) = Bound
Character Strings: String Data Type
Two more terms for strings.
1. Null-terminated if there is a null character within the array.
2. Length: For null-terminated strings, the number of characters before the (first) null terminator.
Problem with determining array size (clear procedure)
Character Strings: String Data Type
More problems:
What Characters? “Execution Character Set”
-locale- setlocale() function
Basic execution character set: 26 UC/LC letters, 10 digits 29 graphic characters, space, 33 control characters including HT VT FF Bell BS CR NL, NULL, DEL
Execution character set may contain many characters, require multiple bytes to represent a character (multibyte character set); basic character set still present. Locale-specific shift states.
Character Strings: UTF-8
Can represent any character in the Unicode character set, use 1-4 bytes.
0-127, 1 Byte
o.w As many 1 bits as the total number of bytes in the sequence, followed by a 0 bit; all succeeding bytes start with 10.
Thus: If leading 0, 1 byte:
If leading 11, start of multibyte code
If leading 10, continuation of multibyte code.
(Watch out for vulnerabilities!)
Wide Strings
16 or 32 bit characters
Terminated with a null wide character.
As is the case with regular strings (with caveats!)● Pointers point to left-most character.● The length is the number of wide characters
preceding the null wide character.● The value is the sequence of code values of the
contained wide characters, in order.
String Literals
Enclosed in double quotes “
Wide string literals prefixed by L
String literal tokens are concatenated together. If any of them is prefixed by L, the string is a wide string. Example in text, page 34. Null appended, used to initialize a static array.
In C, such a string is modifiable (no 'const' modifier available) but modification is “forbidden”.
Watch for declarations of the form:
const char s[3] = “abc”; //Not Null terminated string. Use:
const char s[] = “abc”
Strings in C++
● Proliferation of string classes.● Standardized (STL) down to
● String = typedef for basic_string<char>● Wstring = typedef for basic_string<wchar_t>
● Also allows:● null-terminated byte string (NTBS)● NTMBS is an NTBS that contains a sequence
of valid multibyte characters and ends in the same shift state it starts.
Strings in C++ (2)
basic_string class template specializations are safer than NTBS, but
NTBS are required all over the place:● Literals are NTBS● Existing libraries need NTBS or NTMBS
string objects are passed by value or reference, while c-strings are passed by pointer.
Thank goodness for member function data aka c_str
Character Types
Three types:● Plain● Signed● Unsigned
May cause compiler warnings if the wrong type is used.
int
Some gotcha's:● Getc and friends return an int so that EOF is an
authentic -1.● Functions in ctype.h (cctype) like isalpha accept an
int because they might be passed the result of a getc or similar.
● In C, a character constant has type int, so that sizeof('a') is 4, not 1. In C++ a character constant has type char and its size is 1.
Wide character literals have type wchar_t and multicharacter literals have type int.
Unsigned char and wchar_t
Unsigned char: all bits handled equally; pure binary. No padding bits, no trap representation, no sign extension, etc.
wchar_t: Can be used for natural-language character data. For characters in the basic character set, it does not matter, except for type compatibility issues.
Sizing String headaches
Three important numbers:
Size = number of bytes allocated to the array (sizeof(a))
Count = number of elements in the array (maybe different from size!)
Length = Number of characters before null terminator.
Notes:
If characters are wide, size may be 2*count or 4*count. (depends on OS)
Length MUST be smaller than count.
See Program fragments in book, pages 40-41.
Common String Manipulation Errors
● Use of gets NONONONONONONONO!!!!!!!!!!● Improperly bounded string copies. Do not use:
● strcpy()● strcat()● sprintf()
● Watch out for:● Input strings● Environment strings● Parameter strings.... (see programs, pp 42-47)
Common String Manipulation Errors
● Sizing strings: ● do not use strlen for wide strings; use wcslen● Multiply result by sizeof(wchar_t)
Programs, pages 41-42● Improperly bounded string input:
● Do not use:● gets● cin of string with unbounded length● Unbounded string scanf
See programs pp 42-43 (the program on page 43 is a typical implementation of gets)
Common String Manipulation Errors
● Careless copying and concatenation of strings
Program, page 44● Watch for strcpy, strcat, memcpy, sprint, etc.
● Off-by-one errors. (see program, page 47)● Null termination errors (pp 49-49)● String truncation● If you implement them yourself, you may still be
in trouble! (page 50)
String Vulnerabilities and Exploits
● String Vulnerabilities and Exploits● Where does your data come from? Are you
sure?
Program on page 51 is bad:● Uses gets● Doesn't even check the exit status of gets
String Vulnerabilities and exploits
(see ASM code, pp 56-58)
Effect called “Stack Smashing”
Example follows (remember the code from IsPasswordOK?)
String Vulnerabilities and exploits
● Code Injection:● Injection of malicious address and malicious
code● Must be acceptable as legitimate input● May not cause abnormal termination● Must result in execution of the malicious code.
● IsPasswordOK is vulnerable (page 65)● Exploit with fgets and strcpy on page 66
(unclear; obviously not tested).
String Vulnerabilities and exploits
Arc injection aka return-into-libc includes:
Branching to an existing function
System(), exec(), setuid() are favorites
Example of vulnerable code, page 70
Prevents memory-based protection schemes from working.
String Vulnerabilities and exploits
Return-Oriented Programming
“gadget” = sequence of instructions followed by return.
Turing-complete set exists for many architectures, including x86, Solaris libc and there is a compiler.
Programs use the stack; values are pushed/popped,
return addresses can be skipped for branching.
Actually similar to FORTH programming.
Mitigation Strategies
Two kinds:
Prevent buffer overflows
Detect buffer overflows and recover securely
Best to do defense in depth and apply both.
Mitigation Strategies
Preventing Buffer Overflows:
Cert recommends using a consistent plan for managing strings.
Three models:
1) Caller allocates and frees
Most likely to prevent memory leaks
2) Callee allocates, caller frees
Ensures sufficient memory is available
3) Callee allocates and frees (only available in C++)
Most secure of the three solutions
Mitigation Strategies
Mitigation strategies:
Caller allocates and frees:C <string.h> family expanded with c11 functions:
strcpy_s strcat_s strncpy_s strncat_s
See example 2.5, 2.6, pages 74,75
Mitigation Strategies
Callee allocates and frees
Biggest problems:
DOS attack by exhausting memory
Dynamic memory management errors
Example 2.7 p 77
FILE *fmemopen , *open_memstream(signature, p78) to do memory “I/O”
Example code, page 79
Dynamic allocation disallowed in safety-critical systems
String Handling Functions, the bad and the good
gets: replace with fgets or getchar
Examples 2.9, 2.10, pp 84-86
… or gets_s
Example 2.11, page 87
… or getline() (~= getdelim())
Example 2.12, p88
String Handling Functions, the bad and the good
Strcpy() and strcat()
Fixes:
Allocate required space dynamically
Strncpy and strncat are not recommended.
Strlcpy() and strlcat() (always null-terminate result)
strcpy_s and strcat_s (implementation, page 91)
Strdup() (dynamically allocated, requires free().
Summary, pp 92-93
String Handling Functions, the bad and the good
strncpy() and strncat() (p 93)
See strncpy_s (p 95) and strncat_s (pp 97-98)
strndup() (uses dynamic memory allocation)
Summary on p 99
String Handling Functions, the bad and the good
memcpy() and memmove(): replace by memcpy_s() and memmove_s() respectively
Watch out for strlen(). There is an strlen_s, strnlen and strnlen_s, all identical.
Runtime Protection Strategies
Detection and recovery
Provided via:
input validation
the compiler and its runtime system (e. g. array bounds checking)
Operating system
Runtime Protection StrategiesInput Validation
Input data size checking.
Object size checking (with ___builtin_object_size()) Use by turning on _FORTIFY_SOURCE=n for n ⩾ 1 (p 104, 105)
Runtime Protection StrategiesThe compiler, runtime system.
Visual Studio Compiler-Generated Runtime Checks
Turn on with flags: /RTCs turns on checks for:
Local variable overflows (including arrays)
Use of uninitialized variables
Stack pointer corruption
Can be tweaked: #pragma runtime_checks(“s”, off/restore)
Runtime Bounds Checkers:
Libsafe
Libverify
CRED
Runtime Protection StrategiesThe compiler, runtime system
Stack Canaries:
StackGuard
GCC's Stack-Smashing Protector aka ProPolice
-fstack-protector[-all] -wstack-protector
C++ .NET stack overrun detection capability /GS
recommend adding: #pragma strict_gs_check(on)
recommend adding #pragma string_gs_check(on)
Recommend compiling with /GS flag and linking with /GS compiledlibraries.
Runtime Protection StrategiesThe Operating System
Address space layout randomization
Linux (PaX project, 2000)
Windows, since Vista
MAC OS X since 2007/2011, IOS since 4.3
Nonexecutable Stacks
W^X
Data Execution Prevention (Microsoft Visual Studio)
PaX marked stack as non-executable
StackGap