sendmail coding standards - proofpoint

22
SENDMAIL CODING STANDARDS This document describes stylistic and policy issues surrounding the sendmail software. There are a few principles that will underly this entire document: 1. An attractive, consistent style should be used throughout. The code should appear to the customer as the creation of a team working together rather than a group of individuals working coincidently at the same company. Even minor stylistic differences can disrupt the reading. This style will be optimized for line printer output rather than terminal output, since most people reading through the code to understand it will tend to do it from a paper copy. 2. Maintenance is 80% of the task -- optimize for that, not the original coding. 3. Interface to functions will be as consistent as possible. The number of basic concepts will be minimized to keep the learning curve as easy as possible. 4. Code duplication will be kept to a minimum. This has many advantages: a smaller number of function names must be learned, bugs need only be fixed once, the size of the code is minimized, and the number of names that clutter up the symbol table are minimized. 5. Documentation is obligatory. 1. CODE As a general rule, opt for clarity over compilation or execution efficiency. Major performance improvements are achieved through algorithmic changes, not bit twiddling. A good general rule is "performance is easier to add than clarity." 1.1. Comments and Documentation Comments are an integral part of a program, not seasoning that can be added later. Although some of the following standards may seem obvious, a description may be helpful. We will use a three-level hierarchy of comments. The top level describes the functioning of the various procedures of the system. In general they avoid implementation details but are explicit about interfacing characteristics. The second level is embedded inside procedures, and describes implementation at a high level. The third level should give helpful comments about what is going on, and should be highly implementation dependent. Try to use reasonable spelling and grammar in comments. Besides tarnishing our corporate image, bad spelling makes it harder for people to read comments. 1.1.1. Procedure prologue comments Each procedure should have a block "prologue" comment preceding it. The form should be as follows (using the appropriate commenting conventions for the language being used): /* ** NAME -- one line description of function. ** ** Any description needed of the routine. This ** should not be too detailed, since as you change ** the program you will have to remember to change ** the comments.

Upload: others

Post on 19-Mar-2022

11 views

Category:

Documents


0 download

TRANSCRIPT

SENDMAIL CODING STANDARDS

This document describes stylistic and policy issues surrounding the sendmail software. There are a few principles that will underly this entire document:

1. An attractive, consistent style should be used throughout. The code should appear to the customer as the creation of a team working together rather than a group of individuals working coincidently at the same company. Even minor stylistic differences can disrupt the reading. This style will be optimized for line printer output rather than terminal output, since most people reading through the code to understand it will tend to do it from a paper copy.

2. Maintenance is 80% of the task -- optimize for that, not the original coding.

3. Interface to functions will be as consistent as possible. The number of basic concepts will be minimized to keep the learning curve as easy as possible.

4. Code duplication will be kept to a minimum. This has many advantages: a smaller number of function names must be learned, bugs need only be fixed once, the size of the code is minimized, and the number of names that clutter up the symbol table are minimized.

5. Documentation is obligatory.

1. CODE

As a general rule, opt for clarity over compilation or execution efficiency. Major performance improvements are achieved through algorithmic changes, not bit twiddling. A good general rule is "performance is easier to add than clarity."

1.1. Comments and Documentation

Comments are an integral part of a program, not seasoning that can be added later. Although some of the following standards may seem obvious, a description may be helpful.

We will use a three-level hierarchy of comments. The top level describes the functioning of the various procedures of the system. In general they avoid implementation details but are explicit about interfacing characteristics. The second level is embedded inside procedures, and describes implementation at a high level. The third level should give helpful comments about what is going on, and should be highly implementation dependent.

Try to use reasonable spelling and grammar in comments. Besides tarnishing our corporate image, bad spelling makes it harder for people to read comments.

1.1.1. Procedure prologue comments

Each procedure should have a block "prologue" comment preceding it. The form should be as follows (using the appropriate commenting conventions for the language being used):

/*

** NAME -- one line description of function.

**

** Any description needed of the routine. This

** should not be too detailed, since as you change

** the program you will have to remember to change

** the comments.

**

** Parameters:

** parameter -- short description. This

** can be multiple lines if needed.

** none -- if no parameters.

**

** Returns:

** return values, including error returns.

** "none" if no return value.

**

** Side Effects:

** List of side effects this routine may

** have, such as changing global variables,

** system state (working directory, etc.),

** or file modes. Use "none" or omit this

** section entirely if no side effects.

**

** Warnings:

** Anything the user should be careful about

** when using this routine, e.g., the return

** buffer points to static memory. This

** section can be omitted entirely if not

** used.

*/

These fields should clearly determine the functioning of the module. If you find that the semantics are hard to describe, then they can probably be changed to be clearer -- and a better program can be produced.

This level of comment blocks should not contain descriptions of implementation details under most circumstances. Although it is important for comments to exist, it is even more important that they be accurate; try to make your job easy by keeping comments explaining implementation near the code they describe.

The exception to this rule is that occasionally you will need a subroutine that will not have semantic meaning by itself. This sort of subroutine will typically only be one function. If you were coding in Pascal it would be local to one procedure. In this case the procedure should not be on a separate page, and need not have a full header. It should still have a small header that explains its function.

1.1.2. Block comments

Where procedure prologue comments may be considered chapters of a novel, block comments are the paragraphs. They should have the same indent as the code they describe to avoid obscuring the structure of the program, and look something like:

/*

** This comment describes approximately what will be

** happening in the next piece of code. There are two

** asterisks at the beginning of each line, and two spaces

** between the asterisks and the text.

*/

A blank line should be left both before and after these comments to help them stand out. Once again, too much detail can be a pain when you have to change things. But also use the visibility of these comments

to advantage: if you have an important trick that you use this is the right way to display an explanation. Naturally, it is better to avoid tricks entirely, but sometimes they are necessary, and they should be documented clearly.

Indenting can also be used in these comments to add clarity. For example, a comment might appear like:

/*

** This is approximately what I do here, with the text

** indented two spaces.

** At a one-tab indent, I point out the nasty trick

** that I am using.

** At two tabs I may give more detail.

** Back a level, I warn about dependence on the

** "vfork" system call.

*/

Indenting like this can work because it should be very rare that you are doing major tasks deeply nested in a procedure. If you find that you are, ask yourself if that task might not be better off in a sub-procedure. Remember, breaking up your program into a series of relatively straight line tasks enhances understandability, and hence simplifies debugging.

A very useful description to include in block comments are "fixpoints," i.e., explanations of what we know to be true at that point in the code. These resemble assertions fixpoint might say "'tree' now points to the complete query tree." The code to collect this might be complex, but at the beginning of each block of code it is nice to have something we know (or believe) to be true.

1.1.3. Explanatory comments

At the very bottom level are the one-line explanatory comments. It should be possible to keep them to one line because the preceding higher-level comments should have been effective at telling you what is going on, so these just give details. If they do need to be multi-line, consider elevating them to block comments.

The format of these comments is:

/* tell what is going on next */

There should be a blank line before such a comment but no blank line afterwards. The indent should be the prevailing indent of the code. There is one space between the open comment and the text, and one space at the end before the close comment. Complete sentences are not required here.

This style of code is also useful on the end of a struct component to tell what that field does, or

whatever. The explanation of what the data structure itself is for should happen in a block comment, so not too much detail should be needed. In such situations line up the beginnings of the comments.

Don't use comments on the same line as procedural code. The lengths of code lines vary so much they can be easily lost and generally muddy rather than clarify the code.

1.2. Variables

Global variables should have descriptive (but not absurdly long) names. Keep them unique in the first seven characters; we will have to port to systems with short names. Globals should always be capitalized. You can make them more readable by capitalizing multiple words, e.g., "LineNumber."

If the first character of a global variable is an underscore, the capitalization described above should be maintained. For example, "_TimeZone" is a global variable.

Local variables used frequently can be single characters, and probably should be since it can help keep the page neat. But variables like Boolean flags should have more descriptive names. In any case, keep the naming consistent: if you want to use "e" to point to an envelope, it should be called "e" throughout the program.

Libraries must be particularly careful in their choice of names. Since these are loaded in with users' programs, names that are not intended to be visible should be hidden. The names in the code can remain clear by using a header file of "#define" lines to turn the external names into internal names. For example, a library can use the name "setline" in the code, but change the name using:

#define setline _Isetline

Some systems may be able to make the names completely inaccessible through loader options or by massaging the assembler output of the C compiler.

1.3. Declarations

Global variables should be declared in header files as

extern <type> (tab) <variable>; (tabs) /* comment */

A series of globals should line up. For example:

extern char *FileName; /* name of current input file */

extern int LineNumber; /* current input line number */

Declarations of structure member names should line up. In general structures represent some complex data structure and should be treated as carefully as procedural code: well commented, indented to show

structure, etc. Structures should be renamed with typedef's if they are to be at all central to the

algorithm. So, a sample declaration might be:

/*

** NODE -- a node in our call graph.

** There is one node per datum....

*/

typedef struct node NODE;

struct node

{

char *n_name; /* symbolic node name */

int n_value; /* node value */

NODE *n_next; /* next in chain */

short n_flags; /* flags, see below */

};

/* bit values for n_flags */

#define N_PERM 0x00001 /* keep this value */

#define N_INTERN 0x00002 /* created internally */

#define N_IGNORE 0x00004 /* ignore this node this pass */

Local declarations (i.e., declarations within a procedure) should be as close to their use as possible. For example, a local variable used only within a particular scope should be declared inside that scope rather than at the top of the routine. For example:

void

func(nodelist)

struct NODE *nodelist;

{

register NODE *n;

for (n = nodelist; n != NULL; n = n->n_next)

{

int i;

/* process n using i as a temporary */

}

/* more processing */

}

As shown above, the declarations of parameters as well as the declarations of local variables should be indented one tab stop. However, the names of local variables should not be tabbed out, to avoid situations such as:

register struct descriptor *d;

int i;

extern struct descriptor *openr __P((char *));

Put declarations of scalars before declarations of buffers to improve the efficiency of the generated object code on machines with byte offset instructions.

Declarations should be separated from the body of the code by one blank line.

Procedures should be prototyped in header files using the __P macro:

extern void dropenvelope __P((ENVELOPE *));

Note the use of two sets of parentheses.

1.4. Procedural Code

A consistent coding style is important to insure that code becomes universally readable. The human mind catches patterns faster than details of content, and so we want the patterns of the code in the client software to be consistent.

1.4.1. General rules; tabbing

The most important rule is that the physical structure of the code should reflect the logical structure of the algorithm. Comments should follow this rule also. The white space on the page is not wasted space.

Curly braces should be on a line by themselves (with the exception of the closing curly brace on

a do statement) and should line up vertically (no exceptions); i.e., given an open curly brace the matching

close brace should be vertically aligned below it. Code within the braces should be indented one tab stop in.

Tab stops should always be a full eight spaces (that is, don't set the "tabstop" variable in vi to 4 so you

can fit more on the page, or mix lines with four spaces and full tabstops). If you find that you are running off the end of the line, your nesting structure is probably getting too deep for easy comprehension; breaking your algorithm up into two routines may make sense.

Another way to avoid rampant tabs is to make good use of the break, continue,

and return statements. For example, use:

for (i = 0; i < NITEMS; i++)

{

if (!qualifies(datum[i]))

continue;

/* process datum[i] */

}

rather than

for (i = 0; i < NITEMS; i++)

{

if (qualifies(datum[i]))

{

/* process datum[i] */

}

}

1.4.2. Spacing standards

As per standard mathematical convention, functions and procedures should never have a space between the name and the left parenthesis. Keywords should always have a space after them. For example, use:

r = random();

while (*p != '\0')

In virtually all cases, binary operators should have a space on both sides for clarity, but unary operators should abut their operand. Parentheses should not have spaces on their concave side unless needed for clarity in complex expressions. For example:

x = -a * (b + c);

Parentheses should be added in expressions where needed to emphasize structure, even where not strictly necessary, and extra spaces can be added to make this even more apparent; it should never be

necessary to draw in the binding rules to understand a complex if clause. Commas and semicolons

should be followed by a space or end of line but not preceded by a space. For example:

for (i = 0; i < MAX; i++)

Avoid trailing white space on lines.

1.4.3. Procedure declarations

Procedure declarations should use K&R style declarations (there are still some environments without ANSI compilers). An exception is for procedures that take variable number of arguments; they have to be

declared using #ifdef:

void

/*VARARGS1*/

#ifdef __STDC__

syserr(const char *fmt, ...)

#else /* __STDC__ */

syserr(fmt, va_alist)

const char *fmt;

va_dcl

#endif /* __STDC__ */

{

...

Any type information for a procedure return value should be on a separate line preceding the procedure declaration, so that editor searches of

/^proc

will work. Parameter declarations should be indented one tab stop to align with the local declarations. For example, a procedure header should look like:

static char *

proc(x, y)

int x;

char *y;

{

char c;

float f;

/* code for proc */

}

All procedures should have a return value declared, even if the return value is int.

1.4.4. Statements

Do not put two statements on the same line. A person scanning the code has a significant advantage if they can know that there is only one statement per line. For example, use:

if (cond)

process();

rather than

if (cond) process();

Switch statements should have the following format:

switch (expr)

{

case 1:

code 1;

break;

case 2:

code 2;

break;

default:

default code;

break;

}

The case code should be in one tab stop from the switch statement. The case labels themselves should be indented two spaces. This avoids using up two tab stops for each switch statement, and closely mimics the if-else form:

if (expr == 1)

code 1;

else if (expr == 2)

code 2;

else

default code;

This latter form should be used if the selection criteria are too complex to be represented in a switch.

Breaks should be included on all case clauses, including the last one; if the intent is to fall through to the

next case, put a comment to that effect.

Do statements should have the form:

do

{

code;

} while (condition);

The while should be on the same line as the close brace to emphasize that this is the end of a do,

rather than the used even if the code is only one line. This is the only case in procedural code where a right brace does not occur on a line by itself.

When executing a while or for for side effects only, use the continue statement to provide clarity,

e.g.,

for (p = str; *p != '\0'; p++)

continue;

This makes the desired effect more visible than a semicolon on the for line or even on a line by itself.

A particularly wonderful and yet horrible operator is the comma operator. In general, do not use the comma operator where a block will do, e.g.,

if (cond)

{

a = 1;

b = 2;

}

not

if (cond)

a = 1, b = 2;

However, this can be a most useful operator to emphasize parallelism. For example, consider:

for (pa = a, pb = b; *pa != '\0'; pa++, pb++)

{

if (*pa != *pb)

break;

}

This emphasizes that pa and pb are being advanced in tandem and maintains the symmetry of the algorithm. Other than cases such as this, the comma should be considered a dangerous operator.

In a condition part of if, while, or for tests against zero or NULL should be explicit:

if (p != NULL)

- not -

if (p)

However, tests against TRUE and FALSE should be implicit:

if (send_to_log)

- not -

if (send_to_log == TRUE)

Generally speaking, tests against zero are more efficient than tests against explicit values. For example, to check a system call for failure, it is best to use

if (read(...) < 0)

- not -

if (read(...) == -1)

If testing against a constant, the variable expression should come first as shown in all previous examples; do not use

if (NULL == fopen(...))

Avoid excessive braces. For example, use:

if (a > 0)

a++;

- not -

if (a > 0)

{

a++;

}

However, in some cases this can make sense to create symmetry:

if (a > 0)

{

a++;

}

else

{

a--;

deadbeef = TRUE;

}

If you have an if-else clause with one short section and one very long section, put the short section first, inverting the condition if necessary.

1.5. Conditional Compilation

Conditional compilation (#if) should be limited. When needed for portability, a generic routine should be defined that then has different implementations on different architectures; where possible, Posix routines should be used. For example,sendmail has a freediskspace routine that returns the number of blocks of space available on a given filesystem; the (extremely) system-dependent implementation is in one place in conf.c, while other files just call this. When conditional compilation is required, #if is preferred

over #ifdef. This permits defaults to be set as

#ifndef NEEDFOOBAR

# define NEEDFOOBAR 1

#endif /* !NEEDFOOFAR */

At compile time, this can then be overridden with -DNEEDFOOBAR=0. Note also the use of a single space indent after the # to indicate nesting.

The comment on a closing #endif can be omitted if the commented-out section is so small that the opening statement is likely to still be in view and unambiguously matched to the closing statement. Many people find

#ifndef NEEDFOOBAR

# define NEEDFOOBAR 1

#endif

more readable than the example above, especially when it is part of a long pattern of short, similar

statements.

1.6. Semantics

This section describes the standards that are not purely syntactic.

1.6.1. Arguments to procedures

An old adage has it that "if you have twelve arguments to a procedure, you have forgotten one." Except under very unusual circumstances, there should be at most around four parameters to a procedure. This can be relaxed if they are logically grouped; e.g., a procedure taking two sets of {type, length, value} parameters can be logically thought of as having two sets of three parameters rather than six parameters.

Related procedures should have semantically similar arguments. For example, a set of procedures used to send typed data values should either always send a pointer or never send a pointer, and sets of {type, length, value} parameters should always be in that order.

1.6.2. Depth of control structures

The nesting depth of control structures should be kept low. It is possible to design code where the flow of control merges together frequently. The general structure of this sort of code is a straight line of control structures; complex nesting is achieved by calling procedures. If each procedure has semantic meaning then the human mind needn't struggle to comprehend the depth of control structure.

1.6.3. Assertions

As a side effect of simple control structures, it becomes easy to insert assertions in the code. At any point where the control flow remerges it should be possible to say something about the state. Assertions are to be encouraged because they detect errors early.

1.6.4. Flow of control

In order to insure that libraries can run in many environments, libraries should not use the technique of letting the user redefine a routine in order to perform special operations. For example, do not let the user define a routine "print_error" to override a library routine. Instead, use a callback routine to point to the appropriate routine:

extern int (*ErrPrFunc)();

extern int myerrfunc();

ErrPrFunc = myerrfunc;

This sort of construct should be rare.

1.7. Miscellaneous

1.7.1. Duplicate #include's

The contents of header files should be surrounded by:

#ifndef SOMEDEFINE

# define SOMEDEFINE

...

#endif /* !SOMEDEFINE */

This allows users to include the header files more than once without diagnostics.

1.7.2. C Strings

To avoid buffer overflows when using C strings, i.e., char arrays, the functions snprintf(), strlcpy(), strlcat(), etc, should be used instead of their counterparts that do not enforce length restrictions.

1.7.3. sizeof() vs Macros

If you define a C string

char buf[MAXSIZE];

then the length of buf should be specified via sizeof(buf) instead of MAXSIZE inside the function. This

avoids errors when a different macro is used to define the length of buf. However, make sure you don't

mistakenly use sizeof(str) if str is a pointer to char, not an array of char (unless you really want the size of

the pointer, not the size of the array, i.e., the length of the string/buffer).

If allocating memory for structures, it is preferred to specify sizeof(struct) instead of sizeof(type-of-struct). For example:

struct my_struct *mys;

then use:

mys = (struct my_struct *) malloc(sizeof(*mys));

instead of

mys = (struct my_struct *) malloc(sizeof(struct my_struct));

1.8. Portability

Client software programs will be written with an eye toward portability between hardware architectures and operating systems. As we gain experience in how to do this, this section will be expanded.

1.8.1. Syntactic Limitations

This section describes constructs that are supposedly legal in C that do not port to certain target environments.

1.8.2. Referencing Header Files

One thing we want to be able to do is to maximize the flexibility of module location. For example, it may be useful to move a module to a private subtree, linking it to the existing modules in the original subtree to conserve disk space or reflect other changes. Also, there is some possibility that the structure of our development subtree will differ from the structure of the distribution subtree.

In order to make this work, references to header files (in #include statements) should not mention any directory (the syntax of directory names are different on different operating systems). For example, to include the header file "../h/xxlib.h" just use

#include <xxlib.h>

The Makefile will then include a -I flag to the C compiler to tell it where to find this header file. Typically

this will say "-I$H", and the H macro will be defined appropriately. This is also needed to provide cross-machine portability.

To make the dependencies work correctly, system header files will need to have the dependencies change from "xxlib.h" to "$H/xxlib.h"; this is easily done with a sed script that is run automatically as part of the "depend" function. This sed script can also be used to perform transitive closure on header files.

1.8.3. Structure Member Names

All members of all structures must be unique. This is most effectively done by prepending all structure names by a unique prefix that will typically indicate the usage, e.g.,

struct passwd

{

char *pw_name;

char *pw_passwd;

int pw_uid;

....

};

1.8.3.1. Structure alignment

Try to align structures so that an element would fall on a boundary that matches the size of the object to minimize padding. For example, on some architectures doubles are aligned on 8-byte boundaries; try to arrange the structure so this alignment would occur anyway. Use:

struct

{

long l;

short s;

double d;

};

Not:

struct

{

short s;

double d;

long l;

};

The first example is guaranteed to consume at most sixteen bytes, while the second could consume as much as twenty bytes.

1.8.4. Naming and Variable Usage

1.8.4.1. Side effects in parameter lists

Do not depend on evaluation order of parameter lists. For example, the call:

f(i++, i);

is not guaranteed to pass the same value of i on all systems.

1.8.4.2. Register variables

Rank register variables in order of importance so that architectures with few registers get the important variables in registers.

Do put pointers dereferenced (e.g., "*p" or "p->x") more than once in a register.

Do put array indices in registers if referenced more than once.

Do not put simple scalars that are not used as array indices or pointers that are never dereferenced in registers. The cost of copying the value into the register exceeds the benefit on many architectures.

1.8.5. Byte Ordering

The order of bytes as packed into words is highly nonportable and dependencies should be avoided whenever possible. This is unavoidable for certain types of applications, but the use should be minimized.

1.8.6. Data Types

1.8.6.1. Choice of data type

Use short for 16 bit integers. Use long for 32 bit important (e.g., array indexes). Typically the size

of int is chosen to be whatever size is more efficient for a particular architecture.

Note that with the advent of 64 bit systems, this advice is a little bit out of date and code should begin using the int32_t type if available.

1.8.6.2. Sign extension

Do not depend on assignments from char to int to do (or not do) sign extension.

Do not depend on the right shift operator (`>>') to do sign extension.

1.8.6.3. Enum type

Do not use type enum as it is not supported by all C compilers.

1.8.6.4. Void casts

Use the cast "(void)" to explicitly ignore the results of a function that returns a value:

(void) unlink(tempfile);

In the unlikely event that you have a compiler that doesn't support the void type, you may have to

include:

typedef int void;

to avoid compilation errors. These days such compilers should be very rare.

1.8.6.5. Pointers are not integers

Never assume that the sizeof(char *) is the same as sizeof(int).

1.9 Standard C

If your project chooses to use Standard C, rather than K&R C, the following guidelines apply. They're written using the assumption that C89 is widely available, and that some features of C99 are desirable and should be used whenever they can be emulated if they're missing.

For example, it is easy to supply <inttypes.h> and <stdbool.h> header files if they're missing; but it is impossible to emulate variable length arrays. Consequently, bool and int_fast32_t can be used, but VLAs cannot.

Beware: many C99 extensions have been supported by gcc for quite a while. Consider compiling your code with multiple compilers to help catch accidental use of more common extensions. AIX's xlc seems to have the fewest extensions amongst the compilers we commonly use.

1.9.1 Comments

Do not use // comments, even if your compiler supports them.

1.9.2 Integer Types

C guarantees sizes (and always has) for its existing types:

char must be at least 8 bits;

short and int must be at least 16 bits;

long must be at least 32 bits.

C99 also made an new extra-long integer type standard that already existed in many pre-99 compilers:

long long must be at least 64 bits.

In practice, all machines our commercial releases are ported to support a 64-bit type.

Not all of these 64-bit types are called "long long". Not all of these 64-bit types are printed using "%lld".

1.9.2.1 <inttypes.h>

C99 has introduced a new header, <inttypes.h>, that defines a large number of new integer types in preparation for easily extending integers to arbitrary widths beyond "long long", and to provide a number of ancillary services for existing derived integer types like size_t and ptrdiff_t.

Very few, if any, of the systems we port to have a Standards-compliant <inttypes.h>. If you decide to use this mechanism, expect to spend some time building it first. It is possible to build <inttypes.h> and <stdbool.h> for any given machine; it is near impossible to build portable, generic ones.

It is tempting to use the new types to more strongly document your assumptions about integer sizes, but be aware of the trade-offs in legibility and ease of use with existing C library functions as you do so.

Familiarize yourself with all of the new <inttypes.h> header (and the <stdint.h> header it includes) and use the types and macros in it where appropriate, with care.

Use (u)int_fast<number>_t (for <number> in 8, 16, 32, 64) to refer to an (un)signed integer that you require to simply be able to hold that many bits. Alternatively, just use char, short, long, and long long, as before.

Use (u)int_least<number>_t (for <number> in 8, 16, 32, 64) to store an (un)signed integer in as small a space as possible. Alternatively, just use char, short, long, and long long, as before.

These types must exist in any C implementation. When providing your own <inttypes.h> for an

implementation without the header, they can portably and safely default to char, short, long, and long

long, respectively. Pitfalls:

Do not use (u)int<number>_t unless you really need _exactly_ that many bits. Unlike the previous two classes of integer types, the int<number>_t types are not guaranteed to exist on all platforms. The same is true for bit sizes other than the traditional ones. To test whether a type exists, use #ifdef on its limit. E.g.,

#ifdef INT8_MAX

/* code requiring signed integers with exactly 8 bits */

#endif /* INT8_MAX */

Be aware that you are programming with standard C library functions that mostly did not know about <inttypes.h> when they were conceived, and that casual mixing of traditional integer types and the numeric ones may not yield more correct code than had you stuck to the traditional types.

To test for over- or underflow in connection with the numbered types, use the new (U)INT_(FAST/LEAST)<number>_MAX and INT_(FAST/LEAST)<number>_MIN limits.

To declare an integer constant with at least a certain width, use (U)INT<width>_C(<number>). (Note that there are no corresponding _C macros for the FAST and LEAST types; the unadorned macros already have the "at least that many bits" semantics.)

To print an integer using printf and relatives (or scan it using sscanf), use the PRI(u|d)(LEAST|FAST|)<number> macros like this:

int_fast32_t my_value = 0;

printf("You are visitor %6" PRIuFAST32 "\n", my_value);

Note that the "%" and any formatting parameters like precision or padding are not part of the

macro.

Unless you have a good reason not to, use typedefs to centrally define the integers that your application data types map to. These are design decisions that should be made once and then merely used, not repeated in each module that touches the data. When creating typedefs for your integral application data, it is often a good idea to also define the ancillary macros at the same time:

upper limit (and lower limit, if the type is signed)

printf and scanf specifiers

integer constant declarators

For example, to fully define a type "my_uid" as a signed 64-bit integer, one might use: typedef int_fast64_t my_uid;

#define MY_UID_MAX INT_FAST64_MAX

#define MY_UID_MIN INT_FAST64_MIN

#define MY_PRIdUID PRIdFAST64

#define MY_SCNdUID SCNdFAST64

#define MY_UID_C(x) INT64_C(x)

1.9.3 Boolean Types

C99 has introduced a new header, <stdbool.h>, that defines the type "bool" (as a macro evaluating to _Bool) with values "true" and "false".

Not all the platforms we port to have such a header; if you're using this mechanism, be prepared to supply one if it isn't there; here is a sample:

typedef unsigned int _Bool;

#define true 1

#define false 0

#define bool _Bool

#define __bool_true_false_are_defined 1

(See section 7.16 of the C standard.)

Use booleans in place of an unsigned integer type, 1, and 0 to document an assumption about the range of values.

Do not use explicit comparisons with true or false. Don't use casts to (bool) (or the underlying built-in C type, _Bool) to convert any non-zero integer to 1. C99 guarantees that this is possible, but it is impossible to emulate without compiler support.

1.9.4 Functions

When name and parameters of a function fit comfortably onto a single line, use a single line.

void

usage(char const *program_name)

{

/* ... */

}

When there are more parameters, or to be more formal, give each parameter a separate line, indented by a tab. Pull the closing parenthesis onto the end of the last parameter.

void

usage_exit(

FILE *stream,

char const *program_name,

int exit_code)

{

/* ... */

}

Varargs functions are always declared using the <stdarg.h> form:

void

usage_exit(

FILE *stream,

char const *program_name,

char consts *fmt,

...)

{

/* ... */

}

Function declarations (in header files, for example) look just like function definitions, except that the return type is pulled into the same line as the function name.

void usage(char const *program_name);

If your project uses exclusively Standard C compilers, do not use the __P() macro mentioned in section 1.3.

1.9.5 Macros

Use # to "stringize" a macro parameter.

#define printval(e) printf("%s is %d\n", #e, (e))

Use ## to "paste" two tokens together.

#define chain_init(chain) \

((chain ## _head = NULL), \

(chain ## _tail = & chain ## _head))

(Formerly, these would have been written

#define printval(e) printf("%s is %d\n", "e", (e)) /* BAD */

#define chain_init(chain) \

((chain/**/_head = NULL), \

(chain/**/_tail = & chain/**/_head))

respectively, and would have been not quite portable.)

Avoid defining macros with variable arguments (a new feature added in C99); if you can't avoid it, offer a static inline function as an alternative, and embed the macro definition in a feature test for C99.

#if __STDC_VERSION__ > 199901L

# define log(...) my_log(__FILE__, __LINE__, __VA_ARGS__)

#else /* __STDC_VERSION__ > 199901L */

static void inline log(char const *fmt, ...)

{

va_list ap;

va_start(fmt, ap);

my_vlog(__FILE__, 0, fmt, ap);

va_end(ap);

}

#endif /* __STDC_VERSION__ > 199901L */

1.9.6 The type qualifier "const"

Use "const" to let the compiler enforce your promise not to modify a value, and to allow users of your interface to call your functions with values they wish to promise not to modify.

Do not use const to indicate that a value's meaning stays the same, even if its real C structure contents may change. For example, do not declare "ob" const in the example below:

value *

get_value(object *ob)

{

if (ob->value == NULL)

ob->value = load_object(ob->pathname);

return ob->value;

}

When writing functions that return their arguments (or pointers into their arguments) without modifying them, it is impossible to write a type-safe function that works both for const and non-const arguments.

In this case, choose type-unsafety. Return a non-const value, but accept a const one:

char *

strchr(char const *str, int ch)

{

while (*str != '\0')

if ((unsigned char)*str++ == ch)

return (char *)--str;

return NULL;

}

You'll have to add a cast; but better you, once, than the users of the function every time they want to modify the result.

It is perfectly Standard-conforming, although risky, to cast a pointer to a const object into a pointer to a modifiable object, as long as the object isn't actually modified.

Do not use "const" value declarations in place of #defines and enums. That works well in C++ and in Java, but not in C. (For example, const ints cannot be used as cases in switch statements.)

1.9.7 Function pointers

If function pointers are a rare and deliberate event in your code (for example, the single call to the

"compar" function in an implementation of a binary search algorithm), explicitly dereference the pointer to

document that this isn't a normal function call.

result = (*compar)(ptr1, ptr2);

If you're writing a dynamically linked module that uses function pointers as a replacement of statically bound functions, or if you're emulating an object-based dispatch table in the style of C++'s virtual objects:

if (stream->read(stream, buffer, sizeof(buffer)) <= 0)

{

stream->close(stream);

...

}

omitting the (*...) is guaranteed to work and will be less distracting.

When assigning to a function pointer, use the name of the function you're assigning:

filter_callback = my_filter_callback;

do not use & .

filter_callback = &my_filter_callback; /* BAD */

(The function name without () already is the function's address; while taking its address is allowed and yields the same value, it is unnecessary, and does not add clarity since the use cannot be confused with a function call.)

1.9.8 The "register" storage-class specifier

Don't use "register" unless you've looked at the assembler code and know that it makes a difference.

1.9.9 The offsetof() macro

Use offsetof to compute the offset of a struct member from the beginning of its struct. The resulting integer has the type size_t and is usable as a constant initializer.

Not all platforms support offsetof. If you have to supply your own,

#ifndef offsetof

# define offsetof(str, mem) ((size_t)&((str *)0)->mem)

#endif /* ! offsetof */

will work on most platforms.

1.9.10 The type size_t.

Use the unsigned integer type size_t for anything that is the size of an object or derived from the size of an object. For example, the number of elements in an array can always be stored in a size_t. Array- or loop indexes should almost always be size_t.

Keep in mind that size_t cannot store values less than zero; this loop

void

print_string_array_backwards(

char const * const *a,

size_t nelems)

{

size_t i;

for (i = nelems - 1; i >= 0; i--) /* BAD */

printf("%lu: %s\n", (unsigned long) i, a[i]);

}

never terminates. Instead, rewrite the loop head as

for (i = nelems; i-- > 0;)

1.9.11 Enums

Enumerations are a convenient way to define automatically incremented constant tags, and to document the connection between a declaration and the value they're supposed to have.

By default, enumerations are zero-based. If you need the number of elements in a set, it's a common trick to define that constant simply as the last value in the enumeration.

It's good style to tag that "last value" assumption with an explicit comment; otherwise, people adding to the enum may overlook the special status of the last entry.

enum defcon

{

DEFCON_BLUE,

DEFCON_GREEN,

DEFCON_YELLOW,

DEFCON_ORANGE,

DEFCON_RED,

DEFCON_N /* Must be last */

};

If the enumeration constants are used as labels in a "switch" statement over the enumerated type, this will lead gcc -Wswitch to complain that one of the enumeration constants isn't handled. To work around that, #define the maximum, don't include it as an enum value:

enum defcon

{

DEFCON_BLUE,

DEFCON_GREEN,

DEFCON_YELLOW,

DEFCON_ORANGE,

DEFCON_RED

};

#define DEFCON_N (DEFCON_RED + 1) /* Update if adding values */

Do not use enumerations for values that should be available during preprocessing (e.g., in #ifdef).

Standard C allows trailing commas in enumeration lists. Don't use that feature; AIX xlc doesn't support it.

enum primary

{

red,

yellow,

blue, /* BAD */

};

Be aware that enum types deliver less type checking than you probably think they do. In particular, the standard allows assignments from one enum type to another, and assignments of values other than the defined range. The enumerated constants have type "int", not the enum type.

1.9.12 String literal concatenation

C89 implicitly concatenates adjacent strings. This makes for a very convenient way of writing multi-line string constants:

char const message[] =

"Subject: Greetings!\n"

"\n"

"Hi, how are you?\n";

In K&R, this string would have had to be flushed-left, with \ before each hard newline. Use the new form; it's much easier to read.