Dark Side of C Programming Language

Mixed Operators

What is the outcome when the following code is executed: 0 or 1?

int a = -1, b = 1, c;
c = a+++b;
printf("a, b, c = %d, %d, %d\n", a, b, c);

This code snippet is interpreted by the compiler as follows:

c = a++ + b;

Because the increment syntax ++ is placed after a, the compiler, while executing from left to right, postpones the increment operation. Thus, the execution result is: a, b, c = 0, 1, 0.

The Mystery of the Exclamation Mark

This code segment compiles flawlessly, without warnings or errors, and executes successfully:

int a = 'cat!';
printf("a = %d\n", a);

Upon execution, the variable a is assigned the value 1685022497, which changes with every run. Experiment by removing the exclamation mark at the end or by extending the term cat to exceed three bytes, for instance, to cats, and note the results.

Switch-Case Code Fragment

In a switch-case statement, only the code within the cases has an effect, rendering the initialization of variable b pointless.

int a = 0;

switch (a) {
    // outside case will not be executed
    int b = a + 10;
    case 0:
        printf("b = %d\n", b);
    break;

    default:
	    break;
}

The execution result of this program is: b = 0.

define and typedef

Defining two byte pointers using both define and typedef, then checking their sizes with sizeof reveals:

#define BYTE_PTR1 unsigned char *
typedef unsigned char * BYTE_PTR2;

BYTE_PTR1 pByte1, pByte2;
BYTE_PTR2 pByte3, pByte4;

printf("%lu, %lu, %lu %lu,",
sizeof(pByte1), sizeof(pByte2), sizeof(pByte3), sizeof(pByte4) ); // 8, 1, 8, 8

Given the execution environment is a 64-bit machine, the size of a pointer is 8 bytes. The type of pByte2 is actually unsigned char, not unsigned char*. This discrepancy arises because the first declaration method is expanded during compilation as:

unsigned char * pByte1, pByte2;

The second variable, pByte2, is not a pointer to a type but the actual type itself.

struct Alignment

Consider two structs as follows, declaring a variable Tom to represent the Student structure:

struct Human {
    char name[10];
    int gender;
};
struct Student {
    struct Human info;
    char school[10];
    float grade;
};
struct Student Tom;

When examined separately, the size of each field within a structure is apparent. Yet, viewing the structure as a whole complicates matters. Due to the program’s execution accessing data in 4-byte increments, data sizes align to 4-byte multiples to decrease access frequency and improve program efficiency. Below are the results for the sizes of each individual field and the total structure:

sizeof(Tom.info.name)   = 10
sizeof(Tom.info.gender) = 4
sizeof(Tom.info)        = 16    // 10 + 4 = 14, aligned to 16
sizeof(Tom.school)      = 10
sizeof(Tom.grade)       = 4
sizeof(Tom)             = 32    // 16 + 10 + 4 = 30, aligned to 32

Automatic Type Conversion

Actually, C is not a strongly typed language, meaning explicit conversion is not necessary for some different types of variables; the compiler automatically performs type conversion. Does the following outcome match your expectations?

unsigned int a = 1;
int b = -100;
(a + b > 0) ? puts("a + b > 0") : puts("a + b <= 0");

The result of executing this program is a + b > 0. This occurs because if an unsigned type is present in the expression, all components are converted to an unsigned type, resulting in 1 plus a large positive integer, which is greater than zero.

Conversion between char* and float

Standard method:

static char str[20];
// convert from float to char*
sprintf(str, "%f", 1.2345);
// convert from char* to float
printf("float = %f\n", atof(str));

Unconventional method:

// convert from float to char*
printf("%a", 1.2345);
// output:0x1.3c083126e978dp+0
// convert from char* to float
printf("float = %f\n", atof("0x1.3c083126e978dp+0"));

The Shortest C Program

main;

Compilable, but it generates warning messages, so it can be modified to: Executing these two programs results in a Bus error.

int main;

Executing these two programs results in a Bus error.

Constants and Pointers

int a = 3, b = 4;
const int c = 5;
const int* ptr1 = &a;
int* const ptr2 = &b;
const int* const ptr3 = &c;

ptr1 is a pointer to a constant, the value at the pointed address is constant.

ptr2 is a constant pointer, the address it holds is constant.

ptr3 is a constant pointer to a constant, both the value at the pointed address and the address itself are constant.

ptr1 = &b;
// *ptr1 = 10; // could not be compiled

The restored address can be modified, but the pointed content cannot be.

*ptr2 = 10;
// ptr2 = &a;  // could not be compiled

The pointed content can be modified, but the recorded address cannot be.

// *ptr3 = 10; // could not be compiled
// ptr3 = &b;  // could not be compiled

Neither the pointed content nor the restored address can be modified.

C Language Pointer’s Clockwise/Spiral Rule

	     +-------+
	     | +-+   |
	     | ^ |   |
	char *str[10];
	^    ^   |   |
	|    +---+   |
	+------------+

Beginning with the variable str and interpreting in a clockwise direction: str is an array of 10 pointers to char.

	     +--------------------+
	     | +---+              |
	     | |+-+|              |
	     | |^ ||              |
	char *(*fp)( int, float *);
	 ^   ^ ^  ||              |
	 |   | +--+|              |
	 |   +-----+              |
	 +------------------------+

Initiating from the variable fp and explaining in a clockwise manner: fp is a pointer to a function accepting an int and a pointer to float, and it returns a pointer to a char.

Let’s use this approach to discuss the examples involving constants and pointers:

	         +-------+
	         |   +-+ |
	         |   ^ | |
	const int* ptr1; |
	    ^    ^     | |
	    |    +-----+ |
	    +------------+

ptr1: It is a pointer to a constant int.

	   +----------------+
	   |   +----------+ |
	   |   |     +-+  | |
	   |   |     ^ |  | |
	int* const ptr2;  | |
	^  ^   ^       |  | |
	|  |   +-------+  | |
	|  +--------------+ |
	+-------------------+

ptr2: It is a constant pointer to an int.

	         +----------------+
	         |   +----------+ |
	         |   |     +-+  | |
	         |   |     ^ |  | |
	const int* const ptr3;  | |
	    ^    ^   ^       |  | |
	    |    |   +-------+  | |
	    |    +--------------+ |
	    +---------------------+

ptr3: It is a constant pointer to a constant int.

Grasping this rule reveals that visualizing the structure and corresponding English sentences can demystify complex concepts.

Object-Oriented Programming with Function Pointers

Declaring function pointers within a structure enables C to emulate class functionalities of higher-level languages, facilitating simple inheritance. The invocation of function pointers resembles calling methods within a class.

// tAnimal structure
struct tAnimal{
    int legs;
    int gender;
    char* (*react)(struct tAnimal *self, char *status);
};
// tHuman structure, base member is tAnimal structure to simulate inherent
struct tHuman{
    struct tAnimal base;
    char country[10];
    // Functions with the same name formally simulate override
    char* (*react)(struct tHuman *self, char *status);
};

static char* animalReact (struct tAnimal *animal, char *status) {
    if ( strcmp(status, "tired") == 0 )
        return "rest";
    else if ( strcmp(status, "hungry") == 0 )
        return "eat";
    else
        return "idle" ;
}

static char* HumanReact (struct tHuman *human, char *status) {
    if ( strcmp(status, "rich") == 0 )
        return "retire";
    else
        return human->base.react(&human->base, status);
}

int main(int argc, char* argv[]) {

    struct tHuman aHuman;
    // Register function pointers to the corresponding functions
    aHuman.base.react = &animalReact;
    aHuman.react = &HumanReact;

    puts( aHuman.base.react(&aHuman.base, "rich") ); // idle
    puts( aHuman.react(&aHuman, "rich") );           // retire

    return 0;
}

Dynamic Memory Allocation Issues

Besides memory leaks, dynamic memory allocation in C often faces issues like program crashes caused by double frees. The malloc function’s lack of success/failure feedback contributes to this overlooked complication.

int arySize = 0, *pAry = NULL;

    // Dynamically allocate an int array of size 0
pAry = (int*)malloc(sizeof(int) * arySize);

if ( pAry != NULL ) {
	// Common checks would make you think everything is fine
	printf("Let's do some array operation!\n");
	free(pAry);
}

The result after running this program is: Let's do some array operation!

Rare C Language Keywords

auto register signed volatile restrict

In C, variables are classified into storage classes, including auto for automatic storage duration and register for suggesting storage in the CPU’s registers to improve efficiency. Variables not explicitly declared as static default to auto, indicating automatic storage duration where the variable is automatically allocated and deallocated.

In C, integer types default to signed if the unsigned keyword is not explicitly used. This default behavior assumes that variables can hold both positive and negative values unless otherwise specified.

The volatile keyword is used to tell the compiler that a variable’s value may change from external sources, not just within the program. This prevents the compiler from optimizing out necessary reads/writes to the variable, crucial in multi-threaded applications and when interacting with hardware that may update memory outside the program’s control.

int square(volatile int *ptr) { 
	return *ptr * *ptr;
}
int main(int argc, char* argv[]) {
	int volatile * pReg = (int volatile *) 0x1234; // Assuming the value is 5
	square(pReg);
	
	return 0;
}

It seems possible to calculate the square of the value passed by the pointer, but the external variable *pReg is read from the register, and it may be changed by another thread during execution, turning into something like the following program: (After compilation, it’s assembly language, this is to express the concept)

int square(volatile int *ptr) {
	int a, b;
	a = *ptr;      // At this time, *ptr is 5
	b = *ptr;      // At this moment, address 0x1234 is accessed by another thread, and the value is no longer 5
	return a * b;  // The result is not the correct square value
}

So why use this keyword to invite trouble? Wouldn’t it be better not to use it? Suppose the following situation, a waitLoop() function that waits until *pReg is true before it exits

int * pReg = (int *) 0x1234;

void waitLoop() {
	while (*pReg != true) {
		sleep(1000);
	}
}	

Here, *pReg points to the memory address 0x1234, which is initially false, so waitLoop() continues to wait. After some time, hardware changes the value of part of the memory at 0x1234 to true, thus the program exits. It seems there is no problem, but after the program is compiled, the optimizing compiler will assume that *pReg will not be modified (since there are no other expressions that modify the value of *pReg), so the program code will actually be changed to this: (After compilation, it’s assembly language, this is to express the concept)

void waitLoop() {
	while (true) {
		sleep(1000);
	}
}

Therefore, it causes an infinite loop in the program. Usually, this situation occurs when hardware also changes the memory addresses accessed by software, so in embedded systems, it’s more common to see the use of the volatile keyword to force the program to read this value every time it is used.

The restrict keyword, used exclusively with pointers, signifies that the pointer is the only reference to its memory address. This allows the compiler to perform optimizations, understanding that no other pointer will alter the data, thus potentially enhancing program efficiency.

Further Reading