ANSI/ISO C++ Professional Programmer's Handbook

Contents


13

C Language Compatibility Issues

by Danny Kalev

Introduction

C is a subset of C++. Theoretically, every valid C program is also a valid C++ program. In practice, however, there are some subtle incompatibilities and silent differences between the seemingly common portion of both languages. Most of these differences can be diagnosed by the compiler. Others are more evasive and, in rare conditions, they can have surprising effects.

Although it seems that most of the time legacy C code is combined with newer C++ code, the opposite is also true: C++ code is used in C-based applications. For example, transaction-processing monitors of relational databases that are written in C interact with code modules that are written in C++. This chapter first discusses the differences between ISO C and the C subset of ANSI/ISO C++, and it demonstrates how to migrate legacy C code to a C++ environment. Next, you will explore the underlying object model of C++, including the memory layout of objects, member functions, virtual member functions, virtual base classes, and access specifiers, and you will learn how C code can access C++ objects.

Differences Between ISO C and the C Subset of ANSI/ISO C++

With a few minor differences, C++ is a superset of C. The following sections outline the differences between the C subset of C++ and ISO C.

Function Parameter List

In pre-Standard C, the parameter list of a function was declared as follows:

/* pre-standard C, still valid in ISO C, invalid in C++*/
int negate (n)
int n; /* parameter declaration  appears here*/
{
  return -n;
}

In other words, only the parameters' names appeared in the parentheses, whereas their types were declared before the opening brace. Undeclared parameters defaulted to int. In ISO C, as in C++, both the names and types of the parameters must appear in the parentheses:

/* ISO C and C++ */
int negate (int n)
{
  return -n;
}

The old-style parameter list is still legal in ISO C, but it is deprecated. In C++, it is illegal. Legacy C code that contains an old-style parameter list has to be changed to become valid in C++.

Function Declaration

In C, functions can be called without having to be declared first. In C++, a function cannot be called without a previous declaration or definition. For example

/* valid in C but not in C++ */
int main()
{
  int n;
  n = negate(5); /* undeclared function; valid in C but not in C++ */
  return 0;
}

Functions can be declared in C, just as in C++:

/* C/C++ */
int negate(int n);
int main()
{
  int n;
  n= negate(5);
  return 0;
}

The use of a function declaration (also called a function prototype) in C is recommended because it enables the compiler to detect mismatches in type and argument number. However, it is not a requirement.

Empty Parameter List

In C, a function that is declared with an empty list of parameters such as

int f();  
void g( int i)
{
  f(i)  /* valid in C but not in C++ */
}

can take any number of arguments of any type. In C++, such a function does not take any arguments.

Implicit int Declarations

In C and in pre-Standard C++, the default type for missing declarations is int. For example

/* valid in C but not in C++ */
void  func()
{
  const k =0; /*int type assumed in C; invalid in C++*/
}

ISO C is currently being revised to disallow implicit int declarations.Repeated Declarations of Global Variables

In C, global variables can be declared more than once without the extern specifier. As long as a single initialization (at most) for the same variable is used, the linker resolves all the repeated declarations into a single entity:

/* valid in C but not in C++ */
int flag;
int num;
int flag; /* repeated declaration of a global variable */
void func()
{
  flag = 1;
}

In C++, an entity must be defined exactly once. Repeated definitions of the same entity in separate translation units result in a link-time error.

Implicit Casting of void Pointers

In C, a void pointer is implicitly cast to any other pointer type in assignments and initializations. For example

/* valid in C but not C++*/
#include <stdlib.h>
long * p_to_int()
{
  long * pl = malloc(sizeof(long)); /* implicit conversion of void* to long* */
  return pl;
}

In general, implicit conversion of void * is undesirable because it can result in bugs that can otherwise be detected by the compiler. Consider the following example:

/* valid in C but not C++*/
#include <stdlib.h>
long * p_to_int()
{
  long * pl = malloc(sizeof(short)); /* oops!  */
  return pl;
}
In C++, void pointers have to be cast explicitly to the desired type. The explicit cast makes the programmer's intention clearer and reduces the likelihood of unpleasant surprises. 

The Underlying Representation of NULL Pointers

NULL is an implementation-defined const null pointer. C implementations usually define NULL as follows:

 #define NULL  ((void*)0)  

However, In C++, NULL is usually defined as the literal 0 (or 0L), but never as void *:

const int NULL = 0; //some C++ implementations use this convention
#define NULL 0; //others might use this convention

The difference in the underlying representations of NULL pointers between C and C++ derives from the fact that C++ pointers are strongly typed, whereas in C they are not. If C++ retained C's convention, a C++ statement such as

char * p = NULL;

would be expanded into something similar to

char * p = (void*) 0;   // compile time error: incompatible pointer types

Because 0 is the universal initializer for all pointer types in C++, it is used instead the traditional C convention; in fact, many programmers simply use the literal 0 or 0L instead of NULL.

Default Linkage Type of Global const Variables

In C, the default linkage of global const variables is extern. An uninitialized const variable is implicitly zero initialized. For example

/*** valid in C but not C++ ***/
/* file error_flag.h */
const int error; /*default extern linkage */
/*** end file ***/
#include"error_flag.h"
int func();
int main()
{
  int status = func();
  if( status == error)
  {
    /*do something */
  }
  return 0;
}

In C++, a global const variable that is not explicitly declared extern has static linkage. In addition, a const variable must be initialized.

Null-Terminated Character Arrays

In C, character arrays can be initialized with a string literal without the null-terminating character. For example

/*** valid in C but not C++ ***/
const char message[5] =  "hello"; /* does not contain a null terminator */

In C++, character arrays that are initialized by a string literal must be large enough to contain a null terminator.

Assignment of Integers to an enum Type

In C, the assignment of integers to an enumerated type is valid. For example

/*** valid in C but not C++ ***/
enum Status {good, bad};
void func()
{
  Status stat = 1;  /* integer assignment */
}

In C++, enums are strongly typed. Only enumerators of the same enum type can be assigned to an enum variable. Explicit type casting is required otherwise. For example

//C++
enum Status {good, bad};
void func()
{
  Status stat = static_cast<Status> (1);  // stat = bad
}

Definition of Structs in a Function Parameter List and Return Type

In C, a struct can be defined in a function parameter list as well as in its return type. For example

/*** valid in C but not C++ ***/
/* struct definition in return type and parameter list of a function */
struct Stat { int code; char msg[10];} 
    logon (struct User { char username[8];  char pwd[8];} u );

In C++, this is illegal.

Bypassing an Initialization

A jump statement unconditionally transfers control. A jump statement is one of the following: a goto statement, a transfer from the condition of a switch statement to a case label, a break statement, a continue statement, or a return statement. In C, the initialization of a variable can be skipped by a jump statement, as in the following example:

/*** valid in C but not C++ ***/
int main()
{
  int n=1;
  switch(n)
  {
  case 0:
    int j=0;
    break;
  case 1: /* skip initialization of j */
    j++;  /* undefined */
    break;
  default:
    break;
  }
  return 0;
}

In C++, bypassing an initialization is illegal.

Quiet Differences Between C and C++

The differences that have been presented thus far are easily diagnosed by a C++ compiler. There are, however, semantic differences between C and C++ in the interpretation of certain constructs. These differences might not result in a compiler diagnostic; therefore, it is important to pay attention to them.

The Size of an enum Type

In C, the size of an enumeration equals the sizeof(int). In C++, the underlying type for an enumeration is not necessarily an int -- it can be smaller. Furthermore, if an enumerator's value is too large to be represented as an unsigned int, the implementation is allowed to use a larger unit. For example enum { SIZE = 5000000000UL };

The Size of A Character Constant

In C, the result of applying the operator sizeof to a character constant -- for example, sizeof('c'); -- equals sizeof(int). In C++, on the other hand, the expression sizeof('c'); equals sizeof(char).

Predefined Macros

C and C++ compilers define the following macros:

__DATE__ /*a literal containing compilation date in the form "Apr 13 1998" */
__TIME__ /*a literal containing the compilation time in the form "10:01:07" */
__FILE__  /*a literal containing the name of the source file being compiled */
__LINE__ /* current line number in the source file */

C++ compilers exclusively define the following macro:

__cpluplus

Standard-compliant C compilers define the following macro symbol:

__STDC__

Whether a C++ compiler also defines the macro symbol __STDC__ is implementation-dependent.

Default Value Returned from main()

In C, when control reaches the end of main() without encountering a return statement, the effect is that of returning an undefined value to the environment. In C++, however, main() implicitly executes a

return 0;

statement in this case.


NOTE: You might have noticed that the code listings throughout the book contain an explicit return statement at the end of main(), even though this is not necessary. There are two reasons for this: First, many compilers that do not comply with the Standard issue a warning message when a return statement is omitted. Secondly, the explicit return statement is used to return a nonzero value in the event of an error.

Migrating From C to C++

Resolving the syntactic and semantic differences between C and C++ is the first step in migrating from C to C++. This process ensures that C code can compile under a C++ compiler, and that the program behaves as expected. There is another clear advantage of compiling C code under a C++ compiler: The tighter type checking that is applied by a C++ compiler can detect potential bugs that a C compiler does not detect. The list of discrepancies between C and C++ that was previously presented is mostly a result of loopholes and potential traps in C that were fixed in C++. An issue that is of concern, however, is performance -- does a C++ compiler produce object code that is less efficient than the code produced by a C compiler? This topic is discussed in more detail in Chapter 12, "Optimizing Your Code." However, it is important to note that a good C++ compiler can outperform a good C compiler because it can exercise optimization techniques that C compilers normally do not support, such as function inlining and the named return value (also discussed in Chapter 12).

Nonetheless, in order to benefit from the robustness of object-oriented programming, more substantial code modifications are required. Fortunately, the transition from procedural to object-oriented programming can be performed gradually. The following section demonstrates a technique of wrapping bare functions with an additional code layer to minimize the dependency on implementation details. Following that is a discussion of how to use full-fledged classes that wrap legacy code in order to gain more of the benefits of object-orientation.

Function Wrappers

Low-level code such as infrastructure routines and API functions can be used by different teams for the same project. Normally, this code is developed and maintained by a third party vendor or a specific team in the project. For example

int retrievePerson (int key, Person* recordToBefilled); /* C function */

A problem can arise when the interface of () changes: Every occurrence of a function call has to be tracked down and modified accordingly. Consider how such a small change can affect existing programs:

/*
 function modification: key is now a char * instead of an int
 every call to this function has to modified accordingly
*/
int retrievePerson (const char * key, Person* recordToBefilled);

As you saw in Chapter 5, "Object-Oriented Programming and Design," one of the most noticeable weaknesses of procedural programming is its vulnerability to such changes; however, even in strict procedural programming you can localize their impact by using a wrapper function. A wrapper function calls the vulnerable function and returns its result. Following is an example:

/* A wrapper function */
int WrapRetrievePerson(int key, Person* recordToBefilled)
{
  return retrievePerson (key, recordToBefilled);
}

A wrapper provides a stable interface to a code fragment that is used extensively and that is vulnerable to changes. When using a wrapper function, a change in the interface of an API function is reflected only in the definition of its corresponding wrapper function. Other parts of the program are not affected by the change. This is very similar to the way in which a class's accessors and mutators provide indirect access to its nonpublic members. In the following example, the function wrapper's body has been modified due to the change in the type of key from int to char *. Note, however, that its interface remains intact:

/*** file DB_API.h ***/
int retrievePerson (const char *strkey, Person* precordToBefilled);
typedef struct
{
  char first_name[20];
  char last_name[20];
  char address [50];
} Person;
/*** file DB_API.h ***/
#include <stdio.h>
#include " DB_API.h "
int WrapRetrievePerson(int key, Person* precordToBefilled) //remains intact
{
  /* wrapper's implementation modified according to API's modification */
  char strkey[100];
  sprintf (strkey, "%d", key);  /* convert int to a string */
  return retrievePerson (strkey, precordToBefilled);
}

By systematically applying this technique to every function that is maintained by other teams or vendors, you can ensure a reasonable level of interface stability even when the underlying implementation changes.

Although the function wrapper technique offers protection from changes in implementation details, it does not provide other advantages of object-oriented programming, including encapsulation of related operations in a single class, constructors and destructors, and inheritance. The next phase in the migration process is to encapsulate a collection of related functions into a single wrapper class. This technique, however, requires familiarity with object-oriented concepts and principles.

Designing Legacy Code Wrapper Classes

In many frameworks that were originally written in C and then ported to C++, a common -- but wrong -- practice was to wrap C functions in a single wrapper class. Such a wrapper class provides as its interface a series of operations mapped directly to the legacy functions. The following networking functions provide an example:

/*** file: network.h ***/
#ifndef NETWORK_H
#define NETWORK_H
    /* functions related to UDP protocol */
int UDP_init();
int UDP_bind(int port);
int UDP_listen(int timeout);
int UDP_send(char * buffer);
    /* functions related to X.25 protocol */
int X25_create_virtual_line();
int X25_read_msg_from_queue(char * buffer);
    /* general utility functions */
int hton(unsigned int); //reverse bytes from host to network order
int ntoh(unsigned int); //reverse bytes from network to host order
#endif
/*** network.h ***/

A na[um]ive implementation of a class wrapper might simply embed all these functions in a single class as follows:

#include "network.h"
class Networking
{
private:
//...stuff
public:
  //constructor and destructor
  Networking();
  ~Networking();
  //members
  int UDP_init();
  int UDP_bind(int port);
  int UDP_listen(int timeout);
  int UDP_send(char * buffer);
  int X25_create_virtual_line();
  int X25_read_msg_from_queue(char * buffer);
  int hton(unsigned int); //reverse bytes from host to network order
  int ntoh(unsigned int); //reverse bytes from network to host order
};

However, this method of implementing a wrapper class is not recommended. X.25 and UDP protocols are used for different purposes and have almost nothing in common. Bundling the interfaces of these two protocols together can cause maintenance problems in the long term -- and it undermines the very reason for moving to an object-oriented design in the first place. Furthermore, due to its amorphous interface, Networking is not an ideal base for other derived classes.The problem with Networking and similar classes is that they do not genuinely embody an object-oriented policy. They are merely a collection of unrelated operations. A better design approach is to divide the legacy functions into meaningful, self-contained units and wrap each unit by a dedicated class. For example

#include "network.h"
class UDP_API
{
private:
//...stuff
public:
  //constructor and destructor
  UDP_API();
  ~UDP_API();
  //members
  int UDP_init();
  int UDP_bind(int port);
  int UDP_listen(int timeout);
  int UDP_send(char * buffer);
};
class X25_API
{
private:
//...stuff
public:
  //constructor and destructor
  X25_API();
  ~X25_API();
  //members
  int X25_create_virtual_line();
  int X25_read_msg_from_queue(char * buffer);
};
class Net_utility
{
    private:
//...stuff
public:
  //constructor and destructor
  Net_utility();
  ~Net_utility();
  //members
  int hton(unsigned int); //reverse bytes from host to network order
  int ntoh(unsigned int); //reverse bytes from network to host order
};

Now each class offers a coherent interface. Another advantage is a simpler usage protocol; users of class X25_API, for instance, are not forced to accept the interface of UDP protocol, and vice versa.

Multilingual Environments


NOTE: In this section, the distinction between C code and C++ is indicated explicitly by file extensions. The .h extension is used for C header files, whereas C++ header files are indicated by the .hpp extension. Similarly, .c and .cpp extensions are used for C and C++ source files, respectively. In addition, only C-style comments are used in C files.

Thus far, this chapter has concentrated on a unidirectional migration process: from C to C++. Nevertheless, many systems are not confined to a single programming language. A typical information system can simultaneously use one programming language for the graphical interface, another language for accessing data from a database, and a third language for server applications. Often, these languages have to share data and code with one another. This section focuses on how to combine C and C++ code in a bilingual system that uses both these languages simultaneously.

The easiest way to ensure compatibility between code modules that are written in C and C++ is to adhere to the common denominator of these languages. Then again, using C++ as a procedural language ("better C") isn't worth the bother -- you can simply stick to C. Combining object-oriented C++ code with procedural C code into a seamless executable is more challenging -- but it offers many advantages.

C and C++ Linkage Conventions

By default, C++ functions have C++ linkage, which is incompatible with C linkage. Consequently, global C++ functions cannot be called from C code unless they are explicitly declared as having a C linkage.

Forcing C Linkage on A C++ Function

To override the default C++ linkage, a C++ function has to be declared extern "C".For example

// filename decl.hpp
extern "C" void f(int n); //force C linkage so that f() can be called from C
                          // code although it is compiled by a C++ compiler
 // decl.hpp

The extern "C" prefix instructs a C++ compiler to apply C linkage to the function f() rather than the default C++ linkage. This means that a C++ compiler does not apply name mangling to f(), either (see the following sidebar, "What's in Name Mangling?"). Consequently, a call to f() from C code is properly resolved by a C linker. A C++ linker can also locate the compiled version of f() even though it has a C linkage type. In other words, declaring C++ functions as extern "C" guarantees interoperability between C++ and C (as well as other procedural languages that use the C calling convention). However, forcing C linkage has a price: It is impossible to overload another version of f() that is also declared as extern "C". For example

// filename decl.hpp
extern "C" void f(int n);
extern "C" void f(float f); //error, second C linkage of f is illegal
// decl.hpp

Note that you can declare additional overloaded versions of f() as long as they are not declared extern "C":

// filename decl.hpp
extern "C" void f(int n); //OK, can be called from C and C++ code
void f(float f); //OK, no C linkage used. Can be called only from C++ code
void f(char c); //OK, no C linkage used. Can be called only from C++ code
// decl.hpp

How does it work? A call to the function from C code is translated to a CALL assembly directive, followed by the function name. Declaring a C++ function as extern "C" ensures that the name that is generated for it by a C++ compiler is identical to the name that a C compiler expects. On the other hand, if the called function is compiled by a C++ compiler without the extern "C" specifier, it has a mangled name but a C compiler still places the nonmangled name after the CALL directive, resulting in a link-time error.


What's in Name Mangling?
Name mangling (the more politically correct term, although rarely used, is name decoration) is a method used by a C++ compiler to generate unique names for identifiers in a program. The exact details of the algorithm are compiler-dependent, and they might vary from one version to another. Name mangling ensures that entities with seemingly identical names get unique identifications. The resultant mangled name contains all the information that might be needed by the linker, including linkage type, scope, calling convention, and so on. For instance, when a global function is overloaded, the generated mangled name for each overloaded version is unique. Name mangling is also applied to variables. Thus, a local variable and a global variable with the same user-given name still get distinct mangled names. How is the mangled name synthesized? The compiler picks the user-given name of an identifier and decorates it with additional affixes to indicate a variable of a fundamental type, a class, or a function. For a function, the mangled name embeds its scope and linkage type, the namespace in which it is declared, the list of parameters, the parameters' passing mechanism, and the parameters' cv-qualifications. A mangled name of a member function incorporates additional information such as the class name, whether it is a const member function, and other implementation-dependent details that the linker and the runtime environment might need. Following is an example: For a global function void func(int);, a given compiler can generate the corresponding mangled name __x_func@i@, where the affix x indicates a function, func is the function's user-given name, @ indicates the beginning of the parameter list, i indicates the type of the parameter, and the closing @ sign signals the end of the parameter list. An overloaded version of f() has a different mangled name because it has a different parameter list. The original user-given name can be reproduced from the mangled name, so linkers in general can issue error messages in a human-readable format.

As was previously stated, the name mangling scheme of a given compiler can change from one version to another (for example, if the new version supports namespaces, whereas the previous one did not). This is one of the reasons you often have to recompile your code with every compiler upgrade. Another important implication is that, usually, the linker and the compiler need to come from the same vendor and have compatible versions. This ensures that they share the same naming conventions and that they produce compatible binary code.

Calling C++ Code from C Code

Up until now, you have observed the C++ side of the story. A C program cannot #include the header file decl.hpp because the extern "C" specifier is not recognized by a C compiler. To ensure that the declaration can be parsed by a C compiler, extern "C" needs to be visible to a C++ compiler -- but not to a C compiler. A C++ function with C linkage has to be declared in two distinct forms, one for C++ and another for C. This can be achieved by using separate C and C++ header files. The C header file looks similar to the following:

/*** filename decl.h ***/
void f(int n);  /* identical to the C++ header but no extern "C" here */
 /*** decl.h ***/

The header file can be #included in the C source file that calls the function f(). For example

/*** filename do_something.c ***/
#include "decl.h"
void do_something()
{
  f(5);
}
/*** do_something.c ***/

Keeping separate header files for C and C++ is not an elegant solution, however. The header files have to remain in sync all the time, and when many header files are used, this can turn into a serious maintenance problem. A better alternative is to use one or more C header files for the declarations. For example

/*** filename f.h ***/
void f(int n);  /* identical to the C++ header but no extern "C" here */
 /*** f.h ***/
/*** filename g.h ***/
void g(const char * pc, int n);   
 /*** g.h ***/

Next, the C header files are #included in a C++ header file that contains an extern "C" block:

// filename decl.hpp
extern "C"
{
#include "f.h"
#include "g.h"
}
// filename decl.hpp

The effect of an extern "C" block is as if every declaration in the #included header files had a preceding extern "C" specifier. Another alternative is to modify the C header file directly by adding an #ifdef directive to make the extern "C" declaration visible only to a C++ compiler. For example

/*** filename decl.h ***/
#ifdef __cplusplus
extern "C"  { //visible only to a C++ compiler
#endif
void g(const char * pc, int n);
void f(int n);
#ifdef __cplusplus
} //visible only to a C++ compiler
#endif
 /*** g.h ***/

This way, only one header file is needed. However, it is not always possible to modify the C header files directly. In such cases, the preceding technique needs to be used. Please note that a C++ function called from C code is an ordinary C++ function. It can instantiate objects, invoke their member functions, or use any other C++ feature. However, some implementations might require special configuration settings to ensure that the linker has access to the C++ libraries and template codes.

Compiling main()

Functions can be compiled by either a C compiler or a C++ compiler. However, a C++ compiler should compile main(). This enables a C++ compiler to take care of templates, static initialization, and additional implementation-dependent operations for which main() is responsible. Compiling main() under a C compiler will most likely result in link-time errors due to the different semantics of main() in C and C++.

Minimize the Interface Between C and C++ Code

In general, you can call a C function from C++ code without special adjustments. The opposite, as you have seen, is also possible -- but it requires additional adjustments. It is therefore recommended that you keep the interface between the two languages at a minimum. Declaring every C++ function as extern "C", for example, is not recommended. Not only does this convention imply additional modifications to the header files, it also disables overloading. Remember also that you cannot declare a member function extern "C". For C++ functions that have to be called from C code, it might be advantageous to use a function wrapper that has an extern "C" specifier. In this case, the wrapped C++ functions can have the C++ linkage. For example

void g(const char * pc, int n);  //extern "C" is unnecessary
void f(int n);
extern "C" void f_Wrapper(int n) //only the wrapper function is called from C
{
  f(n);
}
extern "C" void g_Wrapper(const char *pc,  int n)
{
  g(pc, n);
}

Mixing <iostream> Classes with <stdio.h> Functions

It is possible to use both <iostream> classes and <stdio.h> library functions in the same program, as long as they do not access the same file. For example, you can use the <iostream> object cin to read data from the keyboard, and then use <stdio.h> functions to write the data to a disk file, as in the following program:

#include <iostream>
#include <cstdio>
using namespace std;
int main()
{
  int num;
  cin>>num;
  cout<<"you enetred: "<< num <<endl;
  FILE *fout = fopen("data.dat", "w");
  if (fout) //write num to a disk file
  {
    fprintf(fout, "%d\n", num);
  }
  fclose(fout);
  return 0;
}

It is even possible to use <iostream> and <stdio.h> to manipulate the same file; for instance, a program can send output to both stdout and cout, although this is not recommended. To enable simultaneous access to the same file, you first have to call ios::sync_with_stdio(true); to synchronize the I/O operations. Note, however, that this synchronization degrades performance. Therefore, only use it when <iostream> and <stdio.h> access the same file. For example

#include <iostream>
#include <cstdio>
using namespace std;
int main()
{
  ios::sync_with_stdio(true);//enable mixed I/O
  int num;
  printf("please enter a number\n");
  cin>>num;
  cout<<"you enetred: "<< num << "please enter another one " << endl;
  scanf("%d", &num);
  return 0;
}

Normally, you won't write such code. However, when a large application combines legacy C functions that use <stdio.h> and C++ objects that use <iostream>, I/O synchronization is unavoidable because, ultimately, the same low-level system resources are used by both <stdio.h> and <iostream>.

The fact that <iostream> and <stdio.h> can be combined is a major advantage. Otherwise, the migration process from C to C++ might be much fussier, and making C and C++ code work together might prove to be very difficult.

Accessing a C++ Object in C Code

Can C code, which of course is unaware of object semantics, access the data members of a C++ object directly? The short answer is, "Yes, but". There are some guarantees about the underlying memory layout of an object; C code can take advantage of these guarantees and treat a C++ object as an ordinary data struct, provided that all the following restrictions apply to the class of the object in question:

The Underlying Representation of an Object in Memory

Examine these restrictions in more detail, given the following declaration of the class Date:

class Date
{
public:
  int day;
  int month;
  int year;
  //constructor and destructor
  Date(); //current date
  ~Date();
  //a non-virtual member function
  bool isLeap() const;
  bool operator == (const Date& other);
};

The Standard guarantees that within every instance of class Date, data members are set down in the order of their declarations (static data members are stored outside the object and are therefore ignored). There is no requirement that members be set down in contiguous memory regions; the compiler can insert additional padding bytes (more on this in Chapter 11, "Memory Management") between data members to ensure proper alignment. However, this is also the practice in C, so you can safely assume that a Date object has a memory layout that is identical to that of the following C struct:

/*** filename POD_Date.h***/
struct POD_Date
/* the following struct has memory layout that is identical
to a Date object */
{
  int day;
  int month;
  int year;
};
/*** POD_Date.h***/

Consequently, a Date object can be passed to C code and treated as if it were an instance of POD_Date. That the memory layout in C and C++ is identical in this case might seem surprising; class Date defines member functions in addition to data members, yet there is no trace of these member functions in the object's memory layout. Where are these member functions stored? C++ treats nonstatic member functions as static functions. In other words, member functions are ordinary functions. They are no different from global functions, except that they take an implicit this argument, which ensures that they are called on an object and that they can access its data members. An invocation of a member function is transformed to a function call, whereby the compiler inserts an additional argument that holds the address of the object. Consider the following example:

void func()
{
  Date d;
  bool leap = d.isLeap(); //1
}

The invocation of the member function isLeap() in (1) is transformed by a C++ compiler into something such as

_x_isLeap?Date@KPK_Date@(&d); //pseudo C++ code

What was that again? Parse it carefully. The parentheses contain the this argument, which is inserted by the compiler in every nonstatic member function call. As you already know, function names are mangled. _x_isLeap?Date@KPK_Date@ is a hypothetical mangled name of the member function bool Date::isLeap() const;. In the hypothetical C++ compiler, every mangled name begins with an underscore to minimize the potential for conflicts with user-given names. Next, the x indicates a function, as opposed to a data variable. isLeap is the user-given name of the function. The ? is a delimiter that precedes the name of the class. The @ that follows the class name indicates the parameter list, which begins with a KPK and Date to indicate a const pointer to a const Date (the this argument of a const member function is a const pointer to a const object). Finally, a closing @ indicates the end of the parameter list. _x_isLeap?Date@KPK_Date@ is, therefore, the underlying name of the member function bool Date::isLeap() const;. Other compilers are likely to use different name mangling schemes, but the details are quite similar to the example presented here. You must be thinking: "This is very similar to the way procedural programming manipulates data." It is. The crucial difference is that the compiler, rather than the human programmer, takes care of these low-level details.

The C++ Object Model is Efficient

The object model of C++ is the underlying mechanism that supports object-oriented concepts such as constructors and destructors, encapsulation, inheritance, and polymorphism. The underlying representation of class member functions has several advantages. It is very efficient in terms of execution speed and memory usage because an object does not store pointers to its member functions. In addition, the invocation of a nonvirtual member function does not involve additional lookup and pointer dereferencing. A third advantage is backward compatibility with C; an object of type Date can be passed to C code safely because the binary representation of such an object complies with the binary representation of a corresponding C struct. Other object-oriented languages use a radically different object model, which might not be compatible with either C or C++. Most of them use reference semantics. In a reference-based object model, an object is represented as a reference (a pointer or a handle) that refers to a memory block in which data members and pointers to functions are stored. There are some advantages to reference semantics; for example, reference counting and garbage collection are easier to implement in such languages, and indeed such languages usually provide automatic reference counting and garbage collection. However, garbage collection also incurs additional runtime overhead, and a reference-based model breaks down backward compatibility with C. The C++ object model, on the other hand, enables C++ compilers to be written in C, and (as you read in Chapter 6, "Exception Handling,") early C++ compilers were essentially C++-to-C translators.

Memory Layout of Derived Objects

The Standard does not specify the memory layout of base class subobjects in a derived class. In practice, however, all C++ compilers use the same convention: The base class subobject appears first (in left-to-right order in the event of multiple inheritance), and data members of the derived class follow. C code can access derived objects, as long as the derived class abides by the same restrictions that were specified previously. For example, consider a nonpolymorphic class that inherits from Date and has additional data members:

class DateTime: public Date
{
public: //additional members
long time;
bool PM; //display time in AM or PM?
DateTime();
~DateTime();
long getTime() const;
};

The two additional data members of DateTime are appended after the three members of the base class Time, so the memory layout of a DateTime object is equivalent to the following C struct:

/*** filename POD_Date.h***/
struct POD_DateTime
{
  int day;
  int month;
  int year;
  long time
  bool PM;
};
/*** POD_Date.h***/

In a similar vein, the nonpolymorphic member functions of DateTime have no effect on the size or memory layout of the object.

The compatible memory layout of nonpolymorphic C++ objects and C structs has many useful applications. For example, it enables relational databases to retrieve and insert objects into a database table. Data Manipulation Languages, such as SQL, that do not support object semantics, can still treat a "live" object as a raw chunk of memory. In fact, several commercial databases rely on this compatibility to provide an object-oriented interface with an underlying relational data model. Another application is the capability to transmit objects as a stream of bytes from one machine to another.

Support for Virtual Member Functions

What happens when an object becomes polymorphic? In this case, backward compatibility with C is trickier. As was noted previously, the compiler is allowed to insert additional data members to a class in addition to user-declared data members. These members can be padding bytes that ensure proper alignment. In the case of virtual functions, an additional member is inserted into the class: a pointer to the virtual table, or _vptr. The _vptr holds the address of a static table of function pointers (as well as the runtime type information of a polymorphic class; see Chapter 7, "Runtime Type Identification"). The exact position of the _vptr is implementation-dependent. Traditionally, it was placed after the class's user-declared data members. However, some compilers have moved it to the beginning of the class for performance reasons. Theoretically, the _vptr can be located anywhere inside the class -- even among user-declared members.

A virtual member function, like a nonvirtual member function, is an ordinary function. When a derived class overrides it, however, multiple distinct versions of the function exist. It is not always possible to determine at compile time which of these functions needs to be invoked. For example

#include <iostream>
using namespace std;
class PolyDate
{
public:
//PolyDate has the same members as Date but it's polymorphic
virtual void name() const { cout<<"PolyDate"<<endl;}
};
class PolyDateTime: public PolyDate
{
public:
// the same members as DateTime but it's polymorphic
void name() const { cout<<"PolyDateTime"<<endl;} //override PolyDate::name()
};

When these classes are compiled, the hypothetical compiler generates two underlying functions that correspond to PolyDate::name() and PolyDateTime()::name():

      // mangled name of void PolyDate::name() const
_x_name?PolyDate@KPK_PolyDate@
      // mangled name of void PolyDateTime::name() const;
_x_name?PolyDateTime@KPK_PolyDateTime@

So far, there's nothing unusual about this. You already know that a member function is an ordinary function that takes an implicit this argument. Because you have defined two versions of the same virtual function, you also expect to find two corresponding functions, each of which has a distinct mangled name. However, unlike nonvirtual functions, the compiler cannot always transform an invocation of a virtual member function into a direct function call. For example

void func(const PolyDate* pd)
{
  pd->name();
}

func() can be located in a separate source file, which might have been compiled before class PolyDateTime was defined. Therefore, the invocation of the virtual function name() has to be deferred until runtime. The compiler transforms the function call into something such as

(* pd->_vptr[2]) (pd);

Analyze it; the member _vptr points to the internally-generated virtual table. The first member of the virtual table is usually saved for the address of the destructor, and the second might store the address of the class's type_info. Any other user-defined virtual member functions are located in higher positions. In this example, the address of name() is stored at the third position in the virtual table (in practice, the name of the _vptr is also mangled). Thus, the expression pd->_vptr[2] returns the address of the function name() associated with the current object. pd, in the second occurrence, represents the this argument.

Clearly, defining a corresponding C struct is more precarious in this case and requires intimate acquaintance with the compiler's preferred position of the _vptr as well as with its size. There is another hazard here: The value of the _vptr is transient, which means that it might have a different value, according to the address space of the process that executes the program. . Consequently, when an entire polymorphic object is stored in a file and retrieved later, the retrieved data cannot be used as a valid object. For all these reasons, accessing polymorphic objects from C code is dangerous and generally needs to be avoided.

Virtual Inheritance

C code does not access objects that have a virtual base class either. The reason is that a virtual base is usually represented in the form of a pointer to a shared instance of the virtual subobject. Here again, the position of this pointer among user-defined data members is implementation-dependent. Likewise, the pointer holds a transient value, which can change from one execution of the program to another.

Different Access Specifiers

The fourth restriction on the legality of accessing C++ objects from C code states that all the data members of the class are declared without an intervening access specifier. This means, theoretically, that the memory layout of a class that looks similar to the following

class AnotherDate
{
private:
  int day;
private:
  int month;
private:
  int year;
public:
  //constructor and destructor
  AnotherDate(); //current date
  ~AnotherDate();
  //a non-virtual member function
  bool isLeap() const;
  bool operator == (const Date& other);
};

might differ from a class that has the same data members declared in the same order, albeit without any intervening access specifiers. In other words, for class AnotherDate, an implementation is allowed to place the member month before the member day, year before month, or whatever. Of course, this nullifies any compatibility with C code. However, in practice, all current C++ compilers ignore the access specifiers and store the data members in the order of declaration. So C code that accesses a class object that has multiple access specifiers might work -- but there is no guarantee that the compatibility will remain in the future.

Conclusions

The creators of C++ have attempted to preserve, as closely as possible, backward compatibility with C. Indeed, almost without exception, every C program is also a valid C++ program. Still, there are some subtle differences between the seemingly common denominator of the two languages. Most of them, as you might have noted, derive from the improved type-safety of C++. -- for example, the obligatory declaration of a function prior to its usage, the need to use explicit cast of void pointers to the target pointer, the deprecation of implicit int declarations, and the enforcement of a null terminator in a string literal. Other discrepancies between the two languages derive from the different rules of type definition.

C code can be called directly from C++ code. Calling C++ code from C is also possible under certain conditions, but it requires additional adjustments regarding the linkage type and it is confined to global functions exclusively. C++ objects can be accessed from C code, as you have seen, but here again, there are stringent constraints to which you must adhere.


Contents


© Copyright 1999, Macmillan Computer Publishing. All rights reserved.