ANSI/ISO C++ Professional Programmer's Handbook

Contents


5

Object-Oriented Programming and Design

by Danny Kalev

Introduction

C++ is the most widely used object-oriented programming language today. The success of C++ has been a prominent factor in making object-oriented design and programming a de facto standard in today's software industry. Yet, unlike other object-oriented programming languages (some of them have been around for nearly 30 years), C++ does not enforce object-oriented programming -- it can be used as a "better C", as an object-based language, or as a generic programming language. This flexibility, which is unparalleled among programming languages, makes C++ a suitable programming language in any domain area -- real time, embedded systems, data processing, numerical computation, graphics, artificial intelligence, or system programming.

This chapter begins with a brief survey of the various programming styles that are supported by C++. Next, you will focus on various aspects of object-oriented design and programming.

Programming Paradigms

A programming paradigm defines the methodology of designing and implementing software, including the building blocks of the language, the interaction between data structures and the operations applied to them, program structure, and how problems are generally analyzed and solved. A programming language provides the linguistic means (keywords, preprocessor directives, program structure) as well as the extra-linguistic capabilities, namely standard libraries and programming environment, to support a specific programming paradigm. Usually, a given programming language is targeted for a specific application domain, for example, string manipulation, mathematical applications, simulations, Web programming and so on. C++, however, is not confined to any specific application domain. Rather, it supports many useful programming paradigms. Now, for a discussion of some of the most prominent programming paradigms supported in C++.

Procedural Programming

C++ is a superset of ISO C. As such, it can be used as a procedural programming language, albeit with tighter type checking and several enhancements that improve design and coding: reference variables, inline functions, default arguments, and bool type. Procedural programming is based on separation between functions and the data that they manipulate. In general, functions rely on the physical representation of the data types that they manipulate. This dependency is one of the most problematic aspects in the maintenance and extensibility of procedural software.

Procedural Programming Is Susceptible To Design Changes

Whenever the definition of a type changes (as a result of porting the software to a different platform, changes in the customer's requirement, and so on), the functions that refer to that type have to be modified accordingly. The opposite is also true: When a function is being changed, its arguments might be affected as well; for instance, instead of passing a struct by value, it might be passed by address to optimize performance. Consider the following:

struct Date //pack data in a compact struct
{
  char day;
  char month;
  short year;
};
bool isDateValid(Date d); //pass by value
void getCurrentDate(Date * pdate); //changes its argument, address needed
void initializeDate (Date* pdate); //changes its argument, address needed

Data structures, such as Date, and the group of associated functions that initialize, read, and test it are very common in software projects in which C is the predominant programming language. Now suppose that due to a change in the design, Date is required to also hold the current time stamp in seconds. Consequently, a change in the definition of Date is made:

struct Date
{
  char day;
  char month;
  short year;
  long seconds;
}; //now less compact than before

All the functions that manipulate Date have to be modified to cope with change. An additional change in the design adds one more field to store millionths of a second in order to make a unique timestamp for database transactions. The modified Date is now

struct Date
{
  char day;
  char month;
  short year;
  long seconds;
  long millionths;
};

Once more, all the functions that manipulate Date have to be modified to cope with the change. This time, even the interface of the functions changes because Date now occupies at least 12 bytes. Functions that are passed a Date by value are modified to accept a pointer to Date.

bool isDateValid(Date* pd); // pass by address for efficiency

Drawbacks of Procedural Programming

This example is not fictitious. Such frequent design changes occur in almost every software project. The budget and time overhead that are produced as a result can be overwhelming; indeed, they sometimes lead to the project's discontinuation. The attempt to avoid -- or at least to minimize -- these overheads has led to the emergence of new programming paradigms.

Procedural programming enables only a limited form of code reuse, that is, by calling a function or using a common user-defined data structure. Nonetheless, the tight coupling between a data structure and the functions that manipulate it considerably narrows their reusability potential. A function that computes the square root of a double cannot be applied to a user-defined struct that represents a complex, for example. In general, procedural programming languages rely on static type checking, which ensures better performance than dynamic type checking -- but it also compromises the software's extensibility.

Procedural programming languages provide a closed set of built-in data types that cannot be extended. User-defined types are either unsupported or they are "second class citizens" in the language. The user cannot redefine built-in operators to support them. Furthermore, the lack of abstraction and information hiding mechanisms force users to expose the implementation details. Consider the standard C functions atof(), atoi(), and atol(), which convert a C-string to double, int, and long, respectively. Not only do they force the user to pay attention to the physical data type of the return value (on most machines these days, an int and a long are identical anyway), they also prohibit the use of other data types.

Why Procedural Programming Still Matters

In spite of its noticeable drawbacks, procedural programming is still the preferred programming paradigm in some specific application domains, such as embedded and time critical systems. Procedural programming is also widely used in machine generated code because code reuse, extensibility, and maintenance are immaterial in this case. Many SQL interpreters, for example, translate the high-level SQL statements into C code that is then compiled.

Procedural programming languages -- such as C, Pascal, or Fortran -- produce the most efficient machine code among high-level programming languages. In fact, development teams that are reluctant to adopt object orientation often point to performance degradation as the major deterring factor.

The evolution of C++ is unique among programming languages. The job of its creators might have been a lot easier had they chosen to design it from scratch, without guaranteeing backward compatibility with C. Yet this backward compatibility is one of the its strengths: It enables organizations and programmers to benefit from C++ without having to trash hundreds of millions of lines of working C code. Furthermore, C programmers can easily become productive in C++ even before they have fully mastered object-oriented programming.

Object-Based Programming

The limitations of procedural programming have led researchers and developers alike to find better methods of separating implementation details from interfaces. Object-based programming enables them to create user-defined types that behave like first class citizens. User-defined types can bundle data and meaningful operations in a single entity -- a class. Classes also support information hiding, thereby separating implementation details such as physical representation and underlying bookkeeping from the set of services that a class provides, or its interface. Users of a class are allowed to access its interface, but they cannot access its implementation details. The separation between the implementation -- which might vary rather frequently due to design changes, portability, and efficiency -- and the stable interface is substantial. This separation ensures that changes in the design are localized to a single entity -- the class implementation; the class users, on the other hand, are not affected. To assess the importance of object-based programming, examine a simple minded Date class:

class Date
{
private:
  char day;
  char month;
  short year;
public:
  bool isValid();
  Date getCurrent();
  void initialize();
};

Object-Based Programming Localizes Changes In Implementation Details

Now suppose that you have to change the definition of Date to support time:

class Date
{
private:
  char day;
  char month;
  short year;
  long secs;
public:
  bool isValid();
  Date getCurrent();
  void initialize ();
};

The addition of a new data member does not affect the interface of Date. The users of Date don't even know that a new field has been added; they continue to receive the same services from the class as before. Of course, the implementer of Date has to modify the code of the member functions to reflect the change. Therefore, Date::initialize() has to initialize one more field. Still, the change is localized only to the definition of Date::initialize() because users cannot access the underlying representation of Date. In procedural programming, however, users can access the data members of Date directly.

Abstract Data Types

Classes such as Date are sometimes called concrete types, or abstract data types (not to be confused with abstract classes; see the sidebar titled "Abstract Data Types Versus Abstract Classes" later in this chapter).

These classes can meet a vast variety of needs in clean and easy-to-maintain capsules that separate implementation from interface. C++ provides the necessary mechanisms for data abstraction in the form of classes, which bundle data with a full set of associated operations. Information hiding is achieved by means of the private access specifier, which restricts the access to data members to class members exclusively.

Operator Overloading

In object-based languages, the user can extend the definition of a built-in operator to support a user-defined type (operator overloading is discussed in Chapter 3, "Operator Overloading"). This feature provides a higher level of abstraction by rendering user-defined types a status of built-in types. For example

class Date
{
private:
  char day;
  char month;
  short year;
  long secs;
public:
  bool operator < (const Date& other);
  bool operator == (const Date& other);
  //...other member functions
};

Characteristics of Object-Based Programming

In a way, object-based programming can be thought of as a subset of object-oriented programming; that is, some common principles are adhered to in both paradigms. Unlike object-oriented programming, however, object-based programming does not use inheritance. Rather, each user-defined class is a self-contained entity that is neither derived from a more general type, nor does it serve as a base for other types. The lack of inheritance in this paradigm is not coincidental. Proponents of object-based programming claim that inheritance complicates design, and that it might propagate bugs and deficiencies in a base class to its subclasses. Furthermore, inheritance also implies polymorphism, which is a source for additional design complexities. For instance, a function that takes a base object as an argument also knows how to handle any object that is publicly derived from that base.

Advantages of Object-Based Programming

Object-based programming overcomes most of the shortcomings of procedural programming. It localizes changes, it decouples implementation details from the interface, and it supports user-defined types. The Standard Library provides a rich set of abstract data types, including string, complex, and vector. These classes are designed to provide an abstraction for very specific uses, for example, character manipulations and complex numbers arithmetic. They are not derived from a more general base class, and they are not meant to be used as a base for other classes.


Abstract Data Types Versus Abstract Classes
The terms abstract data type and abstract class refer to two entirely different concepts, although both of them use the word abstract due to a historical accident. An abstract data type (also called a concrete type) is a self-contained, user-defined type that bundles data with a set of related operations. It behaves in the same way as does a built-in type. However, it is not extensible nor does it exhibit dynamic polymorphism. In contrast, an abstract class is anything but an abstract data type. It is not a data type (normally, abstract classes do not contain any data members), nor can you instantiate an object thereof. An abstract class is merely a skeletal interface, that specifies a set of services or operations that other (nonabstract) classes implement. Unfortunately, the distinction between the two concepts is often confused. Many people erroneously use the term abstract data type when they are actually referring to an abstract class.

Limitations of Object-Based Programming

Object-based programming is advantageous for specific uses. However, it cannot capture real-world relationships that exist among objects. The commonality that exists among a floppy disk and a hard disk, for instance, cannot be expressed directly in an object-based design. A hard disk and a floppy disk can both store files; they can contain directories and subdirectories, and so on. However, the implementer has to create two distinct and autonomous entities in this case, without sharing any common features that the two have.

Object-Oriented Programming

Object-oriented programming overcomes the limitations of object-based programming by providing the necessary constructs for defining class hierarchies. A class hierarchy captures the commonality among similar -- and yet distinct -- types. For example, the classes Mouse and a Joystick are two distinct entities, yet they share many common features that can be factored out into a common class, PointingDevice, which serves as a base class for both. Object-oriented programming is based on the foundations of object-based programming such as information hiding, abstract data typing, and encapsulation. In addition, it supports inheritance, polymorphism, and dynamic binding.

Characteristics of Object-Oriented Programming

Object-oriented programming languages differ from one another, sometimes considerably. Smalltalk programmers who migrate to C++, for instance, find the differences between the two languages somewhat daunting. The same can be said, of course, for C++ programmers who migrate to Smalltalk or Eiffel. However, several common characteristics that exist in all object-oriented programming languages distinguish them from non-object-oriented ones. These characteristics are presented in the following sections.

Inheritance

Inheritance enables a derived class to reuse the functionality and interface of its base class. The advantages of reuse are enormous: faster development time, easier maintenance, and simpler extensibility. The designer of class hierarchies captures the generalizations and commonality that exist among related classes. The more general operations are located in classes that appear higher in the derivation graph. Often, the design considerations are application-specific. For instance, the classes Thesaurus and Dictionary might be treated differently in an online ordering system of a bookstore and a computerized library of the linguistics department in some university. In the bookstore's online ordering system, the classes Thesaurus and Dictionary can inherit from a common base class called Item:

#include <string>
#include <list>
using namespace std;
class Review{/*...*/};
class Book
{
private:
  string author;
  string publisher;
  string ISBN;
  float list_price;
  list<Review> readers_reviews;
public:
  Book();
  const string& getAuthor() const;
  //...
};

Classes Dictionary and Thesaurus are defined as follows:

class Dictionary : public Book
{
private:
 int languages; //bilingual, trilingual etc.
 //...
};
class Thesaurus: public Book
{
private:
 int no_of_entries;
//...
};

However, the computerized library of the linguistics department might use a different hierarchy:

class Library_item
{
private:
  string Dewey_classification; 
  int copies;
  bool in_store;
  bool can_be_borrowed;
  string author;
  string publisher;
  string ISBN;
public:
  Library_item();
  const string& getDewey_classification() const;
  //...
};
class Dictionary : public Library_item
{
private:
 int languages;
 bool phonetic_transciption;
 //...
};
class Thesaurus: public Library_item
{
private:
 int entries;
 int century; //historical period of the language, e.g., Shakespeare's era
//...
};

These two hierarchies look different because they serve different purposes. However, the crucial point is that the common functionality and data are "elevated" to the base class that is extended by more specialized classes. Introducing a new class, for example Encyclopedia, to either the bookstore ordering system or the computerized library of the linguistics department is much easier in an object-oriented environment. That is because most of the functionality of the new class already exists in the base class, whatever it might be. On the other hand, in an object-based environment, every new class has to be written from scratch.

Polymorphism

Polymorphism is the capability of different objects to react in an individual manner to the same message. Polymorphism is widely used in natural languages. Consider the verb to close: It means different things when applied to different objects. Closing a door, closing a bank account, or closing a program's window are all different actions; their exact meaning depends on the object on which the action is performed. Similarly, polymorphism in object-oriented programming means that the interpretation of a message depends on its object. C++ has three mechanisms of static (compile-time) polymorphism: operator overloading, templates, and function overloading.

Operator Overloading

Applying operator +=, for example, to an int or a string is interpreted by each of these objects in an individual manner. Intuitively, however, you can predict what results will be, and you can find some similarities between the two. Object-based programming languages that support operator overloading are, in a limited way, polymorphic as well.

Templates

A vector<int> and a vector<string> react differently; that is, they execute a different set of instructions when they receive the same message. However, you can expect similar behavior (templates are discussed in detail in Chapter 9, "Templates"). Consider the following example:

vector < int > vi;  vector < string > names;
string name("Bjarne");
vi.push_back( 5 ); // add an integer at the end of the vector
names.push_back (name); //add a string at the end of the vector

Function Overloading

Function overloading is a third form of polymorphism. In order to overload a function, a different list of parameters is used for each overloaded version. For example, a set of valid overloaded versions of a function named f() might look similar to the following:

void f(char c, int i);
void f(int i, char c); //order of parameters is significant
void f(string & s);
void f();
void f(int i);
void f(char c);

Note, however, that a function that differs only by its returned type is illegal in C++:

int f();  //error; differs from void f(); above only by return type
int f(float f);  //fine - unique signature

Dynamic Binding

Dynamic binding takes the notion of polymorphism one step further. In dynamic binding, the meaning of a message depends on the object that receives it; yet, the exact type of the object can be determined only at runtime. Virtual member functions are a good example of this. The specific version of a virtual function might not be known at compile time. In this case, the call resolution is delayed to runtime, as in the following example:

#include <iostream>
using namespace std;
class base
{
  public: virtual void f() { cout<< "base"<<endl;}
};
class derived : public base
{
  public: void f() { cout<< "derived"<<endl;} //overrides base::f
};
void identify(base & b) // the argument can be an instance
                        // of base or any object derived from it
{
  b.f(); //base::f or derived::f? resolution is delayed to runtime
}
//a separate translation unit
int main()
{
  derived d;
  identify; // argument is an object derived from base
  return 0;
}

The function identify can receive any object that is publicly derived from class base -- even objects of subclasses that were defined after identify was compiled.

Dynamic binding has numerous advantages. In this example, it enables the user to extend the functionality of base without having to modify identify in any way. In procedural and object-based programming, such flexibility is nearly impossible. Furthermore, the underlying mechanism of dynamic binding is automatic. The programmer doesn't need to implement the code for runtime lookup and dispatch of a virtual function, nor does he or she need to check the dynamic type of the object.

Techniques Of Object-Oriented Programming

Up until now, the discussion has focused on the general characteristics of object-oriented programming and design. This part presents C++-specific practical techniques and guidelines of object-oriented programming.

Class Design

Classes are the primary unit of abstraction in C++. Finding the right classes during analysis and design is perhaps the most important phase in the lifetime of an object-oriented software system. The common guidelines for finding classes state that a class should represent a real-world object; others maintain that nouns in natural languages should represent classes. This is true to some extent, but nontrivial software projects often have classes that exist nowhere except the programming domain. Does an exception represent a real-world object? Do function objects (which are discussed in Chapter 10, "STL and Generic Programming") and smart pointers have an equivalent outside the programming environment? Clearly, the relationship between real-world entities and objects is not 1:1.

Finding the Classes

The process of finding the right classes is mostly derived from the functional requirements of the application domain. That is, a designer can decide to represent a concept as a class (rather than, for example, a member function within a different class or a global function) when it serves the needs of the application. This is usually done by means of CRC (Class, Responsibility, Collaboration) cards or any other method.

Common Design Mistakes with Classes

No two object-oriented languages are alike. The programming language also affects the design. As you learned in Chapter 4, "Special Member Functions: Default Constructor, Copy Constructor, Destructor, and Assignment Operator," C++ has a distinct symmetry between constructors and destructors that most other object-oriented languages do not have. Objects in C++ can automatically clean up after themselves. C++ also enables you to create local objects with automatic data storage. In other languages, objects can only be created on heap memory. C++ is also one of just a few languages that support multiple inheritance. C++ is a strongly-typed language with static type checking. As much as design gurus insist on separating pure design from implementation artifacts (that is, language-specific behavior), such language-specific features do affect the overall design. But of course, design mistakes do not result only from the interference of other languages.

Object-orientation is not a panacea. Some common pitfalls can lead to monstrous applications that need constant maintenance, that perform unsatisfactorily, and that only eventually -- or never -- reach production. Some of these design mistakes are easy to detect.

Gigantic Classes

There are no standardized methods for measuring the size of a class. However, many small specialized classes are preferred to a bulky single class that contains hundreds of member functions and data members. But such bulky classes do get written. Class std::string has a fat interface of more than 100 member functions; clearly, this is an exception to the rule and, to be honest, many people consider this to be a compromise between conflicting design approaches. Still, ordinary programs rarely use all these members. More than once I've seen programmers extending a class with additional member functions and data members instead of using more plausible object-oriented techniques such as subclassing. As a rule, a class that exceeds a 20-30 member function count is suspicious.

Gigantic classes are problematic for at least three reasons: Users of such classes rarely know how to use them properly; the implementation and interface of such classes tend to undergo extensive changes and bug-fixes; and they are not good candidates for reuse because the fat interface and intricate implementation details can fit only a very limited usage. In a sense, large classes are very similar to large functions -- they are noncohesive and difficult to maintain.

Exposing Implementation Details

Declaring data members with public access is, almost without exception, a design flaw. Still, even vendors of popular frameworks resort to this deprecated programming style. It might be tempting to use public data members because it saves the programmer the bother of writing trivial accessors and mutators (getters and setters, respectively). This approach cannot be recommended, however, because it results in maintenance difficulties and it compromises the class's reliability. Users of such classes tend to rely heavily on their implementation details; even if they normally avoid such dependencies, they might feel that the exposure of the implementation details implies that they are not supposed to change. Sometimes there is no other choice -- the class implementer has not defined any other method of accessing data members of a class. The process of modifying or extending such classes becomes a maintenance nightmare. Infrastructure components, such as Date or string classes, can be used dozens of times within a single source file. It is not hard to imagine what it is like when dozens of programmers, each producing dozens of source files, have to chase every source line that refers to any one of these classes. This is exactly what caused the notorious Year 2000 Bug. If, on the other hand, data members are declared private, users cannot access them directly. When the implementation details of the class are modified, only accessors and mutators need to be modified, but the rest of the code remains intact.

There is another danger in exposing implementation details. Due to indiscriminate access to data members and helper functions, users can inadvertently tamper with the object's internal data members. They might delete memory (which is supposed to be deleted by the destructor), or they might change the value of a file handle, and so on, with disastrous results. Therefore, it is always a better design choice to hide implementation details of an object.

The "Resource Acquisition Is Initialization" Idiom

Many objects of various kinds share a similar characterization: They must be acquired by means of initialization prior to their usage; then they can be used, and then they have to be released explicitly. Objects such as File, CommunicationSocket, DatabaseCursor, DeviceContext, OperatingSystem, and many others have to be opened, attached, initialized, constructed, or booted, respectively, before you can use them. When their job is done, they have to be flushed, detached, closed, released, or logged out, respectively. A common design mistake is to have the user request explicitly for the initialization and release operations to take place. A much better choice is to move all initialization action into the constructor and all release actions into the destructor. This technique is called resource acquisition is initialization (The C++ Programming Language, 3rd ed., page 365). The advantage is a simplified usage protocol. Users can start using the object right after it has been created, without bothering with whether the object is valid or whether further arbitrary initialization actions have to be taken. Furthermore, because the destructor also releases all its resources, users are free from that hassle too. Please note that this technique usually requires an appropriate exception handling code to cope with exceptions that are thrown during construction of the object.

Classes and Objects

Unlike some other object-oriented programming languages, C++ makes a clear distinction between a class, which is a user-defined type, and an object, which is an instance thereof. There are several features for manipulating the state of a class rather than the state of individual objects. These features are discussed in the following sections.

Static Data Members

A static member is shared by all instances of its class. For that reason, it is sometimes termed a class variable. Static members are useful in synchronization objects. For example, a file lock can be implemented using a static data member. An object that is trying to access this file has to check first whether the file is being processed by another user. If the file is available, the object turns the flag on and user can process the file safely. Other users are not allowed to access the file until the flag is reset to false. When the object that is processing the file is finished, it has to turn off the flag, enabling another object to access it.

class fileProc
{
private:
  FILE *p;
  static bool Locked;
public:
//...
  bool isLocked () const;
  //...
};
bool fileProc::Locked;

Static Member Functions

A static member function in a class can access only other static members of its class.. Unlike ordinary member functions, a static member function can be invoked even when no object instance exists. For example

class stat
{
private:
  int num;
public:
  stat(int n = 0) {num=n;}
  static void print() {cout <<"static member function" <<endl;
};
int main()
{
  stat::print(); //no object instance required
  stat s(1);
  s.print();//still, a static member function can be called from an object
  return 0;
}

Static members are used in the following cases:

A Pointer to Member Cannot Refer To a Static Member Function

It is illegal to assign the address of a static class member to a pointer to member. However, you can take the address of a static member function of a class and treat it as if it were an ordinary function. For example

class A
{
public:
  static  void f();
};
int main()
{
  void (*p) () = &A::f; //OK, ordinary pointer to function
}

You can do this because a static member function is essentially an ordinary function, which doesn't take an implicit this argument.

Defining a Class Constant

When you need a constant integer member in a class, the easiest way to create one is by using a const static member of an integral type; unlike other static data members, such a member can be initialized within the class body (see also Chapter 2, "Standard Briefing: The Latest Addenda to ANSI/ISO C++"). For example

class vector
{
private:
  int v_size;
  const static int MAX  1024; //a single MAX is shared by all vector objects
  char *p;
public:
  vector() {p = new char[MAX]; }
  vector( int size)
  {
    if (size <= MAX)
      p = new char[size] ;
    else
     p = new char[MAX];
  }
};

Designing Class Hierarchies

After identifying a set of potential classes that might be required for the application, it is important to correctly identify the interactions and relationships among the classes to specify inheritance, containment, and ownership. The design of class hierarchies, as opposed to designing concrete types, requires additional considerations that are discussed in this section.

Private Data Members Are Preferable To Protected Ones

Data members of a class are usually a part of its implementation. They can be replaced when the internal implementation of the class is changed; therefore, they need to be hidden from other classes. If derived classes need to access these data members, they need to use accessor methods instead of directly accessing data members of a base class. Consequently, no modification is required for derived classes when a change is made in the base class.

Here's an example:

class Date
{
private:
  int d,m,y //how a date is represented is an implementation detail
public:
  int Day() const {return d; }
};
class DateTime : public Date
{
private:
  int hthiss;
  int minutes;
  int seconds;
public:
//...additional member functions
};

Now assume that class Date is used mostly on display devices, so it has to supply some method of converting its d,m,y members into a displayable string. In order to enhance performance, a design modification is made: Instead of the three integers, a single string now holds the date representation. Had class DateTime relied on the internal implementation of Date, it would have had to be modified as well. But because it can access Date's data members only through access methods, all that is required is a small change in the Date::Day() member function. Please note that accessor methods are usually inlined anyway, so their use does not incur additional runtime overhead.

Declaring Virtual Base Class Destructors

A base class needs to have its destructor declared virtual. In doing so, you ensure that the correct destructor is always called, even in the following case:

class Base
{
private:
  char *p;
public:
  Base() { p = new char [200]; }
  ~ Base () {delete [] p; } //non virtual destructor, bad
};
class Derived : public Base
{
private:
  char *q;
public:
  Derived() { q = new char[300]; }
  ~Derived() { delete [] q; }
  //...
};
void destroy (Base & b)  
{ 
  delete &b; 
}
int main()
{
  Base *pb = new Derived(); //200 + 300 bytes allocated
  //... meddle with pb
  destroy (*pb);  //OOPS! only the destructor of Base is called
  //were Base's destructor virtual, the correct destructor would be called
  return 0;
}

Virtual Member Functions

Virtual member functions enable subclasses to extend or override the behavior of a base class. Deciding which members in a class can be overridden by a derived class is not a trivial issue. A class that overrides a virtual member function is only committed to adhere to the prototype of the overridden member function -- not to its implementation. A common mistake is to declare all member functions as virtual "just in case". In this respect, C++ makes a clear-cut distinction between abstract classes that provide pure interfaces as opposed to base classes that provide implementation as well as an interface.

Extending A Virtual Function in A Derived Class

There are cases in which you want a derived class to extend a virtual function defined in its base class rather than override it altogether. It can be done quite easily in the following way:

class shape
{
  //...
public:
  virtual void draw();
  virtual void resize(int x, int y) { clearscr(); /*...*/ }};
class rectangle: public shape
{
  //...
public:  
  virtual void resize (int x, int y)
  {
    shape::resize(x, y);  //explicit call to the base's virtual function
    //add functionality
    int size = x*y;
    //...
  }
};

The overriding function in a derived class should invoke an overridden function of its base class using its fully-qualified name.

Changing Access Specification of A Virtual Function

The access specification of a virtual member function that is defined in a base class can be changed in a derived class. For example

class Base
{
public:
  virtual void Say() { cout<<"Base";}
};
class Derived : public Base
{
private: //access specifier changed; legal but not a good idea
  void Say() {cout <<"Derived";} // overriding Base::Say()
};

Although this is legal, it does not work as expected when pointers or references are used; a pointer or reference to Base can also be assigned to any object that is publicly derived from Base:

Derived d;
Base *p = &d;
p->Say(); //OK, invokes Derived::Say()

Because the actual binding of a virtual member function is postponed to runtime, the compiler cannot detect that a nonpublic member function will be called; it assumes that p points to an object of type Base, in which Say() is a public member. As a rule, do not change the access specification of a virtual member function in a derived class.

Virtual Member Functions Should Not Be Private

As you saw previously, it is customary to extend virtual functions in a derived class by first invoking the base class's version of that function; then extend it with additional functionality. This can't be done when a virtual function is declared private.

Abstract Classes and Interfaces

An abstract class is one that has at least one pure virtual member function, that is, a non-implemented placeholder that must be implemented by its derived class. Instances of an abstract class cannot be created because it is intended to serve as a design skeleton for concrete classes that are derived from it, and not as an independent object. See the following example:

class File  //abstract class; serves as interface
{
public:
  int virtual open() = 0;  //pure virtual
  int virtual close() = 0; //pure virtual	
};
class diskFile: public File
{
private:
  string filename;
  //...
public:
  int open() {/*...*/}
  int close () {/*...*/}
};

Use Derivation Instead of Type-Fields

Suppose that you have to implement an internationalization helper class that manages the necessary parameters of every natural language that is currently supported by a word processor. A naive implementation might rely on type-fields to indicate the specific language that is currently being used (for example, the interface language in which menus are displayed).

class Fonts {/*...*/};
class Internationalization
{
private:
  Lang lg; //type field
  FontResthisce fonts
public:
  enum Lang {English, Hebrew, Danish}
  Internationalization(Lang lang) : lg(lang) {};
  Loadfonts(Lang lang);
};

Every modification in Internationalization affects all its users, even when they are not supposed to be affected. When adding support for a new language, the users of the already-supported languages have to recompile (or download, which is worse) the new version of the class. Moreover, as time goes by and support for new languages is added, the class becomes bigger and more difficult to maintain, and it tends to contain more bugs. A much better design approach is to use derivation instead of type-fields. For example

class Internationalization //now a base class
{
private:
  FontResthisce fonts
public:
  Internationalization ();
  virtual int Loadfonts();
  virtual void SetDirectionality();
};
class English : public Internationalization
{
public:
  English();
  Loadfonts() { fonts = TimesNewRoman; }
  SetDirectionality(){}//do nothing; default: left to right
};
class Hebrew : public Internationalization
{
public:
  Hebrew();
  Loadfonts() { fonts = David; }
  SetDirectionality() { directionality = right_to_left;}
};

Derivation simplifies class structure and localizes the changes that are associated with a specific language to its corresponding class without affecting others.

Overloading A Member Function Across Class Boundaries

A class is a namespace. The scope for overloading a member function is confined to a class but not to its derived classes. Sometimes the need arises to overload the same function in its class as well as in a class that is derived from it. However, using an identical name in a derived class merely hides the base class's function, rather than overloading it. Consider the following:

class B
{
public:
  void func();
};
class D : public B
{
public:
  void func(int n); //now hiding B::f, not overloading it
};
D d;
d.func();//compilation error. B::f is invisible in d;
d.func(1); //OK, D::func takes an argument of type int

In order to overload -- rather than hide -- a function of a base class, the function name of the base class has to be injected explicitly into the namespace of the derived class by a using declaration. For example

class D : public B
{
using B::func; // inject the name of a base member into the scope of D
public:
  void func(int n); // D now has two overloaded versions of func()
};
D d;
d.func ( ); // OK
d.func ( 10 ); // OK

Deciding Between Inheritance and Containment

When designing a class hierarchy, you often face a decision between inheritance, or is-a, and containment, or has-a, relation. The choice is not always immediately apparent. Assume that you are designing a Radio class, and you already have the following classes implemented for you in some library: Dial and ElectricAppliance. It is obvious that Radio is derived from ElectricAppliance. However, it is not so obvious that Radio is also derived from Dial. In such cases, check whether there is always a 1:1 relationship between the two. Do all radios have one and only one dial? They don't. A radio can have no dials at all -- a transmitter/receiver adjusted to a fixed frequency, for example. Furthermore, it might have more than one dial -- FM and AM dials. Hence, your Radio class needs to be designed to have Dial(s) rather than being derived from Dial. Note that the relationship between Radio and ElectricAppliance is 1:1 and corroborates the decision to derive Radio from ElectricAppliance.

The Holds-a Relation

Ownership defines the responsibility for the creation and the destruction of an object. An object is an owner of some other resource if and only if it has the responsibility for both constructing and destroying it. In this respect, an object that contains another object also owns it because its constructor is responsible for the invocation of the embedded object's constructor. Likewise, its destructor is responsible for invoking the embedded object's destructor. This is the well-known has-a relationship. A similar relationship is holds-a. It is distinguished from has-a by one factor: ownership. A class that indirectly contains -- by means of a reference or a pointer -- another object that is constructed and destroyed independently is said to hold that object. Here's an example:

class Phone {/*...*/};
class Dialer {/*...*/};
class Modem
{
private:	
  Phone* pline;
  Dialer& dialer;
public:
  Modem (Phone *pp, Dialer& d) : pline(pp), dialer {}
//Phone and Dialer objects are constructed and destroyed
//independently of Modem
};
void f()
{
  Phone phone;
  Dialer dialer;
  Modem modem(&phone, dialer);
  //...use modem
}

Modem uses Phone and Dialer. However, it is not responsible for constructing or destroying them.

Empty Classes

A class that contains no data members and no member functions is an empty class. For example

class PlaceHolder {};

An empty class can serve as a placeholder for a yet-to-be defined class. Imagine an interface class that serves as a base for other classes; instead of waiting for its full implementation to be completed, it can be used this way in the interim. Additionally, an empty class can also be used as a means of forcing derivation relationship among classes that are not originally descended from one base class. (This is a bottom-up design). Finally, it can be used as a dummy argument to distinguish between overloaded versions of a function. In fact, one of the standard versions of operator new (see also Chapter 11, "Memory Management") uses this technique:

#include <new>
using namespace std;
int main()
{
  try
  {
    int *p = new int[100]; //exception-throwing new
  }
  catch(bad_alloc & new_failure) {/*..*/}
  int *p = new (nothrow) int [100]; // exception-free version of
  if (p) 
  {/*..*/}
  return 0;
}

The nothrow argument is of type nothrow_t, which is an empty class by itself.

Using structs as A Shorthand for Public Classes

Traditionally, structs serve as data aggregates. However, in C++ a struct can have constructors, a destructor, and member functions -- just like a class. The only difference between the two is the default access type: By default, a class has private access type to its members and derived objects, whereas a struct has public access. Consequently, structs are sometimes used as shorthand for classes, whose members are all public. Abstract classes are a good example of classes that have all public members.

#include <cstdio>
using namespace std;
struct File //interface class. all members are implicitly public
{
  virtual int Read()  = 0;
  File(FILE *);
  virtual ~File() = 0;
};
class TextFile: File //implicit public inheritance; File is a struct
{
private:
  string path;
public:
  int Flush();
  int Read();
};
class UnicodeFile : TextFile //implicit private inheritance
{
public:
  wchar_t convert(char c);
};

Friendship

A class can grant access to its members on a selective basis bydeclaring external classes and functions as friends. A friend has full access to all the grantor's members, including private and protected ones. Friendship is sometimes unjustly criticized for exposing implementation details. However, this is radically different from declaring data members as public because friendship enables the class to declare explicitly which clients can access its members; in contrast, a public declaration provides indiscriminate access to a member. Here's an example:

bool operator ==( const Date & d1, const Date& d2);
{
  return (d1.day == d2.day) &&
           (d1.month == d2.month) &&
           (d1.year == d2.year);
}
class Date
{
  private:
    int day, month, year;
  public:
    friend bool operator ==( const Date & d1, const Date& d2);
};

Remember that friendship is not inherited, so nonpublic members of any class that is derived from Date are not accessible to operator ==.

Nonpublic Inheritance

When a derived class inherits from a nonpublic base, the is-a relationship between a derived object and its nonpublic base does not exist. For example:

class Mem_Manager {/*..*/};
class List: private Mem_Manager {/*..*/};
void OS_Register( Mem_Manager& mm);
int main()
{
  List li;
  OS_Register( li ); //compile time error; conversion from
                     //List & to Mem_Manager& is inaccessible
  return 0;
}

Class List has a private base, Mem_Manager, which is responsible for its necessary memory bookkeeping. However, List is not a memory manager by itself. Therefore, private inheritance is used to block its misuse. Private inheritance is similar to containment. As a matter of fact, the same effect might have been achieved by making Mem_Manager a member of class List. Protected inheritance is used in class hierarchies for similar purposes.

Common Root Class

In many frameworks and software projects, all classes are forced to be descendants of one common root class, which is usually named Object. This design policy prevails in other OO languages such as Smalltalk and Java, whose classes are derived from class Object implicitly. However, imitating this in C++ incurs many compromises and potential bugs. It creates artificial kinship among classes that have absolutely nothing in common. Bjarne Stroustrup addresses the issue: "Now what is the common relationship between a smile, the driver of my CD-ROM reader, a recording of Richard Strauss' Don Juan, a line of text, my medical records, and a real-time clock? Placing them all in a single hierarchy when their only shared property is that they are programming artifacts (they are all "objects") is of little fundamental value and can cause confusion." (The C++ Programming Language, 3rd ed., page 732).

If you are looking for genericity, that is, if you need an algorithm/container/function that works for every data type, you might find that templates serve you better. Moreover, a common root design policy also forces you to refrain from multiple inheritance entirely because any class that is derived simultaneously from two or more base classes faces the dreadful derivation diamond problem: It embeds more than one base subobject. Finally, the common root class usually serves as a means of implementing exception handling and RTTI, both of which are integral parts of C++ anyway.

Forward Declarations

Consider the following common situation in which classes refer to one another:

//file: bank.h
class Report
{
public:
  void Output(const Account& account); // compile time error;
                                             // Account is not declared yet
};
class Account
{
public:
  void Show() {Report::Output(*this);}
};

An attempt to compile this header file causes compilation errors because the compiler does not recognize the identifier Account as a class name when class Report is compiled. Even if you relocate the declaration of class Account and place it before class Report, you encounter the same problem: Report is referred to from Account. For that purpose, a forward declaration is required. A forward declaration instructs the compiler to hold off reporting such errors until the entire source file has been scanned. For example

//file: bank.h
class Acount; //forward declaration
class Report
{
public:
  void Output(const Account& account); //fine
};
class Account
{
private:
  Report rep;
public:
  void Show() {Report::Output(*this);}
};

The forward declaration in the beginning of the source file enables class Report to refer to class Account even though its definition has not yet been seen. Note that only references and pointers can refer to a forward-declared class.

Local Classes

A class can be declared inside a function or a block. In such cases, it is not visible from anywhere else, and instances thereof can only be created within the scope in which it is declared. This can be useful if you need to hide an ancillary object that is not to be accessible or used anywhere else. For example

void f(const char *text)
{
  class Display  //local helper class; visible only in f()
  {
    const char *ps;
  public:
    Display(const char *t) : ps(t) {}
    ~Display() { cout<<ps; }
  };
Display ucd(text);  //local object of type Display
}

A local class has no linkage.

Multiple Inheritance

Multiple inheritance was introduced to C++ in 1989. It isn't an exaggeration to say that it has been the most controversial feature ever added to C++. The opponents of multiple inheritance maintain that it adds an unnecessary complexity to the language, that every design model that uses multiple inheritance can be modeled with single inheritance, and that it complicates compiler writing. Of the three arguments, only the third one is true. Multiple inheritance is optional. Designers who feel that they can make do without it are never forced to use it. The added level of complexity that is ascribed to multiple inheritance is not a compelling argument either because the same criticism is applicable to other language features such as templates, operator overloading, exception handling, and so on.

Multiple inheritance enables the designer to create objects that are closer to their real-world reality. A fax modem card is essentially a modem and a fax combined in one. Similarly, a fax_modem class that is publicly derived from both fax and modem represents the concept of a fax/modem better than a single inheritance model does. But the most compelling argument in favor of multiple inheritance is that some designs cannot be realized without it. For example, implementing the Observer pattern in Java is nearly impossible because Java lacks multiple inheritance ("Java vs. C++ -- A Critical Comparison," C++ Report, January 1997). Observer is not the only pattern that relies on multiple inheritance -- Adapter and Bridge also do (ibid.).

Using Multiple Inheritance to Conjoin Features

Derived classes can combine the functionality of several base classes simultaneously, by means of multiple inheritance. Trying to achieve the same effect using single inheritance can be very difficult, to say the least. For example

class Persistent //abstract base class used by
{
                        //all persistence-supporting objects
public:
  virtual void WriteObject(void *pobj, size_t sz) = 0;
  virtual void* ReadObject(Archive & ar) = 0;
};
class Date {/*...*/};
class PersistentDate: public Date, public Persistent
{ /*..*/} //can be stored and retrieved

Virtual Inheritance

Multiple inheritance can lead to a problem known as the DDD (or dreadful diamond of derivation), as shown in the following case:

class ElectricAppliance
{
private:
  int voltage,
  int Hertz ;
public:
  //...constructor and other useful methods
  int getVoltage () const { return voltage; }
  int getHertz() const {return Hertz; }
};
class Radio : public ElectricAppliance {/*...*/};
class Tape : public ElectricAppliance {/*...*/};
class RadioTape: public Radio, public Tape { /*...*/};
int main()
{
  RadioTape rt;
  //the following statement is a compilation Error - ambiguous call.
  //Two copies getVoltage() exist in rt: one from Radio and one
  //from Tape. Furthermore, which voltage value should be returned?
  int voltage = rt.getVoltage();
  return 0;
}

The problem is obvious: rt is derived simultaneously from two base classes, each of which has its own copy of the methods and data members of ElecctricAppliance. Consequently, rt has two copies of ElectricAppliance. This is the DDD. However, giving up multiple inheritance leads to a design compromise. In such cases, where reduplication of data and methods from a common base class is undesirable, use virtual inheritance:

class Radio : virtual public ElectricAppliance {/*...*/};
class Tape : virtual public ElectricAppliance {/*...*/};
class RadioTape: public Radio, public Tape
{/*...*/};

Now class RadioTape contains a single instance of ElectricAppliance that is shared by Radio and Tape; therefore, there are no ambiguities and no need to give up the powerful tool of multiple inheritance.

int main()
{
  RadioTape rt;
  int voltage = rt.getVoltage(); //now OK
  return 0;
}

How does C++ ensure that only a single instance of a virtual member exists, regardless of the number of classes derived from it? This is implementation-dependent. However, all implementations currently use an additional level of indirection to access a virtual base class, usually by means of a pointer.

//Note: this is a simplified description of iostream classes
class  ostream: virtual public ios { /*..*/ }
class  istream: virtual public ios { /*..*/ }
class iostream : public istream, public ostream { /*..*/ }

In other words, each object in the iostream hierarchy has a pointer to the shared instance of the ios subobject. The additional level of indirection has a slight performance overhead. It also implies that the location of virtual subobjects is not known at compile time; therefore, RTTI might be needed to access virtual subobjects in some circumstances (this is discussed further in Chapter 7, "Runtime Type Identification").

When multiple inheritance is used, the memory layout of such an object is implementation-dependent. The compiler can rearrange the order of the inherited subobjects to improve memory alignment. In addition, a virtual base can be moved to a different memory location. Therefore, when you are using multiple inheritance, do not assume anything about the underlying memory layout of an object.

Non-virtual Multiple Inheritance

Virtual inheritance is used to avoid multiple copies of a base class in a multiply-inherited object, as you just saw. However, there are cases in which multiple copies of a base are needed in a derived class. In such cases, virtual inheritance is intentionally avoided. For example, suppose you have a scrollbar class that serves as a base for two other subclasses:

class Scrollbar
{
private:
  int x;
  int y;
public:
  void Scroll(units n);
   //...
  };
class HorizontalScrollbar : public Scrollbar {/*..*/};
class VerticalScrollbar : public Scrollbar {/*..*/};

Now imagine a window that has both a vertical scrollbar and a horizontal one. It can be implemented and used in the following way:

class MultiScrollWindow: public VerticalScrollbar, 
                         public HorizontalScrollbar {/*..*/};
MultiScrollWindow msw;
msw.HorizontalScrollbar::Scroll(5);   // scroll left
msw.VerticalScrollbar::Scroll(12);   //...and up

The user can scroll such a window up and down as well as left and right. For this purpose, the window object has to have two distinct Scrollbar subobjects. Therefore, virtual inheritance is intentionally avoided in this case.

Choosing Distinct Names for Member Functions

When two or more classes serve as base classes in multiple inheritance, you want to choose a distinct name for each member function in order to avoid name ambiguity. Consider the following concrete example:

class AudioStreamer //real-time sound player
{
public:
  void Play();
  void Stop();
};
class VideoStreamer //real-time video player
{
public:
  void Play();
  void Stop();
};
class AudioVisual: public AudioStreamer, public VideoStreamer {/*...*/};
AudioVisual player;
player.play(); //error:  AudioStreamer::play() or VideoStreamer::play() ?

One way to overcome the ambiguity is specifying the function's fully-qualified name:

Player.AudioStreamer::play(); //fine but tedious

However, a preferable solution is the use of distinct names for member functions in the base classes:

class AudioStreamer
{
public:
  void au_Play(); };
class VideoStreamer
{
public:
  void vd_Play();
};
Player.au_play(); //now distinct

Conclusions

C++ is used today in fields as diverse as embedded systems, database engines, Web engines, financial systems, artificial intelligence, and more. This versatility can be attributed to its flexibility of programming styles, backward compatibility with C, and the fact that it is the most efficient object-oriented programming language in existence.

As a procedural programming language, C++ offers a tighter type-checking than C does. It also provides better memory management, inline functions, default arguments, and reference variables, which make it a "better C".

Object-based programming solves some of the noticeable weaknesses of procedural programming by bundling data types and the set of operations that are applied to them in a single entity. The separation of implementation details from an interface localizes changes in the design, thereby yielding more robust and extensible software. However, it does not support class hierarchies.

Object-oriented programming relies on encapsulation, information hiding, polymorphism, inheritance, and dynamic binding. These features enable you to design and implement class hierarchies. The advantages of object-oriented programming over object-based programming are faster development time, easier maintenance, and simpler extensibility.

C++ supports advanced object-oriented programming features such as multiple inheritance, static and dynamic polymorphism, and a clear-cut distinction between a class and an object. Object-oriented design begins with locating the classes and their interrelations: inheritance, containment, and ownership. The symmetry between constructors and destructors is the basis for useful design idioms such as "initialization is acquisition" and smart pointers.

An additional programming paradigm that is supported in C++, generic programming, is not directly related to object-oriented programming. In fact, it can be implemented in procedural languages as well. Nonethless, the combination of object-oriented programming and generic programming makes C++ a very powerful language indeed, as you will read in Chapter 10.


Contents


© Copyright 1999, Macmillan Computer Publishing. All rights reserved.