Chapter 14: Polymorphism

We're always interested in getting feedback. E-mail us if you like this guide, if you think that important material is omitted, if you encounter errors in the code examples or in the documentation, if you find any typos, or generally just if you feel like e-mailing. Send your email to Frank Brokken.
Please state the document version you're referring to, as found in the title (in this document: 5.2.0a) and please state the paragraph you're referring to.
All mail received is seriously considered, and new (sub)releases of the Annotations will normally reflect your suggestions for improvements. Except for the incidental case I will not otherwise acknowledge the receipt of suggestions for improvements. Please don't misinterpret this for lack of appreciation.

As we have seen in chapter 13, C++ provides the tools to derive classes from base classes, and to use base class pointers to address derived objects. As we've seen, when using a base class pointer to address an object of a derived class, the type of the pointer determines which member function will be used. This means that a Vehicle *vp, pointing to a Truck object, will incorrectly compute the truck's combined weight in a statement like vp->getweight(). The reason for this should now be clear: vp calls Vehicle::getweight() and not Truck::getweight(), even though vp actually points to a Truck.

Fortunately, a remedy is available. In C++ it is possible for a Vehicle *vp to call a function Truck::getweight() when the pointer actually points to a Truck.

The terminology for this feature is polymorphism: it is as though the pointer vp changes its type from a base class pointer to a pointer to the class of the object it actually points to. So, vp might behave like a Truck * when pointing to a Truck, and like an Auto * when pointing to an Auto etc.. (In one of the StarTrek movies, Capt. Kirk was in trouble, as usual. He met an extremely beautiful lady who, however, later on changed into a hideous troll. Kirk was quite surprised, but the lady told him: ``Didn't you know I am a polymorph?'')

Polymorphism is realized by a feature called late binding. This refers to the fact that the decision which function to call (a base class function or a function of a derived class) cannot be made compile-time, but is postponed until the program is actually executed: the actual member function to be used is selected run-time.

14.1: Virtual functions

The default behavior of the activation of a member function via a pointer or reference is that the type of the pointer (or reference) determines the function that is called. E.g., a Vehicle * will activate Vehicle's member functions, even when pointing to an object of a derived class. This is referred to as early or static binding, since the type of function is known compile-time. The late or dynamic binding is achieved in C++ using virtual member functions.

A member function becomes a virtual member function when its declaration starts with the keyword virtual. Once a function is declared virtual in a base class, it remains a virtual member function in all derived classes; even when the keyword virtual is not repeated in a derived class.

As far as the vehicle classification system is concerned (see section 13.1) the two member functions getweight() and setweight() might well be declared virtual. The relevant sections of the class definitions of the class Vehicle and Truck are shown below. Also, we show the implementations of the member functions getweight() of the two classes:

    class Vehicle
    {
        public:
            virtual int getweight() const;
            virtual void setweight(int wt);
    };

    class Truck: public Auto
    {
        public:
            void setweight(int engine_wt, int trailer_wt);
            int getweight() const;
    };

    int Vehicle::getweight() const
    {
        return (weight);
    }

    int Truck::getweight() const
    {
        return (Auto::getweight() + trailer_wt);
    }

Note that the keyword virtual only needs to appear in the definition of the Vehicle base class. There is no need (but there is also no penalty) to repeat it in derived classes: once virtual, always virtual. On the other hand, a function may be declared virtual anywhere in a class hierarchy: the compiler will be perfectly happy if getweight() is declared virtual in Auto, rather than in Vehicle. However, the specific characteristics of virtual member functions would then, for the member function getweight(), only appear with Auto (and its derived classes) pointers or references. With a Vehicle pointer, static binding will remain to be used. The effect of late binding is illustrated below:

    Vehicle
        v(1200);            // vehicle with weight 1200
    Truck
        t(6000, 115,        // truck with cabin weight 6000, speed 115,
          "Scania",         // make Scania, trailer weight 15000
          15000);
    Vehicle
        *vp;                // generic vehicle pointer

    int main()
    {
        vp = &v;                            // see (1) below 
        cout << vp->getweight() << endl;

        vp = &t;                            // see (2) below 
        cout << vp->getweight() << endl;

        cout << vp->getspeed() << endl;     // see (3) below 
    }

Since the function getweight() is defined virtual, late binding is used:

at (1), Vehicle::getweight() is called.
at (2) Truck::getweight() is called.
at (3) a syntax error is generated. The member getspeed() is no member of Vehicle, and hence not callable via a Vehicle*.

The example illustrates that when using a pointer to a class, only the functions which are members of that class can be called. These functions may be virtual, but this only influences the type of binding (early vs. late).

A virtual member function cannot be a static member function: a virtual member function is still a ordinary member function in that it has a this pointer. As static member functions have no this pointer, they cannot be declared virtual.

14.2: Virtual destructors

When the operator delete releases memory which is occupied by a dynamically allocated object, or when an object goes out of scope, the appropriate destructor is called to ensure that memory allocated by the object is also deleted. Now consider the following code fragment (cf. section 13.1):

    Vehicle
        *vp = new Land(1000, 120);

    delete vp;          // object destroyed

In this example an object of a derived class (Land) is destroyed using a base class pointer (Vehicle *). For a `standard' class definition this will mean that the destructor of Vehicle is called, instead of the destructor of the Land object. This not only results in a memory leak when memory is allocated in Land, but it will also prevent any other task, normally performed by the derived class' destructor from being completed (or, better: started). A Bad Thing.

In C++ this problem is solved using a virtual destructor. By applying the keyword virtual to the declaration of a destructor the appropriate derived class destructor is activated when the argument of the delete operator is a base class pointer. In the following partial class definition the declaration of such a virtual destructor is shown:

    class Vehicle
    {
        public:
            virtual ~Vehicle();
            virtual unsigned getweight() const;
    };

By declaring a virtual destructor, the above delete operation (delete vp) will correctly call de the Land's destructor, rather than the Vehicle's destructor.

From this discussion we are now able to formulate the following situations in which a destructor should be defined:

A destructor should be defined when memory is allocated and managed by objects of the class.
A virtual destructor should be defined if the class contains at least one virtual member function.

In the second case, the destructor will have no special tasks to perform. The virtual destructor will therefore often be defined empty. For example, the definition of Vehicle::~Vehicle() may be as simple as:

    Vehicle::~Vehicle()
    {}

Often this will be part of the class interface as an inline destructor.

14.3: Pure virtual functions

Until now the base class Vehicle contained its own, concrete, implementations of the virtual functions getweight() and setweight(). In C++ it is however also possible only to mention virtual member functions in a base class, without actually defining them. The functions are concretely implemented in a derived class. This approach defines a protocol, which has to be followed in the derived classes. This implies that derived classes must take care of the actual definition: the C++ compiler will not allow the definition of an object of a class in which one or more member functions are left undefined. The base class thus enforces a protocol by declaring a function by its name, return value and arguments. The derived classes must take care of the actual implementation. The base class itself defines therefore only a model or mold, to be used when other classes are derived. Such base classes are also called abstract classes.

The functions which are only declared in the base class are called pure virtual functions. A function is made pure virtual by preceding its declaration with the keyword virtual and by postfixing it with = 0. An example of a pure virtual function occurs in the following listing, where the definition of a class Object requires the implementation of the conversion operator operator string():

    #include <string>

    class Object
    {
        public:
            virtual operator string() const = 0;
    };

Now, all classes derived from Object must implement the operator string() member function, or their objects cannot be constructed. This is neat: all objects derived from Object can now always be considered string objects, so they can, e.g., be inserted into ostream objects.

Should the virtual destructor of a base class be a pure virtual function? The answer to this question is no: a class such as Vehicle should not require derived classes to define a destructor. In contrast, Object::operator string() can be a pure virtual function: in this case the base class defines a protocol which must be adhered to.

Realize what would happen if we would define the destructor of a base class as a pure virtual destructor: according to the compiler, the derived class object can be constructed: as its destructor is defined, the derived class is not a pure abstract class. However, inside the derived class destructor, the destructor of its base class is implicitly called. This destructor was never defined, and the linker will loudly complain about an undefined reference to, e.g., Virtual::~Virtual().

Often, but not necessarily always, pure virtual member functions are const member functions. This allows the construction of constant derived class objects. In other situations this might not be necessary (or realistic), and non-constant member functions might be required. The general rule for const member functions applies also to pure virtual functions: if the member function will alter the object's data members, it cannot be a const member function. Often abstract base classes have no data members. However, the prototype of the pure virtual member function must be used again in derived classes. If the implementation of a pure virtual function in a derived class alters the data of the derived class object, than that function cannot be declared as a const member function. Therefore, the constructor of an abstract base class should well consider whether a pure virtual member function should be a const member function or not.

14.4: Virtual functions in multiple inheritance

As mentioned in chapter 13 it is possible to derive a class from several base classes. Such a derived class inherits the properties of all its base classes. Of course, the base classes themselves may be derived from classes yet higher in the hierarchy.

A slight difficulty in multiple inheritance may arise when more than one `path' leads from the derived class to the base class. This is illustrated in the code example below: a class Derived is doubly derived from a class Base:

    class Base
    {
        int d_field;
        public:
            void setfield(int val)
                { d_field = val; }
            int getfield() const
                { return d_field; }
    };

    class Derived: public Base, public Base
    {
    };

Due to the double derivation, the functionality of Base now occurs twice in Derived. This leads to ambiguity: when the function setfield() is called for a Derived object, which function should that be, since there are two? In such a duplicate derivation, many C++ compilers will refuse to generate code and will (correctly) identify an error.

The above code clearly duplicates its base class in the derivation. Such a duplication can here easily be easily. But duplication of a base class can also occur through nested inheritance, where an object is derived from, say, an Auto and from an Air (see the vehicle classification system, section 13.1). Such a class would be needed to represent, e.g., a flying car (such as the one in James Bond vs. the Man with the Golden Gun...). An AirAuto would ultimately contain two Vehicles, and hence two weight fields, two setweight() functions and two getweight() functions.

14.4.1: Ambiguity in multiple inheritance

Let's investigate closer why an AirAuto introduces ambiguity, when derived from Auto and Air.

An AirAuto is an Auto, hence a Land, and hence a Vehicle.
However, an AirAuto is also an Air, and hence a Vehicle.

The duplication of Vehicle data is further illustrated in figure 13.

figure 13: Duplication of a base class in multiple derivation.

The internal organization of an AirAuto is shown in figure 14

figure 14: Internal organization of an AirAuto object.

The C++ compiler will detect the ambiguity in an AirAuto object, and will therefore fail to compile a statement like:

    AirAuto
        cool;

    cout << cool.getweight() << endl;

The question of which member function getweight() should be called, cannot be answered by the compiler. The programmer has two possibilities to resolve the ambiguity explicitly:

First, the function call where the ambiguity occurs can be modified. This is done with the scope resolution operator:
```
    // let's hope that the weight is kept in the Auto
    // part of the object..
    cout << cool.Auto::getweight() << endl;
```
Note the position of the scope operator and the class name: before the name of the member function itself.

Second, a dedicated function getweight() could be created for the class AirAuto:

    int AirAuto::getweight() const
    {
        return(Auto::getweight());
    }

The second possibility from the two above is preferable, since it relieves the programmer who uses the class AirAuto of special precautions.

However, apart from these explicit solutions, there is a more elegant one, which will be introduced in the next section.

14.4.2: Virtual base classes

As illustrated in figure 14, more than one object of the class Vehicle is present in an AirAuto. The result is not only an ambiguity in the functions which access the weight data, but also the presence of two weight fields. This is somewhat redundant, since we can assume that an AirAuto has just one weight.

We can achieve the situation that only one Vehicle will be contained in an AirAuto. This is done by ensuring that the base class which is multiply present in a derived class, is defined as a virtual base class. For the class AirAuto this means that the derivation of Land and Air is changed:

    class Land: virtual public Vehicle
    {
        // etc
    };

    class Air: virtual public Vehicle
    {
        // etc
    };
    
    class AirAuto: public Land, public Air
    {
    };

The virtual derivation ensures that via the Land route, a Vehicle is only added to a class when a virtual base class was not yet present. The same holds true for the Air route. This means that we can no longer say via which route a Vehicle becomes a part of an AirAuto; we can only say that there is an embedded Vehicle object. The internal organization of an AirAuto after virtual derivation is shown in figure 15.

figure 15: Internal organization of an AirAuto object when the base classes are virtual.

There are several points worth noting when using virtual derivation:

When base classes of a class usng multiple derivation are themselves virtually derived from a base class (as shown above), the base class constructor which is normally called when the derived class constructor is called is no longer called: its base class initializer: ignored base class initializer is ignored. Instead, the base class constructor will be callled independently from the derived class constructors. Assume we have two classes, Derived1 and Derived2, both (possibly virtually) derived from Base. We will address the question which constructors will be called when a class Final: public Derived1, public Derived2 is defined. To distinguish the several constructors that are involved, we will use Base1() to indicate the Base class constructor that is called as base class initializer for Derived1 (and analogously: Base2() belonging to Derived2), while Base() indicates the default constructor of the class Base. Apart from the Base class constructor, we use Derived1() and Derived2() to indicate the base class initializers for the class Final. We now distinguid the following situation, for constructors of the class Final: public Derived1, public Derived2:
- classes: Derived1: public Base, Derived2: public Base
  This is the normal, non virtual multiple derivation. There are two Base classes in the Final object, and the following constructors will be called (in the mentioned order):
  - Base1(), Derived1(), Base2(), Derived2()
- classes: Derived1: public Base, Derived2: virtual public Base
  Only Derived2 uses virtual derivation. For the Derived2 part the base class initializer will be omitted, and the default Base class constructor will be called. Furthermore, this `detached' base class constructor will be called first:
  - Base(), Base1(), Derived1(), Derived2()
  Note that Base() is called first, not Base1(). Also note that, as only one derived class uses virtual derivation, there are still two Base class objects in the eventual Final class. Merging of base classes only occurs with multiple virtual base classes.
- classes: Derived1: virtual public Base, Derived2: public Base
  Only Derived1 uses virtual derivation. For the Derived1 part the base class initializer will now be omitted, and the default Base class constructor will be called instead. Note the difference with the first case: Base1() is replaced by Base(). Should Derived1 happen to use the default Base constructor, no difference would be noted here with the first case:
  - Base(), Derived1(), Base2(), Derived2()
- classes: Derived1: virtual public Base, Derived2: virtual public Base
  Here both derived classes use virtual derivation, and so only one Base class object will be present in the Final class. Note that now only one Base class constructor is called: for the detached (merged) Base class object:
  - Base(), Derived1(), Derived2()
Virtual derivation is, in contrast to virtual functions, a pure compile-time issue: whether a derivation is virtual or not defines how the compiler builds a class definition from other classes.

Summarizing, using virtual derivation avoids ambiguity in the calling of member functions of a base class. Furthermore, duplication of data members is avoided.

14.4.3: When virtual derivation is not appropriate

In contrast to the previous definition of a class such as AirAuto, situations may arise where the double presence of the members of a base class is appropriate. To illustrate this, consider the definition of a Truck from section 13.4:

    class Truck: public Auto
    {
        int d_trailer_weight;

        public:
            Truck();
            Truck(int engine_wt, int sp, char const *nm,
                   int trailer_wt);

            void setweight(int engine_wt, int trailer_wt);
            int getweight() const;
    };

    Truck::Truck(int engine_wt, int sp, char const *nm,
                  int trailer_wt)
    : 
        Auto(engine_wt, sp, nm)
    {
        d_trailer_weight = trailer_wt;
    }

    int Truck::getweight() const
    {
        return
        (                           // sum of:    
            Auto::getweight() +     //   engine part plus    
            trailer_wt              //   the trailer    
        );    
    }

This definition shows how a Truck object is constructed to contain two weight fields: one via its derivation from Auto and one via its own

int
d_trailer_weight

data member. Such a definition is of course valid, but it could also be rewritten. We could derive a Truck from an Auto and from a Vehicle, thereby explicitly requesting the double presence of a Vehicle; one for the weight of the engine and cabin, and one for the weight of the trailer. A small point of interest here is that a derivation like

    class Truck: public Auto, public Vehicle

is not accepted by the C++ compiler: a Vehicle is already part of an Auto, and is therefore not needed. An intermediate class solves the problem: we derive a class TrailerVeh from Vehicle, and Truck from Auto and from TrailerVeh. All ambiguities concerning the member functions are then be solved for the class Truck:

    class TrailerVeh: public Vehicle
    {
        public:
            TrailerVeh(int wt);
    };

    TrailerVeh::TrailerVeh(int wt)
    : 
        Vehicle(wt)
    {
    }
    
    class Truck: public Auto, public TrailerVeh
    {
        public:
            Truck();
            Truck(int engine_wt, int sp, char const *nm,
                   int trailer_wt);

            void setweight(int engine_wt, int trailer_wt);
            int getweight() const;
    };

    Truck::Truck(int engine_wt, int sp, char const *nm,
                  int trailer_wt)
    : 
        Auto(engine_wt, sp, nm), 
        TrailerVeh(trailer_wt)
    {
    }

    int Truck::getweight() const
    {
        return
            (                               // sum of:
                Auto::getweight() +        //   engine part plus
                TrailerVeh::getweight()    //   the trailer
            );
    }

14.5: Run-Time Type identification

C++ offers two ways to retrieve the type of objects and expressions while the program is running. The possibilities of C++'s run-time type identification are somewhat limited compared to languages like Java. Normally, C++ uses static type checking and static type identification. Static type checking and determination is possibly safer and certainly more efficient than run-time type identification, and should therefore be used wherever possible. Nonetheles, C++ offers run-time type identification by providing the dynamic cast and typeid operators.

The dynamic_cast<>() operator can be used to convert a base class base class: coverting to derived class pointer or reference to a derived class pointer or reference.
The typeid operator returns the actual type of an expression.

These operators operate on class type objects, containing at least one virtual member function.

14.5.1: The dynamic_cast operator

The dynamic_cast<>() operator is used to convert a base class pointer or reference to, respectively, a derived class pointer or reference.

A dynamic cast is performed run-time. A prerequisite for the use of the dynamic cast operator is the existence of at least one virtual member function in the base class.

In the following example a pointer to the class Derived is obtained from the Base class pointer bp:

    class Base
    {
        public:
            virtual ~Base();
    };

    class Derived: public Base
    {
        public:
            char const *toString()
            {
                return ("Derived object");
            }
    };

    int main()
    {
        Base
            *bp;
        Derived
            *dp,
            d;

        bp = &d;

        dp = dynamic_cast<Derived *>(bp);

        if (dp)
            cout << dp->toString() << endl;
        else
            cout << "dynamic cast conversion failed\n";
    }

Note the test: in the if condition the success of the dynamic cast is checked. This must be done run-time, as the compiler can't do this all by itself. If a base class pointer is provided the dynamic cast operator returns 0 on failure, and a pointer to the requested derived class on success. Consequently, if there are multiple derived classes, a series of checks could be performed to find the actual derived class to which the pointer points (In de next example derived classes are declared only):

    class Base
    {
        public:
            virtual ~Base();
    };
    class Derived1: public Base;
    class Derived2: public Base;

    int main()
    {
        Base
            *bp;
        Derived1
            *d1,
            d;
        Derived2
            *d2;

        bp = &d;

        if ((d1 = dynamic_cast<Derived1 *>(bp)))
            cout << *d1 << endl;
        else if ((d2 = dynamic_cast<Derived2 *>(bp)))
            cout << *d2 << endl;
    }

Alternatively, a reference to a base class object may be available. In this case the dynamic_cast<>() operator will throw an exception if it fails. For example:

    #include <iostream>
    #include <typeinfo>

    class Base
    {
        public:
            virtual ~Base() 
            {}
            virtual char const *toString()
            {}
    };

    class Derived1: public Base
    {};

    class Derived2: public Base
    {};

    void process(Base &b)
    {
        try
        {
            cout << dynamic_cast<Derived1 &>(b).toString() << endl;
            return;
        }
        catch (std::bad_cast)
        {}

        try
        {
            cout << dynamic_cast<Derived2 &>(b).toString() << endl;
            return;
        }
        catch (std::bad_cast)
        {}
    }
            
    int main()
    {
        Derived1
            d;

        process(d);
    }

In this example the value std::bad_cast is introduced. The std::bad_cast is thrown as an exception if the dynamic cast of a reference to a base class object fails. Apparently bad_cast is the name of a type ( ). In section EMPTYENUM the construction of such a type is discussed.

The dynamic cast operator is a handy tool when an existing base class cannot or should not be modified (e.g., when the sources are not available), and a derived class may be modified instead. Code receiving a base class pointer or reference may then perform a dynamic cast to the derived class to be able to use the derived class' functionality.

Casts from a base class reference or pointer to a derived class reference or pointer are called downcasts.

14.5.2: The typeid operator

As with the dynamic_cast<>() operator, the typeid is usually applied to base class objects, that are actually derived class objects. Similarly, the base class should contain one or more virtual functions.

In order to use the typeid operator, source files must

    #include <typeinfo>

Actually, the typeid operator returns an object of type type_info, which may, e.g., be compared to other type_info objects.

The class type_info may be implemented differently by different implementations, but at the very least it has the following interface:

    class type_info
    {
        public:
            virtual ~type_info();
            int operator==(const type_info &other) const;
            int operator!=(const type_info &other) const;
            char const *name() const;
        private:
            type_info(type_info const &other);
            type_info &operator=(type_info const &other);
    };

Note that this class has a private copy constructor and overloaded assignment operator. This prevents the normal construction or assignment of a type_info object. Type_info objects are constructed and returned by the typeid operator. Implementations, however, may choose to extend or elaborate the type_info class and provide, e.g., lists of functions that can be called with a certain class.

If the type_id operator is given a base class reference (where the base class contains at least one virtual function), it will indicate that the type of its operand is the derived class. For example:

    class Base;     // contains >= 1 virtual functions
    class Derived: public Base;

    Derived
        d;
    Base
        &br = d;

    cout << typeid(br).name() << endl;

In this example the typeid operator is given a base class reference. It will print the text ``Derived'', being the class name of the class br actually refers to. If Base does not contain virtual functions, the text ``Base'' would have been printed.

The typeid operator can be used to determine the name of the actual type of expressions, not just of class type objects. For example:

    cout << typeid(12).name() << endl;     // prints:  int
    cout << typeid(12.23).name() << endl;  // prints:  double

Note, however, that the above example is suggestive at most of the type that is printed. It may be int and double, but this is not necessarily the case. If portability is required, make sure no tests against static, built-in strings are required. Check out what your compiler produces in case of doubt.

In situations where the typeid operator is applied to determine the type of a derived class, it is important to realize that a base class reference is used as the argument of the typeid operator. Consider the following example:

    class Base;     // contains at least one virtual function
    class Derived: public Base;

    Base
        *bp = new Derived;      // base class pointer to derived object

    if (typeid(bp) == typeid(Derived *))    // 1: false
        ...
    if (typeid(bp) == typeid(Base *))       // 2: true
        ...
    if (typeid(bp) == typeid(Derived))      // 3: false
        ...
    if (typeid(bp) == typeid(Base))         // 4: false
        ...

Here, (1) returns false as a Base * is not a

Derived
*

. (2) returns true, as the two pointer types are the same, (3) and (4) return false as pointers to objects are not the objects themselves.

On the other hand, if *bp is used in the above expressions, then (1) and (2) return false as an object (or reference to an object) is not a pointer to an object, whereas with

    if (typeid(*bp) == typeid(Derived))      // 3: true
        ...
    if (typeid(*bp) == typeid(Base))         // 4: false
        ...

we see that (3) now returns true: *bp actually refers to a Derived class object, and typeid(*bp) will return typeid(Derived).

A similar result is obtained if a base class reference is used:

    Base
        &br = *bp;
    
    if (typeid(br) == typeid(Derived))      // 3: true
        ...
    if (typeid(br) == typeid(Base))         // 4: false
        ...

14.6: Deriving classes from `streambuf'

The class streambuf (see section 5.7 and figure 4) has many (protected) virtual member functions (see section 5.7.1) that are used by the stream classes using streambuf objects. By deriving a class from the class streambuf these member functions may be overriden in the derived classes, thus implementing a specialization of the class streambuf for which the standard istream and ostream objects can be used.

Basically, a streambuf interfaces to some device. The normal behavior of the stream-class objects remains unaltered. So, a string extraction from a streambuf object will still return a consecutive sequence of non white space delimited characters. If the derived class is used for input operations, the following member functions are serious candidates to be overridden. Examples in which some of these functions are overridden will be given later in this section:

int streambuf::pbackfail(int c):
This member is called when
- gptr() == 0: no buffering used,
- gptr() == eback(): no more room to push back,
- *gptr() != c: a different character than the next character to be read must be pushed back.
If c == endOfFile() then the input device must be reset one character, otherwise c must be prepended to the characters to be read. The function returns EOF on failure. Otherwise 0 can be returned. The function is called when other attempts to push back a character fail.
streamsize streambuf::showmanyc():
This member must return a guaranteed lower bound on the number of characters that can be read from the device before uflow() or underflow() returns EOF. By default 0 is returned (meaning at least 0 characters will be returned before the latter two functions will return EOF). When a positive value is returned then the next call to the u(nder)flow() member will not return EOF.
int streambuf::uflow():
By default, this function calls underflow(). If underflow() fails, EOF is returned. Otherwise, the next character available character is returned as *gptr() following a gbump(-1). The member moves the pending character that is returned also to the backup sequence. This is different from underflow(), which also returns the next available character, but does not alter the input position.
int streambuf::underflow():
This member is called when
- there is no input buffer (eback() == 0)
- gptr() >= egptr(): there are no more pending input characters.
It returns the next available input character, which is the character at gptr(), or the first available character from the input device.
Since this member is eventually used by other member functions for reading characters from a device, at the very least this member function must be overridden for new classes derived from streambuf.
streamsize streambuf::xsgetn(char *buffer, streamsize n):
This member function should act as if the returnvalues of n calls of snext() are assigned to consecutive locations of buffer. If EOF is returned then reading stops. The actual number of characters read is returned. Overridden versions could optimize the reading process by, e.g., directly accessing the input buffer.

When the derived class is used for output operations, the next member functions should be considered:

int streambuf::overflow(int c):
This member is called to write characters from the pending sequence to the output device. Unless c is EOF, it is at least logically, appended to the pending sequence. So, if the pending sequence consists of the characters 'h', 'e', 'l' and 'l', and c == 'o', then eventually `hello' will be written to the output device.
Since this member is eventually used by other member functions for writing characters to a device, at the very least this member function must be overridden for new classes derived from streambuf.
streamsize streambuf::xsputn(char const *buffer, streamsize n):
This member function should act as if n consecutive locations of buffer are passed to sputc(). If EOF is returned by this latter member, then writing stops. The actual number of characters written is returned. Overridden versions could optimize the writing process by, e.g., directly accessing the output buffer.

For derived classes using buffers and supporting seek operations, consider these member functions:

streambuf *streambuf::setbuf(char *buffer, streamsize n):
This member function is called by the pubsetbuf() member function.
pos_type streambuf::seekoff(off_type offset, ios::seekdir way, ios::openmode mode = ios::in | ios::out):
This member function is called to reset the position of the next character to be processed. It is called by pubseekoff(). The new position or an invalid position (e.g., -1) is returned.
pos_type streambuf::seekpos(pos_type offset, ios::openmode mode = ios::in | ios::out):
This member function acts similarly as seekoff(), but operates with absolute rather than relative positions.
int sync():
This member function flushes all pending characters to the device, and/or resets an input device to the position of the first pending character, waiting in the input buffer to be consumed. It returns 0 on success, -1 on failure. As the default streambuf is not buffered, the default implementation also returns 0.

Next, consider the following problem, which will be solved by constructing a class capsbuf that is derived from streambuf. The problem is to construct a streambuf which writes its information to the standard output stream in such a way that all white-space series of characters are capitalized. The class capsbuf obviously needs an overridden overflow() member and a minimal awareness of its state. Its state changes from `Capitalize' to `Literal' as follows:

The start state is `Capitalize';
Change to `Capitalize' after processing a white-space character;
Change to `Literal' after processing a non-whitespace character.

A simple variable to remember the last character allows us to keep track of the current state. Since `Capitalize' is similar to `last character processed is a white space character' we can simply initialize the variable with a white space character, e.g., the blank space. Here is the initial definition of the class capsbuf:

#include <iostream>
#include <streambuf>
#include <ctype.h>

class capsbuf: public std::streambuf
{
    int d_last;

    public:
        capsbuf()
        :
            d_last(' ')
        {}

    protected:
        int overflow(int c)             // interface to the device.
        {
            std::cout.put(isspace(d_last) ? toupper(c) : c);
            d_last = c;
            return c;
        }
};

An example of a program using capsbuf is:

    #include "capsbuf1.h"
    
    using namespace std;
    
    int main()
    {
        capsbuf     cb;
    
        ostream     out(&cb);
    
        out << hex << "hello " << 32 << " worlds" << endl;

        return 0;
    }
    /*
        Generated output:

        Hello 20 Worlds
    */

Note the use of the insertion operator, and note that all type and radix conversions (inserting hex and the value 32, coming out as the ASCII-characters '2' and '0') is neatly done by the ostream object. The real purpose in life for capsbuf is to capitalize series of ASCII-characters, and that's what it does very well.

Next, we realize that inserting characters into streams can also be done using a construction like

    cout << cin.rdbuf();

or, boiling down to the same thing:

    cin >> cout.rdbuf();

Realizing that this is all about streams, we now try, in the above main() function:

    cin >> out.rdbuf();

We compile and link the program to the executable caps, and start:

    echo hello world | caps

Unfortunately, nothing happens.... Any reaction is also lacking if we try the statement cin >> cout.rdbuf(). What's wrong here?

The difference between cout << cin.rdbuf(), which does produce the expected results and our using of cin >> out.rdbuf() is that the operator>>(streambuf *) (and its insertion counterpart) member function only does a streambuf-to-streambuf copy if the respective stream modes are set up correctly. So, the argument of the extraction operator must point to a streambuf into which information can be written. By default, no stream mode is set for a plain streambuf object. As there is no constructor for a streambuf accepting an ios::openmode, we force the ios::out mode by defining an output buffer using setp(). By doing so we define a buffer, but don't want to use is, so we make its size 0. Note that this is something different than using 0-argument values with setp(), as this would indicate `no buffering', which would not alter the default situation. Although any non-0 value could be used for the empty [begin, begin) range, we decided to define a (dummy) local char variable in the constructor, and use [&dummy, &dummy) to define the empty buffer. This effectively makes capsbuf an output buffer, thus activating the

    istream::operator>>(streambuf *)

member. Here is the revised constructor of the class capsbuf:

    capsbuf::capsbuf()
    :
        d_last(' ')
    {
        char
            dummy;
        setp(&dummy, &dummy);
    }

Now the program can use either

    out << cin.rdbuf();

or:

    cin >> out.rdbuf();

Actually, the ostream wrapper isn't really needed here:

    cin >> &cb;

would have produced the same results.

It is not clear whether the setp() solution proposed here is actually a kludge. After all, shouldn't the ostream wrapper around cb inform the capsbuf that it should act as a streambuf for doing output operations?

14.7: A polymorphic exception class

Earlier in the Annotations (section 8.3.1) we hinted at the possibility of designing a class Exception whose process() member would behave differently, depending on the kind of exception that was thrown. Now that we've introduced polymorphism, we can further develop this example.

It will now probably be clear that our class Exception should be a virtual base class, from which special exception handling classes can be derived. It could even be argued that Exception can be an abstract base class declaring only pure virtual member functions. In the discussion in section 8.3.1 a member function severity() was mentioned which might not be a proper candidate for a purely abstract member function, but for that member we can now use the completely general dynamic_cast<>() operator.

The (abstract) base class Exception is designed as follows:

    #ifndef _EXCEPTION_H_
    #define _EXCEPTION_H_

    #include <iostream>
    #include <string>

    class Exception
    {
        friend ostream &operator<<(ostream &str, Exception const &e)
        {
            return str << e.operator string();
        }

        public:
            virtual ~Exception()
            {}
            virtual void process() const = 0;
            virtual operator string() const
            {
                return d_reason;
            }
        protected:
            Exception(char const *reason)
            :
                d_reason(reason)
            {}
            string
                d_reason;
    };
    #endif

The operator string() member function of course replaces the toString() member used in section 8.3.1. The

friend
operator<<()

function is using the (virtual) operator string() member so that we're able to insert an Exception object into an ostream. Apart from that, notice the use of a virtual destructor, doing nothing.

A derived class FatalException: public Exception could now be defined as follows (using a very basic process() implementation indeed):

    #ifndef _FATALEXCEPTION_H_
    #define _FATALEXCEPTION_H_

    #include "exception.h"    

    class FatalException: public Exception
    {
        public:
            FatalException(char const *reason)
            :
                Exception(reason)
            {}
            void process() const
            {
                exit(1);
            }
    };
    #endif

The translation of the example at the end of section 8.3.1 to the current situation can now easily be made (using derived classes WarningException and MessageException), constructed like FatalException:

    #include <iostream>

    #include "message.h"
    #include "warning.h"

    void initialExceptionHandler(Exception const *e)
    {
        cout << *e << endl;             // show the plain-text information

        if 
        (
            dynamic_cast<MessageException const *>(e)
            || 
            dynamic_cast<WarningException const *>(e)
        )
        {
            e->process();               // Process a message or a warning
            delete e;
        }
        else
            throw;                      // Pass on other types of Exceptions
    }

14.8: How polymorphism is implemented

This section briefly describes how polymorphism is implemented in C++. It is not necessary to understand how polymorphism is implemented if using this feature is the only intention. However, we think it's nice to know how polymorphism is at all possible. Besides, the following discussion does explain why there is a cost of polymorphism in terms of memory usage.

The fundamental idea behind polymorphism is that the compiler does not know which function to call compile-time; the appropriate function will be selected run-time. That means that the address of the function must be stored somewhere, to be looked up prior to the actual call. This `somewhere' place must be accessible from the object in question. E.g., when a Vehicle *vp points to a Truck object, then vp->getweight() calls a member function of Truck; the address of this function is determined from the actual object which vp points to.

A common implementation is the following: An object containing virtual member functions holds as its first data member a hidden field, pointing to an array of pointers containing the addresses of the virtual member functions. The hidden data member is usually called the vpointer, the array of virtual member function addresses the vtable. Note that the discussed implementation is compiler-dependent, and is by no means dictated by the C++ ANSI/ISO standard.

The table of addresses of virtual functions is shared by all objects of the class. Multiple classes may even share the same table. The overhead in terms of memory consumption is therefore:

One extra pointer field per object, which points to:
One table of pointers per (derived) class to address the virtual functions.

Consequently, a statement like vp->getweight() first inspects the hidden data member of the object pointed to by vp. In the case of the vehicle classification system, this data member points to a table of two addresses: one pointer for the function getweight() and one pointer for the function setweight(). The actual function which is called is determined from this table.

The internal organization of the objects having virtual functions is further illustrated in figures figure 16 and figure 17.

figure 16: Internal organization objects when virtual functions are defined.

figure 17: Complementary figure, provided by Guillaume Caumon

As can be seen from figures figure 16 and figure 17, all objects which use virtual functions must have one (hidden) data member to address a table of function pointers. The objects of the classes Vehicle and Auto both address the same table. The class Truck, however, introduces its own version of getweight(): therefore, this class needs its own table of function pointers.