ANSI/ISO C++ Professional Programmer's Handbook

7 Runtime Type Identification

by Danny Kalev

Introduction
Structure Of This Chapter
Making Do Without RTTI
- Virtual member functions can provide a reasonable level of dynamic typing without the need for additional RTTI support. A well-designed class hierarchy can define a meaningful operation for every virtual member function that is declared in the base class.
RTTI constituents
The Cost of Runtime Type Information
Conclusions

Introduction

Originally, C++ did not provide standardized support for runtime type information (RTTI). Furthermore, its creators balked at the idea of adding RTTI support for at least two reasons. First, they wanted to preserve backward compatibility with C. Secondly, they were concerned about efficiency. Other RTTI-enabled languages, such as Smalltalk and Lisp, were characterized by their notoriously sluggish performance. The performance penalty of dynamic type checking results from the relatively slow process of retrieving the object's type at runtime as well as from the additional information that the system needs to store for every type. C++ designers wanted to preserve the efficiency of C.

Another claim against the addition of RTTI to the language was that, in many cases, the use of virtual member functions could serve as an alternative to explicit runtime type checking. However, the addition of multiple inheritance (and consequently, of virtual inheritance) to C++ gave overwhelming ammunition to the proponents of RTTI (multiple inheritance is discussed in Chapter 5, "Object-Oriented Programming and Design"); it became apparent that under some circumstances, static type checking and virtual functions were insufficient.

Eventually, the C++ standardization committee approved the addition of RTTI to the language. Two new operators, dynamic_cast<> and typeid, were introduced. In addition, the class std::type_info was added to the Standard Library.

Structure Of This Chapter

This chapter consists of three major parts. The limitations of virtual functions are presented first. Then, the standard RTTI constituents are explained and exemplified. Finally, RTTI performance and design issues are discussed.

Making Do Without RTTI

Virtual member functions can provide a reasonable level of dynamic typing without the need for additional RTTI support. A well-designed class hierarchy can define a meaningful operation for every virtual member function that is declared in the base class.

Suppose you have to develop a file manager application as a component of a GUI-based operating system. The files in this system are represented as icons that respond to the right click of a mouse, displaying a menu with options such as open, close, read, and so on. The underlying implementation of the file system relies on a class hierarchy that represents files of various types. In a well-designed class hierarchy, there is usually an abstract class serving as an interface:

class File //abstract,  all members are pure virtual
{
  public: virtual void open() =0;  
  public: virtual void read() =0;
  public: virtual void write() =0;
  public: virtual ~File () =0;
};
File::~File ()  //pure virtual destructor must be defined
{}

At a lower level in the hierarchy, you have a set of derived classes that implement the common interface that they inherit from File. Each of these subclasses represents a different family of files. To simplify the discussion, assume that there are only two file types in this system: binary .exe files and text files.

class BinaryFile : public File
{
public:
  void open () { OS_execute(this); }  //implement the pure virtual function
  //...other member functions
};
class TextFile : public File
{
public:
  void open () { Activate_word_processor (this); }  
  //...other member functions of File are implemented here
  void virtual print();  // an additional member function
};

The pure virtual function open() is implemented in every derived class, according to the type of the file. Thus, in a TextFile object, open() activates a word processor, whereas a BinaryFile object invokes the operating system's API function OS_execute(), which in turn executes the program that is stored in the binary file.

There are several differences between a binary file and a text file. For example, a text file can be printed directly on a screen or a printer because it consists of a sequence of printable characters. Conversely, a binary file with an .exe extension contains a stream of bits; it cannot be printed or displayed directly on a screen. It must be converted to a text file first, usually by a utility that translates the binary data into their symbolic representations. (For instance, the sequence 0110010 in an executable file can be replaced by a corresponding move esp, ebp assembly directive.) In other words, an executable file must be converted to a text file in order to be viewed or printed. Therefore, the member function print() appears only in class TextFile.

In this file manager, right-clicking the mouse on a file icon opens a menu of messages (options) to which the object can respond. For that purpose, the operating system has a function that takes a reference to a File:

OnRightClick (File & file); //operating system's API function

Obviously, no object of class File can be instantiated because File is an abstract class (see Chapter 5). However, the function OnRightClick() can accept any object that is derived from File. When the user right-clicks on a file icon and chooses the option Open, for instance, OnRightClick invokes the virtual member function open of its argument, and the appropriate member function is called. For example

OnRightClick (File & file)
{
  switch (message)
  {
  //...
  case m_open:
    file.open();
  break;
  }
}

So far, so good. You have implemented a polymorphic class hierarchy and a function that does not depend on the dynamic type of its argument. In this case, the language support for virtual functions was sufficient for your purposes; you did not need any explicit runtime type information (RTTI). Well, not exactly. You might have noticed the lack of file printing support. Look at the definition of class TextFile again:

class TextFile : public File
{
public:
  void open () { Activate_word_processor (this); } 
  void virtual print();
};

The member function print() is not a part of the common interface that is implemented by all files in your system. It would be a design error to move print() to the abstract class File because binary files are nonprintable and cannot define a meaningful operation for it. Then again, OnRightClick() has to support file printing when it handles a text file. In this case, ordinary polymorphism in the form of virtual member functions will not do. OnRightClick() only knows that its argument is derived from File. However, this information is not sufficient to tell whether the actual object is printable. Clearly, OnRightClick() needs more information about the dynamic type of its argument in order to properly handle file printing. This is where the need for runtime type information arises. Before delving into the implementation of OnRightClick(), an overview of RTTI constituents and their role is necessary.

RTTI constituents

The operators typeid and dynamic_cast<> offer two complementary forms of accessing the runtime type information of their operand. The operand's runtime type information itself is stored in a type_info object. This section exemplifies how these three constituents are used.

RTTI Is Applicable to Polymorphic Objects Exclusively

It is important to realize that RTTI is applicable solely to polymorphic objects. A class must have at least one virtual member function in order to have RTTI support for its objects. C++ does not offer RTTI support for non-polymorphic classes and primitive types. This restriction is just common sense -- a fundamental type such as double or a concrete class such as string cannot change its type at runtime. Therefore, there is no need to detect their dynamic types because they are identical to their static types. But there is also a practical reason for confining RTTI support to polymorphic classes exclusively, as you will see momentarily.

As you probably know, every object that has at least one virtual member function also contains a special data member that is added by the compiler (more on this in Chapter 13, "C Language Compatibility Issues"). This member is a pointer to the virtual function table. The runtime type information is stored in this table, as is a pointer to a std::type_info object.

Class type_info

For every distinct type, C++ instantiates a corresponding RTTI object that contains the necessary runtime type information. The RTTI object is an instance of the standard class std::type_info or an implementation-defined class derived from it. (std::type_info is defined in the standard header <typeinfo>). This object is owned by the implementation and cannot be altered in any way by the programmer. The interface of type_info looks similar to the following (namespaces will be covered in Chapter 8, "Namespaces"):

namespace std { //class type_info is declared in namespace std
  class type_info
  {
  public:
    virtual ~type_info(); //type_info can serve as a base class
    bool operator==(const type_info&  rhs ) const; // enable comparison 
    bool operator!=(const type_info&  rhs ) const; // return !( *this == rhs)
    bool before(const type_info&  rhs ) const; // ordering
    const char* name() const; //return a C-string containing the type's name
  private:
    //objects of this type cannot be copied
         type_info(const type_info&  rhs );
         type_info& operator=(const type_info&  rhs);
  }; //type_info
}

In general, all instances of the same type share a single type_info object. The most widely used member functions of type_info are name() and operator==. But before you can invoke these member functions, you have to access the type_info object itself. How is it done?

Operator typeid

Operator typeid takes either an object or a type name as its argument and returns a matching const type_info object. The dynamic type of an object can be examined as follows:

OnRightClick (File & file)  
{
  if ( typeid( file)  == typeid( TextFile ) )
  {
    //received a TextFile object; printing should be enabled
  }
  else
  {
    //not a TextFile object, printing disabled
  }
}

To understand how it works, look at the highlighted source line:

if ( typeid( file)  == typeid( TextFile ) ).

The if statement tests whether the dynamic type of the argument file is TextFile (the static type of file is File, of course). The leftmost expression, typeid(file), returns a type_infoobject that holds the necessary runtime type information that is associated with the object file. The rightmost expression, typeid(TextFile), returns the type information that is associated with class TextFile. When typeid is applied to a class name rather than an object, it always returns a type_info object that corresponds to that class name. As you saw earlier, type_info overloads the operator ==. Therefore, the type_info object that is returned by the leftmost typeid expression is compared to the type_info object that is returned by the rightmost typeid expression. If indeed file is an instance of TextFile, the if statement evaluates to true. In this case, OnRightClick displays an additional option in the menu -- print(). If, on the other hand, file is not a TextFile, the if statement evaluates to false, and the print() option is disabled. This is all well and good, but a typeid-based solution has a drawback. Suppose that you want to add support for a new type of files, for example HTML files. What happens when the file manager application has to be extended? HTML files are essentially text files. They can be read and printed. However, they differ from plain text files in some respects. An open message applied to an HTML file launches a browser rather than a word processor. In addition, HTML files have to be converted to a printable format before they can be printed. The need to extend a system's functionality at a minimal cost is a challenge that is faced by software developers every day. Object-oriented programming and design can facilitate the task. By subclassing TextFile, you can reuse its existing behavior and implement only the additional functionality that is required for HTML files:

class HTMLFile : public TextFile
{
  void open () { Launch_Browser (); }  
  void virtual print();  // perform the necessary conversions to a 
                         //printable format and then print file
};

This is, however, only half of the story. OnRightClick() fails badly when it receives an object of type HTMLFile. Look at it again to see why:

OnRightClick (File & file) //operating system's API function
{
  if ( typeid( file)  == typeid( TextFile ) )
  {
    //we received a TextFile object; printing should be enabled
  }
  else //OOPS! we get here when file is of type HTMLFile
  {
  }
}

typeid returns the exact type information of its argument. Therefore, the if statement in OnRightClick() evaluates to false when the argument is an HTMLFile. But a false value implies a binary file! Consequently, printing is disabled. This onerous bug is likely to occur every time you add support for a new file type. Of course, you can modify OnRightClick() so that it performs another test:

OnRightClick (File & file) //operating system's API function
{
  if ( (typeid( file)  == typeid( TextFile ))  
    || (typeid( file)  == typeid( HTMLFile)) ) //check for HTMLFile as well
  {
    //we received either a TextFile or an HTMLFile; printing should be enabled
  }
  else //it's a binary file, no print option
  {
  }
}

However, this solution is cumbersome and error prone. Furthermore, it imposes an unacceptable burden on the programmers who maintain this function. Not only are they required to clutter up OnRightClick() with additional code every time a new class is derived from File, but they also have to be on guard to detect any new class that has been derived from File lately. Fortunately, C++ offers a much better way to handle this situation.

NOTE: You can use typeid to retrieve the type information of non-polymorphic objects and fundamental types. However, the result refers to a type_info object that represents the static type of the operand. For example

#include<typeinfo>
#include <iostream>
#include <string>
using namespace std;
typedef int I;
void fundamental()
{
  cout<<typeid(I).name()<<endl; //display 'int'
}
void non_polymorphic()
{
  cout<<typeid(string).name()<<endl;
}

NOTE: Note however, that applying dynamic_cast to fundamental types or non-polymorphic classes is a compile time error.

Operator dynamic_cast<>

It is a mistake to allow OnRightClick() to take care of every conceivable class type. In doing so, you are forced to modify OnRightClick() any time you add a new file class or modify an existing class. In software design, and in object-oriented design in particular, you want to minimize such dependencies. If you examine OnRightClick() closely, you can see that it doesn't really know whether its argument is an instance of class TextFile (or of any other class, for that matter). Rather, all it needs to know is whether its argument is a TextFile. There is a big difference between the two -- an object is-a TextFile if it is an instance of class TextFile or if it is an instance of any class derived from TextFile. However, typeid is incapable of examining the derivation hierarchy of an object. For this purpose, you have to use the operator dynamic_cast<>. dynamic_cast<> takes two arguments: The first is a type name, and the second argument is an object, which dynamic_cast<> attempts to cast at runtime to the desired type. For example

dynamic_cast <TextFile &> (file); //attempt to cast file to a reference to 
                                  //an object of type TextFile

If the attempted cast succeeds, either the second argument is an instance of the class name that appears as the second argument or it is an object derived from it. The preceding dynamic_cast<> expression succeeds if file is-a TextFile. This is exactly the information needed by OnRightClick to operate properly. But how do you know whether dynamic_cast<> was successful?

Pointer Cast and Reference Cast

There are two flavors of dynamic_cast<>. One uses pointers and the other uses references. Accordingly, dynamic_cast<> returns a pointer or a reference of the desired type when it succeeds. When dynamic_cast<> cannot perform the cast, it returns a NULL pointer or, in the case of a reference, it throws an exception of type std::bad_cast. Look at the following pointer cast example:

TextFile * pTest = dynamic_cast < TextFile *> (&file); //attempt to cast 
                                                       //file address to a pointer to TextFile
if (pTest) //dynamic_cast succeeded, file is-a TextFile
{
  //use pTest
}
else // file is not a TextFile;  pTest has a NULL value
{
}

C++ does not have NULL references. Therefore, when a reference dynamic_cast<> fails, it throws an exception of type std::bad_cast. That is why you always need to place a reference dynamic_cast<> expression within a try-block and include a suitable catch-statement to handle std::bad_cast exceptions (see also Chapter 6, "Exception Handling"). For example

try
{
  TextFile  tf = dynamic_cast < TextFile &> (file); 
  //use tf safely,
}
catch (std::bad_cast)
{ 
  //dynamic_cast<> failed
}

Now you can revise OnRightClick() to handle HTMLFile objects properly:

OnRightClick (File & file)  
{
  try
  {
    TextFile temp = dynamic_cast<TextFile&> (file);
    //display options, including "print"
    switch (message)
    {
    case m_open:
      temp.open();  //either TextFile::open or HTMLFile::open 
    break;
    case m_print:
      temp.print();//either TextFile::print or HTMLFile::print
    break;
    }//switch
  }//try
  catch (std::bad_cast& noTextFile)
  {
    // treat file as a BinaryFile; exclude"print"
  }
}// OnRightClick

The revised version of OnRightClick() handles an object of type HTMLFile appropriately because an object of type HTMLFile is-a TextFile. When the user clicks on the open message in the file manager application, the function OnRightClick() invokes the member function open() of its argument, which behaves as expected because it was overridden in class HTMLFile. Likewise, when OnRightClick() detects that its argument is a TextFile, it displays a print option. If the user clicks on this option, OnRightClick() sends the message print to its argument, which reacts as expected.

Other Uses of dynamic_cast<>

Dynamic type casts are required in cases in which the dynamic type of an object -- rather than its static type -- is necessary to perform the cast properly. Note that any attempt to use a static cast in these cases is either flagged as an error by the compiler, or -- even worse -- it might result in undefined behavior at runtime.

Cross casts

A cross cast converts a multiply-inherited object to one of its secondary base classes. To demonstrate what a cross cast does, consider the following class hierarchy:

struct A
{
  int i;
  virtual ~A () {} //enforce polymorphism; needed for dynamic_cast
};
struct B
{
  bool b;
};
struct D: public A, public B
{
  int k;
  D() { b = true; i = k = 0; } 
};
A *pa = new D;
B *pb = dynamic_cast<B*> pa;  //cross cast; access the second base 
                              //of a multiply-derived object

The static type of pa is "pointer to A", whereas its dynamic type is "pointer to D". A simple static_cast<> cannot convert a pointer to A into a pointer to B because A and B are unrelated (your compiler issues an error message in this case). A brute force cast, (for example reinterpret_cast<> or C-style cast), has disastrous results at runtime because the compiler simply assigns pa to pb. However, the B subobject is located at a different address within D than the A subobject. To perform the cross cast properly, the value of pb has to be calculated at runtime. After all, the cross cast can be done in a translation unit that doesn't even know that class D exists! The following program demonstrates why a dynamic cast, rather than compile-time cast, is required:

int main()
{
  A *pa = new D;
  B *pb = (B*) pa;  // disastrous; pb points to the subobject A within d
  bool bb = pb->b;  // bb has an undefined value
  cout<< "pa: " << pa << " pb: "<<pb <<endl;  // pb was not properly 
                                               //adjusted; pa and pb are identical
  pb = dynamic_cast<B*> (pa); //cross cast; adjust pb correctly
  bb= pb->b; //OK, bb is true
  cout<< "pa: "<< pa << " pb: " << pb <<endl; // OK, pb was properly adjusted;
                                             // pa and pb have distinct values
  return 0;
}

The program displays two lines of output; the first shows that the memory addresses of pa and pb are identical. The second line shows that the memory addresses of pa and pb are different after performing a dynamic cast as required.

Downcasting From a Virtual Base

A downcast is a cast from a base to a derived object. Before the introduction of RTTI to the language, downcasts were regarded as a bad programming practice. They were unsafe, and some even viewed the reliance on the dynamic type of an object a violation of object-oriented principles (see also Chapter 2, "Standard Briefing: the Latest Addenda to ANSI/ISO C++"). dynamic_cast<> enables you to use safe, standardized, and simple downcasts from a virtual base to its derived object. Look at the following example:

 struct V
{
  virtual ~V (){} //ensure polymorphism
};
struct A: virtual V {};
struct B: virtual V {};
struct D: A, B {};
#include <iostream>
using namespace std;
int main()
{
 V *pv = new D;
 A* pa = dynamic_cast<A*> (pv); // downcast
 cout<< "pv: "<< pv << " pa: " << pa <<endl; // OK, pv and pa have 
                                             //different addresses
 return 0;
}

V is a virtual base for classes A and B. D is multiply-inherited from A and B. Inside main(), pv is declared as a "pointer to V" and its dynamic type is "pointer to D". Here again, as in the cross cast example, the dynamic type of pv is needed in order to properly downcast it to a pointer to A. A static_cast<> would be rejected by the compiler. As you read in Chapter 5, the memory layout of a virtual subobject might be different from that of a nonvirtual subobject. Consequently, it is impossible to calculate at compile time the address of the subobject A within the object pointed to by pv. As the output of the program shows, pv and pa indeed point to different memory addresses.

The Cost of Runtime Type Information

Runtime Type Information is not free. To estimate how expensive it is in terms of performance, it is important to understand how it is implemented behind the scenes. Some of the technical details are platform-dependent. Still, the basic model that is presented here can give you a fair idea of the performance penalties of RTTI in terms of memory overhead and execution speed.

Memory Overhead

Additional memory is needed to store the type_info object of every fundamental and user-defined type. Ideally, the implementation associates a single type_info object with every distinct type. However, this is not a requirement, and under some circumstances -- for example, dynamically linked libraries -- it is impossible to guarantee that only one type_info object per class exists. . Therefore, an implementation can create more than one type_info object per type.

As was previously noted, there is a practical reason that dynamic_cast<> is applicable only to polymorphic objects: An object does not store its runtime type information directly (as a data member, for example).

Runtime Type Information of Polymorphic Objects

Every polymorphic object has a pointer to its virtual functions table. This pointer, traditionally named vptr, holds the address of a dispatch table that contains the memory addresses of every virtual function in this class. The trick is to add another entry to this table. This entry points at the class's type_info object. In other words, the vptr data member of a polymorphic object points at a table of pointers, in which the address of type_info is kept at a fixed position. This model is very economical in terms of memory usage; it requires a single type_info object and a pointer for every polymorphic class. Note that this is a fixed cost, regardless of how many instances of the class actually exist in the program. The cost of retrieving an object's runtime type information is therefore a single pointer indirection, which might be less efficient than direct access to a data member; still, though, it is equivalent to a virtual function invocation.

Additional Overhead

A pointer indirection, a type_info object, and a pointer per class sound like a reasonable price to pay for RTTI support. This is not the full picture, however. The type_info objects, just like any other object, have to be constructed. Large programs that contain hundreds of distinct polymorphic classes have to construct an equivalent number of type_info objects as well.

RTTI Support Can Usually Be Toggled

This overhead is imposed even if you never use RTTI in your programs. For this reason, most compilers enable you to switch off their RTTI support (check the user's manual to see the default RTTI setting of your compiler and how it can be modified). If you never use RTTI in your programs, iyou can turn off your compiler's RTTI support. The results are smaller executables and a slightly faster code.

typeid Versus dynamic_cast<>

Until now, this chapter has discussed the indirect cost of RTTI support. It is now time to explore the cost of its direct usage -- that is, applying typeid and dynamic_cast<>.

A typeid invocation is a constant time operation. It takes the same length of time to retrieve the runtime type information of every polymorphic object, regardless of its derivational complexity. In essence, calling typeid is similar to invoking a virtual member function. For instance, the expression typeid(obj) is evaluated into something similar to the following:

return *(obj->__vptr[0]); //return the type_info object whose address
                         // is stored at offset 0 in the virtual table of obj

Note that the pointer to a class's type_info object is stored at a fixed offset in the virtual table (usually 0, but this is implementation-dependent).

Unlike typeid, dynamic_cast<> is not a constant time operation. In the expression dynamic_cast<T&> (obj), where T is the target type and obj is the operand, the time that is needed to cast the operand to the target type depends on the complexity of the class hierarchy of obj. dynamic_cast<> has to traverse the derivation tree of the obj until it has located the target object in it. When the target is a virtual base, the dynamic cast becomes even more complicated (albeit unavoidable, as you have seen); consequently, it takes longer to execute. The worst case scenario is when the operand is a deeply derived object and the target is a nonrelated class type. In this case, dynamic_cast<> has to traverse the entire derivation tree of obj before it can confidently decide that obj cannot be cast to a T. In other words, a failed dynamic_cast<> is an O(n) operation, where n is the number of base classes of the operand.

You might recall the conclusion that from a design point of view, dynamic_cast<> is preferable to typeid because the former enables more flexibility and extensibility. Notwithstanding that, the runtime overhead of typeid can be less expensive than dynamic_cast<>, depending on the derivational complexity of the entities involved.

Conclusions

The RTTI mechanism of C++ consists of three components: operator typeid, operator dynamic_cast<>, and class std::type_info. RTTI is relatively new in C++. Some existing compilers do not support it yet. Furthermore, compilers that support it can usually be configured to disable RTTI support. Even when there is no explicit usage of RTTI in a program, the compiler automatically adds the necessary "scaffolding" to the resultant executable. To avert this, you can usually switch off your compiler's RTTI support.

From the object-oriented design point of view, operator dynamic_cast<> is preferable to typeid because it enables more flexibility and robustness, as you have seen. However, dynamic_cast<> can be slower than typeid because its performance depends on the proximity of its target and operand, as well as on the derivational complexity of the latter. When complex derivational hierarchies are used, the incurred performance penalty might be noticeable. It is recommended, therefore, that you use RTTI judiciously. In many cases, a virtual member function is sufficient to achieve the necessary polymorphic behavior. Only when virtual member functions are insufficient should RTTI be considered.

Following are a few additional notes to keep in mind when using RTTI:

In order to enable RTTI support, an object must have at least one virtual member function. In addition, switch on your compiler's RTTI support (please consult your user's manual for further information) if it isn't already on.

Make sure that your program has a catch-statement to handle std::bad_cast exceptions whenever you are using dynamic_cast<> with a reference. Note also that an attempt to dereference a null pointer in a typeid expression, as in typeid(*p) where p is NULL, results in a std::bad_typeid exception being thrown.

When you are using dynamic_cast<> with a pointer, always check the returned value.