"Hey, we're done!"
The previous chapters have told the past and the present of C++. In nearly 20 years, C++ has evolved from an experimental language into the most widely used object-oriented programming language worldwide. The importance of standardizing C++ cannot be overemphasized. Having the ANSI/ISO endorsement has several advantages:
Language stability -- C++ is probably the largest programming language in commercial use today. Learning it from scratch is a demanding and time-consuming process. It is guaranteed that, henceforth, learning C++ is a one-time investment rather than an iterative process.
Code stability -- The Standard specifies a set of deprecated features that might become obsolete in the future. Other than that, Fully ANSI-compliant code is guaranteed to work in the future.
Manpower portability -- C++ programmers can switch more easily to different environments, projects, compilers, and companies.
Easier Portability -- The standard defines a common denominator for all platforms and compiler vendors, enabling easier porting of software across various operating systems and hardware architectures.
The following code sample is Standard-compliant; however, some compilers will reject it, whereas others will compile it without complaints:
#include <iostream> using namespace std; void detect_int(size_t size) { switch(size) { case sizeof(char): cout<<"char detected"<<endl; break; case sizeof(short): cout<<"short detected"<<endl; break; case sizeof(int): cout<<"int detected"<<endl; break; case sizeof(long): cout<<"int detected"<<endl; break; } }
On platforms that have distinct sizes for all four integral types (for example, architectures that use 16 bits for short, 32 bits for int, and 64 for long) this code will compile and work as expected. On other platforms, where the size of int overlaps with the size of another integral type, the compiler will complain on identical case labels.
The point to take home from this example is that the Standard does not guarantee absolute code portability, nor does it ensure binary compatibility. However, it facilitates software porting from one platform to another by defining a common ground for the language, which an implementation is allowed to extend. This practice is almost universal: Platform-specific libraries and keywords are added to almost every C++ implementation. However, an implementation cannot alter the specifications of the Standard (otherwise, such an implementation is not Standard-compliant). As you will read in the following sections, allowing platform-specific extensions is an important factor in the success of programming languages in general; languages that have attempted to prohibit platform-specific extensions have failed to obtain a critical mass of users due to a lack of vendor support.
The previous chapters mostly focus on the hows of C++; this chapter explores the whys. It elucidates the philosophy behind the design and evolution of C++ and compares it to the evolution of other programming languages. Some features that almost made it into the Standard are then presented. Possible future additions to C++, including automatic garbage collection, object persistence, and concurrency, are discussed next. Finally, theoretical and experimental issues are discussed. The intent is not to predict the future of C++ (there is no guarantee that any of the features discussed here will ever become an integral part of the Standard), but rather to give you a broader view of the challenges of language design.
The standardization of C++ lasted nine years. STL alone added at least one more year to the original agenda. However, STL was an exception. Other features that were proposed too late were not included in the Standard. The following section lists two such features: hashed associative containers and default type arguments of function templates.
The Standard Template Library provides only one type of associative container -- the sorted associative container. The STL sorted associated containers are map, multimap, set, and multiset (see Chapter 10, "STL and Generic Programming"). However, there is another type of associated container, the hashed associative container, that should really be in the Standard Library but isn't there because it was proposed too late. The difference between a sorted associative container and a hashed associative container is that the former keeps the keys sorted according to some total order. For example, in a map<string, int>, the elements are sorted according to the lexicographical order of the strings. A hashed associative container, on the other hand, divides the keys into a number of subsets, and the association of each key to its subset is done by a hash function. Consequently, searching a key is confined to its subset rather than the entire key space. Searching a hashed associative container can therefore be faster than searching a sorted associative container under some circumstances; but unlike sorted associated containers, the performance is less predictable. There are already vendors that include hashed associated containers as an extension, and it is likely that these containers will be added to the Standard in the next revision phase.
As you read in Chapter 9, "Templates," class templates can take default type arguments. However, the Standard disallows default type arguments in function templates. This asymmetry between class templates and function templates is simply an unfortunate oversight that was discovered too late to be fixed. Default type arguments of function templates will most likely be added to C++ in the next revision of the Standard.
Unlike other newer languages, C++ is not an artifact of a commercial company. C++ does not bear the trademark sign, nor do any of its creators receive royalties for every compiler that is sold. Thus, C++ is not subjected to the marketing ploys and dominance battles that rage among software companies these days. Another crucial factor that distinguishes C++ from some other "would-be perfect" programming languages is the way in which it has been designed and extended through the years.
Some of you might recall the tremendous hype that surrounded Ada in its early days. Ada was perhaps the most presumptuous endeavor to create a language that was free from the deficiencies of the other programming languages that existed at that time. Ada promised to be a 100% portable language, free of subsets and dialects. It also provided built-in support for multitasking and parameterized types. The design of Ada lasted more than a decade, but it was a design by committee process rather than the design by community process that characterizes C++. The facts are known: Ada never really became the general purpose, widely used programming language it intended to be. It is amusing to recall today that back in 1983, when Ada was released, many believed that it was the last third generation programming language to be created. Ironically, C++ was making its first steps at exactly that same time. Needless to say, the design and evolution of C++ have taken a radically different path. Other third generation languages have appeared since 1983 and -- surely -- new third generation languages will appear in the future. The factors that led to the failure of Ada as a universal and general purpose programming language can serve as a lesson in language design.
The failure of Ada can be attributed mostly to the design by committee approach. In addition, the prohibition of platform-specific extensions deterred vendors from developing libraries and tools that supported the new language. It is always surprising to learn how computer scientists and language users differ in their views about the important features of the language. C, which was created by programmers rather than by academia, offered convenience and efficiency at the expense of readability and safety. For example, the capability to write statements such as this one
if (n = v) //did the programmer mistook assignment for equality? { //...do something }
has been a source of criticism. Still, it is this very feature that enables programmers to write a complete function that consists of a single statement such as the following:
void strcpy (char * dst, const char * src) { while( *dst++ = *src++ ); }
The tedium of typing long keywords is also an issue of debate. "Academic languages" usually advocate the use of verbose statements that consist of complete keywords -- for example, integer rather than int, character rather than char (as in Eiffel and other similar languages), and call func(); rather than func();. Programmers, on the other hand, feel more comfortable with truncated keywords and symbols. Look at the following:
class Derived : Base {}; //inheritance indicated by :
In other languages, inheritance is expressed by explicit keywords:
class Derived extends Base {}; //Java; full keyword indicates inheritance
C++ adopts the policy of C in this respect. Furthermore, according to Bjarne Stroustrup, one of the principles in the design of C++ says that where there is a choice between inconveniencing the compiler writer and annoying the programmer, choose to inconvenience the compiler writer (The Evolution of C++: Language Design in the Marketplace of Ideas, p.50). The implementations of operator overloading, enum types, templates, default arguments, and Koenig lookup are instances of this approach. Programmers can get along without direct language support for these features, at the cost of inconvenience: Ordinary functions can be used instead of overloaded operators, constants can replace enum types, and fully qualified names can make up for the lack of Koenig lookup. Fortunately, this is not the case in C++. Other languages, however, have adopted the opposite approach, namely simple compiler writing at the cost of inconveniencing the programmers. Java, for instance, does not have enum types, operator overloading, and default arguments by design. Although these features do not incur overhead of any kind, and no one doubts their importance and usefulness, they make a compiler writer's work more difficult (originally, Java designers claimed that operator overloading was an unnecessary complexity).
The benefits of object-oriented programming are not free. The automatic invocation of constructors and destructors is very handy, but it incurs additional overhead in the speed and the size of the program. Likewise, dynamic binding and virtual inheritance also impose performance penalties. But none of these features is forced on the programmer. Pure procedural C++ code (legacy C code that is ported to a C++ compiler, for example) does not pay for these features. In other words, users -- almost without exception -- have a choice between higher-level features, which impose a performance penalty, and lower-level features, which are free from these performance penalties but are more susceptible to design modifications and are harder to maintain. The "pay as you go" principle enables programmers to use C++ in diverse application domains and apply different programming paradigms according to their needs and priorities.
It's hard to predict which new features will be added to C++ in the future, mostly because it's hard to predict what programming in general will be like five or ten years from now. However, automatic garbage collection, concurrency, and object persistence are already implemented in many other object-oriented programming languages; in the future, they might be added to C++ as well. Rule-based programming and better support for dynamically linked libraries are other such possible extensions that C++ might or might not have in the future. The following sections discuss these features, and their incurred complications, in greater detail.
"If the programmer's convenience is that important, why doesn't C++ have a garbage collector?" is an often-heard question (garbage collection is also discussed in Chapter 11, "Memory Management"). Clearly, an automated garbage collector can make the life of a programmer easier. However, unlike objects, virtual member functions, and dynamic casts, the programmer does not have the freedom of choice with garbage collection. If garbage collection is an automatic process that is hidden from the programmer, it violates the "pay as you go" principle. The cost of automatic garbage collection is forced on users, even if they prefer to manage dynamic memory manually.
Is it possible to add automated garbage collection as a switch (very much like the capability to turn off RTTI support in some compilers)? This is an interesting question. Surely, in programs that do not use dynamic memory allocation, the programmer might want to turn the garbage collector off. The real crux is with programs that allocate memory dynamically. Consider the following example:
void f() { int * p = new int; //...use p }
When the garbage collector is switched on, the implementation will mark the pointer p as unreferenced when f() exits. Consequently, in the next invocation of the garbage collector, the memory pointed to by p will be released. Adding a garbage collector for such simple cases is overkill, though. The programmer can use an auto_ptr (auto_ptr is discussed in Chapter 6, "Exception handling," and in Chapter 11) to achieve the same effect. For example
void f() { auto_ptr<int> p (new int); //...use p } //auto_ptr's destructor releases p
Garbage collection is more useful when dynamic memory has to be released at a different scope from where it was allocated. For example, virtual constructors (which are discussed in Chapter 4, "Special Member Functions: Default Constructor, Copy Constructor, Destructor, and Assignment Operator") enable the user to instantiate a new object of the right type, without having to know the exact type of the source object (the example is repeated here for convenience):
class Browser { public: Browser(); Browser( const Browser&); virtual Browser* construct() { return new Browser; } //virtual default constructor virtual Browser* clone() { return new Browser(*this); } //virtual copy constructor virtual ~Browser(); //... }; class HTMLEditor: public Browser { public: HTMLEditor (); HTMLEditor (const HTMLEditor &); HTMLEditor * construct() { return new HTMLEditor; }//virtual default constructor HTMLEditor * clone() { return new HTMLEditor (*this); } //virtual copy constructor virtual ~HTMLEditor(); //... };
In a garbage collected environment, it is possible to use a virtual constructor in the following way:
void instantiate (Browser& br) { br.construct()->view(); }
Here again, the system automatically registers the unnamed pointer that is returned from br.construct() and marks it as unreferenced so that the garbage collector can later destroy its associated object and recycle its storage. In a non-garbage collected environment, instantiate() causes a memory leak because the allocated object is never deleted (it might cause undefined behavior as well because the allocated object is never destroyed). To enable this programming practice, a garbage collector is mandatory rather than optional. You might suggest that instantiate() is to be written as follows:
void instantiate (Browser& br) { Browser *pbr = br.construct(); pbr->view(); delete pbr; }
This way, instantiate() can be used in a non-garbage collected environment as well as in a garbage collected one: When the garbage collector is active, the delete statement is ignored (perhaps by some macro magic) and the dynamically allocated object is automatically released some time after instantiate() has exited. The delete statement is executed only when the garbage collector is inactive. However, there is another subtle problem here.
In a non-garbage collected environment, pbr is deleted right before instantiate() exits, which means that the destructor of the dynamically allocated object is also invoked at that point. Conversely, in a garbage collected environment, the destructor will be activated at an unspecified time after instantiate() exits. The programmer cannot predict when this will happen. It might take a few seconds, but it can also take hours or even days before the garbage collector is invoked the next time. Now suppose that the destructor of Browser releases a locked resourcesuch as a database connection, a lock, or a modem. The program's behavior in a garbage collected environment is unpredictable -- the locked resource can cause a deadlock because other objects might be waiting for it, too. In order to avert such a potential deadlock, destructors can perform only operations that do not affect other objects, and locked resources have to be released explicitly by calling another member function. For example
void instantiate (Browser& br) { Browser *pbr = br.construct(); pbr->view(); pbr->release(); //release all locked resources delete pbr; }
This is, in fact, the predominant technique in garbage collected languages. Then again, to ensure interoperability between a garbage collected environment and a non-garbage collected one, programmers will have to write a dedicated member function that releases locked resources that the class acquires -- even if that class is used in a non-garbage collected environment. This is an unacceptable burden and a violation of the "pay as you go" principle. The conclusion that can be drawn from this discussion is that garbage collection cannot be optional. It is nearly impossible to write efficient and reliable programs that work in both environments. Either automatic garbage collection needs to be an integral part of the language, or it is totally out (as is the case in C++ at present).
Garbage collection cannot be optional, as you have observed. Why not make it an integral part of the language? Real-time systems are based on deterministic time calculations. For example, a function that has to execute within a time slot of 500 microseconds should never exceed its allotted time slice. However, the garbage collection process is non-deterministic -- it is impossible to predict when it will be invoked, and how long it will take. Therefore, languages that offer automatic garbage collection are usually disqualified for use in time-critical applications. Note that real-time programming is not confined to missile launching and low-level hardware manipulation; most modern operating systems include time-critical components that control the allocation of system resources among processes and threads. Many communication systems are also deterministic by nature. Adding an automated garbage collector to C++ would disqualify it from being used in such application domains. Because a toggled garbage collector is also impractical, C++, by design, is not a garbage collected language at present. Notwithstanding the difficulties involved in garbage collection, there are some serious discussions of adding garbage collection to C++. It is too early to determine if and when this will happen.
Persistent objects can be stored in nonvolatile storage and used later in other runs of the same program or in other programs. Storing the contents of an object in persistent storage is called serialization. The process of reconstituting a serialized object from a persistent repository is called deserialization, or reconstitution. Other object-oriented languages support object persistence directly by means of a library or built-in keywords and operators. C++ does not support object persistence directly. Designing an efficient, general purpose, platform-independent model of object persistence is quite a challenge. This section exemplifies handmade solutions that make up for the lack of language support for persistence. The difficulties and complications that are associated with a handmade object persistence model demonstrate the importance of language support.
Consider the following class:
class Date { private: int day; int month; int year; //constructor and destructor public: Date(); //current date ~Date(); //... };
Storing a Date object is a rather straightforward operation: Every data member is written to a persistent stream (usually this is a local disk file, but it can also be a file on a remote computer). The data members can be read from the stream at a later stage. For that purpose, two additional member functions are required, one for storing the object and the other for reading the stored object:
#include<fstream> using namespace std; class Date { //... virtual ofstream& Write(ofstream& archive); virtual ifstream& Read(ifstream& archive); }; ofstream& Date::Write(ofstream& archive) { archive.write( reinterpret_cast<char*> (&day), sizeof(day)); archive.write( reinterpret_cast<char*> (&month), sizeof(month)); archive.write( reinterpret_cast<char*> (&month), sizeof(year)); return archive; } ifstream& Date::Read(ifstream& archive) { archive.read( reinterpret_cast<char*> (&day), sizeof(day)); archive.read( reinterpret_cast<char*> (&month), sizeof(month)); archive.read( reinterpret_cast<char*> (&month), sizeof(year)); return archive; }
In addition to the member functions Read() and Write(), it is necessary to define a reconstituting constructor, which reads a serialized object from a stream:
Date::Date(ifstream& archive) //reconstituting constructor { Read(arcive); }
For concrete classes such as Date, whose members are fundamental types, making up for the lack of standardized persistence facilities is rather straightforward. The serialization and deserialization operations merely store and read data members, respectively. Note that the class's member functions are not serialized. This is not a major issue of concern because the serialized object should be a close approximation of the binary representation of the object in memory.
Handling derived classes and classes that contain member objects is more complicated: The member functions Read() and Write() need to be redefined in every class in the hierarchy. Likewise, a reconstituting constructor is required for every class, as in the following example:
class DateTime: public Date { private: int secs; int minutes; int hours; public: //... DateTime::DateTime(ifstream& archive); //reconstituting constructor ofstream& Write(ofstream& archive); ifstream& Read(ifstream& archive); }; ofstream& DateTime::Write(ofstream& archive) { Date::Write(archive); //must invoke base class Write() first archive.write( reinterpret_cast<char*> (&), sizeof(day)); archive.write( reinterpret_cast<char*> (&month), sizeof(month)); archive.write( reinterpret_cast<char*> (&month), sizeof(year)); return archive; } ifstream& DateTime::Read(ifstream& archive) { Date::Read(archive); archive.read( reinterpret_cast<char*> (&day), sizeof(day)); archive.read( reinterpret_cast<char*> (&month), sizeof(month)); archive.read( reinterpret_cast<char*> (&month), sizeof(year)); return archive; } DateTime::DateTime(ifstream& archive) //reconstituting constructor { Read(arcive); }
Overriding the member functions Read() and Write() and serializing data members to and from a stream are error prone and can cause maintenance difficulties. Whenever data members are added or removed, or when their types are changed, the implementer has to modify these member functions accordingly -- but this is still managable. However, deriving from classes that do not define a reconstituting constructor and the member functions Read() and Write() is more difficult to handle because a derived class can only serialize its own members -- not members of its base classes. The same difficulties exist with embedded objects. How are such subobjects serialized? It might be possible to overcome these difficulties in some cases, albeit with considerable efforts. For example, a class that contains a vector can iterate through the vector's members and serialize them one by one. This is only half the story, though. A vector's state depends on other parameters, such as its capacity. Where can this information be stored if the vector object itself cannot be serialized? Serializing arrays is another conundrum. One solution is to write a header in the beginning of every serialized object that contains the number of elements. However, this won't work with reference counted objects. Most implementations of std::string are reference counted, which means that in the following code snippet, the five string objects share some of their data members:
#include <string> using namespace std; void single_string() { string sarr[4]; string s = sarr[0]; for (int i = 1; i< 4; i++) { sarr[i] = s; } }
Reference counting is an implementation detail that is hidden from the users of the class; it is impossible to query the string object about how many strings it represents and to serialize this datum.
Handmade object persistence is becoming much more complicated than it seemed at first, isn't it? But that's not all yet. How might such a handmade persistence model represent templates? By simply storing specializations as ordinary objects, the model fails to represent the relationship that exists among the specializations. Worse yet, multiple inheritance and virtual inheritance are even more challenging. How can a handmade persistence model ensure that a virtual subobject is serialized only once, regardless of the number of its occurrences in the inheritance graph?
Most programmers probably give in at this point, and rightfully so. It is possible to come up with a solution even to the virtual base class problem, but as soon as this problem is solved, other special cases such as function objects, static data members, reference variables, and unions present more complexities. There is another drawback in the handmade persistence model: It is not standardized, and as such, programmers have to implement it on their own. The result is a lack of uniformity and varying levels of reliability and performance. Without standardized support for object persistence, a homemade persistence model is, at best, brittle and error prone. Obviously, without standardized object persistence it is impossible to ensure simple, portable, and efficient serialization and deserialization of objects.
What might such a standardized persistence model look like? There are two basic strategies. One is library-based, whereas the other relies on core language extensions (keywords and syntax). A library-based solution is advantageous in many respects. For example, it does not extend the core language, thus avoiding additional burden for programmers who do not intend to use persistent objects. In addition, a library can be replaced by a better implementation from another vendor without having to switch to a different compiler. This practice can be seen today with people who uninstall the original STL implementation -- provided by the compiler vendor -- and replace it with another one. Still, a library-based solution has to deal with the lack of language support for persistence, and it must face the same difficulties and complications that were demonstrated previously (the intricacies and vagaries of the most widely used object distribution frameworks, namely the Distributed Component Object Model (DCOM) and the Common Object Request Broker Architecture (CORBA), prove this point). STL might have never become what it is today without built-in support for templates and operator overloading. Furthermore, the language support for templates was extended in various ways to provide the necessary constructs for STL (see Chapter 2, "Standard Briefing: The Latest Addenda to ANSI/ISO C++," and Chapter 9). Similarly, the support for persistence requires core language extensions.
The special member functions are automatically synthesized by the implementation if the programmer does not declare them explicitly and if the implementation needs them (see Chapter 4). Similarly, a language extension can be made so that another type of constructor, a reconstituting constructor, is either implicitly synthesized by the implementation when needed, or so that it can be declared by the programmer. As is the case with other constructor types, the programmers need to be allowed to override the default reconstituting constructor by defining it explicitly. The syntactic form of such a constructor must be distinct from all other constructor forms. In particular, a reconstituting constructor is not to be identified solely by its signature. In other words, the following
class A { //... public: A(istream& repository ); //reconstituting ctor or an ordinary constructor };
is not recommended. It might well be the case that the programmer's intention was to define an ordinary constructor that takes an istream object by reference and not a reconstituting constructor. Furthermore, such a convention might break existing code. A better approach is to add a syntactic clue that signifies a reconstituting constructor exclusively. For example, by preceding the symbol >< to the constructor's name
class A { //... public: ><A(istream& repository ); //reconstituting constructor };
the reconstituting constructor can take a single parameter of some stream type. This parameter is optional. When the reconstituting constructor is invoked without an argument, the implementation deserializes the object from a default input stream that can be specified in the compiler's setting (similar to the default location of the standard header files). To automate the serialization process, a serializing destructor is also necessary. How might such a destructor be declared? One solution is to add another type of destructor so that classes can have two destructor types. This is, however, troublesome because the object model of C++ is based on a single destructor per class. Adding another type of destructor is ruled out then. Perhaps there is no need to define a distinct destructor type. Instead, the existing destructor can do the serialization automatically: The compiler can insert into the destructor additional code that performs the necessary serialization operations. (As you know, compilers already insert code into user-defined destructors to invoke the destructors of base classes and embedded objects.)
Automating the serialization process has drawbacks, too. Not every class has to be serialized. The overhead of serializing an object should be imposed only when the user really needs it. Furthermore, the possibility of encountering runtime exceptions during serialization is rather high. A full hard disk, a broken network connection, and a corrupted repository are only a handful of the possible runtime exceptions that can occur during the process of writing the contents of an object to a permanent storage medium. However, throwing an exception from a destructor is highly undesirable (see Chapter 6), so perhaps automatic serialization during object destruction is too risky. Apparently, there is no escape from explicitly calling a member function to do the job. There are other obstacles here: How to handle the creation and serialization of an array of objects? How to synchronize changes in the definition of a class and the contents of an object that was serialized before the change took place? Every language that supports object persistence deals with these difficulties in its own way. C++ can borrow some of these ideas, too, or it can initiate innovative ideas.
This discussion gives you some feel of why language extensions are necessary, and what kind of obstacles they overcome. However hypothetical this discussion might seem, the evolution of C++ has been a democratic process. Many of the changes and extensions were initiated by users of the language rather than Standardization committee members. STL is probably the best example of this. If you have a comprehensive proposal for such an extension, you can present it to the Standardization committee.
Concurrency is a generic term for multithreading and multiprocessing. Concurrent programming can effectively improve performance and responsiveness of an application, be it a word processor or a satellite homing system. C++ does not directly address the issues of multiprocessing, threads, and thread safety. It is important to note, however, that nothing in the Standard Library or the language itself disallows concurrency. Look at the example of exception handling: In a multithreaded environment, exception handling should be thread-safe, but a single-threaded environment can implement exception handling in a non-thread-safe manner; this is an implementation-dependent issue. Implementations are allowed to provide the necessary facilities for concurrency, and indeed many of them do so. Again, without direct support from the programming language, either by standardized libraries or by core extensions, the implementation of thread safety is more complicated and highly nonportable. There have been several proposals in the past for adding concurrency to C and C++. At present, however, none of these languages supports concurrency directly.
Multithreading, as opposed to multiprocessing, refers to the use of several control threads in a single process. Multithreading is therefore simpler to design and implement, and it enables the application to use system resources more effectively.
Because all threads in a process share the process's data, it is essential to synchronize their operation properly so that one thread does not interfere with another. For that purpose, synchronization objects are used. Various types of synchronization objects, such as mutex, critical section, lock, and semaphore offer different levels of resource allocation and protection. Unfortunately, the details and the characterizations of synchronization objects vary from platform to platform. A standard library of synchronization objects has to be flexible enough to enable users to combine platform-specific synchronization objects with standard objects. This is similar to the use of std::string and nonstandard string objects in the same program. Alternatively, the standard thread library could provide the basic interfaces of the synchronization objects, and the implementation would be platform-dependent. There is a problem with introducing multithreading support into the Standard, however: single-threaded operating systems such as DOS. Although these platforms are not very popular these days, they are still in use, and implementing a thread library on these platforms is nearly impossible.
Perhaps the Standard can provide only the necessary features for thread safety and leave the other issues -- such as synchronization objects, event objects, instantiation, destruction of threads, and so on -- implementation-defined, as they are today. Thread safety ensures that an object can be used safely in a multithreaded environment. For example, the following thread-unsafe class
class Date { private: int day; int month; int year; public: Date(); //current date ~Date(); //accessors int getDay() const { return day; } int getMonth() const { return month; } int getYear() const { return year; } //mutators void setDay(int d) { day = d; } void setMonth(int m) { month = m; } void setYear(int y) { year = y; } };
can become thread-safe by applying the following changes to it: At the beginning of each member function, a lock has to be acquired; in every return point of each member function, the lock has to be released.
The modified member functions now look like this:
void Date::setDay(int d) { get_lock(); day = d; release_lock(); } void Date::setMonth(int m) { get_lock(); month = m; release_lock(); } //etc.
This is tedious, and yet very simple to automate. The recurrent pattern is very reminiscent of the "resource acquisition is initialization" idiom (discussed in Chapter 5, "Object-Oriented Programming and Design"). You can define a class whose constructor acquires a lock, and whose destructor releases it. For example
class LockDate { private: Date& date; public: LockDate(const Date& d) : date { lock(&d); } ~LockDate() { release(&d); } };
A real-world lock class would probably be templatized. It would also provide timeouts and handle exceptions; however, the definition of LockDate suffices for this discussion. The member functions of Date can now be defined as follows:
int Date::getDay() const { LockDate ld(this); return day; } /...and so on void Date::getDay(int d) { LockDate ld(this); day = d; } //etc.
This looks better than the original thread-safe version, but it's still tedious. Standard C++, however, goes only that far. A fully automated thread safety requires core language extensions.
It might not seem obvious from the example why language support for thread safety is necessary. After all, instantiating a local object in every member function is not unacceptably complicated or inefficient. The troubles begin with inheritance. Invoking a non-thread-safe inherited member function might have undefined results in this case. To ensure thread safety in inherited member functions as well, the implementer of Date has to override every inherited member function. In each override, a lock has to be acquired. Then, the parent member function is invoked, and finally, the lock is released. With a little help from the programming language, these operations can be made much easier.
Before Method and After Method
The CLOS programming language defines the concepts before method and after method. A before method is a sequence of operations that precedes the action of a method. An after method is a sequence of operations that succeeds the action of a method. Thus, each method (member function) in CLOS can be thought of as an object with a corresponding constructor and destructor. CLOS provides default before method and after method for each user-defined method. By default, the before method and after method do nothing. However, the user can override them to perform initialization and cleanup operations. Adopting this concept in C++ with slight modifications might simplify the implementation of thread-safe classes. One direction is to provide identical before method and after method for every member function of a class. That is, the before method and after method are defined only once, but they are automatically invoked by every member function of the class (except for the constructor and destructor). One of the benefits of this approach is that new member functions that are added to the class automatically become thread-safe, as do inherited member functions.
Several programming languages enable the user to compose inherited member functions in a derived class almost automatically. In C++, a member function of a derived class overrides rather than extends the corresponding member of the base class. It is possible to extend the inherited function by calling it explicitly before performing any other operations in the overriding member function (see Chapter 5). The following example (repeated here for convenience) shows how it is done:
class rectangle: public shape { //... virtual void resize (int x, int y) //extends base's resize() { shape::resize(x, y); //explicit call to the base's virtual function //add functionality int size = x*y; } };
There are two problems with this approach. First, if the base class name changes, the implementer of the derived class has to find every occurrence of the old qualified name and change it accordingly.
Another problem is that some member functions are meant to be extended rather than overridden. The best examples are constructors and destructors (which, luckily, the compiler takes care of), but there are other such examples. The serialization and deserialization operations that were discussed previously also need to be extended rather than overridden in a derived class.
It is very tempting to solve the first problem by adding the keyword super to the language. Smalltalk and other object-oriented languages already have it. Why not let C++ programmers enjoy it as well? super refers to the direct base class. It can be used in the following manner:
class rectangle: public shape { //... void resize (int x, int y) //extends base's resize() { super.resize(x, y); //the name of the base class is not necessary anymore //add functionality int size = x*y; } }; class specialRect: public rectangle { void resize (int x, int y) //extends base's resize() { super.resize(x, y); //calls recatngle::resize() //add more functionality } };
However, super is ambiguous in objects that have multiple base classes. An alternative solution is to add a different keyword to the language, extensible, that instructs the compiler to insert a call of the base member function in an overriding member function automatically. For example
class shape { public: extensible void resize(); } class rectangle: public shape { public: void resize (int x, int y) //extends base's resize() { //shape::resize() is implicitly invoked at this point //add functionality int size = x*y; } }; class specialRect: public rectangle { void resize (int x, int y) //extends base's resize() { //implicitly calls recatngle::resize() //...add more functionality } };
extensible is a specialized form of virtual, so the latter is unnecessary. Surely, extensible solves the first problem: If the base class name changes, the implementer of the derived class does not have to change the definition of the member functions. The second problem is also solved here: After a member function is declared extensible, the compiler automatically sees that the corresponding member function of a derived class first invokes the member function of the base class.
A typical C++ application consists of a statically linked executable that contains all the code and data of the program. Although static linking is efficient in terms of speed, it's inflexible: Every change in the code requires a complete rebuild of the executable. When a dynamically linked library is used, the executable does not need to be rebuilt; the next time the application is run, it automatically picks up the new library version. The advantage of dynamically linked libraries is a transparent upgrade of new releases of the dynamically linked library. However, this transparent "drop in" model breaks under the object model of C++ if the data layout of an object changes in the new release of the library; this is because the size of an object and the offset of its data members are fixed at compile time. There have been suggestions to extend the object model of C++ so that it can support dynamic shared libraries better. However, the costs are slower execution speed and size.
Many commercial databases support triggers. A trigger is a user-defined rule that instructs the system to perform specific actions automatically whenever a certain data value changes. For example, imagine a database that contains two tables, Person and Bank Account. Every row in Bank Account is associated with a record in Person. Deleting a Person record automatically triggers the deletion of all its associated Bank Account records. Rules are the equivalent of triggers in software systems. William Tepfenhart and other researchers at AT&T Bell Laboratories have extended C++ to support rules (UML and C++: A Practical Guide to Object-Oriented Development, p. 137). The extended language is called R++ (the R stands for "rules"). In addition to member functions and data members, R++ defines a third kind of class member: a rule. A rule consists of a condition and an associated action that is automatically executed when the condition evaluates to true. In C++, the programmer has to test the condition manually in order to decide whether the associated action is to be executed, usually by a switch statement or an if statement. In R++, this testing is automated -- the system monitors the data members listed in the rule's condition, and whenever the condition is satisfied, the rule "fires" (that is, the associated action is executed). Rule-based programming is widely used in artificial intelligence, debugging systems, and event-driven systems. Adding this feature to C++ could considerably simplify the design and implementation of such systems.
Language extensions are needed to facilitate the implementation of operations that otherwise might be more difficult or even impossible. However, there is always a tradeoff involved. To use an analogy, adding an air conditioner to a car decreases its fuel efficiency and degrades its performance (Principles of Programming Languages: Design, Evaluation and Implementation, p. 327). Whether it is a beneficial tradeoff depends on various factors, such as the climate in the region where the car is used, the cost of fuel, the engine's power, and the personal preferences of its users. Note that the air conditioner can always be turned off to gain more power and increase the fuel efficiency. Ideally, new language features will not impose a performance penalty of any kind when they are not used. When the programmer deliberately uses them, they should impose as little overhead as possible or no overhead at all. There is, however, a notable difference between an air conditioner and language extensions: Extensions interact with one another. For example, the imaginary keyword super has an undesirable interaction with another language feature, namely multiple inheritance. A more realistic example is template's template arguments. The space between the left two angular brackets is mandatory:
Vector <Vector<char*> > msg_que(10);
Otherwise, the >> sequence is parsed as the right shift operator. In other situations, the interaction is much more complex: Koenig lookup, for instance, can have surprising results under some circumstances (as you read in Chapter 8, "Namespaces").
This chapter has presented three major proposals for language extensions: garbage collection, persistence, and concurrency. Suggestions for less radical extensions are extensible members and rules. None of these is to be taken lightly. The complexity involved in standardizing each of these is intensified even further when they interact with each other. For example, a persistence model becomes even more complicated in a thread-safe environment.
Considering the challenges that the designers of C++ have faced during the past two decades, you can remain optimistic. If you are familiar with the prestandardized implementations of container classes, RTTI, and exception handling of several well known frameworks, you are probably aware of how the standardized container classes, RTTI, and exception handling are much better in every way. This will also be the case if any of the features that are discussed here become part of the C++ Standard.
© Copyright 1999, Macmillan Computer Publishing. All rights reserved.